Knowledge Layer¶

This directory is the curated layer that sits between raw source documents and model training.

Raw documents tell the system what exists in your world. The files in knowledge/ tell the system what you actually believe, prefer, reject, and want repeated consistently.

That distinction matters:

data/raw/ and data/processed/ are evidence stores
knowledge/ is your high-signal operating manual
data/training/ is where that knowledge gets turned into SFT and refusal examples

What belongs here¶

your preferred response style
your domain-specific opinions and heuristics
default architecture choices
tool preferences and anti-patterns
finance, tax, and governance cautions you want repeated
explicit boundary rulings for difficult mixed-domain questions

Suggested workflow¶

edit persona.md
fill in the relevant files listed in domains/README.md
create training examples using docs/10_dataset_design_examples.md
record a versioned snapshot under snapshots/README.md
build or refresh your LoRA dataset

Directory layout¶

Path	Purpose
`persona.md`	How the assistant should sound and reason
`domains/`	One canonical file per professional domain cluster
`templates/`	Reusable authoring templates
`snapshots/`	Versioning notes that tie knowledge state to dataset and adapter runs

Best practice¶

Do not try to fine-tune directly from all raw documents. Use raw documents mainly for RAG and evidence retrieval. Use curated knowledge files to encode:

your stable opinions
your preferred decision logic
the exact type of concise or detailed answers you want

That combination usually produces a much better personal assistant than training on noisy source material alone.