Knowledge Layer¶
This directory is the curated layer that sits between raw source documents and model training.
Raw documents tell the system what exists in your world. The files in knowledge/ tell the system what you actually believe, prefer, reject, and want repeated consistently.
That distinction matters:
data/raw/anddata/processed/are evidence storesknowledge/is your high-signal operating manualdata/training/is where that knowledge gets turned into SFT and refusal examples
What belongs here¶
- your preferred response style
- your domain-specific opinions and heuristics
- default architecture choices
- tool preferences and anti-patterns
- finance, tax, and governance cautions you want repeated
- explicit boundary rulings for difficult mixed-domain questions
Suggested workflow¶
- edit persona.md
- fill in the relevant files listed in domains/README.md
- create training examples using docs/10_dataset_design_examples.md
- record a versioned snapshot under snapshots/README.md
- build or refresh your LoRA dataset
Directory layout¶
| Path | Purpose |
|---|---|
persona.md |
How the assistant should sound and reason |
domains/ |
One canonical file per professional domain cluster |
templates/ |
Reusable authoring templates |
snapshots/ |
Versioning notes that tie knowledge state to dataset and adapter runs |
Best practice¶
Do not try to fine-tune directly from all raw documents. Use raw documents mainly for RAG and evidence retrieval. Use curated knowledge files to encode:
- your stable opinions
- your preferred decision logic
- the exact type of concise or detailed answers you want
That combination usually produces a much better personal assistant than training on noisy source material alone.