11 Personal Knowledge¶

This guide explains how to build the part of the repository that is actually personal.

Why this layer matters¶

If you only ingest raw files, the system will know what documents say. It will not reliably know:

your preferred defaults
which tradeoffs you care about most
which tools you trust or reject
how cautious or challenging the assistant should be
what answer structure feels useful to you

That is why the repository includes knowledge/ as a curated layer.

Three knowledge tiers¶

Tier 1: raw evidence¶

Location:

data/raw/
data/processed/

Use for:

RAG
citations
source-of-truth lookup

Tier 2: curated personal knowledge¶

Location:

knowledge/persona.md
knowledge/domains/README.md

Use for:

your stable opinions
default recommendations
tone and response structure
anti-patterns and heuristics

Tier 3: training-ready examples¶

Location:

data/training/
data/training/examples/README.md

Use for:

converting the curated layer into SFT, refusal, and boundary-case examples

Recommended authoring order¶

define the persona
write domain files for your strongest knowledge areas
mark boundary cases that should be allowed or refused
create a small but high-quality training set
snapshot the knowledge state before training

What to write in domain files¶

Each domain file should capture:

default recommendation
core heuristics
preferred tools or patterns
anti-patterns
exceptions to your usual rule
realistic example Q&A

Use knowledge/templates/domain_knowledge_template.md as the baseline.

How much detail is enough¶

For the first useful adapter, aim for:

one persona file
three to five strong domain files
twenty to fifty carefully written training pairs derived from them

That is usually more valuable than hundreds of noisy auto-generated examples.

Versioning strategy¶

Best practice for a first-time user:

keep one current working version in knowledge/
create a small snapshot note before each serious LoRA run in knowledge/snapshots/README.md
include the persona, changed domain files, guardrail profile, dataset path, and evaluation report

This is simpler and safer than trying to build a complex versioning system immediately.

What should stay out of LoRA data¶

Usually keep these in RAG only unless they are very stable:

long vendor docs
transient tickets
stale PDFs
duplicate notes
one-off spreadsheets with shifting assumptions

Only fine-tune on material you want repeated consistently.

Signals that a topic belongs in curated knowledge¶

you have a strong default answer
you want the same framing every time
the advice is more about judgment than memorizing facts
you want the response style to feel like your own

11 Personal Knowledge¶

Why this layer matters¶

Three knowledge tiers¶

Tier 1: raw evidence¶

Tier 2: curated personal knowledge¶

Tier 3: training-ready examples¶

Recommended authoring order¶

What to write in domain files¶

How much detail is enough¶

Versioning strategy¶

What should stay out of LoRA data¶

Signals that a topic belongs in curated knowledge¶

Related guides¶