Architecture Overview¶

Design principles¶

Keep the first-run experience simple on Apple Silicon.
Preserve stable interfaces so the local quickstart can evolve into a more production-style deployment.
Keep training adapter-based and retrieval-grounded.
Bias the system strongly toward professional domains through multiple independent control layers.
Keep raw data private and minimize unnecessary export to external compute.

Diagrams¶

System overview¶

flowchart TD
    user["User / Agent Client"] --> api["FastAPI Orchestrator"]
    api --> router["Topic Router + Policy Validator"]
    router -->|allowed| retriever["Retriever"]
    router -->|disallowed| refusal["Refusal Policy"]
    retriever --> vectordb["Chroma or Qdrant"]
    vectordb --> chunks["Allowed Domain Chunks"]
    api --> backend["MLX / Ollama / llama.cpp"]
    backend --> adapters["LoRA Adapters"]
    connectors["Drive / Email / Linkwarden / Files / Code"] --> ingest["Ingestion Pipelines"]
    ingest --> processed["Processed JSONL + Parquet"]
    processed --> vectordb
    processed --> training["Training Dataset Builder"]
    training --> adapters

Ingestion flow¶

flowchart LR
    source["Source Connectors"] --> raw["data/raw"]
    raw --> extract["Text Extraction"]
    extract --> dedup["Checksum + Near-Duplicate Detection"]
    dedup --> classify["Domain Classification"]
    classify --> chunk["Chunking + Metadata Tags"]
    chunk --> manifest["JSONL Manifest"]
    chunk --> parquet["Parquet Export"]
    manifest --> embed["Embedding Jobs"]
    embed --> vector["Vector Store"]

Training flow¶

flowchart TD
    corpus["Processed Chunks"] --> curate["Dataset Curation"]
    prompts["Refusal + Professional Prompt Assets"] --> curate
    curate --> sft["SFT JSONL"]
    curate --> eval["Eval JSONL"]
    sft --> mlx["Local MLX LoRA"]
    sft --> runpod["Remote RunPod Axolotl/PEFT"]
    mlx --> adapters["Adapters + Metrics"]
    runpod --> adapters

Serving sequence¶

sequenceDiagram
    participant C as Client
    participant A as API
    participant P as Policy
    participant R as Retriever
    participant V as Vector DB
    participant M as Model Backend
    C->>A: chat.completions
    A->>P: classify query
    alt disallowed
        P-->>A: refuse + redirect
        A-->>C: refusal response
    else allowed
        A->>R: retrieve context
        R->>V: semantic search + metadata filter
        V-->>R: top chunks
        R-->>A: citations + context
        A->>M: final prompt
        M-->>A: answer
        A->>P: validate output
        A-->>C: grounded response
    end

Security boundaries¶

flowchart TD
    raw["Raw Personal Documents"] --> secure["Encrypted Local Storage"]
    secure --> process["Local Processing"]
    process --> filtered["Approved Training / Retrieval Corpus"]
    filtered --> local["Local MLX / Chroma / API"]
    filtered --> remote["Optional Remote GPU Export"]
    remote --> review["Explicit Approval + Redaction Review"]
    review --> adapter["Remote LoRA Job"]
    classDef sensitive fill:#fbe9e7,stroke:#bf360c,color:#000;
    classDef controlled fill:#e8f5e9,stroke:#1b5e20,color:#000;
    class raw,secure sensitive;
    class process,filtered,local,remote,review,adapter controlled;

Components¶

Connectors acquire or scan sources and write immutable raw artifacts plus normalized source manifests.
Pipelines extract text, deduplicate, chunk, and classify domains before exporting processed corpora.
Training builds SFT and refusal datasets for MLX or remote Axolotl/PEFT jobs.
RAG indexes allowed-domain chunks and filters disallowed or sensitive data by policy.
The API orchestrator classifies every query before retrieval or generation.