Skip to content

Architecture Overview

Design principles

  • Keep the first-run experience simple on Apple Silicon.
  • Preserve stable interfaces so the local quickstart can evolve into a more production-style deployment.
  • Keep training adapter-based and retrieval-grounded.
  • Bias the system strongly toward professional domains through multiple independent control layers.
  • Keep raw data private and minimize unnecessary export to external compute.

Diagrams

System overview

flowchart TD
    user["User / Agent Client"] --> api["FastAPI Orchestrator"]
    api --> router["Topic Router + Policy Validator"]
    router -->|allowed| retriever["Retriever"]
    router -->|disallowed| refusal["Refusal Policy"]
    retriever --> vectordb["Chroma or Qdrant"]
    vectordb --> chunks["Allowed Domain Chunks"]
    api --> backend["MLX / Ollama / llama.cpp"]
    backend --> adapters["LoRA Adapters"]
    connectors["Drive / Email / Linkwarden / Files / Code"] --> ingest["Ingestion Pipelines"]
    ingest --> processed["Processed JSONL + Parquet"]
    processed --> vectordb
    processed --> training["Training Dataset Builder"]
    training --> adapters

Ingestion flow

flowchart LR
    source["Source Connectors"] --> raw["data/raw"]
    raw --> extract["Text Extraction"]
    extract --> dedup["Checksum + Near-Duplicate Detection"]
    dedup --> classify["Domain Classification"]
    classify --> chunk["Chunking + Metadata Tags"]
    chunk --> manifest["JSONL Manifest"]
    chunk --> parquet["Parquet Export"]
    manifest --> embed["Embedding Jobs"]
    embed --> vector["Vector Store"]

Training flow

flowchart TD
    corpus["Processed Chunks"] --> curate["Dataset Curation"]
    prompts["Refusal + Professional Prompt Assets"] --> curate
    curate --> sft["SFT JSONL"]
    curate --> eval["Eval JSONL"]
    sft --> mlx["Local MLX LoRA"]
    sft --> runpod["Remote RunPod Axolotl/PEFT"]
    mlx --> adapters["Adapters + Metrics"]
    runpod --> adapters

Serving sequence

sequenceDiagram
    participant C as Client
    participant A as API
    participant P as Policy
    participant R as Retriever
    participant V as Vector DB
    participant M as Model Backend
    C->>A: chat.completions
    A->>P: classify query
    alt disallowed
        P-->>A: refuse + redirect
        A-->>C: refusal response
    else allowed
        A->>R: retrieve context
        R->>V: semantic search + metadata filter
        V-->>R: top chunks
        R-->>A: citations + context
        A->>M: final prompt
        M-->>A: answer
        A->>P: validate output
        A-->>C: grounded response
    end

Security boundaries

flowchart TD
    raw["Raw Personal Documents"] --> secure["Encrypted Local Storage"]
    secure --> process["Local Processing"]
    process --> filtered["Approved Training / Retrieval Corpus"]
    filtered --> local["Local MLX / Chroma / API"]
    filtered --> remote["Optional Remote GPU Export"]
    remote --> review["Explicit Approval + Redaction Review"]
    review --> adapter["Remote LoRA Job"]
    classDef sensitive fill:#fbe9e7,stroke:#bf360c,color:#000;
    classDef controlled fill:#e8f5e9,stroke:#1b5e20,color:#000;
    class raw,secure sensitive;
    class process,filtered,local,remote,review,adapter controlled;

Components

  • Connectors acquire or scan sources and write immutable raw artifacts plus normalized source manifests.
  • Pipelines extract text, deduplicate, chunk, and classify domains before exporting processed corpora.
  • Training builds SFT and refusal datasets for MLX or remote Axolotl/PEFT jobs.
  • RAG indexes allowed-domain chunks and filters disallowed or sensitive data by policy.
  • The API orchestrator classifies every query before retrieval or generation.