00 Stack Primer¶
This repository uses a few tools whose roles are easy to confuse when you first encounter them. This document explains what each one does in plain language, when you need it, and when you can ignore it.
The simplest mental model¶
Your personal LLM system has five separate jobs:
- collect and clean your documents
- capture your own stable opinions and response style
- search documents when the model needs context
- run or fine-tune the model
- expose the whole thing through a local API
The named tools in this repo map to those jobs like this:
MLX: runs and fine-tunes models on Apple SiliconOllama: runs local models behind a simple HTTP APIQdrant: stores embeddings so the system can retrieve relevant chunks laterRunPod: rents remote GPU machines when your laptop is not enough
The curated knowledge layer for job 2 lives in knowledge/README.md. That layer is what makes the system personal rather than just document-aware.
What each tool is¶
MLX¶
MLX is Apple’s machine learning framework for Apple Silicon. In this repository it is the default local path for model execution and local LoRA fine-tuning because it is designed for Apple Silicon and uses the Mac’s unified memory model well.
Use MLX when:
- you want to run and tune models locally on your Mac
- you want the most Apple-native path in this repo
You can ignore MLX when:
- you only want to call a model managed by Ollama
- you move training entirely to remote GPUs
Ollama¶
Ollama is a local model runner. Think of it as a convenient local model server with a simple API. It downloads and serves models for you and exposes endpoints you can call from code.
Use Ollama when:
- you want a simple local API for chat or embeddings
- you want easier model management than wiring every backend yourself
You can ignore Ollama when:
- you are happy with the MLX-first local path
- you do not need a separate local model server yet
Qdrant¶
Qdrant is a vector database. It stores embeddings plus metadata so the RAG system can retrieve relevant chunks such as “only infrastructure notes from the last six months” or “only finance documents tagged confidential”.
Use Qdrant when:
- you want stronger metadata filtering and a more production-style retrieval backend
- your corpus is growing and you want a dedicated vector store
You can ignore Qdrant when:
- you are starting with the easier local path
- Chroma is good enough for your first version
RunPod¶
RunPod is a GPU cloud service. In this repository it is only for optional remote training. It is not required for local RAG or local inference.
Use RunPod when:
- local training on your Mac is too slow
- you want to train a larger adapter job, especially 14B-class models
You can ignore RunPod when:
- you are building your first local version
- you are staying with local MLX LoRA training
What you actually need for v1¶
For a first complete local build, you only need:
- your documents on local disk or an NFS mount
- your emails as
mboxor.eml - your curated knowledge files in
knowledge/ MLXChromaFastAPI
Everything else is optional.
Recommended best practices¶
For a first-time user, I recommend:
- profile-driven model selection
- snapshot-based ingestion
Profile-driven means you pick a model profile such as qwen2.5-7b-instruct, and the repository chooses the matching backend for that profile. This avoids invalid combinations such as pointing an MLX-only model name at an Ollama runtime by accident.
For a fast full-process dry run, use qwen2.5-3b-instruct. It is the smallest practical real-model path in the repo and keeps you in the same Qwen family as the default 7B target.
Snapshot-based ingestion means the system copies local and NFS files into data/raw/ before extraction. This is the best default for reproducibility because your processed corpus can always be traced back to the exact raw bytes that were indexed or used for training.
NFS and email archives¶
If your documents live on a standard NFS share, mount it on macOS and point the local_files connector to that path. The ingestion pipeline treats it as normal local storage.
Best practice: do not extract directly from the live mount. This repository snapshots mounted files into data/raw/ first so later processing is reproducible even if the NFS share changes.
If you do not want IMAP, use one of these formats instead:
mbox: best for whole-mailbox exports- a directory of
.emlfiles: best for simple file-based ingestion - Gmail Takeout archives: supported through the email connector
Official documentation¶
- MLX: https://github.com/ml-explore/mlx
- MLX LM: https://github.com/ml-explore/mlx-lm
- Ollama docs: https://docs.ollama.com/
- Ollama API: https://docs.ollama.com/api/introduction
- Qdrant overview: https://qdrant.tech/documentation/overview/
- RunPod pods overview: https://docs.runpod.io/pods/overview