06 External Training¶
Default provider: RunPod¶
RunPod is the primary provider with end-to-end automation in this repository.
In plain language, RunPod is a service that gives you rented GPU machines on demand. In this repository, you use it only when your Mac is not the right place to train the adapter job.
GPU guidance¶
- 7B LoRA: L4 or A10G 24 GB class GPUs
- 14B LoRA: L40S or A100 40 GB+ class GPUs
Cost and time bands¶
- 7B LoRA on L4/A10G: typically low tens of USD for moderate-size adapter jobs, often completed within a few hours
- 14B LoRA on L40S/A100: higher tens to low hundreds of USD depending on dataset size, epochs, context length, and storage overhead
Treat these as planning ranges rather than fixed prices.
Included automation¶
- package dataset and config
- upload training bundle
- submit RunPod job spec
- poll job state
- pull back adapter artifacts and metrics
When you do and do not need RunPod¶
Use RunPod when:
- you want to train a larger adapter faster than your Mac can handle
- you want to use 14B-class models more comfortably
Do not use RunPod when:
- you are still validating the ingestion and RAG pipeline
- local MLX LoRA is fast enough for your current dataset
Additional providers¶
The repository includes guidance and config templates for:
- Lambda Labs
- Modal
- Together AI
- Replicate
- Hugging Face
- Vast.ai
These are documented but not fully automated in v1.