Baseline Evaluation Template¶
Run metadata¶
- date:
- model profile:
- guardrail profile:
- backend:
- adapter:
- eval case file:
Summary¶
- overall impression:
- strongest domain:
- weakest domain:
- refusal behavior:
- boundary-case behavior:
- persona fit:
Quantitative notes¶
- hard refusal rate:
- boundary-case pass rate:
- citation coverage:
- unsupported-claim proxy:
Qualitative examples¶
Good response¶
- prompt:
- why it was good:
Weak response¶
- prompt:
- why it was weak:
Decision¶
- keep as baseline:
- changes to make before LoRA: