Skip to content

Baseline Evaluation Template

Run metadata

  • date:
  • model profile:
  • guardrail profile:
  • backend:
  • adapter:
  • eval case file:

Summary

  • overall impression:
  • strongest domain:
  • weakest domain:
  • refusal behavior:
  • boundary-case behavior:
  • persona fit:

Quantitative notes

  • hard refusal rate:
  • boundary-case pass rate:
  • citation coverage:
  • unsupported-claim proxy:

Qualitative examples

Good response

  • prompt:
  • why it was good:

Weak response

  • prompt:
  • why it was weak:

Decision

  • keep as baseline:
  • changes to make before LoRA: