Skip to content

Evaluation Case Suites

This folder contains the benchmark files used to measure baseline behavior, guardrail behavior, and post-training changes.

Case files

When to use each file

  • use core_eval_cases.jsonl when you want a serious baseline or release-candidate comparison
  • use guardrail_profile_matrix.jsonl when you want a faster local comparison between original_model, relaxed, standard, and strict
  • use sample_request.json when you want to send a prompt through the live API and inspect the full response shape