Skip to main content
Docent is a behavior analysis platform for agents. After you run an evaluation, Docent analyzes your traces and explains what failure modes or environment issues are driving your team’s evaluation results. Teams use Docent to:
  • Iterate on scaffolds. Docent returns actionable insights to inform prompt tuning, tool instructions, or orchestration logic.
  • Post-train models. Compare behavior across checkpoints or training steps to identify what’s driving shifts in eval results.
  • Build better benchmarks. Catch reward hacking, evaluation awareness, broken environments, and ambiguous task specifications.
Read about how Docent helped align Claude 4 and debug a regression between two Codex checkpoints on Terminal-Bench.

Get started

Quickstart

Install the Docent plugin, run an analysis on sample data, and ingest your own.

Get in touch

Join our Slack community to ask questions and chat with the Docent team.