Analysis Quickstart

If you have not installed Docent yet, start with Installation to set up the SDK, coding agent plugin, and API key.

This quickstart guide will help you run your first Docent analysis on our sample Terminal-Bench data. We recommend using your coding agent’s auto-approval mode so it can generate and run scripts without stopping for each permission prompt. See Claude Code Auto mode or Codex Auto Review, depending on your agent. The prompts below contain the collection ID of our sample Terminal-Bench collection. To use your own collection, copy the ID in the top left corner of your collection, next to the collection name.

Failure modes
Compare models
Collection Overview

/docent What are the main reasons why GPT-5.1 Codex fails?

Identify runs where GPT-5.1 failed. Summarize the primary failure modes in those runs and explain why you think they were decisive. Cluster common failure modes or failing strategies across all runs. Continue to cluster within clusters until you reach failures that are prevalent (i.e. common in the data) and specific (i.e. it is evident to a developer what a concrete fix would look like).

- Collection ID: 479b7093-5a33-47f1-8d7b-fc9f6f16bb75
- Auto accept analysis plan

/docent What are the main reasons why GPT-5.1 Codex underperforms GPT-5 Codex?

Identify tasks where GPT-5.1 regresses on average. On those tasks, compare a failed GPT-5.1 run against the successful GPT-5 runs. Summarize the main failure modes and analyze whether avoiding those failures was material to the result of the successful runs.

- Collection ID: 479b7093-5a33-47f1-8d7b-fc9f6f16bb75
- Auto accept analysis plan

/docent Give me an overview of this collection.

- Collection ID: 479b7093-5a33-47f1-8d7b-fc9f6f16bb75
- Auto accept analysis plan

Next steps

Ingest your own data

Ready to analyze your own agent runs? Follow the ingestion guide to load your logs into Docent.

Installation

Analysis Plans

⌘I

​Next steps

Read more about Analysis Plans

Ingest your own data

Next steps