Use this file to discover all available pages before exploring further.
This quickstart guide will help you run your first Docent analysis on our sample Terminal-Bench data.The prompts below contain the collection ID of our sample Terminal-Bench collection. To use your own collection, copy the ID in the top left corner of your collection, next to the collection name.We recommend using your coding agent’s auto-approval mode so it can generate and run scripts without stopping for each permission prompt. See Claude Code Auto mode or Codex Auto Review, depending on your agent.
Failure modes
Compare models
Collection Overview
/docent What are the main reasons why GPT-5.1 Codex fails?Identify runs where GPT-5.1 failed. Summarize the primary failure modes in those runs and explain why you think they were decisive. Cluster common failure modes or failing strategies across all runs. Continue to cluster within clusters until you reach failures that are prevalent (i.e. common in the data) and specific (i.e. it is evident to a developer what a concrete fix would look like).- Collection ID: 479b7093-5a33-47f1-8d7b-fc9f6f16bb75- Auto accept reading plan
/docent What are the main reasons why GPT-5.1 Codex underperforms GPT-5 Codex?Identify tasks where GPT-5.1 regresses on average. On those tasks, compare a failed GPT-5.1 run against the successful GPT-5 runs. Summarize the main failure modes and analyze whether avoiding those failures was material to the result of the successful runs.- Collection ID: 479b7093-5a33-47f1-8d7b-fc9f6f16bb75- Auto accept reading plan
/docent Give me an overview of this collection.- Collection ID: 479b7093-5a33-47f1-8d7b-fc9f6f16bb75- Auto accept reading plan