> ## Documentation Index
> Fetch the complete documentation index at: https://docs.transluce.org/llms.txt
> Use this file to discover all available pages before exploring further.

# Docent Agent

> Ask questions about your agent runs in natural language and review a verifiable report.

The Docent Agent investigates behavior across an entire collection of agent runs. It combines structured queries over your run metadata with LLM-driven analysis of individual transcripts. A typical investigation might compare scores between two model versions, pinpoint where one regressed, and read the failing transcripts to explain why. You get a report that cites every run behind each claim.

Use the `/docent` slash command in Claude Code, Cursor, or any IDE where you've installed the skill. See the [Quickstart](/quickstart) to set it up.

## What you can do

Here are some ways to use the Docent Agent:

<AccordionGroup>
  <Accordion title="Diagnose a regression" icon="chart-line-down" defaultOpen>
    When one model version underperforms another, pinpoint the tasks where the regression shows up, compare failed runs to successful runs on the same task, and check whether the failure modes you identify are also present in successful runs.

    ```text wrap theme={null}
    /docent What are the main reasons why GPT-5.1 Codex underperforms GPT-5 Codex?

    Identify tasks where GPT-5.1 regresses on average. On those tasks, compare a failed GPT-5.1 run against the successful GPT-5 runs. Summarize the main failure modes and analyze whether avoiding those failures was material to the result of the successful runs.

    - Collection ID: 479b7093-5a33-47f1-8d7b-fc9f6f16bb75
    - Auto accept reading plan and generate a report
    ```

    <Info>
      We used this workflow to investigate why GPT-5.1 Codex underperformed GPT-5 Codex on Terminal-Bench. See the [writeup](https://transluce.org/docent/blog/terminal-bench) for the full report.
    </Info>
  </Accordion>

  <Accordion title="Find common failure modes" icon="magnifying-glass">
    Surface the recurring ways your agent fails so you know what to fix next. Give the Docent Agent a collection, define what "actionable" and "prevalent" mean for your case, and let it recursively cluster the failures until each category is specific enough to act on.

    ```text wrap theme={null}
    /docent What are the main reasons why GPT-5.1 Codex fails?

    Identify runs where GPT-5.1 failed. Summarize the primary failure modes in those runs and explain why you think they were decisive. Cluster common failure modes or failing strategies across all runs. Continue to cluster within clusters until you reach failures that are prevalent (i.e. common in the data) and specific (i.e. it is evident to a developer what a concrete fix would look like).

    - Collection ID: 479b7093-5a33-47f1-8d7b-fc9f6f16bb75
    - Auto accept reading plan and generate a report
    ```
  </Accordion>
</AccordionGroup>

## Tips

* **Be precise about the workflow.** Name the metadata fields, the comparison you want, and how to group results. The agent plans better when it knows exactly what "failure" or "regression" means in your collection.
* Review the Reading Plan to verify claims in the report and understand how the Docent Agent operationalized your instructions.
* **Citations only reach as far as the prompt.** Each step in the reading plan can only cite items directly passed into it, not items transitively cited by earlier steps.

## What's next

<CardGroup cols={2}>
  <Card title="Refine a behavior rubric" icon="sliders" href="/analysis/rubrics/refinement">
    Turn insights from your report into a judge you can run over the whole collection.
  </Card>

  <Card title="Write DQL queries" icon="terminal" href="/analysis/dql">
    Pull specific slices of runs and metadata directly with SQL.
  </Card>

  <Card title="Search and cluster" icon="magnifying-glass" href="/analysis/search-and-clustering">
    Find behaviors in the UI and group results automatically.
  </Card>

  <Card title="Export data" icon="download" href="/analysis/exporting">
    Download transcripts and metadata for local analysis.
  </Card>
</CardGroup>
