When you use the Docent plugin, your coding agent generates writes a Python script that calls Docent’s analysis tools. These operations show up in the Docent UI as an Analysis Plan that you can review and approve. An Analysis Plan contains two kinds of steps:Documentation Index
Fetch the complete documentation index at: https://docs.transluce.org/llms.txt
Use this file to discover all available pages before exploring further.
- A DQL step displays and executes a structured query. These steps can help filter, group, or aggregate over metadata, transcripts, or prior reading results. DQL steps are fast and deterministic.
- A Reading step uses a language model to evaluate the results of a DQL query, which may return transcripts, metadata, prior Reading results, or text.

Creating and executing Analysis Plans
Analysis Plans display in the UI after your coding agent writes and executes a script calling Docent’s analysis tools. Reading steps may require your approval before running. You can approve individual reading steps by clicking the Approve button in the top right corner of the step. You can also approve all pending steps by clicking the Approve All button in the top right of the page. Steps that are waiting on your approval will display in purple on the minimap.
Common patterns
Search and cluster
Search and cluster
A reader evaluates each transcript independently for a behavior. A separate Reading step clusters the results.The per-transcript step applies a rubric to each transcript one at a time. For example: “Does the agent attempt to access files that don’t exist? If so, describe what it tried to access and why.” The reduce step takes those per-transcript results and groups them: “Cluster these file-access failures by root cause.”
Recursive clustering
Recursive clustering
After clustering your transcripts, create reading steps that cluster within categories to increase specificity.
Pairwise comparison
Pairwise comparison
Compare two models on the same tasks. A DQL step selects runs where one model regresses relative to the other, then a reading step identifies the main differences between a successful and a failed run on the same task.
We used this workflow to investigate why GPT-5.1 Codex underperformed GPT-5 Codex on Terminal-Bench. See the writeup for the full report.

