- A DQL step displays and executes a structured query. These steps can help filter, group, or aggregate over metadata, transcripts, or prior reading results. DQL steps are fast and deterministic.
- A Reading step uses a language model to evaluate the results of a DQL query, which may return transcripts, metadata, prior Reading results, or text.
- Markdown notes are queued in script order like other steps but are not executed. They are meant for plan context (for example, the behavior a rubric measures), usually one note at the top of the plan.

Common use-cases
Search and cluster
Search and cluster
A reader evaluates each transcript independently for a behavior. A separate Reading step clusters the results.The per-transcript step applies a rubric to each transcript one at a time. For example: “Does the agent attempt to access files that don’t exist? If so, describe what it tried to access and why.” The reduce step takes those per-transcript results and groups them: “Cluster these file-access failures by root cause.”
Recursive clustering
Recursive clustering
After clustering your transcripts, create reading steps that cluster within categories to increase specificity.
Pairwise comparison
Pairwise comparison
Compare two models on the same tasks. A DQL step selects runs where one model regresses relative to the other, then a reading step identifies the main differences between a successful and a failed run on the same task.
We used this workflow to investigate why GPT-5.1 Codex underperformed GPT-5 Codex on Terminal-Bench. See the writeup for the full report.
Analysis Plans are programs
Under the hood, an Analysis Plan is a Python script built with the Docent SDK. Your coding agent writes these scripts for you, but they’re ordinary code: you can read them, edit them, and re-run them. Analysis Plans is a lazily evaluated computation framework. Each call toclient.query() or client.read() registers a step and immediately returns a lightweight handle. Handles feed into later steps, forming a dependency graph. When you ask for results, or when the script exits, the graph is submitted as an Analysis Plan. Docent executes the steps in dependency order, and the results flow back into your script as plain Python objects.
Here’s a complete search-and-cluster pipeline:
How DQL rows feed into readings
Every Reading step takes its inputs from a DQL query, which returns a table. Each row of that table becomes one LLM call. The columns of the table fill in the prompt. Accessing an attribute on a query handle, likesampled.transcript in step $2, gives a reference to that column. Where the reference appears in the prompt template, each row’s value is substituted. The .as_type(...) annotation controls how the value is rendered: as_type("text") embeds the value literally, while types like "transcript" or "agent_run" treat the value as an ID and render the full object for the judge.
This means the DQL query controls both what each judge call sees and how many calls there are. Step $1 returns 100 rows, so step $2 makes 100 LLM calls, one per transcript. Step $3 uses array_agg to collapse all of step $2’s outputs into a single row with a list-valued column, so step $4 makes one LLM call that sees every summary at once. Going from “one call per item” to “one call over all items” is just a change to the query.
Working with the graph
A few properties fall out of this design:- Plans are built with ordinary Python. You can create steps in loops, build prompts with string formatting, and wrap common patterns in functions.
- Dependencies are tracked for you. Referencing a query’s column in a prompt ties the reading to that query. Interpolating a
Readinghandle into a DQL string ties the query to that reading. Docent infers the execution order from these references, so you never schedule anything yourself. - Results are available whenever you want them. Accessing
reading.resultsmid-script runs that step and everything upstream of it. The outputs come back as plain dicts, so you can use ordinary Python to shape later steps. For example, you can take the cluster names proposed by one reading and use them as theenumvalues in the next reading’s output schema. - Re-running is cheap. Steps are content-addressed. When you re-run a script, any step whose inputs and configuration are unchanged reuses its cached results. The standard way to iterate is to append steps to the script and re-run the whole thing; only the new steps execute.
client.flush(auto_approve=True) to skip manual approval.
Creating and executing Analysis Plans
Analysis Plans display in the UI after your coding agent writes and executes a script calling Docent’s analysis tools. The UI view is read-only: to make revisions, instruct your coding agent to change the plan. Reading steps may require your approval before running. You can approve individual reading steps by clicking the Approve button in the top right corner of the step. You can also approve all pending steps by clicking the Approve All button in the top right of the page. Steps that are waiting on your approval will display in purple on the minimap.

