Collection
A Collection is a Docent workspace for one experiment. It holds the set of agent runs you want to analyze together. The typical unit is a single benchmark or eval, with metadata (model, checkpoint, scaffold) attached to each run so you can slice and compare across runs within it. Filters, DQL, rubrics, and the Docent Agent all operate within a single Collection. The Collection is the frame that all analysis sits inside. See the Collection reference or the SDK collections API for details.Agent Run
An AgentRun is one execution of an agent against one task. Think of it as the row in your dataset. Filters, DQL queries, and clustering return sets of AgentRuns. A rubric grades one AgentRun at a time. Search returns AgentRuns that match your prompt. An AgentRun bundles one or more Transcripts. Metadata can attach to the AgentRun as a whole, or to individual Transcripts within it. See the Agent Run reference for details.Transcript
A Transcript is the sequence of ChatMessages from one agent’s point of view. Single-agent runs have one Transcript per AgentRun. See the Transcript reference. Multi-agent setup? Transcripts can be organized into optional TranscriptGroups within a single AgentRun. Most single-agent evals ignore them.Chat Message
A ChatMessage is one turn in a conversation:SystemMessage, UserMessage, AssistantMessage (optionally with tool calls), or ToolMessage. The schema is OpenAI-compatible. If your messages are already in that format, parse_chat_message converts them directly.
See the Chat Messages reference.
Metadata
Metadata is a JSON dict you attach to Collections, AgentRuns, TranscriptGroups, and Transcripts. Richer metadata makes everything else in Docent more useful: DQL queries join on it, rubric prompts reference it, and dashboard filters slice by it. Common fields on an AgentRun:scores(e.g.{"reward": 0.7, "passed": true}). Scoring info goes undermetadata["scores"]by convention.model,checkpoint,agent_scaffoldfor cross-run comparisons.task_id,difficulty,categoryfor slicing.cost,latency_ms,token_countfor quantitative rollups.
model and reward are in metadata. It’s impossible when they’re not.
See the Metadata reference.
Next steps
- Quickstart: set up Docent and run your first analysis.
- Ingestion Overview: pick a path to load your data.
- Analysis Overview: pick a mode to analyze it.

