Data Models Overview

This guide explains how Docent formats and organizes your transcripts. Use it to decide how to ingest your data for the analysis you want to do.

Collection

A Collection is a Docent workspace for one experiment. It holds the set of agent runs you want to analyze together. The typical unit is a single benchmark or eval, with metadata (model, checkpoint, scaffold) attached to each run so you can slice and compare across runs within it. Filters, DQL, and Analysis Plans all operate within a single Collection. The Collection is the frame that all analysis sits inside. See the Collection reference or the SDK collections API for details.

Agent Run

An AgentRun is one execution of an agent against one task. Think of it as the row in your dataset. Filters, DQL queries, and clustering return sets of AgentRuns. A rubric grades one AgentRun at a time. Search returns AgentRuns that match your prompt. An AgentRun bundles one or more Transcripts. Metadata can attach to the AgentRun as a whole, or to individual Transcripts within it. See the Agent Run reference for details.

Transcript

A Transcript is the sequence of ChatMessages from one agent’s point of view. Single-agent runs have one Transcript per AgentRun. See the Transcript reference. _{Multi-agent setup? Transcripts can be organized into optional TranscriptGroups within a single AgentRun. Most single-agent evals ignore them.}

Chat Message

A ChatMessage is one turn in a conversation: SystemMessage, UserMessage, AssistantMessage (optionally with tool calls), or ToolMessage. The schema is OpenAI-compatible. If your messages are already in that format, parse_chat_message converts them directly. See the Chat Messages reference.

Metadata

Metadata is a JSON dict you attach to Collections, AgentRuns, TranscriptGroups, and Transcripts. Richer metadata makes everything else in Docent more useful: DQL queries join on it, rubric prompts reference it, and dashboard filters slice by it. Common fields on an AgentRun:

scores (e.g. {"reward": 0.7, "passed": true}). Scoring info goes under metadata["scores"] by convention.
model, checkpoint, agent_scaffold for cross-run comparisons.
task_id, difficulty, category for slicing.
cost, latency_ms, token_count for quantitative rollups.

“Average reward per model” is a one-line DQL query when model and reward are in metadata. It’s impossible when they’re not. See the Metadata reference.

Next steps

Installation: set up Docent.
Analysis Quickstart: run your first analysis.
Ingestion Quickstart: pick a path to load your data.
Analysis Plans: pick a mode to analyze it.

​Collection

​Agent Run

​Transcript

​Chat Message

​Metadata

​Next steps