Skip to main content
A Collection is a workspace for one experiment. It holds the AgentRuns from that experiment along with the rubrics, labels, and metadata you build up about them. Queries, search, and the Docent Agent all operate inside one Collection at a time.

What a Collection contains

A Collection is the home for everything tied to one experiment: the runs themselves, the analysis you generate about them, and the access list for the whole set.
  • AgentRuns. The runs themselves, one per execution you want to analyze.
  • Rubrics and their results. Judges you have defined and the scores they produced.
  • Labels. Human annotations attached to AgentRuns.
  • Collection metadata. Fields that describe the dataset as a whole, like eval config, environment, or dataset name.
  • Shared access. The list of users who can view or edit this Collection.

Analysis tools operate within a single collection

All analysis tools currently operate within a collection. These tools include:
  • DQL queries
  • Rubric runs
  • Search and clustering
  • The Docent Agent’s context
  • Filter state in the web UI
Your account and the data model sit above the workspace level:
  • Your account and API keys
  • Sharing permissions (one user can belong to many Collections)
  • The shapes of AgentRuns, Transcripts, and ChatMessages
To compare data across Collections, export it and join it yourself. Put things you want to compare in the same Collection from the start. For example, if you want to compare the performance of different models on one benchmark, you should include runs from many models in the same collection.

Collection metadata vs. AgentRun metadata

Collection metadata describes the dataset as a whole. AgentRun metadata describes each run.
  • Put it on the Collection when every run shares the value, like dataset name, eval version, environment, or the date the eval ran.
  • Put it on the AgentRun when you would ever filter or group by it, like model, checkpoint, task_id, or scores.
  • When unsure, put it on the AgentRun. DQL can query AgentRun metadata. Collection metadata carries context, not slicing.
See Metadata for the full pattern and the tracing equivalents.

Create, update, and share Collections

You create, update, and share a Collection through the SDK or the web UI. Deletion happens only in the web UI. Create a Collection in one SDK call and keep the returned collection_id for everything downstream:
from docent import Docent

client = Docent()

collection_id = client.create_collection(
    name="Terminal-Bench: GPT-5 vs GPT-5.1",
    description="December 2025 head-to-head run",
    metadata={"eval": "terminal-bench", "date": "2025-12-10"},
)
The SDK covers the common operations, with one exception:
  • List, update, or remove runs: Manage collections.
  • Read or merge Collection metadata: Collection metadata.
  • Share a Collection: done in the web UI.
  • Delete a Collection: web UI only. The SDK removes AgentRuns from a Collection but does not delete the Collection itself.

Next steps

  • AgentRun: the unit that lives inside a Collection.
  • Metadata: how to shape the fields you will query against.
  • Ingestion Overview: pick a path to load your first Collection.