Skip to main content
This guide helps you ingest agent runs into Docent. Before starting, navigate to docent.transluce.org and sign up for an account.

Ingesting transcripts

Docent provides three main ways to ingest transcripts:
  1. Tracing: Automatically capture LLM interactions in real-time using Docent’s tracing SDK
  2. Drag-and-drop Inspect .eval files: Upload existing logs through the web UI
  3. SDK Ingestion: Programmatically ingest transcripts using the Python SDK
Docent’s tracing system automatically captures LLM interactions, organizes them into agent runs. Tracing allows you to:
  • Automatically instrument LLM provider calls (OpenAI, Anthropic)
  • Organize code into logical agent runs with metadata and scores
  • Track chat conversations and tool calls
  • Attach metadata to your runs and transcripts
  • Resume agent runs across different parts of your codebase
from docent.trace import initialize_tracing

# Basic initialization
initialize_tracing("my-collection-name")

# Your existing LLM code will now be automatically traced
response = client.chat.completions.create(
    model="gpt-5",
    messages=[{"role": "user", "content": "Hello!"}]
)
For detailed tracing documentation, see Tracing Introduction.

Option 2: Upload Inspect Evaluations

You can upload Inspect AI evaluation files directly through the Docent web interface:
  1. Create a collection on the Docent website
  2. Click “Add Data”
  3. Select “Upload Inspect Log”
  4. Upload your Inspect evaluation file
This is the quickest way to get started if you already have Inspect evaluation logs.

Option 3: SDK Ingestion

For programmatic ingestion or custom data formats, use the Python SDK:
pip install docent-python
First go to the API keys page, create a key, and instantiate a client object with that key:
import os
from docent import Docent

client = Docent(
    api_key=os.getenv("DOCENT_API_KEY"),  # is default and can be omitted

    # Uncomment and adjust these if you're self-hosting
    # server_url="http://localhost:8889",
    # web_url="http://localhost:3001",
)
Let’s create a fresh collection of agent runs:
collection_id = client.create_collection(
    name="sample collection",
    description="example that comes with the Docent repo",
)
Now we’re ready to ingest some logs! There are three end-to-end examples below; pick whichever you’re most interested in.
Say we have three simple agent runs.
transcript_1 = [
    {
        "role": "user",
        "content": "What's the weather like in New York today?"
    },
    {
        "role": "assistant",
        "content": "The weather in New York today is mostly sunny with a high of 75°F (24°C)."
    }
]
metadata_1 = {"model": "gpt-3.5-turbo", "agent_scaffold": "foo", "hallucinated": True}
transcript_2 = [
    {
        "role": "user",
        "content": "What's the weather like in San Francisco today?"
    },
    {
        "role": "assistant",
        "content": "The weather in San Francisco today is mostly cloudy with a high of 65°F (18°C)."
    }
]
metadata_2 = {"model": "gpt-3.5-turbo", "agent_scaffold": "foo", "hallucinated": True}
transcript_3 = [
    {
        "role": "user",
        "content": "What's the weather like in Paris today?"
    },
    {
        "role": "assistant",
        "content": "I'm sorry, I don't know because I don't have access to weather tools."
    }
]
metadata_3 = {"model": "gpt-3.5-turbo", "agent_scaffold": "bar", "hallucinated": False}

transcripts = [transcript_1, transcript_2, transcript_3]
metadata = [metadata_1, metadata_2, metadata_3]
We need to convert each input into an AgentRun object, which holds Transcript objects where each message needs to be a ChatMessage. We could construct the messages manually, but it’s easier to use the parse_chat_message function, since the raw dicts already conform to the expected schema.
from docent.data_models.chat import parse_chat_message
from docent.data_models import Transcript

parsed_transcripts = [
    Transcript(messages=[parse_chat_message(msg) for msg in transcript])
    for transcript in transcripts
]
Now we can create the AgentRun objects.
from docent.data_models import AgentRun

agent_runs = [
    AgentRun(
        transcripts=[t],
        metadata={
            "model": m["model"],
            "agent_scaffold": m["agent_scaffold"],
            "scores": {"hallucinated": m["hallucinated"]},
        }
    )
    for t, m in zip(parsed_transcripts, metadata)
]
We can finally ingest the agent run and watch the UI update:
client.add_agent_runs(collection_id, agent_runs)
If you navigate to the frontend URL printed by client.create_collection(...), you should see the run available for viewing.

Tips and tricks

Including sufficient context

Docent can only catch issues that are evident from the context it has about your evaluation. For example:
  • If you’re looking to catch issues with solution labels, you should provide the exact label in the metadata, not just the agent’s score.
  • For software engineering tasks, if you want to know why agents failed, you should include information about what tests were run and their traceback/execution logs.