> ## Documentation Index
> Fetch the complete documentation index at: https://docs.transluce.org/llms.txt
> Use this file to discover all available pages before exploring further.

# Ingest via SDK

> Ingest agent runs into Docent using the Python SDK

## Before you start

We generally recommend using the [`/docent` plugin](/installation) to [ingest your traces](/ingestion/quickstart). The plugin writes the SDK script for you from your existing logs. Use this page if you want to debug what `/docent` produced, have unusual data formats your coding agent can't infer, or need fine-grained control.

If you already have an Inspect `.eval` file, the fastest path is [drag-and-drop upload](/ingestion/integrations/inspect). Otherwise, follow the steps below.

## Setup

Install the SDK:

```bash theme={null}
uv add docent-python
```

Go to the [API keys page](https://docent.transluce.org/settings/api-keys), create a key, and instantiate a client object with that key:

```python theme={null}
import os
from docent import Docent

client = Docent(
    api_key=os.getenv("DOCENT_API_KEY"),  # is default and can be omitted

    # Uncomment and adjust these if you're self-hosting
    # server_url="http://localhost:8889",
    # web_url="http://localhost:3001",
)
```

## Create a collection

```python theme={null}
collection_id = client.create_collection(
    name="sample collection",
    description="example that comes with the Docent repo",
)
```

## Convert your data

There are three end-to-end examples below; pick whichever matches your data.

<Info>
  If your messages are already in OpenAI chat format (`{"role": ..., "content": ..., "tool_calls": ...}`), use `parse_chat_message` to convert each one into a `ChatMessage`. All three examples below use this helper.
</Info>

<Tabs>
  <Tab title="Simple example">
    Say we have three simple agent runs.

    ```python theme={null}
    transcript_1 = [
        {
            "role": "user",
            "content": "What's the weather like in New York today?"
        },
        {
            "role": "assistant",
            "content": "The weather in New York today is mostly sunny with a high of 75°F (24°C)."
        }
    ]
    metadata_1 = {"model": "gpt-3.5-turbo", "agent_scaffold": "foo", "hallucinated": True}
    transcript_2 = [
        {
            "role": "user",
            "content": "What's the weather like in San Francisco today?"
        },
        {
            "role": "assistant",
            "content": "The weather in San Francisco today is mostly cloudy with a high of 65°F (18°C)."
        }
    ]
    metadata_2 = {"model": "gpt-3.5-turbo", "agent_scaffold": "foo", "hallucinated": True}
    transcript_3 = [
        {
            "role": "user",
            "content": "What's the weather like in Paris today?"
        },
        {
            "role": "assistant",
            "content": "I'm sorry, I don't know because I don't have access to weather tools."
        }
    ]
    metadata_3 = {"model": "gpt-3.5-turbo", "agent_scaffold": "bar", "hallucinated": False}

    transcripts = [transcript_1, transcript_2, transcript_3]
    metadata = [metadata_1, metadata_2, metadata_3]
    ```

    We need to convert each input into an [AgentRun](/concepts/agent-run) object, which holds Transcript objects where each message needs to be a [ChatMessage](/concepts/chat-messages). We could construct the messages manually, but it's easier to use the `parse_chat_message` function, since the raw dicts already conform to the expected schema.

    ```python theme={null}
    from docent.data_models.chat import parse_chat_message
    from docent.data_models import Transcript

    parsed_transcripts = [
        Transcript(messages=[parse_chat_message(msg) for msg in transcript])
        for transcript in transcripts
    ]
    ```

    Now we can create the [AgentRun](/concepts/agent-run) objects.

    ```python theme={null}
    from docent.data_models import AgentRun

    agent_runs = [
        AgentRun(
            transcripts=[t],
            metadata={
                "model": m["model"],
                "agent_scaffold": m["agent_scaffold"],
                "scores": {"hallucinated": m["hallucinated"]},
            }
        )
        for t, m in zip(parsed_transcripts, metadata)
    ]
    ```
  </Tab>

  <Tab title="τ-Bench">
    For a more complex case that involves tool calls, Docent ships with a sample τ-bench log file, generated by running Sonnet 3.5 (new) on *one* task from the τ-bench-airline dataset.

    To inspect the log, we can load it as a dictionary.

    ```python theme={null}
    from docent.samples import get_tau_bench_airline_fpath
    import json
    with open(get_tau_bench_airline_fpath(), "r") as f:
        tb_log = json.load(f)
    print(tb_log)
    ```

    Next, we write a function that parses the dict into an [AgentRun](/concepts/agent-run) object, complete with metadata. Most of the effort is in converting the raw tool calls into the expected format.

    ```python theme={null}
    from docent.data_models import AgentRun, Transcript
    from docent.data_models.chat import ChatMessage, ToolCall, parse_chat_message

    def load_tau_bench_log(data: dict[str, Any]) -> AgentRun:
        traj, info, reward, task_id = data["traj"], data["info"], data["reward"], data["task_id"]

        messages: list[ChatMessage] = []
        for msg in traj:
            # Extract raw message data
            role = msg.get("role")
            content = msg.get("content", "")
            raw_tool_calls = msg.get("tool_calls")
            tool_call_id = msg.get("tool_call_id")

            # Create a message data dictionary
            message_data = {
                "role": role,
                "content": content,
            }

            # For tool messages, include the tool name
            if role == "tool":
                message_data["name"] = msg.get("name")
                message_data["tool_call_id"] = tool_call_id

            # For assistant messages, include tool calls if present
            if role == "assistant" and raw_tool_calls:
                # Convert tool calls to the expected format
                parsed_tool_calls: list[ToolCall] = []
                for tc in raw_tool_calls:
                    tool_call = ToolCall(
                        id=tc.get("id"),
                        function=tc.get("function", {}).get("name"),
                        arguments=tc.get("function", {}).get("arguments", {}),
                        type="function",
                        parse_error=None,
                    )
                    parsed_tool_calls.append(tool_call)

                message_data["tool_calls"] = parsed_tool_calls

            # Parse the message into the appropriate type
            chat_message = parse_chat_message(message_data)
            messages.append(chat_message)

        # Extract metadata from the sample
        task_id = info["task"]["user_id"]
        scores = {"reward": round(reward, 3)}

        # Build metadata
        metadata = {
            "benchmark_id": task_id,
            "task_id": task_id,
            "model": "sonnet-35-new",
            "scores": scores,
            "additional_metadata": info,
            "scoring_metadata": info["reward_info"],
        }

        # Create the transcript and wrap in AgentRun
        transcript = Transcript(
            messages=messages,
            metadata=metadata,
        )
        agent_run = AgentRun(
            transcripts=[transcript],
            metadata=metadata,
        )

        return agent_run
    ```

    Let's just load the single run in, and print its string representation.

    ```python theme={null}
    agent_runs = [load_tau_bench_log(tb_log)]
    print(agent_runs[0].text)
    ```
  </Tab>

  <Tab title="Inspect AI logs">
    You can upload Inspect files directly into Docent! After making a collection on the website, just click "Add Data" and then "Upload Inspect Log".

    Alternatively, you can also add Inspect logs via the SDK; keep reading for an example of how to do this.

    Our [ChatMessage](/concepts/chat-messages) schema is compatible with Inspect AI's format (as of `inspect-ai==0.3.93`), which means you can directly use the `parse_chat_message` function to parse Inspect messages.

    Docent ships with a sample Inspect log file, generated by running GPT-4o on a subset of the Intercode CTF benchmark.

    First install [Inspect](https://inspect.aisi.org.uk/):

    <CodeGroup>
      ```bash uv theme={null}
      uv add inspect-ai
      ```

      ```bash pip theme={null}
      pip install inspect-ai
      ```
    </CodeGroup>

    Inspect provides a library function to read the log; we can convert it to a dictionary for easier viewing.

    ```python theme={null}
    from docent.samples import get_inspect_fpath
    from inspect_ai.log import read_eval_log
    from pydantic_core import to_jsonable_python

    ctf_log = read_eval_log(get_inspect_fpath())
    ctf_log_dict = to_jsonable_python(ctf_log)
    ```

    Now we can write a function that takes the Inspect log and converts it into an [AgentRun](/concepts/agent-run) object.

    ```python theme={null}
    from inspect_ai.log import EvalLog
    from docent.data_models import AgentRun, Transcript
    from docent.data_models.chat import parse_chat_message

    def load_inspect_log(log: EvalLog) -> list[AgentRun]:
        if log.samples is None:
            return []

        agent_runs: list[AgentRun] = []

        for s in log.samples:
            # Extract sample_id from the sample ID
            sample_id = s.id
            epoch_id = s.epoch

            # Gather scores
            scores: dict[str, int | float | bool] = {}

            # Evaluate correctness (for this CTF benchmark)
            if s.scores and "includes" in s.scores:
                scores["correct"] = s.scores["includes"].value == "C"

            # Set metadata
            metadata = {
                "task_id": log.eval.task,
                "sample_id": str(sample_id),
                "epoch_id": epoch_id,
                "model": log.eval.model,
                "scores": scores,
                "additional_metadata": s.metadata,
                "scoring_metadata": s.scores,
            }

            # Create transcript
            agent_runs.append(
                AgentRun(
                    transcripts=[
                        Transcript(
                            messages=[parse_chat_message(m.model_dump()) for m in s.messages]
                        )
                    ],
                    metadata=metadata,
                )
            )

        return agent_runs
    ```

    Let's check on our loaded run:

    ```python theme={null}
    agent_runs = load_inspect_log(ctf_log)
    print(agent_runs[0].text)
    ```
  </Tab>
</Tabs>

## Upload the runs

```python theme={null}
client.add_agent_runs(collection_id, agent_runs)
```

If you navigate to the frontend URL printed by `client.create_collection(...)`, you should see the run available for viewing.

<Note>
  Docent assigns the `id` field on `AgentRun`, `Transcript`, and `TranscriptGroup` automatically. You cannot set these IDs yourself — reassigning `id` after construction raises a `ValueError`, and the upload path rejects payloads whose IDs were set by the caller (for example, runs round-tripped through `client.get_agent_run(...)` or loaded from a JSON dump).

  To wire references between objects in the same upload, construct the parent first and read its assigned `id`:

  ```python theme={null}
  group = TranscriptGroup(agent_run_id=run.id)
  transcript = Transcript(transcript_group_id=group.id, messages=[...])
  ```

  To re-upload runs that already have IDs, regenerate them first with the clone helper:

  ```python theme={null}
  from docent import clone_agent_runs_with_random_ids

  agent_runs = clone_agent_runs_with_random_ids(agent_runs)
  client.add_agent_runs(collection_id, agent_runs)
  ```

  A single-run variant, `clone_agent_run_with_random_ids`, is also exported from `docent`.
</Note>

## Tips and tricks

### Including sufficient context

Docent can only catch issues that are evident from the context it has about your evaluation. For example:

* If you're looking to catch issues with solution labels, you should provide the exact label in the metadata, not just the agent's score.
* For software engineering tasks, if you want to know *why* agents failed, you should include information about what tests were run and their traceback/execution logs.