Skip to content

Agent Run

An AgentRun represents a complete agent run. It contains a collection of Transcript objects, as well as metadata (scores, experiment info, etc.).

  • In single-agent (most common) settings, each AgentRun contains a single Transcript.
  • In multi-agent settings, an AgentRun may contain multiple Transcript objects. For example, in a two-agent debate setting, you'll have one Transcript per agent in the same AgentRun.
  • Docent's LLM search features operate over complete AgentRun objects. Runs are passed to LLMs in their .text form.

Usage

AgentRun objects require a dictionary of Transcript objects, as well as a metadata dictionary whose keys are strings. The metadata should be JSON-serializable.

from docent.data_models import AgentRun, Transcript
from docent.data_models.chat import UserMessage, AssistantMessage

transcripts = [
    Transcript(
        messages=[
            UserMessage(content="Hello, what's 1 + 1?"),
            AssistantMessage(content="2"),
        ]
    )
]

agent_run = AgentRun(
    transcripts=transcripts,
    metadata={
        "scores": {"correct": True, "reward": 1.0},
    }
)

Rendering

To see how your AgentRun is being rendered to an LLM, you can print(agent_run.text). This might be useful for validating that your metadata is being included properly.

docent.data_models.agent_run

AgentRun

Bases: BaseModel

Represents a complete run of an agent with transcripts and metadata.

An AgentRun encapsulates the execution of an agent, storing all communication transcripts and associated metadata. It must contain at least one transcript.

Attributes:

Name Type Description
id str

Unique identifier for the agent run, auto-generated by default.

name str | None

Optional human-readable name for the agent run.

description str | None

Optional description of the agent run.

transcripts list[Transcript]

List of Transcript objects.

transcript_groups list[TranscriptGroup]

List of TranscriptGroup objects.

metadata dict[str, Any]

Additional structured metadata about the agent run as a JSON-serializable dictionary.

Source code in docent/data_models/agent_run.py
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
class AgentRun(BaseModel):
    """Represents a complete run of an agent with transcripts and metadata.

    An AgentRun encapsulates the execution of an agent, storing all communication
    transcripts and associated metadata. It must contain at least one transcript.

    Attributes:
        id: Unique identifier for the agent run, auto-generated by default.
        name: Optional human-readable name for the agent run.
        description: Optional description of the agent run.
        transcripts: List of Transcript objects.
        transcript_groups: List of TranscriptGroup objects.
        metadata: Additional structured metadata about the agent run as a JSON-serializable dictionary.
    """

    id: str = Field(default_factory=lambda: str(uuid4()))
    name: str | None = None
    description: str | None = None

    transcripts: list[Transcript]
    transcript_groups: list[TranscriptGroup] = Field(default_factory=list)
    metadata: dict[str, Any] = Field(default_factory=dict)

    @field_validator("transcripts", mode="before")
    @classmethod
    def _validate_transcripts_type(cls, v: Any) -> Any:
        if isinstance(v, dict):
            logger.warning(
                "dict[str, Transcript] for transcripts is deprecated. Use list[Transcript] instead."
            )
            v = cast(dict[str, Transcript], v)
            return [Transcript.model_validate(t) for t in v.values()]
        return v

    @field_validator("transcript_groups", mode="before")
    @classmethod
    def _validate_transcript_groups_type(cls, v: Any) -> Any:
        if isinstance(v, dict):
            logger.warning(
                "dict[str, TranscriptGroup] for transcript_groups is deprecated. Use list[TranscriptGroup] instead."
            )
            v = cast(dict[str, TranscriptGroup], v)
            return [TranscriptGroup.model_validate(tg) for tg in v.values()]
        return v

    @model_validator(mode="after")
    def _validate_transcripts_not_empty(self):
        """Validates that the agent run contains at least one transcript.

        Raises:
            ValueError: If the transcripts list is empty.

        Returns:
            AgentRun: The validated AgentRun instance.
        """
        if len(self.transcripts) == 0:
            raise ValueError("AgentRun must have at least one transcript")
        return self

    def get_filterable_fields(self, max_depth: int = 1) -> list[FilterableField]:
        """Returns a list of all fields that can be used to filter the agent run,
        by recursively exploring the model_dump() for singleton types in dictionaries.

        Returns:
            list[FilterableField]: A list of filterable fields, where each field is a
                                   dictionary containing its 'name' (path) and 'type'.
        """

        result: list[FilterableField] = []

        def _explore_dict(d: dict[str, Any], prefix: str, depth: int):
            nonlocal result

            if depth > max_depth:
                return

            for k, v in d.items():
                if isinstance(v, (str, int, float, bool)):
                    result.append(
                        {
                            "name": f"{prefix}.{k}",
                            "type": cast(Literal["str", "bool", "int", "float"], type(v).__name__),
                        }
                    )
                elif isinstance(v, dict):
                    _explore_dict(cast(dict[str, Any], v), f"{prefix}.{k}", depth + 1)

        # Look at the agent run metadata
        _explore_dict(to_jsonable_python(self.metadata), "metadata", 0)
        # Look at the transcript metadata
        # TODO(mengk): restore this later when we have the ability to integrate with SQL.
        # for t_id, t in self.transcripts.items():
        #     _explore_dict(
        #         t.metadata.model_dump(strip_internal_fields=True), f"transcript.{t_id}.metadata", 0
        #     )

        # Append the text field
        result.append({"name": "text", "type": "str"})

        return result

    ######################
    # Converting to text #
    ######################

    def _to_text_impl(self, token_limit: int = sys.maxsize, use_blocks: bool = False) -> list[str]:
        """
        Core implementation for converting agent run to text representation.

        Args:
            token_limit: Maximum tokens per returned string under the GPT-4 tokenization scheme
            use_blocks: If True, use individual message blocks. If False, use action units.

        Returns:
            List of strings, each at most token_limit tokens
        """
        # Generate transcript strings using appropriate method
        transcript_strs: list[str] = []
        for i, t in enumerate(self.transcripts):
            if use_blocks:
                transcript_content = t.to_str_blocks_with_token_limit(
                    token_limit=sys.maxsize,
                    transcript_idx=i,
                    agent_run_idx=None,
                )[0]
            else:
                transcript_content = t.to_str_with_token_limit(
                    token_limit=sys.maxsize,
                    transcript_idx=i,
                    agent_run_idx=None,
                )[0]
            transcript_strs.append(f"<transcript>\n{transcript_content}\n</transcript>")

        transcripts_str = "\n\n".join(transcript_strs)

        # Gather metadata
        metadata_obj = to_jsonable_python(self.metadata)
        if self.name is not None:
            metadata_obj["name"] = self.name
        if self.description is not None:
            metadata_obj["description"] = self.description

        yaml_width = float("inf")
        transcripts_str = (
            f"Here is a complete agent run for analysis purposes only:\n{transcripts_str}\n\n"
        )
        metadata_str = f"Metadata about the complete agent run:\n<agent run metadata>\n{yaml.dump(metadata_obj, width=yaml_width)}\n</agent run metadata>"

        if token_limit == sys.maxsize:
            return [f"{transcripts_str}" f"{metadata_str}"]

        # Compute message length; if fits, return the full transcript and metadata
        transcript_str_tokens = get_token_count(transcripts_str)
        metadata_str_tokens = get_token_count(metadata_str)
        if transcript_str_tokens + metadata_str_tokens <= token_limit:
            return [f"{transcripts_str}" f"{metadata_str}"]

        # Otherwise, split up the transcript and metadata into chunks
        else:
            results: list[str] = []
            transcript_token_counts = [get_token_count(t) for t in transcript_strs]
            ranges = group_messages_into_ranges(
                transcript_token_counts, metadata_str_tokens, token_limit - 50
            )
            for msg_range in ranges:
                if msg_range.include_metadata:
                    cur_transcript_str = "\n\n".join(
                        transcript_strs[msg_range.start : msg_range.end]
                    )
                    results.append(
                        f"Here is a partial agent run for analysis purposes only:\n{cur_transcript_str}"
                        f"{metadata_str}"
                    )
                else:
                    assert (
                        msg_range.end == msg_range.start + 1
                    ), "Ranges without metadata should be a single message"
                    t = self.transcripts[msg_range.start]
                    if msg_range.num_tokens < token_limit - 50:
                        if use_blocks:
                            transcript = f"<transcript>\n{t.to_str_blocks_with_token_limit(token_limit=sys.maxsize)[0]}\n</transcript>"
                        else:
                            transcript = f"<transcript>\n{t.to_str_with_token_limit(token_limit=sys.maxsize)[0]}\n</transcript>"
                        result = (
                            f"Here is a partial agent run for analysis purposes only:\n{transcript}"
                        )
                        results.append(result)
                    else:
                        if use_blocks:
                            transcript_fragments = t.to_str_blocks_with_token_limit(
                                token_limit=token_limit - 50,
                            )
                        else:
                            transcript_fragments = t.to_str_with_token_limit(
                                token_limit=token_limit - 50,
                            )
                        for fragment in transcript_fragments:
                            result = f"<transcript>\n{fragment}\n</transcript>"
                            result = (
                                f"Here is a partial agent run for analysis purposes only:\n{result}"
                            )
                            results.append(result)
            return results

    def to_text(self, token_limit: int = sys.maxsize) -> list[str]:
        """
        Represents an agent run as a list of strings, each of which is at most token_limit tokens
        under the GPT-4 tokenization scheme.

        We'll try to split up long AgentRuns along transcript boundaries and include metadata.
        For very long transcripts, we'll have to split them up further and remove metadata.
        """
        return self._to_text_impl(token_limit=token_limit, use_blocks=False)

    def to_text_blocks(self, token_limit: int = sys.maxsize) -> list[str]:
        """
        Represents an agent run as a list of strings using individual message blocks,
        each of which is at most token_limit tokens under the GPT-4 tokenization scheme.

        Unlike to_text() which uses action units, this method formats each message
        as an individual block.
        """
        return self._to_text_impl(token_limit=token_limit, use_blocks=True)

    @property
    def text(self) -> str:
        """Concatenates all transcript texts with double newlines as separators.

        Returns:
            str: A string representation of all transcripts.
        """
        return self._to_text_impl(token_limit=sys.maxsize, use_blocks=False)[0]

    @property
    def text_blocks(self) -> str:
        """Concatenates all transcript texts using individual blocks format.

        Returns:
            str: A string representation of all transcripts using individual message blocks.
        """
        return self._to_text_impl(token_limit=sys.maxsize, use_blocks=True)[0]

    ##############################
    # New text rendering methods #
    ##############################

    # Transcript ID -> Transcript
    _transcript_dict: dict[str, Transcript] | None = PrivateAttr(default=None)
    # Transcript Group ID -> Transcript Group
    _transcript_group_dict: dict[str, TranscriptGroup] | None = PrivateAttr(default=None)
    # Canonical tree cache keyed by full_tree flag
    _canonical_tree_cache: dict[bool, dict[str | None, list[tuple[Literal["t", "tg"], str]]]] = (
        PrivateAttr(default_factory=dict)
    )
    # Transcript IDs (depth-first) cache keyed by full_tree flag
    _transcript_ids_ordered_cache: dict[bool, list[str]] = PrivateAttr(default_factory=dict)

    @property
    def transcript_dict(self) -> dict[str, Transcript]:
        """Lazily compute and cache a mapping from transcript ID to Transcript."""
        if self._transcript_dict is None:
            self._transcript_dict = {t.id: t for t in self.transcripts}
        return self._transcript_dict

    @property
    def transcript_group_dict(self) -> dict[str, TranscriptGroup]:
        """Lazily compute and cache a mapping from transcript group ID to TranscriptGroup."""
        if self._transcript_group_dict is None:
            self._transcript_group_dict = {tg.id: tg for tg in self.transcript_groups}
        return self._transcript_group_dict

    def get_canonical_tree(
        self, full_tree: bool = False
    ) -> dict[str | None, list[tuple[Literal["t", "tg"], str]]]:
        """Compute and cache the canonical, sorted transcript group tree.

        Args:
            full_tree: If True, include all transcript groups regardless of whether
                they contain transcripts. If False, include only the minimal tree
                that connects relevant groups and transcripts.

        Returns:
            Canonical tree mapping parent group id (or "__global_root") to a list of
            children (type, id) tuples sorted by creation time.
        """
        if (
            full_tree not in self._canonical_tree_cache
            or full_tree not in self._transcript_ids_ordered_cache
        ):
            canonical_tree, transcript_idx_map = self._build_canonical_tree(full_tree=full_tree)
            self._canonical_tree_cache[full_tree] = canonical_tree
            self._transcript_ids_ordered_cache[full_tree] = list(transcript_idx_map.keys())
        return self._canonical_tree_cache[full_tree]

    def get_transcript_ids_ordered(self, full_tree: bool = False) -> list[str]:
        """Compute and cache the depth-first transcript id ordering.

        Args:
            full_tree: Whether to compute based on the full tree or the minimal tree.

        Returns:
            List of transcript ids in depth-first order.
        """
        if (
            full_tree not in self._transcript_ids_ordered_cache
            or full_tree not in self._canonical_tree_cache
        ):
            canonical_tree, transcript_idx_map = self._build_canonical_tree(full_tree=full_tree)
            self._canonical_tree_cache[full_tree] = canonical_tree
            self._transcript_ids_ordered_cache[full_tree] = list(transcript_idx_map.keys())
        return self._transcript_ids_ordered_cache[full_tree]

    def _build_canonical_tree(self, full_tree: bool = False):
        t_dict = self.transcript_dict
        tg_dict = self.transcript_group_dict

        # Find all transcript groups that have direct transcript children
        # Also keep track of transcripts that are not in a group
        tgs_to_transcripts: dict[str, set[str]] = {}
        for transcript in t_dict.values():
            if transcript.transcript_group_id is None:
                tgs_to_transcripts.setdefault("__global_root", set()).add(transcript.id)
            else:
                tgs_to_transcripts.setdefault(transcript.transcript_group_id, set()).add(
                    transcript.id
                )

        # tg_tree maps from parent -> children. A child can be a group or a transcript.
        #   A parent must be a group (or None, for transcripts that are not in a group).
        tg_tree: dict[str, set[tuple[Literal["t", "tg"], str]]] = {}

        if full_tree:
            for tg_id, tg in tg_dict.items():
                tg_tree.setdefault(tg.parent_transcript_group_id or "__global_root", set()).add(
                    ("tg", tg_id)
                )
                for t_id in tgs_to_transcripts.get(tg_id, []):
                    tg_tree.setdefault(tg_id, set()).add(("t", t_id))
            for t_id, t in t_dict.items():
                tg_tree.setdefault(t.transcript_group_id or "__global_root", set()).add(("t", t_id))
        else:
            # Initialize q with "important" tgs
            q, seen = Queue[str](), set[str]()
            for tg_id in tgs_to_transcripts.keys():
                q.put(tg_id)
                seen.add(tg_id)

            # Do an "upwards BFS" from leaves up to the root. Builds a tree of only relevant nodes.
            while q.qsize() > 0:
                u_id = q.get()
                u = tg_dict.get(u_id)  # None if __global_root

                # Add the transcripts under this tg
                for t_id in tgs_to_transcripts.get(u_id, []):
                    tg_tree.setdefault(u_id, set()).add(("t", t_id))

                # Add an edge from the parent
                if u is not None:
                    par_id = u.parent_transcript_group_id or "__global_root"
                    # Mark u as a child of par
                    tg_tree.setdefault(par_id, set()).add(("tg", u_id))
                    # If we haven't investigated the parent before, add to q
                    if par_id not in seen:
                        q.put(par_id)
                        seen.add(par_id)

        # For each node, sort by created_at timestamp

        def _cmp(element: tuple[Literal["t", "tg"], str]) -> datetime:
            obj_type, obj_id = element
            if obj_type == "tg":
                return tg_dict[obj_id].created_at or datetime.max
            else:
                return t_dict[obj_id].created_at or datetime.max

        c_tree: dict[str | None, list[tuple[Literal["t", "tg"], str]]] = {}
        for tg_id in tg_tree:
            children_ids = list(set(tg_tree[tg_id]))
            sorted_children_ids = sorted(children_ids, key=_cmp)
            c_tree[tg_id] = sorted_children_ids

        # Compute transcript indices as the depth-first traversal index
        transcript_idx_map: dict[str, int] = {}

        def _assign_transcript_indices(cur_tg_id: str, next_idx: int) -> int:
            children = c_tree.get(cur_tg_id, [])
            for child_type, child_id in children:
                if child_type == "tg":
                    next_idx = _assign_transcript_indices(child_id, next_idx)
                else:
                    transcript_idx_map[child_id] = next_idx
                    next_idx += 1
            return next_idx

        _assign_transcript_indices("__global_root", 0)

        return c_tree, transcript_idx_map

    def to_text_new(self, indent: int = 0, full_tree: bool = False):
        c_tree = self.get_canonical_tree(full_tree=full_tree)
        t_ids_ordered = self.get_transcript_ids_ordered(full_tree=full_tree)
        t_idx_map = {t_id: i for i, t_id in enumerate(t_ids_ordered)}
        t_dict = self.transcript_dict
        tg_dict = self.transcript_group_dict

        # Traverse the tree and render the string
        def _recurse(tg_id: str) -> str:
            children_ids = c_tree.get(tg_id, [])
            children_texts: list[str] = []
            for child_type, child_id in children_ids:
                if child_type == "tg":
                    children_texts.append(_recurse(child_id))
                else:
                    cur_text = t_dict[child_id].to_text_new(
                        transcript_idx=t_idx_map[child_id],
                        indent=indent,
                    )
                    children_texts.append(cur_text)
            children_text = "\n".join(children_texts)

            # No wrapper for global root
            if tg_id == "__global_root":
                return children_text
            # Delegate rendering to TranscriptGroup
            else:
                tg = tg_dict[tg_id]
                return tg.to_text_new(children_text=children_text, indent=indent)

        text = _recurse("__global_root")

        # Append agent run metadata below the full content
        yaml_text = yaml_dump_metadata(self.metadata)
        if yaml_text is not None:
            if indent > 0:
                yaml_text = textwrap.indent(yaml_text, " " * indent)
            text += f"\n<|agent run metadata|>\n{yaml_text}\n</|agent run metadata|>"

        return text

text property

text: str

Concatenates all transcript texts with double newlines as separators.

Returns:

Name Type Description
str str

A string representation of all transcripts.

text_blocks property

text_blocks: str

Concatenates all transcript texts using individual blocks format.

Returns:

Name Type Description
str str

A string representation of all transcripts using individual message blocks.

transcript_dict property

transcript_dict: dict[str, Transcript]

Lazily compute and cache a mapping from transcript ID to Transcript.

transcript_group_dict property

transcript_group_dict: dict[str, TranscriptGroup]

Lazily compute and cache a mapping from transcript group ID to TranscriptGroup.

get_filterable_fields

get_filterable_fields(max_depth: int = 1) -> list[FilterableField]

Returns a list of all fields that can be used to filter the agent run, by recursively exploring the model_dump() for singleton types in dictionaries.

Returns:

Type Description
list[FilterableField]

list[FilterableField]: A list of filterable fields, where each field is a dictionary containing its 'name' (path) and 'type'.

Source code in docent/data_models/agent_run.py
def get_filterable_fields(self, max_depth: int = 1) -> list[FilterableField]:
    """Returns a list of all fields that can be used to filter the agent run,
    by recursively exploring the model_dump() for singleton types in dictionaries.

    Returns:
        list[FilterableField]: A list of filterable fields, where each field is a
                               dictionary containing its 'name' (path) and 'type'.
    """

    result: list[FilterableField] = []

    def _explore_dict(d: dict[str, Any], prefix: str, depth: int):
        nonlocal result

        if depth > max_depth:
            return

        for k, v in d.items():
            if isinstance(v, (str, int, float, bool)):
                result.append(
                    {
                        "name": f"{prefix}.{k}",
                        "type": cast(Literal["str", "bool", "int", "float"], type(v).__name__),
                    }
                )
            elif isinstance(v, dict):
                _explore_dict(cast(dict[str, Any], v), f"{prefix}.{k}", depth + 1)

    # Look at the agent run metadata
    _explore_dict(to_jsonable_python(self.metadata), "metadata", 0)
    # Look at the transcript metadata
    # TODO(mengk): restore this later when we have the ability to integrate with SQL.
    # for t_id, t in self.transcripts.items():
    #     _explore_dict(
    #         t.metadata.model_dump(strip_internal_fields=True), f"transcript.{t_id}.metadata", 0
    #     )

    # Append the text field
    result.append({"name": "text", "type": "str"})

    return result

to_text

to_text(token_limit: int = maxsize) -> list[str]

Represents an agent run as a list of strings, each of which is at most token_limit tokens under the GPT-4 tokenization scheme.

We'll try to split up long AgentRuns along transcript boundaries and include metadata. For very long transcripts, we'll have to split them up further and remove metadata.

Source code in docent/data_models/agent_run.py
def to_text(self, token_limit: int = sys.maxsize) -> list[str]:
    """
    Represents an agent run as a list of strings, each of which is at most token_limit tokens
    under the GPT-4 tokenization scheme.

    We'll try to split up long AgentRuns along transcript boundaries and include metadata.
    For very long transcripts, we'll have to split them up further and remove metadata.
    """
    return self._to_text_impl(token_limit=token_limit, use_blocks=False)

to_text_blocks

to_text_blocks(token_limit: int = maxsize) -> list[str]

Represents an agent run as a list of strings using individual message blocks, each of which is at most token_limit tokens under the GPT-4 tokenization scheme.

Unlike to_text() which uses action units, this method formats each message as an individual block.

Source code in docent/data_models/agent_run.py
def to_text_blocks(self, token_limit: int = sys.maxsize) -> list[str]:
    """
    Represents an agent run as a list of strings using individual message blocks,
    each of which is at most token_limit tokens under the GPT-4 tokenization scheme.

    Unlike to_text() which uses action units, this method formats each message
    as an individual block.
    """
    return self._to_text_impl(token_limit=token_limit, use_blocks=True)

get_canonical_tree

get_canonical_tree(full_tree: bool = False) -> dict[str | None, list[tuple[Literal['t', 'tg'], str]]]

Compute and cache the canonical, sorted transcript group tree.

Parameters:

Name Type Description Default
full_tree bool

If True, include all transcript groups regardless of whether they contain transcripts. If False, include only the minimal tree that connects relevant groups and transcripts.

False

Returns:

Type Description
dict[str | None, list[tuple[Literal['t', 'tg'], str]]]

Canonical tree mapping parent group id (or "__global_root") to a list of

dict[str | None, list[tuple[Literal['t', 'tg'], str]]]

children (type, id) tuples sorted by creation time.

Source code in docent/data_models/agent_run.py
def get_canonical_tree(
    self, full_tree: bool = False
) -> dict[str | None, list[tuple[Literal["t", "tg"], str]]]:
    """Compute and cache the canonical, sorted transcript group tree.

    Args:
        full_tree: If True, include all transcript groups regardless of whether
            they contain transcripts. If False, include only the minimal tree
            that connects relevant groups and transcripts.

    Returns:
        Canonical tree mapping parent group id (or "__global_root") to a list of
        children (type, id) tuples sorted by creation time.
    """
    if (
        full_tree not in self._canonical_tree_cache
        or full_tree not in self._transcript_ids_ordered_cache
    ):
        canonical_tree, transcript_idx_map = self._build_canonical_tree(full_tree=full_tree)
        self._canonical_tree_cache[full_tree] = canonical_tree
        self._transcript_ids_ordered_cache[full_tree] = list(transcript_idx_map.keys())
    return self._canonical_tree_cache[full_tree]

get_transcript_ids_ordered

get_transcript_ids_ordered(full_tree: bool = False) -> list[str]

Compute and cache the depth-first transcript id ordering.

Parameters:

Name Type Description Default
full_tree bool

Whether to compute based on the full tree or the minimal tree.

False

Returns:

Type Description
list[str]

List of transcript ids in depth-first order.

Source code in docent/data_models/agent_run.py
def get_transcript_ids_ordered(self, full_tree: bool = False) -> list[str]:
    """Compute and cache the depth-first transcript id ordering.

    Args:
        full_tree: Whether to compute based on the full tree or the minimal tree.

    Returns:
        List of transcript ids in depth-first order.
    """
    if (
        full_tree not in self._transcript_ids_ordered_cache
        or full_tree not in self._canonical_tree_cache
    ):
        canonical_tree, transcript_idx_map = self._build_canonical_tree(full_tree=full_tree)
        self._canonical_tree_cache[full_tree] = canonical_tree
        self._transcript_ids_ordered_cache[full_tree] = list(transcript_idx_map.keys())
    return self._transcript_ids_ordered_cache[full_tree]