Transcript

A Transcript object represents a sequence of chat messages (user, assistant, system, tool) from the perspective of a single agent. See here for more details on the chat message schemas.

TranscriptGroup

Bases: BaseModel Represents a group of transcripts that are logically related. A transcript group can contain multiple transcripts and can have a hierarchical structure with parent groups. This is useful for organizing transcripts into logical units like experiments, tasks, or sessions. Attributes:

Name	Type	Description
`id`	`str`	Unique identifier, auto-generated; cannot be set by callers.
`name`	`str	None`	Optional human-readable name for the transcript group.
`description`	`str	None`	Optional description of the transcript group.
`agent_run_id`	`str`	ID of the agent run this transcript group belongs to.
`parent_transcript_group_id`	`str	None`	Optional ID of the parent transcript group.
`created_at`	`datetime	None`	Optional creation timestamp. Both naive and timezone-aware `datetime` values are accepted; tz-aware values are converted to UTC on ingest and stored as naive UTC. Leave as `None` to let the server assign one.
`metadata`	`dict[str, Any]`	Additional structured metadata about the transcript group.

docent/data_models/transcript.py

class TranscriptGroup(BaseModel):
    """Represents a group of transcripts that are logically related.

    A transcript group can contain multiple transcripts and can have a hierarchical
    structure with parent groups. This is useful for organizing transcripts into
    logical units like experiments, tasks, or sessions.

    Attributes:
        id: Unique identifier, auto-generated; cannot be set by callers.
        name: Optional human-readable name for the transcript group.
        description: Optional description of the transcript group.
        agent_run_id: ID of the agent run this transcript group belongs to.
        parent_transcript_group_id: Optional ID of the parent transcript group.
        metadata: Additional structured metadata about the transcript group.
    """

    id: str = Field(default_factory=lambda: str(uuid4()), frozen=True)
    name: str | None = None
    description: str | None = None
    agent_run_id: str
    parent_transcript_group_id: str | None = None
    created_at: datetime | None = None
    metadata: dict[str, Any] = Field(default_factory=dict)

    def __setattr__(self, name: str, value: Any) -> None:
        if name == "id":
            raise ValueError(
                "Cannot set `id` on TranscriptGroup. Docent assigns IDs automatically; "
                "the assigned value is already available as `group.id`."
            )
        super().__setattr__(name, value)

    def to_text(self, children_text: str, indent: int = 0, render_metadata: bool = True) -> str:
        """Render this transcript group with its children and metadata.

        Metadata appears below the rendered children content.

        Args:
            children_text: Pre-rendered text of this group's children (groups/transcripts).
            indent: Number of spaces to indent the rendered output.
            render_metadata: Whether to include metadata in the output.

        Returns:
            str: XML-like wrapped text including the group's metadata.
        """
        # Prepare YAML metadata
        if render_metadata:
            metadata_text = dump_metadata(self.metadata)
            if metadata_text is not None:
                if indent > 0:
                    metadata_text = textwrap.indent(metadata_text, " " * indent)
                inner = f"{children_text}\n<|{self.name} metadata|>\n{metadata_text}\n</|{self.name} metadata|>"
            else:
                inner = children_text
        else:
            inner = children_text

        # Compose final text: content first, then metadata, all inside the group wrapper
        if indent > 0:
            inner = textwrap.indent(inner, " " * indent)
        return f"<|{self.name}|>\n{inner}\n</|{self.name}|>"

to_text

to_text(children_text: str, indent: int = 0, render_metadata: bool = True) -> str

Render this transcript group with its children and metadata. Metadata appears below the rendered children content. Parameters:

Name	Type	Description	Default
`children_text`	`str`	Pre-rendered text of this group’s children (groups/transcripts).	required
`indent`	`int`	Number of spaces to indent the rendered output.	`0`
`render_metadata`	`bool`	Whether to include metadata in the output.	`True`

Returns:

Name	Type	Description
`str`	`str`	XML-like wrapped text including the group’s metadata.

docent/data_models/transcript.py

def to_text(self, children_text: str, indent: int = 0, render_metadata: bool = True) -> str:
    """Render this transcript group with its children and metadata.

    Metadata appears below the rendered children content.

    Args:
        children_text: Pre-rendered text of this group's children (groups/transcripts).
        indent: Number of spaces to indent the rendered output.
        render_metadata: Whether to include metadata in the output.

    Returns:
        str: XML-like wrapped text including the group's metadata.
    """
    # Prepare YAML metadata
    if render_metadata:
        metadata_text = dump_metadata(self.metadata)
        if metadata_text is not None:
            if indent > 0:
                metadata_text = textwrap.indent(metadata_text, " " * indent)
            inner = f"{children_text}\n<|{self.name} metadata|>\n{metadata_text}\n</|{self.name} metadata|>"
        else:
            inner = children_text
    else:
        inner = children_text

    # Compose final text: content first, then metadata, all inside the group wrapper
    if indent > 0:
        inner = textwrap.indent(inner, " " * indent)
    return f"<|{self.name}|>\n{inner}\n</|{self.name}|>"

Transcript

Bases: BaseModel Represents a transcript of messages in a conversation with an AI agent. A transcript contains a sequence of messages exchanged between different roles (system, user, assistant, tool) and provides methods to organize these messages into logical units of action. Attributes:

Name	Type	Description
`id`	`str`	Unique identifier, auto-generated; cannot be set by callers.
`name`	`str	None`	Optional human-readable name for the transcript.
`description`	`str	None`	Optional description of the transcript.
`transcript_group_id`	`str	None`	Optional ID of the transcript group this transcript belongs to.
`created_at`	`datetime	None`	Optional creation timestamp. Both naive and timezone-aware `datetime` values are accepted; tz-aware values are converted to UTC on ingest and stored as naive UTC. Leave as `None` to let the server assign one.
`messages`	`list[ChatMessage]`	List of chat messages in the transcript.
`metadata`	`dict[str, Any]`	Additional structured metadata about the transcript.

docent/data_models/transcript.py

class Transcript(BaseModel):
    """Represents a transcript of messages in a conversation with an AI agent.

    A transcript contains a sequence of messages exchanged between different roles
    (system, user, assistant, tool) and provides methods to organize these messages
    into logical units of action.

    Attributes:
        id: Unique identifier, auto-generated; cannot be set by callers.
        name: Optional human-readable name for the transcript.
        description: Optional description of the transcript.
        transcript_group_id: Optional ID of the transcript group this transcript belongs to.
        messages: List of chat messages in the transcript.
        metadata: Additional structured metadata about the transcript.
    """

    id: str = Field(default_factory=lambda: str(uuid4()), frozen=True)
    name: str | None = None
    description: str | None = None
    transcript_group_id: str | None = None
    created_at: datetime | None = None

    messages: list[ChatMessage]
    metadata: dict[str, Any] = Field(default_factory=dict)

    def __setattr__(self, name: str, value: Any) -> None:
        if name == "id":
            raise ValueError(
                "Cannot set `id` on Transcript. Docent assigns IDs automatically; "
                "the assigned value is already available as `transcript.id`."
            )
        super().__setattr__(name, value)

    def _enumerate_messages(self) -> Iterable[tuple[int, ChatMessage]]:
        """Yield (index, message) tuples for rendering.

        Override in subclasses to customize index assignment.
        """
        return enumerate(self.messages)

    def to_text(
        self,
        transcript_alias: int | str = 0,
        indent: int = 0,
        render_metadata: bool = True,
        transcript_metadata_comments: list[Comment] | None = None,
        block_metadata_comments: dict[int, list[Comment]] | None = None,
        block_content_comments: dict[int, list[Comment]] | None = None,
    ) -> str:
        """Render this transcript as formatted text with optional comments.

        Args:
            transcript_alias: Identifier for the transcript (e.g., 0 becomes "T0").
            indent: Number of spaces to indent nested content.
            render_metadata: Whether to include transcript metadata in the output.
            transcript_metadata_comments: Comments on this transcript's metadata.
                Rendered after the transcript metadata block.
            block_metadata_comments: Mapping from block index to comments on that
                block's metadata. Keyed by block index because comments need to be
                rendered inline with each block at the correct position.
            block_content_comments: Mapping from block index to comments on that
                block's content. Keyed by block index because comments need to be
                rendered inline with each block, and may include text range
                selections that highlight specific portions of the block content.

        Returns:
            Formatted text representation of the transcript.
        """
        if isinstance(transcript_alias, int):
            transcript_alias = f"T{transcript_alias}"

        # Format individual message blocks
        blocks: list[str] = []
        for msg_idx, message in self._enumerate_messages():
            block_label = f"{transcript_alias}B{msg_idx}"
            # Get block-level comments for this message index
            msg_metadata_comments = (
                block_metadata_comments.get(msg_idx) if block_metadata_comments else None
            )
            msg_content_comments = (
                block_content_comments.get(msg_idx) if block_content_comments else None
            )
            block_text = format_chat_message(
                message,
                block_label,
                block_metadata_comments=msg_metadata_comments,
                block_content_comments=msg_content_comments,
                indent=indent,
            )
            blocks.append(block_text)
        blocks_str = "\n".join(blocks)
        if indent > 0:
            blocks_str = textwrap.indent(blocks_str, " " * indent)

        content_str = f"<|{transcript_alias} blocks|>\n{blocks_str}\n</|{transcript_alias} blocks|>"

        # Gather metadata and add to content
        if render_metadata:
            metadata_text = dump_metadata(self.metadata)
            if metadata_text is not None:
                if indent > 0:
                    metadata_text = textwrap.indent(metadata_text, " " * indent)
                metadata_label = f"{transcript_alias}M"
                content_str += f"\n<|transcript metadata {metadata_label}|>\n{metadata_text}\n</|transcript metadata {metadata_label}|>"

            # Add transcript metadata comments after the metadata
            if transcript_metadata_comments:
                metadata_comments_text = render_metadata_comments(transcript_metadata_comments)
                if metadata_comments_text:
                    if indent > 0:
                        metadata_comments_text = textwrap.indent(
                            metadata_comments_text, " " * indent
                        )
                    content_str += f"\n<|transcript metadata comments|>\n{metadata_comments_text}\n</|transcript metadata comments|>"

        # Format content and return
        if indent > 0:
            content_str = textwrap.indent(content_str, " " * indent)
        return f"<|transcript {transcript_alias}|>\n{content_str}\n</|transcript {transcript_alias}|>\n"

to_text

to_text(transcript_alias: int | str = 0, indent: int = 0, render_metadata: bool = True, transcript_metadata_comments: list[Comment] | None = None, block_metadata_comments: dict[int, list[Comment]] | None = None, block_content_comments: dict[int, list[Comment]] | None = None) -> str

Render this transcript as formatted text with optional comments. Parameters:

Name	Type	Description	Default
`transcript_alias`	`int	str`	Identifier for the transcript (e.g., 0 becomes “T0”).	`0`
`indent`	`int`	Number of spaces to indent nested content.	`0`
`render_metadata`	`bool`	Whether to include transcript metadata in the output.	`True`
`transcript_metadata_comments`	`list[Comment]	None`	Comments on this transcript’s metadata. Rendered after the transcript metadata block.	`None`
`block_metadata_comments`	`dict[int, list[Comment]]	None`	Mapping from block index to comments on that block’s metadata. Keyed by block index because comments need to be rendered inline with each block at the correct position.	`None`
`block_content_comments`	`dict[int, list[Comment]]	None`	Mapping from block index to comments on that block’s content. Keyed by block index because comments need to be rendered inline with each block, and may include text range selections that highlight specific portions of the block content.	`None`

Returns:

Type	Description
`str`	Formatted text representation of the transcript.

docent/data_models/transcript.py

def to_text(
    self,
    transcript_alias: int | str = 0,
    indent: int = 0,
    render_metadata: bool = True,
    transcript_metadata_comments: list[Comment] | None = None,
    block_metadata_comments: dict[int, list[Comment]] | None = None,
    block_content_comments: dict[int, list[Comment]] | None = None,
) -> str:
    """Render this transcript as formatted text with optional comments.

    Args:
        transcript_alias: Identifier for the transcript (e.g., 0 becomes "T0").
        indent: Number of spaces to indent nested content.
        render_metadata: Whether to include transcript metadata in the output.
        transcript_metadata_comments: Comments on this transcript's metadata.
            Rendered after the transcript metadata block.
        block_metadata_comments: Mapping from block index to comments on that
            block's metadata. Keyed by block index because comments need to be
            rendered inline with each block at the correct position.
        block_content_comments: Mapping from block index to comments on that
            block's content. Keyed by block index because comments need to be
            rendered inline with each block, and may include text range
            selections that highlight specific portions of the block content.

    Returns:
        Formatted text representation of the transcript.
    """
    if isinstance(transcript_alias, int):
        transcript_alias = f"T{transcript_alias}"

    # Format individual message blocks
    blocks: list[str] = []
    for msg_idx, message in self._enumerate_messages():
        block_label = f"{transcript_alias}B{msg_idx}"
        # Get block-level comments for this message index
        msg_metadata_comments = (
            block_metadata_comments.get(msg_idx) if block_metadata_comments else None
        )
        msg_content_comments = (
            block_content_comments.get(msg_idx) if block_content_comments else None
        )
        block_text = format_chat_message(
            message,
            block_label,
            block_metadata_comments=msg_metadata_comments,
            block_content_comments=msg_content_comments,
            indent=indent,
        )
        blocks.append(block_text)
    blocks_str = "\n".join(blocks)
    if indent > 0:
        blocks_str = textwrap.indent(blocks_str, " " * indent)

    content_str = f"<|{transcript_alias} blocks|>\n{blocks_str}\n</|{transcript_alias} blocks|>"

    # Gather metadata and add to content
    if render_metadata:
        metadata_text = dump_metadata(self.metadata)
        if metadata_text is not None:
            if indent > 0:
                metadata_text = textwrap.indent(metadata_text, " " * indent)
            metadata_label = f"{transcript_alias}M"
            content_str += f"\n<|transcript metadata {metadata_label}|>\n{metadata_text}\n</|transcript metadata {metadata_label}|>"

        # Add transcript metadata comments after the metadata
        if transcript_metadata_comments:
            metadata_comments_text = render_metadata_comments(transcript_metadata_comments)
            if metadata_comments_text:
                if indent > 0:
                    metadata_comments_text = textwrap.indent(
                        metadata_comments_text, " " * indent
                    )
                content_str += f"\n<|transcript metadata comments|>\n{metadata_comments_text}\n</|transcript metadata comments|>"

    # Format content and return
    if indent > 0:
        content_str = textwrap.indent(content_str, " " * indent)
    return f"<|transcript {transcript_alias}|>\n{content_str}\n</|transcript {transcript_alias}|>\n"

render_metadata_comments

render_metadata_comments(comments: list[Comment]) -> str

Render metadata comments (agent run, transcript, or block metadata). For metadata comments, we render the key on which the comment was written and the user’s content. TODO(mengk): known limitation: does not highlight text_range selections, if available. I’m not sure if it’s supported in the UI, but just pointing this out for the backend. Parameters:

Name	Type	Description	Default
`comments`	`list[Comment]`	List of Comment objects targeting metadata.	required

Returns:

Type	Description
`str`	Formatted string with all comments.

docent/data_models/transcript.py

def render_metadata_comments(comments: list[Comment]) -> str:
    """Render metadata comments (agent run, transcript, or block metadata).

    For metadata comments, we render the key on which the comment was written
    and the user's content.

    TODO(mengk): known limitation: does not highlight text_range selections, if available.
        I'm not sure if it's supported in the UI, but just pointing this out for the backend.

    Args:
        comments: List of Comment objects targeting metadata.

    Returns:
        Formatted string with all comments.
    """
    if not comments:
        return ""

    lines: list[str] = []
    for comment in comments:
        # Iterate through citations to find the right target
        metadata_key = "unknown"
        for citation in comment.citations:
            item = citation.target.item
            if isinstance(item, TranscriptMetadataItem):
                metadata_key = item.metadata_key
                break
            elif isinstance(item, AgentRunMetadataItem):
                metadata_key = item.metadata_key
                break
        lines.append(f'<comment key="{metadata_key}">{comment.content}</comment>')

    return "\n".join(lines)

render_block_content_comments

render_block_content_comments(comments: list[Comment], content: str, comment_index_offset: int = 0) -> tuple[str, str]

Render block content comments with text range highlighting. For block content comments with text_range, we surround the range with tags and render the comment content below with a reference to the selection. Parameters:

Name	Type	Description	Default
`comments`	`list[Comment]`	List of Comment objects targeting block content.	required
`content`	`str`	The block content text to annotate.	required
`comment_index_offset`	`int`	Starting index for comment numbering (local to message).	`0`

Returns:

Type	Description
`str`	Tuple of (annotated_content, comments_text) where annotated_content has
`str`	selection tags inserted and comments_text contains the rendered comments.

docent/data_models/transcript.py

def render_block_content_comments(
    comments: list[Comment],
    content: str,
    comment_index_offset: int = 0,
) -> tuple[str, str]:
    """Render block content comments with text range highlighting.

    For block content comments with text_range, we surround the range with
    <COMMENT_X_SELECTION></COMMENT_X_SELECTION> tags and render the comment
    content below with a reference to the selection.

    Args:
        comments: List of Comment objects targeting block content.
        content: The block content text to annotate.
        comment_index_offset: Starting index for comment numbering (local to message).

    Returns:
        Tuple of (annotated_content, comments_text) where annotated_content has
        selection tags inserted and comments_text contains the rendered comments.
    """
    if not comments:
        return content, ""

    # Build a list of (position, tag_text) for all tag insertions.
    # By treating start and end tags as independent insertions sorted by position
    # descending, we correctly handle overlapping/nested ranges. Each insertion
    # only affects positions after it, so processing from the end backward
    # preserves all indices.
    insertions: list[tuple[int, str]] = []
    comments_with_text_range: set[str] = set()
    for i, comment in enumerate(comments):
        # Iterate through citations to find the right target
        text_range: CitationTargetTextRange | None = None
        for citation in comment.citations:
            item = citation.target.item
            if isinstance(item, TranscriptBlockContentItem):
                text_range = citation.target.text_range
                break

        # If the text range exists, add the start and end tags
        if (
            text_range
            and text_range.target_start_idx is not None
            and text_range.target_end_idx is not None
        ):
            start_idx = text_range.target_start_idx
            end_idx = text_range.target_end_idx
            if 0 <= start_idx < len(content) and start_idx < end_idx <= len(content):
                # End tag goes at end_idx, start tag goes at start_idx
                comment_idx = comment_index_offset + i
                insertions.append((end_idx, f"</COMMENT_{comment_idx}_SELECTION>"))
                insertions.append((start_idx, f"<COMMENT_{comment_idx}_SELECTION>"))

                # Keep track of comments with text ranges
                comments_with_text_range.add(comment.id)

    # Sort by position descending. For ties (e.g., end of one range = start of another),
    # end tags (closing) should come before start tags (opening) at the same position
    # to produce valid nesting, but since our tags don't need to be valid XML, order
    # at ties doesn't matter for correctness.
    insertions.sort(key=lambda x: x[0], reverse=True)

    # Apply insertions from the end backward to preserve indices
    annotated_content = content
    for pos, tag in insertions:
        annotated_content = annotated_content[:pos] + tag + annotated_content[pos:]

    # Build comment text
    comment_lines: list[str] = []
    for i, comment in enumerate(comments):
        if comment.id in comments_with_text_range:
            comment_idx = comment_index_offset + i
            comment_lines.append(
                f'<comment selection="COMMENT_{comment_idx}_SELECTION">{comment.content}</comment>'
            )
        else:
            comment_lines.append(f"<comment>{comment.content}</comment>")

    return annotated_content, "\n".join(comment_lines)

Get Started

Agentic Analysis

Ingestion

Data Models

Guides

Legacy

Support

Transcript

Transcript

TranscriptGroup

to_text

Transcript

to_text

render_metadata_comments

render_block_content_comments

​Transcript

​TranscriptGroup

​to_text

​Transcript

​to_text

​render_metadata_comments

​render_block_content_comments

Transcript

TranscriptGroup

to_text

Transcript

to_text

render_metadata_comments

render_block_content_comments