Bases: BaseModelRepresents a group of transcripts that are logically related.A transcript group can contain multiple transcripts and can have a hierarchical
structure with parent groups. This is useful for organizing transcripts into
logical units like experiments, tasks, or sessions.Attributes:
Name
Type
Description
id
str
Unique identifier for the transcript group, auto-generated by default.
name
`str
None`
Optional human-readable name for the transcript group.
description
`str
None`
Optional description of the transcript group.
agent_run_id
str
ID of the agent run this transcript group belongs to.
parent_transcript_group_id
`str
None`
Optional ID of the parent transcript group.
metadata
dict[str, Any]
Additional structured metadata about the transcript group.
docent/data_models/transcript.py
class TranscriptGroup(BaseModel): """Represents a group of transcripts that are logically related. A transcript group can contain multiple transcripts and can have a hierarchical structure with parent groups. This is useful for organizing transcripts into logical units like experiments, tasks, or sessions. Attributes: id: Unique identifier for the transcript group, auto-generated by default. name: Optional human-readable name for the transcript group. description: Optional description of the transcript group. agent_run_id: ID of the agent run this transcript group belongs to. parent_transcript_group_id: Optional ID of the parent transcript group. metadata: Additional structured metadata about the transcript group. """ id: str = Field(default_factory=lambda: str(uuid4())) name: str | None = None description: str | None = None agent_run_id: str parent_transcript_group_id: str | None = None created_at: datetime | None = None metadata: dict[str, Any] = Field(default_factory=dict) def to_text(self, children_text: str, indent: int = 0, render_metadata: bool = True) -> str: """Render this transcript group with its children and metadata. Metadata appears below the rendered children content. Args: children_text: Pre-rendered text of this group's children (groups/transcripts). indent: Number of spaces to indent the rendered output. render_metadata: Whether to include metadata in the output. Returns: str: XML-like wrapped text including the group's metadata. """ # Prepare YAML metadata if render_metadata: metadata_text = dump_metadata(self.metadata) if metadata_text is not None: if indent > 0: metadata_text = textwrap.indent(metadata_text, " " * indent) inner = f"{children_text}\n<|{self.name} metadata|>\n{metadata_text}\n</|{self.name} metadata|>" else: inner = children_text else: inner = children_text # Compose final text: content first, then metadata, all inside the group wrapper if indent > 0: inner = textwrap.indent(inner, " " * indent) return f"<|{self.name}|>\n{inner}\n</|{self.name}|>"
Render this transcript group with its children and metadata.Metadata appears below the rendered children content.Parameters:
Name
Type
Description
Default
children_text
str
Pre-rendered text of this group’s children (groups/transcripts).
required
indent
int
Number of spaces to indent the rendered output.
0
render_metadata
bool
Whether to include metadata in the output.
True
Returns:
Name
Type
Description
str
str
XML-like wrapped text including the group’s metadata.
docent/data_models/transcript.py
def to_text(self, children_text: str, indent: int = 0, render_metadata: bool = True) -> str: """Render this transcript group with its children and metadata. Metadata appears below the rendered children content. Args: children_text: Pre-rendered text of this group's children (groups/transcripts). indent: Number of spaces to indent the rendered output. render_metadata: Whether to include metadata in the output. Returns: str: XML-like wrapped text including the group's metadata. """ # Prepare YAML metadata if render_metadata: metadata_text = dump_metadata(self.metadata) if metadata_text is not None: if indent > 0: metadata_text = textwrap.indent(metadata_text, " " * indent) inner = f"{children_text}\n<|{self.name} metadata|>\n{metadata_text}\n</|{self.name} metadata|>" else: inner = children_text else: inner = children_text # Compose final text: content first, then metadata, all inside the group wrapper if indent > 0: inner = textwrap.indent(inner, " " * indent) return f"<|{self.name}|>\n{inner}\n</|{self.name}|>"
Bases: BaseModelRepresents a transcript of messages in a conversation with an AI agent.A transcript contains a sequence of messages exchanged between different roles
(system, user, assistant, tool) and provides methods to organize these messages
into logical units of action.Attributes:
Name
Type
Description
id
str
Unique identifier for the transcript, auto-generated by default.
name
`str
None`
Optional human-readable name for the transcript.
description
`str
None`
Optional description of the transcript.
transcript_group_id
`str
None`
Optional ID of the transcript group this transcript belongs to.
messages
list[ChatMessage]
List of chat messages in the transcript.
metadata
dict[str, Any]
Additional structured metadata about the transcript.
docent/data_models/transcript.py
class Transcript(BaseModel): """Represents a transcript of messages in a conversation with an AI agent. A transcript contains a sequence of messages exchanged between different roles (system, user, assistant, tool) and provides methods to organize these messages into logical units of action. Attributes: id: Unique identifier for the transcript, auto-generated by default. name: Optional human-readable name for the transcript. description: Optional description of the transcript. transcript_group_id: Optional ID of the transcript group this transcript belongs to. messages: List of chat messages in the transcript. metadata: Additional structured metadata about the transcript. """ id: str = Field(default_factory=lambda: str(uuid4())) name: str | None = None description: str | None = None transcript_group_id: str | None = None created_at: datetime | None = None messages: list[ChatMessage] metadata: dict[str, Any] = Field(default_factory=dict) def _enumerate_messages(self) -> Iterable[tuple[int, ChatMessage]]: """Yield (index, message) tuples for rendering. Override in subclasses to customize index assignment. """ return enumerate(self.messages) def to_text( self, transcript_alias: int | str = 0, indent: int = 0, render_metadata: bool = True, transcript_metadata_comments: list[Comment] | None = None, block_metadata_comments: dict[int, list[Comment]] | None = None, block_content_comments: dict[int, list[Comment]] | None = None, ) -> str: """Render this transcript as formatted text with optional comments. Args: transcript_alias: Identifier for the transcript (e.g., 0 becomes "T0"). indent: Number of spaces to indent nested content. render_metadata: Whether to include transcript metadata in the output. transcript_metadata_comments: Comments on this transcript's metadata. Rendered after the transcript metadata block. block_metadata_comments: Mapping from block index to comments on that block's metadata. Keyed by block index because comments need to be rendered inline with each block at the correct position. block_content_comments: Mapping from block index to comments on that block's content. Keyed by block index because comments need to be rendered inline with each block, and may include text range selections that highlight specific portions of the block content. Returns: Formatted text representation of the transcript. """ if isinstance(transcript_alias, int): transcript_alias = f"T{transcript_alias}" # Format individual message blocks blocks: list[str] = [] for msg_idx, message in self._enumerate_messages(): block_label = f"{transcript_alias}B{msg_idx}" # Get block-level comments for this message index msg_metadata_comments = ( block_metadata_comments.get(msg_idx) if block_metadata_comments else None ) msg_content_comments = ( block_content_comments.get(msg_idx) if block_content_comments else None ) block_text = format_chat_message( message, block_label, block_metadata_comments=msg_metadata_comments, block_content_comments=msg_content_comments, indent=indent, ) blocks.append(block_text) blocks_str = "\n".join(blocks) if indent > 0: blocks_str = textwrap.indent(blocks_str, " " * indent) content_str = f"<|{transcript_alias} blocks|>\n{blocks_str}\n</|{transcript_alias} blocks|>" # Gather metadata and add to content if render_metadata: metadata_text = dump_metadata(self.metadata) if metadata_text is not None: if indent > 0: metadata_text = textwrap.indent(metadata_text, " " * indent) metadata_label = f"{transcript_alias}M" content_str += f"\n<|transcript metadata {metadata_label}|>\n{metadata_text}\n</|transcript metadata {metadata_label}|>" # Add transcript metadata comments after the metadata if transcript_metadata_comments: metadata_comments_text = render_metadata_comments(transcript_metadata_comments) if metadata_comments_text: if indent > 0: metadata_comments_text = textwrap.indent( metadata_comments_text, " " * indent ) content_str += f"\n<|transcript metadata comments|>\n{metadata_comments_text}\n</|transcript metadata comments|>" # Format content and return if indent > 0: content_str = textwrap.indent(content_str, " " * indent) return f"<|transcript {transcript_alias}|>\n{content_str}\n</|transcript {transcript_alias}|>\n"
Render this transcript as formatted text with optional comments.Parameters:
Name
Type
Description
Default
transcript_alias
`int
str`
Identifier for the transcript (e.g., 0 becomes “T0”).
0
indent
int
Number of spaces to indent nested content.
0
render_metadata
bool
Whether to include transcript metadata in the output.
True
transcript_metadata_comments
`list[Comment]
None`
Comments on this transcript’s metadata. Rendered after the transcript metadata block.
None
block_metadata_comments
`dict[int, list[Comment]]
None`
Mapping from block index to comments on that block’s metadata. Keyed by block index because comments need to be rendered inline with each block at the correct position.
None
block_content_comments
`dict[int, list[Comment]]
None`
Mapping from block index to comments on that block’s content. Keyed by block index because comments need to be rendered inline with each block, and may include text range selections that highlight specific portions of the block content.
None
Returns:
Type
Description
str
Formatted text representation of the transcript.
docent/data_models/transcript.py
def to_text( self, transcript_alias: int | str = 0, indent: int = 0, render_metadata: bool = True, transcript_metadata_comments: list[Comment] | None = None, block_metadata_comments: dict[int, list[Comment]] | None = None, block_content_comments: dict[int, list[Comment]] | None = None,) -> str: """Render this transcript as formatted text with optional comments. Args: transcript_alias: Identifier for the transcript (e.g., 0 becomes "T0"). indent: Number of spaces to indent nested content. render_metadata: Whether to include transcript metadata in the output. transcript_metadata_comments: Comments on this transcript's metadata. Rendered after the transcript metadata block. block_metadata_comments: Mapping from block index to comments on that block's metadata. Keyed by block index because comments need to be rendered inline with each block at the correct position. block_content_comments: Mapping from block index to comments on that block's content. Keyed by block index because comments need to be rendered inline with each block, and may include text range selections that highlight specific portions of the block content. Returns: Formatted text representation of the transcript. """ if isinstance(transcript_alias, int): transcript_alias = f"T{transcript_alias}" # Format individual message blocks blocks: list[str] = [] for msg_idx, message in self._enumerate_messages(): block_label = f"{transcript_alias}B{msg_idx}" # Get block-level comments for this message index msg_metadata_comments = ( block_metadata_comments.get(msg_idx) if block_metadata_comments else None ) msg_content_comments = ( block_content_comments.get(msg_idx) if block_content_comments else None ) block_text = format_chat_message( message, block_label, block_metadata_comments=msg_metadata_comments, block_content_comments=msg_content_comments, indent=indent, ) blocks.append(block_text) blocks_str = "\n".join(blocks) if indent > 0: blocks_str = textwrap.indent(blocks_str, " " * indent) content_str = f"<|{transcript_alias} blocks|>\n{blocks_str}\n</|{transcript_alias} blocks|>" # Gather metadata and add to content if render_metadata: metadata_text = dump_metadata(self.metadata) if metadata_text is not None: if indent > 0: metadata_text = textwrap.indent(metadata_text, " " * indent) metadata_label = f"{transcript_alias}M" content_str += f"\n<|transcript metadata {metadata_label}|>\n{metadata_text}\n</|transcript metadata {metadata_label}|>" # Add transcript metadata comments after the metadata if transcript_metadata_comments: metadata_comments_text = render_metadata_comments(transcript_metadata_comments) if metadata_comments_text: if indent > 0: metadata_comments_text = textwrap.indent( metadata_comments_text, " " * indent ) content_str += f"\n<|transcript metadata comments|>\n{metadata_comments_text}\n</|transcript metadata comments|>" # Format content and return if indent > 0: content_str = textwrap.indent(content_str, " " * indent) return f"<|transcript {transcript_alias}|>\n{content_str}\n</|transcript {transcript_alias}|>\n"
Render metadata comments (agent run, transcript, or block metadata).For metadata comments, we render the key on which the comment was written
and the user’s content.TODO(mengk): known limitation: does not highlight text_range selections, if available.
I’m not sure if it’s supported in the UI, but just pointing this out for the backend.Parameters:
Name
Type
Description
Default
comments
list[Comment]
List of Comment objects targeting metadata.
required
Returns:
Type
Description
str
Formatted string with all comments.
docent/data_models/transcript.py
def render_metadata_comments(comments: list[Comment]) -> str: """Render metadata comments (agent run, transcript, or block metadata). For metadata comments, we render the key on which the comment was written and the user's content. TODO(mengk): known limitation: does not highlight text_range selections, if available. I'm not sure if it's supported in the UI, but just pointing this out for the backend. Args: comments: List of Comment objects targeting metadata. Returns: Formatted string with all comments. """ if not comments: return "" lines: list[str] = [] for comment in comments: # Iterate through citations to find the right target metadata_key = "unknown" for citation in comment.citations: item = citation.target.item if isinstance(item, TranscriptMetadataItem): metadata_key = item.metadata_key break elif isinstance(item, AgentRunMetadataItem): metadata_key = item.metadata_key break lines.append(f'<comment key="{metadata_key}">{comment.content}</comment>') return "\n".join(lines)
Render block content comments with text range highlighting.For block content comments with text_range, we surround the range with
tags and render the comment
content below with a reference to the selection.Parameters:
Name
Type
Description
Default
comments
list[Comment]
List of Comment objects targeting block content.
required
content
str
The block content text to annotate.
required
comment_index_offset
int
Starting index for comment numbering (local to message).
0
Returns:
Type
Description
str
Tuple of (annotated_content, comments_text) where annotated_content has
str
selection tags inserted and comments_text contains the rendered comments.
docent/data_models/transcript.py
def render_block_content_comments( comments: list[Comment], content: str, comment_index_offset: int = 0,) -> tuple[str, str]: """Render block content comments with text range highlighting. For block content comments with text_range, we surround the range with <COMMENT_X_SELECTION></COMMENT_X_SELECTION> tags and render the comment content below with a reference to the selection. Args: comments: List of Comment objects targeting block content. content: The block content text to annotate. comment_index_offset: Starting index for comment numbering (local to message). Returns: Tuple of (annotated_content, comments_text) where annotated_content has selection tags inserted and comments_text contains the rendered comments. """ if not comments: return content, "" # Build a list of (position, tag_text) for all tag insertions. # By treating start and end tags as independent insertions sorted by position # descending, we correctly handle overlapping/nested ranges. Each insertion # only affects positions after it, so processing from the end backward # preserves all indices. insertions: list[tuple[int, str]] = [] comments_with_text_range: set[str] = set() for i, comment in enumerate(comments): # Iterate through citations to find the right target text_range: CitationTargetTextRange | None = None for citation in comment.citations: item = citation.target.item if isinstance(item, TranscriptBlockContentItem): text_range = citation.target.text_range break # If the text range exists, add the start and end tags if ( text_range and text_range.target_start_idx is not None and text_range.target_end_idx is not None ): start_idx = text_range.target_start_idx end_idx = text_range.target_end_idx if 0 <= start_idx < len(content) and start_idx < end_idx <= len(content): # End tag goes at end_idx, start tag goes at start_idx comment_idx = comment_index_offset + i insertions.append((end_idx, f"</COMMENT_{comment_idx}_SELECTION>")) insertions.append((start_idx, f"<COMMENT_{comment_idx}_SELECTION>")) # Keep track of comments with text ranges comments_with_text_range.add(comment.id) # Sort by position descending. For ties (e.g., end of one range = start of another), # end tags (closing) should come before start tags (opening) at the same position # to produce valid nesting, but since our tags don't need to be valid XML, order # at ties doesn't matter for correctness. insertions.sort(key=lambda x: x[0], reverse=True) # Apply insertions from the end backward to preserve indices annotated_content = content for pos, tag in insertions: annotated_content = annotated_content[:pos] + tag + annotated_content[pos:] # Build comment text comment_lines: list[str] = [] for i, comment in enumerate(comments): if comment.id in comments_with_text_range: comment_idx = comment_index_offset + i comment_lines.append( f'<comment selection="COMMENT_{comment_idx}_SELECTION">{comment.content}</comment>' ) else: comment_lines.append(f"<comment>{comment.content}</comment>") return annotated_content, "\n".join(comment_lines)