Getting Started with the SDK
The Python SDK provides several methods for executing DQL queries and working with results.Basic Query Execution
Getting the Schema
Before writing queries, you can discover available tables and columns:Converting Results
DQL results come as columns and rows arrays. Convert them to more convenient formats:SDK Method Reference
execute_dql(collection_id, dql)
Primary method for running DQL queries.
Parameters:
| Parameter | Type | Description |
|---|---|---|
collection_id | str | ID of the collection to query |
dql | str | The DQL query string |
dict with keys:
columns- List of column namesrows- List of row data arraystruncated- Boolean indicating if results were cappedapplied_limit- The limit that was appliedrow_count- Number of rows returnedexecution_time_ms- Query execution time in millisecondsrequested_limit- The LIMIT specified in the query (or null)selected_columns- Detailed column information with source tracking
get_dql_schema(collection_id)
Retrieve available tables and columns for a collection.
Parameters:
| Parameter | Type | Description |
|---|---|---|
collection_id | str | ID of the collection |
dict containing table definitions with column names, types, nullable flags, primary key indicators, and foreign key references.
Example:
dql_result_to_dicts(dql_result)
Convert a DQL result to a list of dictionaries.
Parameters:
| Parameter | Type | Description |
|---|---|---|
dql_result | dict | Result from execute_dql() |
list[dict] where each dict maps column names to values.
Example:
dql_result_to_df_experimental(dql_result)
Convert a DQL result to a pandas DataFrame. Automatically attempts to cast string values to numeric types where possible.
Parameters:
| Parameter | Type | Description |
|---|---|---|
dql_result | dict | Result from execute_dql() |
pd.DataFrame
Example:
This method is experimental and the casting behavior may change.
select_agent_run_ids(collection_id, where_clause, limit)
Convenience helper to fetch agent run IDs matching a filter.
Parameters:
| Parameter | Type | Description |
|---|---|---|
collection_id | str | ID of the collection |
where_clause | str | None | Optional WHERE clause (without the WHERE keyword) |
limit | int | None | Optional limit on results |
list[str] of agent run IDs.
Example:
Available Tables and Columns
| Table | Description |
|---|---|
agent_runs | Information about each agent run in a collection. |
transcripts | Individual transcripts tied to an agent run; stores serialized messages and per-transcript metadata. |
transcript_groups | Hierarchical groupings of transcripts for runs. |
judge_results | Scored rubric outputs keyed by agent run and rubric version. |
labels | Labels applied to agent runs. |
tags | Tags applied to agent runs. |
rubric_centroids | Clustered rubric score categories. |
judge_result_centroids | Junction table for judge result clustering. |
results | Results from result sets. |
agent_runs
| Column | Description |
|---|---|
id | Agent run identifier (UUID). |
collection_id | Collection that owns the run |
name | Optional user-provided display name. |
description | Optional description supplied at ingest time. |
metadata_json | User supplied metadata, stored as JSON. |
created_at | When the run was recorded in Docent. |
transcripts
| Column | Description |
|---|---|
id | Transcript identifier (UUID). |
collection_id | Collection that owns the transcript. |
agent_run_id | Parent run identifier; joins back to agent_runs.id. |
name | Optional transcript title. |
description | Optional description. |
transcript_group_id | Optional grouping identifier. |
messages | Binary-encoded JSON payload of message turns. |
metadata_json | Binary-encoded metadata describing the transcript. |
dict_key | Dictionary key for transcript identification. |
created_at | Timestamp recorded during ingest. |
transcript_groups
| Column | Description |
|---|---|
id | Transcript group identifier. |
collection_id | Collection that owns the group. |
agent_run_id | Parent run identifier; joins back to agent_runs.id. |
name | Optional name for the group. |
description | Optional descriptive text. |
parent_transcript_group_id | Identifier of the parent group (for hierarchical groupings). |
metadata_json | JSONB metadata payload for the group. |
created_at | Timestamp recorded during ingest. |
judge_results
| Column | Description |
|---|---|
id | Judge result identifier. |
agent_run_id | Run scored by the rubric. |
rubric_id | Rubric identifier. |
rubric_version | Version of the rubric used when scoring. |
output | JSON representation of rubric outputs. |
value | Deprecated: use output instead. |
result_metadata | Optional JSON metadata attached to the result. |
result_type | Enum describing the rubric output type. |
JSON Metadata Paths
Docent stores user-supplied metadata as JSON, and can be accessed in Postgres style. Here are some examples:JSON operators work on
agent_runs.metadata_json, transcript_groups.metadata_json, judge_results.output, and judge_results.result_metadata (stored as JSONB). The transcripts.metadata_json column is stored as binary and does not support direct JSON operators in queries.Access Patterns
Allowed Syntax
DQL supported keywords:| Feature | |
|---|---|
SELECT, DISTINCT, FROM, WHERE, subqueries | |
JOIN, LEFT JOIN, RIGHT JOIN, FULL JOIN, CROSS JOIN | |
WITH (CTEs) | |
UNION [ALL], INTERSECT, EXCEPT | |
GROUP BY, HAVING | |
Aggregations (COUNT, AVG, MIN, MAX, SUM, STDDEV_POP, STDDEV_SAMP, ARRAY_AGG, STRING_AGG, JSON_AGG, JSONB_AGG, JSON_OBJECT_AGG, PERCENTILE_CONT, PERCENTILE_DISC (WITHIN GROUP)) | |
Window functions (ROW_NUMBER, RANK, DENSE_RANK, NTILE, LAG, LEAD, FIRST_VALUE, LAST_VALUE, NTH_VALUE, PERCENT_RANK, CUME_DIST) | |
ORDER BY, LIMIT, OFFSET | |
Conditional & null helpers (CASE, COALESCE, NULLIF) | |
Boolean logic (AND, OR, NOT) | |
Comparison operators (=, !=, <, <=, >, >=, IS, IS NOT, IS DISTINCT FROM, IN, BETWEEN, LIKE, ILIKE, EXISTS, SIMILAR TO, ~, ~*, !~, !~*) | |
Arithmetic & math (+, -, *, /, %, POWER, ABS, SIGN, SQRT, LN, LOG, EXP, GREATEST, LEAST, FLOOR, CEIL, ROUND, RANDOM) | |
String helpers (SUBSTRING, LEFT, RIGHT, LENGTH, UPPER, LOWER, INITCAP, TRIM, REPLACE, SPLIT_PART, POSITION, CONCAT, CONCAT_WS, STRING_AGG) | |
JSON operators & functions (->, ->>, #>, #>>, @>, ?, `? | , ?&, json_agg, jsonb_agg, json_object_agg`) |
Date/time basics (CURRENT_DATE, CURRENT_TIME, CURRENT_TIMESTAMP, NOW(), EXTRACT, DATE_TRUNC, AT TIME ZONE) | |
Interval arithmetic (timestamp +/- INTERVAL, INTERVAL literals) | |
Construction & conversion (MAKE_DATE, MAKE_TIME, MAKE_TIMESTAMP, TO_CHAR) | |
Array helpers (array_cat, unnest) | |
Type helpers (CAST, ::) |
* in SELECT clauses (e.g., SELECT *, COUNT(*)), user-defined functions, and any DDL or DML commands.
Common Patterns
Quick-reference snippets for frequent tasks.Filter by Metadata Field
Filter by Date Range
Count by Category
Check if Metadata Field Exists
Numeric Comparison on JSON Field
Get Transcripts with Their Agent Runs
Examples
Basic Queries
Recent Runs
Aggregations
Completion Rate by Environment
Aggregates per-environment success rates by normalizing metadata into a CTE.Joins and CTEs
Transcript Counts per Group
Transcript Coverage Audit
Finds transcript groups that are marked asmust_have but have no associated transcripts.
Advanced Patterns
Flagged Judge Results
Latest Rubric Scores by Model
Pulls the most recent rubric result per run, then joins runs to surface the model responsible for the score.Troubleshooting
”column X does not exist”
- DQL requires explicit column selection. Wildcards (
*) are not supported. - Check the schema using
client.get_dql_schema(collection_id)to see available columns.
Numeric comparisons not working as expected
JSON fields are strings by default. Cast them for numeric operations:Query returns no results but data exists
- Check that you’re querying the correct collection
- Verify metadata field names are exact matches (case-sensitive)
- Use
?operator to check if a field exists before filtering on it
Results truncated unexpectedly
DQL caps results at 10,000 rows. UseLIMIT and OFFSET for pagination:
“syntax error” on valid-looking SQL
Some SQL features aren’t supported in DQL:- No
*wildcard in SELECT - No
INSERT,UPDATE,DELETE - No user-defined functions
Restrictions and Best Practices
- Read-only: Only
SELECT-style queries are permitted. Use bulk exports or ingestion utilities to modify data outside of DQL. - Single statement: Batches or multiple statements are rejected to avoid mixed workloads.
- Explicit projection: Wildcard projections (
*) are disallowed. List the columns you need so downstream tooling (schema builders, type generation) stays predictable. - Collection scoping: A single query can only access data within a single collection.
- Limit enforcement: Every query is capped at 10,000 rows by the server. If you omit
LIMITor request more, Docent automatically applies the cap—use pagination (OFFSET/LIMIT) or offline exports for larger result sets. - JSON performance: Metadata fields are stored as JSON; heavy traversal across large collections can be slower than filtering on indexed scalar columns. Prefer top-level fields when available.
- Type awareness: JSON metadata paths are exposed in the schema with a generic
jsontype. Since the underlying values may be strings, numbers, or other JSON types, cast values explicitly when precision matters (e.g.,CAST(metadata_json->>'duration_ms' AS BIGINT)).

