Skip to main content
DQL (Docent Query Language) lets you query agent runs, metadata, and evaluation results using a SQL-like syntax.
For the conceptual overview of DQL, see Structured queries (DQL). For the column schema of each table, see DQL schema reference.

Execute a Query

from docent import Docent

client = Docent()

result = client.execute_dql(
    "my-collection-id",
    "SELECT id, metadata_json->>'model' AS model FROM agent_runs LIMIT 10",
)

Parameters

collection_id
str
required
ID of the collection to query.
dql
str
required
The DQL query string.
reading_plan_id
str | None
default:"None"
Optional reading plan ID for $alias substitution in queries that reference reading step aliases.
source
Literal["endpoint", "mcp"]
default:"\"endpoint\""
Analytics source for the execution path. Defaults to "endpoint"; the MCP server sets "mcp" automatically. You typically do not need to set this.
ensure_latest_results
bool
default:"False"
When True, the server refuses the optimized parquet path unless the parquet replica reflects every committed mutation, and runs the query against Postgres otherwise. Use this when you cannot tolerate parquet lag — for example, immediately after adding or deleting agent runs, or when verifying a write.

Returns

result
dict
Query result.

Errors

  • ValueErrordql is empty
  • HTTPError — Invalid DQL syntax or collection not found

Convert Results to Dicts

result = client.execute_dql(
    "my-collection-id",
    "SELECT id AS run_id, metadata_json->>'model' AS model FROM agent_runs LIMIT 5",
)

rows = client.dql_result_to_dicts(result)
for row in rows:
    print(row)  # {"run_id": "abc-123", "model": "gpt-4"}

Parameters

dql_result
dict
required
A result dict returned by execute_dql.

Returns

rows
list[dict]
List of dictionaries, one per row, with column names as keys.

Common Query Patterns

Filter by metadata

result = client.execute_dql(
    collection_id,
    """
    SELECT id, metadata_json->>'score' AS score
    FROM agent_runs
    WHERE metadata_json->>'model' = 'gpt-4'
      AND CAST(metadata_json->>'score' AS DOUBLE PRECISION) > 0.8
    """
)

Join with evaluation results

result = client.execute_dql(
    collection_id,
    """
    SELECT
        ar.id,
        jr.output->>'label' AS label,
        jr.output->>'explanation' AS explanation
    FROM agent_runs ar
    JOIN judge_results jr ON ar.id = jr.agent_run_id
    WHERE jr.rubric_id = 'rubric-123'
    """
)

Count and aggregate

result = client.execute_dql(
    collection_id,
    """
    SELECT
        jr.output->>'label' AS label,
        COUNT() AS count
    FROM judge_results jr
    WHERE jr.rubric_id = 'rubric-123'
    GROUP BY jr.output->>'label'
    """
)

Allowed syntax

DQL supported keywords:
Feature
SELECT, DISTINCT, FROM, WHERE, subqueries
JOIN, LEFT JOIN, RIGHT JOIN, FULL JOIN, CROSS JOIN
WITH (CTEs)
UNION [ALL], INTERSECT, EXCEPT
GROUP BY, HAVING
Aggregations (COUNT, AVG, MIN, MAX, SUM, STDDEV_POP, STDDEV_SAMP, ARRAY_AGG, STRING_AGG, JSON_AGG, JSONB_AGG, JSON_OBJECT_AGG, PERCENTILE_CONT, PERCENTILE_DISC (WITHIN GROUP))
Window functions (ROW_NUMBER, RANK, DENSE_RANK, NTILE, LAG, LEAD, FIRST_VALUE, LAST_VALUE, NTH_VALUE, PERCENT_RANK, CUME_DIST)
ORDER BY, LIMIT, OFFSET
Conditional & null helpers (CASE, COALESCE, NULLIF)
Boolean logic (AND, OR, NOT)
Comparison operators (=, !=, <, <=, >, >=, IS, IS NOT, IS DISTINCT FROM, IN, BETWEEN, LIKE, ILIKE, EXISTS, SIMILAR TO, ~, ~*, !~, !~*)
Arithmetic & math (+, -, *, /, %, POWER, ABS, SIGN, SQRT, LN, LOG, EXP, GREATEST, LEAST, FLOOR, CEIL, ROUND, RANDOM)
String helpers (SUBSTRING, LEFT, RIGHT, LENGTH, UPPER, LOWER, INITCAP, TRIM, REPLACE, SPLIT_PART, POSITION, CONCAT, CONCAT_WS, STRING_AGG)
JSON operators & functions (->, ->>, #>, #>>, @>, ?, `?, ?&, jsonb_array_length, json_agg, jsonb_agg, json_object_agg, convert_from, convert_to`)
Date/time basics (CURRENT_DATE, CURRENT_TIME, CURRENT_TIMESTAMP, NOW(), EXTRACT, DATE_TRUNC, AT TIME ZONE)
Interval arithmetic (timestamp +/- INTERVAL, INTERVAL literals)
Construction & conversion (MAKE_DATE, MAKE_TIME, MAKE_TIMESTAMP, TO_CHAR)
Array helpers (array_cat, unnest)
Type helpers (CAST, ::)
Unsupported constructs include wildcard * in SELECT clauses (e.g., SELECT *, COUNT(*)), user-defined functions, and any DDL or DML commands.

SQL patterns

Bare DQL snippets for common tasks.

Filter by metadata field

SELECT id, name FROM agent_runs
WHERE metadata_json->>'environment' = 'prod'

Filter by date range

SELECT id, name, created_at FROM agent_runs
WHERE created_at >= NOW() - INTERVAL '7 days'
ORDER BY created_at DESC

Count by category

SELECT
  metadata_json->>'model' AS model,
  COUNT() AS run_count
FROM agent_runs
GROUP BY metadata_json->>'model'
ORDER BY run_count DESC

Check if a metadata field exists

SELECT id, name FROM agent_runs
WHERE metadata_json ? 'custom_field'

Numeric comparison on a JSON field

SELECT id, name FROM agent_runs
WHERE CAST(metadata_json->>'score' AS DOUBLE PRECISION) > 0.8

Cast JSON for aggregation

SELECT
  AVG(CAST(metadata_json->>'latency_ms' AS DOUBLE PRECISION)) AS avg_latency_ms
FROM agent_runs
WHERE metadata_json ? 'latency_ms';

Counting transcript messages

transcripts.messages is stored as bytea (UTF-8 JSON), not jsonb. Operators like messages -> 0 or jsonb_array_length(messages) raise operator does not exist: bytea -> integer. Decode to jsonb first, then count array elements:
jsonb_array_length(convert_from(messages, 'UTF8')::jsonb)
The same applies to transcripts.metadata_json — decode with convert_from(metadata_json, 'UTF8')::jsonb before using JSON operators. Agent runs with at least N messages in any transcript:
SELECT DISTINCT ar.id AS agent_run_id
FROM agent_runs ar
JOIN transcripts t ON t.agent_run_id = ar.id
WHERE jsonb_array_length(convert_from(t.messages, 'UTF8')::jsonb) >= 10;
Per-transcript message counts (compute once in a subquery, then filter):
SELECT
  transcript_id,
  agent_run_id,
  message_count
FROM (
  SELECT
    t.id AS transcript_id,
    t.agent_run_id,
    jsonb_array_length(convert_from(t.messages, 'UTF8')::jsonb) AS message_count
  FROM transcripts t
) AS counted
WHERE message_count >= 10
ORDER BY message_count DESC;
Reading nested transcripts.metadata_json fields:
SELECT
  id,
  meta->'conversation'->>'speaker' AS speaker,
  meta->'conversation'->>'topic' AS topic
FROM (
  SELECT
    id,
    convert_from(metadata_json, 'UTF8')::jsonb AS meta
  FROM transcripts
) AS t
WHERE meta->>'status' = 'flagged';
Express filters like “≥10 messages” with the pattern above. Don’t materialize matching IDs elsewhere and paste them into a giant WHERE id IN ('…', '…', …) clause — that blows past query size limits and is hard to maintain.

Join transcripts with agent runs

SELECT
  ar.name AS run_name,
  t.name AS transcript_name
FROM agent_runs ar
JOIN transcripts t ON t.agent_run_id = ar.id

Transcript counts per group

SELECT
  tg.id AS group_id,
  tg.name AS group_name,
  COUNT(t.id) AS transcript_count
FROM transcript_groups tg
JOIN transcripts t ON t.transcript_group_id = tg.id
GROUP BY tg.id, tg.name
HAVING COUNT(t.id) > 1
ORDER BY transcript_count DESC;

Transcript coverage audit

Finds transcript groups that are marked as must_have but have no associated transcripts.
SELECT
  tg.id AS group_id,
  tg.name AS group_name,
  COUNT(t.id) AS transcript_count
FROM transcript_groups tg
LEFT JOIN transcripts t
  ON t.transcript_group_id = tg.id
  AND t.collection_id = tg.collection_id
WHERE tg.metadata_json->>'priority' = 'must_have'
GROUP BY tg.id, tg.name
HAVING COUNT(t.id) = 0
ORDER BY group_name;

Flagged judge results

SELECT
  jr.agent_run_id,
  jr.rubric_id,
  jr.result_metadata->>'label' AS label,
  jr.output->>'score' AS score
FROM judge_results jr
WHERE jr.result_metadata->>'severity' = 'high'
  AND EXISTS (
    SELECT 1
    FROM agent_runs ar
    WHERE ar.id = jr.agent_run_id
      AND ar.metadata_json->>'environment' = 'prod'
  )
ORDER BY score DESC
LIMIT 25;

Common gotchas

”column X does not exist”

  • DQL requires explicit column selection. Wildcards (*) are not supported.
  • Check the schema using client.get_dql_schema(collection_id) to see available columns.

Numeric comparisons not working as expected

JSON fields are strings by default. Cast them for numeric operations:
-- Wrong: string comparison
WHERE metadata_json->>'score' > '0.5'

-- Correct: numeric comparison
WHERE CAST(metadata_json->>'score' AS DOUBLE PRECISION) > 0.5

Query returns no results but data exists

  • Check that you’re querying the correct collection
  • Verify metadata field names are exact matches (case-sensitive)
  • Use ? operator to check if a field exists before filtering on it

Results truncated unexpectedly

DQL caps results at 10,000 rows. Use LIMIT and OFFSET for pagination:
-- First page
SELECT id, name FROM agent_runs LIMIT 1000 OFFSET 0
-- Second page
SELECT id, name FROM agent_runs LIMIT 1000 OFFSET 1000

“syntax error” on valid-looking SQL

Some SQL features aren’t supported in DQL:
  • No * wildcard in SELECT
  • No INSERT, UPDATE, DELETE
  • No user-defined functions