> ## Documentation Index
> Fetch the complete documentation index at: https://docs.transluce.org/llms.txt
> Use this file to discover all available pages before exploring further.

# Structured Queries (DQL)

> A read-only SQL subset for structured, quantitative questions about your data

Docent Query Language (DQL) is a read-only SQL subset for ad-hoc queries over a Docent collection. Queries can only run over a single collection by design (if you need multi-collection support, please reach out to us!)

The Docent Agent is the most ergonomic way to write DQL. To write DQL by hand, see the [DQL schema reference](/sdk/dql/schema) for the column schema of each table and the [execute\_dql reference](/sdk/dql/execute) for Python SDK methods, operators, and common patterns. For filtering agent runs (`select_agent_run_ids`), see [Query Agent Runs](/sdk/agent-runs/query).

## When to use DQL

DQL is great for structured queries like getting the average reward by model, or identifying tasks where one model regressed compared to another. Ask the agent:

```text wrap theme={null}
/docent What's the average reward by model across this collection?
```

```text wrap theme={null}
/docent Which tasks had one model scoring lower than another when averaging across rollouts?
```

## A few illustrative queries

### A simple `SELECT`

```sql theme={null}
SELECT
  id,
  name,
  metadata_json->'model'->>'name' AS model_name,
  created_at
FROM agent_runs
WHERE metadata_json->>'status' = 'completed'
ORDER BY created_at DESC
LIMIT 10;
```

### Aggregating with CTEs

Per-environment success rates, normalized via a CTE.

```sql theme={null}
WITH normalized_runs AS (
  SELECT
    metadata_json->>'environment' AS environment,
    metadata_json->>'status' AS status
  FROM agent_runs
  WHERE metadata_json ? 'environment'
)
SELECT
  environment,
  COUNT() AS total_runs,
  SUM(CASE WHEN status = 'completed' THEN 1 ELSE 0 END) AS completed_runs,
  CAST(SUM(CASE WHEN status = 'completed' THEN 1 ELSE 0 END) AS DOUBLE PRECISION)
    / NULLIF(COUNT(), 0) AS completion_rate
FROM normalized_runs
GROUP BY environment
ORDER BY total_runs DESC;
```

### Joining runs with judge results

Pulls the most recent rubric result per run, then joins to surface the model responsible for each score.

```sql theme={null}
WITH latest_scores AS (
  SELECT
    agent_run_id,
    MAX(rubric_version) AS rubric_version
  FROM judge_results
  WHERE rubric_id = 'helpful_response_v1'
  GROUP BY agent_run_id
)
SELECT
  ar.id,
  ar.metadata_json->'model'->>'name' AS model_name,
  jr.output->>'score' AS score,
  jr.result_metadata->>'label' AS label
FROM latest_scores ls
JOIN judge_results jr
  ON jr.agent_run_id = ls.agent_run_id
  AND jr.rubric_version = ls.rubric_version
  AND jr.rubric_id = 'helpful_response_v1'
JOIN agent_runs ar ON ar.id = jr.agent_run_id
WHERE ar.metadata_json->>'environment' = 'prod'
ORDER BY CAST(jr.output->>'score' AS DOUBLE PRECISION) DESC
LIMIT 15;
```

## What DQL does and doesn't do

* **Read-only.** Only `SELECT`-style queries are permitted.
* **Single statement.** Batches and multiple statements are rejected.
* **Explicit projection.** Wildcard `*` is disallowed; list columns explicitly so downstream tooling stays predictable.
* **Single-collection scope.** A query runs against one collection at a time.
* **10,000 row cap.** Every query is capped by the server. Use `LIMIT`/`OFFSET` for pagination, or export offline for larger sets.
* **JSON performance.** Metadata is stored as JSON; heavy traversal across large collections is slower than filtering on indexed scalar columns.
* **Type awareness.** JSON paths expose a generic `json` type. Cast explicitly (e.g., `CAST(metadata_json->>'duration_ms' AS BIGINT)`) when precision matters.

## Writing DQL by hand

See the [full schema reference](/sdk/dql/schema) for the column schema of each table.
