Manage Rubrics

We no longer recommend authoring rubrics by hand. The Docent plugin generates Reading steps inside an Analysis Plan for you. This SDK reference is kept for users with existing rubrics.

Rubrics define evaluation criteria for agent runs. A judge is an LLM configured to evaluate runs against a rubric. See Rubrics and Judges for concepts.

Create a Rubric

from docent import Docent
from docent.judges.types import Rubric

client = Docent()

rubric = Rubric(
    rubric_text="""
    Evaluate whether the agent successfully completed the user's request.

    Decision procedure:
    1. Identify what the user asked for
    2. Check if the agent's final response addresses the request
    3. Verify the response is factually correct
    """,
    output_schema={
        "type": "object",
        "properties": {
            "label": {"type": "string", "enum": ["pass", "fail"]},
            "explanation": {"type": "string", "citations": True},
        },
        "required": ["label", "explanation"],
    },
)

rubric_id = client.create_rubric("my-collection-id", rubric)
print(rubric_id)

Parameters

collection_id

str

required

ID of the collection.

rubric

Rubric

required

The rubric configuration. Must have version=1 for new rubrics.

Show Rubric fields

rubric_text

str

required

The evaluation criteria and decision procedure. This is the core content the judge uses to evaluate agent runs.

output_schema

dict

JSON schema for the judge’s output. Default schema has label (enum: match/no match) and explanation (string with citations) fields.

judge_model

ModelOption

LLM model to use for judging. Uses the platform default if not specified.

n_rollouts_per_input

int

default:"1"

Number of independent judge evaluations per agent run. Used with majority voting or multi-reflection judge variants.

judge_variant

str

default:"majority"

Judge strategy: "majority" for majority voting, "multi-reflect" for multi-stage reflection.

prompt_templates

list[PromptTemplateMessage]

Custom prompt templates. Each has a role ("system", "user", "assistant") and content string. The content can use {rubric}, {agent_run}, and {output_schema} template variables.

output_parsing_mode

str

default:"xml_key"

How to parse judge output: "xml_key" extracts from XML tags, "constrained_decoding" parses entire output as JSON.

response_xml_key

str

default:"response"

XML tag name for extracting output (when using xml_key parsing mode).

output_format

Literal["json", "yaml"]

default:"yaml"

Format the judge is instructed to emit and that the SDK parses. "yaml" is the default for new rubrics; "json" is preserved for rubrics created before this field existed.

Returns

rubric_id

str

The ID of the created rubric.

Get a Rubric

rubric = client.get_rubric("my-collection-id", rubric_id)
print(rubric.rubric_text)
print(rubric.output_schema)

Parameters

collection_id

str

required

ID of the collection.

rubric_id

str

required

ID of the rubric to retrieve.

version

int | None

Specific version number. If None, returns the latest version.

Returns

rubric

Rubric

The rubric configuration object.

List Rubrics

rubrics = client.list_rubrics("my-collection-id")
for r in rubrics:
    print(f"{r['id']}: {r.get('rubric_text', '')[:80]}")

Parameters

collection_id

str

required

ID of the collection.

Returns

rubrics

list[dict]

List of rubric information dictionaries.

Get a Judge

Download a rubric configuration and create a callable judge instance. The judge reads LLM provider API keys from environment variables (OPENAI_API_KEY, ANTHROPIC_API_KEY, etc.).

judge = client.get_judge("my-collection-id", rubric_id)

# Inspect the configuration
print(judge.cfg.rubric_text)
print(judge.cfg.judge_model)

# Run locally (async)
import asyncio

async def evaluate():
    run = client.get_agent_run("my-collection-id", "run-id-123")
    result = await judge(run)
    print(result.output)  # {"label": "pass", "explanation": "..."}
    print(result.result_type)  # ResultType.DIRECT_RESULT

asyncio.run(evaluate())

Parameters

collection_id

str

required

ID of the collection.

rubric_id

str

required

ID of the rubric/judge to retrieve.

version

int | None

Specific version number. If None, returns the latest version.

Returns

judge

BaseJudge

A callable judge instance. Use await judge(agent_run) to evaluate a run.

Show BaseJudge interface

cfg

Rubric

The underlying rubric configuration.

__call__(agent_run, *, temperature=1.0, max_new_tokens=16384, timeout=180.0)

async -> JudgeResult

Evaluate an agent run. Returns a JudgeResult with output, result_type, and result_metadata fields.

Running a judge locally requires the appropriate LLM provider API key set in your environment (e.g., OPENAI_API_KEY, ANTHROPIC_API_KEY). The required provider depends on the rubric’s judge_model configuration.

​Create a Rubric

​Parameters

​Returns

​Get a Rubric

​Parameters

​Returns

​List Rubrics

​Parameters

​Returns

​Get a Judge

​Parameters

​Returns

Create a Rubric

Parameters

Returns

Get a Rubric

Parameters

Returns

List Rubrics

Parameters

Returns

Get a Judge

Parameters

Returns