Skip to main content
Rubrics define evaluation criteria for agent runs. A judge is an LLM configured to evaluate runs against a rubric. See Rubrics and Judges for concepts.

Create a Rubric

from docent import Docent
from docent.judges.types import Rubric

client = Docent()

rubric = Rubric(
    rubric_text="""
    Evaluate whether the agent successfully completed the user's request.

    Decision procedure:
    1. Identify what the user asked for
    2. Check if the agent's final response addresses the request
    3. Verify the response is factually correct
    """,
    output_schema={
        "type": "object",
        "properties": {
            "label": {"type": "string", "enum": ["pass", "fail"]},
            "explanation": {"type": "string", "citations": True},
        },
        "required": ["label", "explanation"],
    },
)

rubric_id = client.create_rubric("my-collection-id", rubric)
print(rubric_id)

Parameters

collection_id
str
required
ID of the collection.
rubric
Rubric
required
The rubric configuration. Must have version=1 for new rubrics.

Returns

rubric_id
str
The ID of the created rubric.

Get a Rubric

rubric = client.get_rubric("my-collection-id", rubric_id)
print(rubric.rubric_text)
print(rubric.output_schema)

Parameters

collection_id
str
required
ID of the collection.
rubric_id
str
required
ID of the rubric to retrieve.
version
int | None
Specific version number. If None, returns the latest version.

Returns

rubric
Rubric
The rubric configuration object.

List Rubrics

rubrics = client.list_rubrics("my-collection-id")
for r in rubrics:
    print(f"{r['id']}: {r.get('rubric_text', '')[:80]}")

Parameters

collection_id
str
required
ID of the collection.

Returns

rubrics
list[dict]
List of rubric information dictionaries.

Get a Judge

Download a rubric configuration and create a callable judge instance. The judge reads LLM provider API keys from environment variables (OPENAI_API_KEY, ANTHROPIC_API_KEY, etc.).
judge = client.get_judge("my-collection-id", rubric_id)

# Inspect the configuration
print(judge.cfg.rubric_text)
print(judge.cfg.judge_model)

# Run locally (async)
import asyncio

async def evaluate():
    run = client.get_agent_run("my-collection-id", "run-id-123")
    result = await judge(run)
    print(result.output)  # {"label": "pass", "explanation": "..."}
    print(result.result_type)  # ResultType.DIRECT_RESULT

asyncio.run(evaluate())

Parameters

collection_id
str
required
ID of the collection.
rubric_id
str
required
ID of the rubric/judge to retrieve.
version
int | None
Specific version number. If None, returns the latest version.

Returns

judge
BaseJudge
A callable judge instance. Use await judge(agent_run) to evaluate a run.
Running a judge locally requires the appropriate LLM provider API key set in your environment (e.g., OPENAI_API_KEY, ANTHROPIC_API_KEY). The required provider depends on the rubric’s judge_model configuration.