We no longer recommend authoring rubrics by hand. The Docent plugin generates Reading steps inside an Analysis Plan for you. This SDK reference is kept for users with existing rubrics.
Rubrics define evaluation criteria for agent runs. A judge is an LLM configured to evaluate
runs against a rubric. See Rubrics and Judges for concepts.
Custom prompt templates. Each has a role ("system", "user", "assistant")
and content string. The content can use {rubric}, {agent_run}, and
{output_schema} template variables.
Format the judge is instructed to emit and that the SDK parses. "yaml"
is the default for new rubrics; "json" is preserved for rubrics created
before this field existed.
Download a rubric configuration and create a callable judge instance. The judge reads
LLM provider API keys from environment variables (OPENAI_API_KEY, ANTHROPIC_API_KEY, etc.).
judge = client.get_judge("my-collection-id", rubric_id)# Inspect the configurationprint(judge.cfg.rubric_text)print(judge.cfg.judge_model)# Run locally (async)import asyncioasync def evaluate(): run = client.get_agent_run("my-collection-id", "run-id-123") result = await judge(run) print(result.output) # {"label": "pass", "explanation": "..."} print(result.result_type) # ResultType.DIRECT_RESULTasyncio.run(evaluate())
Evaluate an agent run. Returns a JudgeResult with output, result_type,
and result_metadata fields.
Running a judge locally requires the appropriate LLM provider API key set in your
environment (e.g., OPENAI_API_KEY, ANTHROPIC_API_KEY). The required provider
depends on the rubric’s judge_model configuration.