Rubrics define evaluation criteria for agent runs. A judge is an LLM configured to evaluate
runs against a rubric. See Rubrics and Judges for concepts.
Custom prompt templates. Each has a role ("system", "user", "assistant")
and content string. The content can use {rubric}, {agent_run}, and
{output_schema} template variables.
Download a rubric configuration and create a callable judge instance. The judge reads
LLM provider API keys from environment variables (OPENAI_API_KEY, ANTHROPIC_API_KEY, etc.).
Copy
Ask AI
judge = client.get_judge("my-collection-id", rubric_id)# Inspect the configurationprint(judge.cfg.rubric_text)print(judge.cfg.judge_model)# Run locally (async)import asyncioasync def evaluate(): run = client.get_agent_run("my-collection-id", "run-id-123") result = await judge(run) print(result.output) # {"label": "pass", "explanation": "..."} print(result.result_type) # ResultType.DIRECT_RESULTasyncio.run(evaluate())
Evaluate an agent run. Returns a JudgeResult with output, result_type,
and result_metadata fields.
Running a judge locally requires the appropriate LLM provider API key set in your
environment (e.g., OPENAI_API_KEY, ANTHROPIC_API_KEY). The required provider
depends on the rubric’s judge_model configuration.