Evaluation jobs run a rubric’s judge against agent runs in a collection.
The evaluation runs server-side — you start the job and monitor progress.
See Rubrics and Judges for evaluation concepts.
Start an Evaluation Job
from docent import Docent
client = Docent()
job_id = client.start_rubric_eval_job(
"my-collection-id" ,
rubric_id = "rubric-123" ,
max_agent_runs = 500 ,
)
print ( f "Started evaluation job: { job_id } " )
Parameters
ID of the rubric to evaluate with.
Maximum number of agent runs to evaluate. If None, evaluates all runs in the collection.
Number of independent judge rollouts per agent run. More rollouts improve reliability
at the cost of more LLM calls.
Backend concurrency limit for the evaluation job. If None, uses the server default.
Whether the judge prompt should include agent run metadata.
Returns
ID of the created (or reused) evaluation job. If an identical job is already running,
its ID is returned instead of creating a duplicate.
Get Evaluation Results
Retrieve the current state of a rubric evaluation, including results and progress.
state = client.get_rubric_run_state( "my-collection-id" , "rubric-123" )
print ( f "Total results needed: { state[ 'total_results_needed' ] } " )
print ( f "Results so far: { len (state.get( 'results' , [])) } " )
Parameters
Rubric version. If None, uses the latest version.
Optional filter to apply to results.
Whether to include failed judge results in the response.
Returns
Evaluation state. List of per-agent-run result groups. Each entry contains: Show AgentRunJudgeResults fields
The agent run that was evaluated.
List of individual judge results, each with output, result_type, and result_metadata.
Reflection data, if the judge variant uses multi-reflection.
ID of the evaluation job, if one exists.
Status of the job: "pending", "running", "completed", or "canceled".
Total number of results expected when evaluation is complete.
Number of results completed so far.
get_rubric_run_state does not start an evaluation. Use start_rubric_eval_job()
first, then poll get_rubric_run_state() to check progress.
Example: Run and Monitor an Evaluation
import time
from docent import Docent
client = Docent()
collection_id = "my-collection-id"
rubric_id = "rubric-123"
# Start evaluation
job_id = client.start_rubric_eval_job(collection_id, rubric_id)
print ( f "Started job: { job_id } " )
# Poll for completion
while True :
state = client.get_rubric_run_state(collection_id, rubric_id)
current = state.get( "current_results_count" , 0 )
total = state.get( "total_results_needed" , 0 )
print ( f "Progress: { current } / { total } " )
if state.get( "job_status" ) in ( "completed" , "canceled" ):
break
time.sleep( 5 )
# Analyze results — each entry groups judge results by agent run
for entry in state.get( "results" , []):
for judge_result in entry[ "results" ]:
print ( f "Run { entry[ 'agent_run_id' ] } : { judge_result[ 'output' ] } " )