> ## Documentation Index > Fetch the complete documentation index at: https://docs.transluce.org/llms.txt > Use this file to discover all available pages before exploring further. # Run Evaluations > Start evaluation jobs and track their progress We no longer recommend running rubric evaluation jobs as a primary workflow. The [Docent plugin](/installation) generates [Reading steps](/analysis/reading-steps) inside an [Analysis Plan](/analysis/analysis-plans) for you. This SDK reference is kept for users with existing evaluation jobs. Evaluation jobs run a rubric's judge against agent runs in a collection. The evaluation runs server-side — you start the job and monitor progress. See [Rubrics and Judges](/legacy/rubrics) for evaluation concepts. ## Start an Evaluation Job ```python theme={null} from docent import Docent client = Docent() job_id = client.start_rubric_eval_job( "my-collection-id", rubric_id="rubric-123", max_agent_runs=500, ) print(f"Started evaluation job: {job_id}") ``` ### Parameters ID of the collection. ID of the rubric to evaluate with. Maximum number of agent runs to evaluate. If `None`, evaluates all runs in the collection. Number of independent judge rollouts per agent run. More rollouts improve reliability at the cost of more LLM calls. Backend concurrency limit for the evaluation job. If `None`, uses the server default. Whether the judge prompt should include agent run metadata. ### Returns ID of the created (or reused) evaluation job. If an identical job is already running, its ID is returned instead of creating a duplicate. *** ## Get Evaluation Results Retrieve the current state of a rubric evaluation, including results and progress. ```python theme={null} state = client.get_rubric_run_state("my-collection-id", "rubric-123") print(f"Total results needed: {state['total_results_needed']}") print(f"Results so far: {len(state.get('results', []))}") ``` ### Parameters ID of the collection. ID of the rubric. Rubric version. If `None`, uses the latest version. Optional filter to apply to results. Whether to include failed judge results in the response. ### Returns Evaluation state. List of per-agent-run result groups. Each entry contains: The agent run that was evaluated. The rubric used. The rubric version used. List of individual judge results, each with `output`, `result_type`, and `result_metadata`. Reflection data, if the judge variant uses multi-reflection. ID of the evaluation job, if one exists. Status of the job: `"pending"`, `"running"`, `"completed"`, or `"canceled"`. Total number of results expected when evaluation is complete. Number of results completed so far. `get_rubric_run_state` does **not** start an evaluation. Use `start_rubric_eval_job()` first, then poll `get_rubric_run_state()` to check progress. *** ## Example: Run and Monitor an Evaluation ```python theme={null} import time from docent import Docent client = Docent() collection_id = "my-collection-id" rubric_id = "rubric-123" # Start evaluation job_id = client.start_rubric_eval_job(collection_id, rubric_id) print(f"Started job: {job_id}") # Poll for completion while True: state = client.get_rubric_run_state(collection_id, rubric_id) current = state.get("current_results_count", 0) total = state.get("total_results_needed", 0) print(f"Progress: {current}/{total}") if state.get("job_status") in ("completed", "canceled"): break time.sleep(5) # Analyze results — each entry groups judge results by agent run for entry in state.get("results", []): for judge_result in entry["results"]: print(f"Run {entry['agent_run_id']}: {judge_result['output']}") ```