Walkthrough
Let’s check for issues with the agent scaffolding that might have caused spurious failures. First, we filter to runs where the agent failed, then search forpotential issues with the environment the agent is operating in:
We can then cluster the results and see what the most common issues are:
Sharing results
You can open access permissions to share these results with anyone: You can also link to specific parts of the agent run:Tips for using search
- If you don’t precisely know what you’re looking for, start with a general rubric (e.g., “cases of cheating” or “types of environment issues”). Then, based on initial results, refine your rubric.
- If you do know what you’re looking for, feel free to provide lots of detail in your rubric; that’s why the text box is so large.
- Use appropriate metadata filters to narrow the scope of your search.
Customizing the judge output schema
Judges produce data in a JSON format. Each judge has an associated schema that is used to prompt the language model and validate its output. The default schema is:properties.label.enum. See the JSON Schema documentation for more information on how to write a schema.
"citations" is a non-standard keyword which indicates whether a string property should include citations to a run’s transcript(s). If any part of the schema uses citations, the judge model will receive a prompt about how to write them. Citations are rendered as clickable links.
Retrieving results from the SDK
The Python SDK exposes rubric results viaget_rubric_run_state, given a Collection ID (collection_id) and rubric_id.
get_rubric_run_state doesn’t run a search, it just retrieves the results from a completed rubric evaluation along with job status and total agent runs.
collection_id to use, you can call client.list_collections() to find the right Collection.
How to get rubrics for the current Collection
How to get rubrics for the current Collection
For programmatic access to rubrics, you can use
list_rubrics to get a list of rubric objects given a collection_idget_clustering_state with the rubric_id.
get_cluster_assignments to see which rubric results match which clusters.

