Rubric Graders
Rubric graders use language models to evaluate submissions based on custom criteria. They’re ideal for subjective, nuanced evaluation.
Rubric graders work by providing the LLM with a prompt that describes the evaluation criteria, then the language model generates a structured JSON response with a score and rationale.
Basic Configuration
Rubric Prompt Format
Your rubric file should describe the evaluation criteria. Use placeholders:
{input}
: The original input from the dataset{submission}
: The extracted agent response{ground_truth}
: Ground truth from dataset (if available)
Example quality_rubric.txt
:
Model Configuration
Agent-as-Judge
Use a Letta agent as the judge instead of a direct LLM API call:
Requirements: The judge agent must have a tool with signature submit_grade(score: float, rationale: str)
.
Next Steps
- Tool Graders - Deterministic grading functions
- Multi-Metric - Combine multiple graders
- Custom Graders - Write your own grading logic