Custom Graders

Write your own grading functions to implement custom evaluation logic.

Custom graders let you implement domain-specific evaluation, parse complex formats, and apply custom scoring algorithms.

Basic Structure

1from letta_evals.decorators import grader
2from letta_evals.models import GradeResult, Sample
3
4@grader
5def my_custom_grader(sample: Sample, submission: str) -> GradeResult:
6 """Custom grading logic."""
7
8 # Your evaluation logic
9 score = calculate_score(submission, sample.ground_truth)
10
11 # Ensure score is between 0.0 and 1.0
12 score = max(0.0, min(1.0, score))
13
14 return GradeResult(
15 score=score,
16 rationale=f"Score based on custom logic: {score}"
17 )

Example: JSON Validation

1import json
2from letta_evals.decorators import grader
3from letta_evals.models import GradeResult, Sample
4
5@grader
6def valid_json(sample: Sample, submission: str) -> GradeResult:
7 """Check if submission is valid JSON."""
8 try:
9 json.loads(submission)
10 return GradeResult(score=1.0, rationale="Valid JSON")
11 except json.JSONDecodeError as e:
12 return GradeResult(score=0.0, rationale=f"Invalid JSON: {e}")

Registration

Custom graders are automatically registered when you import them in your suite’s setup script or custom evaluators file.

Configuration

1graders:
2 my_metric:
3 kind: tool
4 function: my_custom_grader # Your function name
5 extractor: last_assistant

Next Steps