Multi-Metric Evaluation
Evaluate multiple aspects of agent performance simultaneously in a single evaluation suite.
Multi-metric evaluation allows you to define multiple graders, each measuring a different dimension of your agent’s behavior.
Why Multiple Metrics?
Agents are complex systems. You might want to evaluate:
- Correctness: Does the answer match the expected output?
- Quality: Is the explanation clear and complete?
- Tool usage: Does the agent call the right tools with correct arguments?
- Memory: Does the agent correctly update its memory blocks?
- Format: Does the output follow required formatting rules?
Configuration
Gating on One Metric
The gate can check any of these metrics:
Results will include scores for all graders, even if you only gate on one.
Next Steps
- Tool Graders - Deterministic evaluation
- Rubric Graders - LLM-as-judge evaluation
- Gates - Setting pass/fail criteria