---
title: Multi-metric evaluation | Letta Docs
description: Evaluate agents across multiple metrics simultaneously with composite grading functions.
---

Evaluate multiple aspects of agent performance simultaneously in a single evaluation suite.

Multi-metric evaluation allows you to define multiple graders, each measuring a different dimension of your agent’s behavior.

## Why Multiple Metrics?

Agents are complex systems. You might want to evaluate:

- **Correctness**: Does the answer match the expected output?
- **Quality**: Is the explanation clear and complete?
- **Tool usage**: Does the agent call the right tools with correct arguments?
- **Memory**: Does the agent correctly update its memory blocks?
- **Format**: Does the output follow required formatting rules?

## Configuration

```
graders:
  accuracy: # Check if answer is correct
    kind: tool
    function: exact_match
    extractor: last_assistant


  completeness: # LLM judges response quality
    kind: rubric
    prompt_path: rubrics/completeness.txt
    model: gpt-4o-mini
    extractor: last_assistant


  tool_usage: # Verify correct tool was called
    kind: tool
    function: contains
    extractor: tool_arguments
    extractor_config:
      tool_name: search
```

## Gating on One Metric

The gate can check any of these metrics:

```
gate:
  metric_key: accuracy # Gate on accuracy (others still computed)
  op: gte
  value: 0.9
```

Results will include scores for all graders, even if you only gate on one.

## Next Steps

- [Tool Graders](/guides/evals/graders/tool-graders/index.md) - Deterministic evaluation
- [Rubric Graders](/guides/evals/graders/rubric-graders/index.md) - LLM-as-judge evaluation
- [Gates](/guides/evals/concepts/gates/index.md) - Setting pass/fail criteria
