Letta Evals

Systematic testing for stateful AI agents. Validate changes, prevent regressions, and ship with confidence.

Test agent memory, tool usage, multi-turn conversations, and state evolution with automated grading and pass/fail gates.

Ready to start? Jump to Getting Started or learn the Core Concepts first.

Core Concepts

Understand the building blocks of evaluations:

Grading & Extraction

Choose how to score your agents:

Advanced

Reference

Resources