Skip to content

Letta Evals Documentation

Letta Evals Documentation

Welcome to the comprehensive documentation for Letta Evals Kit - a framework for evaluating Letta AI agents.

Table of Contents

Getting Started

Getting Started - Installation, first evaluation, and core concepts

Core Concepts

Overview - Understanding the evaluation framework
Suites - Evaluation suite configuration
Datasets - Creating and managing test datasets
Targets - What you’re evaluating
Graders - How responses are scored
Extractors - Extracting submissions from agent output
Gates - Pass/fail criteria

Graders

Grader Overview - Understanding grader types
Tool Graders - Built-in and custom function graders
Rubric Graders - LLM-as-judge evaluation
Multi-Metric Grading - Evaluating with multiple metrics

Extractors

Extractor Overview - Understanding extractors
Built-in Extractors - All available extractors
Custom Extractors - Writing your own extractors

Configuration

Suite YAML Reference - Complete YAML schema
Target Configuration - Target setup options
Grader Configuration - Grader parameters
Environment Variables - Environment setup

Advanced Usage

Custom Graders - Writing custom grading functions
Multi-Turn Conversations - Testing conversational memory and state
Agent Factories - Programmatic agent creation
Multi-Model Evaluation - Testing across models
Setup Scripts - Pre-evaluation setup
Memory Block Testing - Testing agent memory
Result Streaming - Real-time results and caching

Results & Metrics

Understanding Results - Result structure and interpretation
Metrics - Aggregate statistics
Output Formats - JSON, JSONL, and console output

CLI Reference

Commands - All CLI commands
Options - Command-line options

Examples

Example Walkthroughs - Detailed example explanations

API Reference

Data Models - Pydantic models reference
Decorators - @grader and @extractor decorators

Best Practices

Troubleshooting