---
title: Getting started | Letta Docs
description: Get started with Letta evaluations by creating datasets, running tests, and analyzing agent outputs.
---

Run your first Letta agent evaluation in 5 minutes.

## Prerequisites

- Python 3.11 or higher
- A running Letta server (Docker or Letta API)
- A Letta agent to test, either in agent file format or by ID (see [Targets](/guides/evals/concepts/targets/index.md) for more details)

## Installation

Terminal window

```
pip install letta-evals
```

Or with uv:

Terminal window

```
uv pip install letta-evals
```

## Getting an Agent to Test

Export an existing agent to a file using the Letta SDK:

```
from letta_client import Letta
import os


# Connect to the Letta API
client = Letta(api_key=os.getenv("LETTA_API_KEY"))


# Export an agent to a file
agent_file = client.agents.export_file(agent_id="agent-123")


# Save to disk
with open("my_agent.af", "w") as f:
    f.write(agent_file)
```

Or export via the Agent Development Environment (ADE) by selecting “Export Agent”.

Then reference it in your suite:

```
target:
  kind: agent
  agent_file: my_agent.af
```

**Other options:** You can also use existing agents by ID or programmatically generate agents. See [Targets](/guides/evals/concepts/targets/index.md) for all agent configuration options.

## Quick Start

Let’s create your first evaluation in 3 steps:

### 1. Create a Test Dataset

Create a file named `dataset.jsonl`:

```
{"input": "What's the capital of France?", "ground_truth": "Paris"}
{"input": "Calculate 2+2", "ground_truth": "4"}
{"input": "What color is the sky?", "ground_truth": "blue"}
```

Each line is a JSON object with:

- `input`: The prompt to send to your agent
- `ground_truth`: The expected answer (used for grading)

`ground_truth` is optional for some graders (like rubric graders), but required for tool graders like `contains` and `exact_match`.

Read more about [Datasets](/guides/evals/concepts/datasets/index.md) for details on how to create your dataset.

### 2. Create a Suite Configuration

Create a file named `suite.yaml`:

```
name: my-first-eval
dataset: dataset.jsonl


target:
  kind: agent
  agent_file: my_agent.af # Path to your agent file
  base_url: https://api.letta.com # Letta API (default)
  token: ${LETTA_API_KEY} # Your API key


graders:
  quality:
    kind: tool
    function: contains # Check if response contains the ground truth
    extractor: last_assistant # Use the last assistant message


gate:
  metric_key: quality
  op: gte
  value: 0.75 # Require 75% pass rate
```

The suite configuration defines:

- The [dataset](/guides/evals/concepts/datasets/index.md) to use
- The [agent](/guides/evals/concepts/targets/index.md) to test
- The [graders](/guides/evals/concepts/graders/index.md) to use
- The [gate](/guides/evals/concepts/gates/index.md) criteria

Read more about [Suites](/guides/evals/concepts/suites/index.md) for details on how to configure your evaluation.

### 3. Run the Evaluation

Run your evaluation with the following command:

Terminal window

```
letta-evals run suite.yaml
```

You’ll see real-time progress as your evaluation runs:

```
Running evaluation: my-first-eval
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3/3 100%
✓ PASSED (2.25/3.00 avg, 75.0% pass rate)
```

Read more about [CLI Commands](/guides/evals/cli/commands/index.md) for details about the available commands and options.

## Understanding the Results

The core evaluation flow is:

**Dataset → Target (Agent) → Extractor → Grader → Gate → Result**

The evaluation runner:

1. Loads your dataset
2. Sends each input to your agent (Target)
3. Extracts the relevant information (using the Extractor)
4. Grades the response (using the Grader function)
5. Computes aggregate metrics
6. Checks if metrics pass the Gate criteria

The output shows:

- **Average score**: Mean score across all samples
- **Pass rate**: Percentage of samples that passed
- **Gate status**: Whether the evaluation passed or failed overall

## Next Steps

Now that you’ve run your first evaluation, explore more advanced features:

- [Core Concepts](/guides/evals/concepts/overview/index.md) - Understand suites, datasets, graders, and extractors
- [Grader Types](/guides/evals/concepts/graders/index.md) - Learn about tool graders vs rubric graders
- [Multi-Metric Evaluation](/guides/evals/graders/multi-metric/index.md) - Test multiple aspects simultaneously
- [Custom Graders](/guides/evals/advanced/custom-graders/index.md) - Write custom grading functions
- [Multi-Turn Conversations](/guides/evals/advanced/multi-turn-conversations/index.md) - Test conversational memory

## Common Use Cases

### Strict Answer Checking

Use exact matching for cases where the answer must be precisely correct:

```
graders:
  accuracy:
    kind: tool
    function: exact_match
    extractor: last_assistant
```

### Subjective Quality Evaluation

Use an LLM judge to evaluate subjective qualities like helpfulness or tone:

```
graders:
  quality:
    kind: rubric
    prompt_path: rubric.txt
    model: gpt-4o-mini
    extractor: last_assistant
```

Then create `rubric.txt`:

```
Rate the helpfulness and accuracy of the response.
- Score 1.0 if helpful and accurate
- Score 0.5 if partially helpful
- Score 0.0 if unhelpful or wrong
```

### Testing Tool Calls

Verify that your agent calls specific tools with expected arguments:

```
graders:
  tool_check:
    kind: tool
    function: contains
    extractor: tool_arguments
    extractor_config:
      tool_name: search
```

### Testing Memory Persistence

Check if the agent correctly updates its memory blocks:

```
graders:
  memory_check:
    kind: tool
    function: contains
    extractor: memory_block
    extractor_config:
      block_label: human
```

## Troubleshooting

**“Agent file not found”**

Make sure your `agent_file` path is correct. Paths are relative to the suite YAML file location. Use absolute paths if needed:

```
target:
  agent_file: /absolute/path/to/my_agent.af
```

**“Connection refused”**

Your Letta server isn’t running or isn’t accessible. Start it using Docker:

Terminal window

```
docker run -p 8283:8283 -e OPENAI_API_KEY="your_api_key" letta/letta:latest
```

By default, it runs at `http://localhost:8283`. See the [self-hosting guide](/guides/docker/index.md) for more information.

**“No ground\_truth provided”**

Tool graders like `exact_match` and `contains` require `ground_truth` in your dataset. Either:

- Add `ground_truth` to your samples, or
- Use a rubric grader which doesn’t require ground truth

**Agent didn’t respond as expected**

Try testing your agent manually first using the Letta SDK or Agent Development Environment (ADE) to see how it behaves before running evaluations. See the [Letta documentation](https://docs.letta.com) for more information.

For more help, see the [Troubleshooting Guide](/guides/evals/troubleshooting/index.md).
