CLI Commands

The letta-evals command-line interface lets you run evaluations, validate configurations, and inspect available components.

Quick overview:

  • run - Execute an evaluation suite (most common)
  • validate - Check suite configuration without running
  • list-extractors - Show available extractors
  • list-graders - Show available grader functions
  • Exit codes - 0 for pass, 1 for fail (perfect for CI/CD)

Typical workflow:

  1. Validate your suite: letta-evals validate suite.yaml
  2. Run evaluation: letta-evals run suite.yaml --output results/
  3. Check exit code: echo $? (0 = passed, 1 = failed)

run

Run an evaluation suite.

$letta-evals run <suite.yaml> [options]

Arguments

  • suite.yaml: Path to the suite configuration file (required)

Options

—output, -o

Save results to a directory.

$letta-evals run suite.yaml --output results/

Creates:

  • results/header.json: Evaluation metadata
  • results/summary.json: Aggregate metrics and configuration
  • results/results.jsonl: Per-sample results (one JSON per line)

—quiet, -q

Quiet mode - only show pass/fail result.

$letta-evals run suite.yaml --quiet

Output:

✓ PASSED

—max-concurrent

Maximum concurrent sample evaluations. Default: 15

$letta-evals run suite.yaml --max-concurrent 10

Higher values = faster evaluation but more resource usage.

—api-key

Letta API key (overrides LETTA_API_KEY environment variable).

$letta-evals run suite.yaml --api-key your-key

—base-url

Letta server base URL (overrides suite config and environment variable).

$letta-evals run suite.yaml --base-url http://localhost:8283

—project-id

Letta project ID for cloud deployments.

$letta-evals run suite.yaml --project-id proj_abc123

—cached, -c

Path to cached results (JSONL) for re-grading trajectories without re-running the agent.

$letta-evals run suite.yaml --cached previous_results.jsonl

Use this to test different graders on the same agent trajectories.

—num-runs

Run the evaluation multiple times to measure consistency. Default: 1

$letta-evals run suite.yaml --num-runs 10

Output with multiple runs:

  • Each run creates a separate run_N/ directory with individual results
  • An aggregate_stats.json file contains statistics across all runs (mean, standard deviation, pass rate)

Examples

Basic run:

$letta-evals run suite.yaml # Run evaluation, show results in terminal

Save results:

$letta-evals run suite.yaml --output evaluation-results/ # Save to directory

Letta Cloud:

$letta-evals run suite.yaml \
> --base-url https://api.letta.com \
> --api-key $LETTA_API_KEY \
> --project-id proj_abc123

Quiet CI mode:

$letta-evals run suite.yaml --quiet
>if [ $? -eq 0 ]; then
> echo "Evaluation passed"
>else
> echo "Evaluation failed"
> exit 1
>fi

Exit Codes

  • 0: Evaluation passed (gate criteria met)
  • 1: Evaluation failed (gate criteria not met or error)

validate

Validate a suite configuration without running it.

$letta-evals validate <suite.yaml>

Checks:

  • YAML syntax is valid
  • Required fields are present
  • Paths exist
  • Configuration is consistent
  • Grader/extractor combinations are valid

Output on success:

✓ Suite configuration is valid

Output on error:

✗ Validation failed:
- Agent file not found: agent.af
- Grader 'my_metric' references unknown function

list-extractors

List all available extractors.

$letta-evals list-extractors

Output:

Available extractors:
last_assistant - Extract the last assistant message
first_assistant - Extract the first assistant message
all_assistant - Concatenate all assistant messages
pattern - Extract content matching regex
tool_arguments - Extract tool call arguments
tool_output - Extract tool return value
after_marker - Extract content after a marker
memory_block - Extract from memory block (requires agent_state)

list-graders

List all available grader functions.

$letta-evals list-graders

Output:

Available graders:
exact_match - Exact string match with ground_truth
contains - Check if contains ground_truth
regex_match - Match regex pattern
ascii_printable_only - Validate ASCII-only content

help

Show help information.

$letta-evals --help

Show help for a specific command:

$letta-evals run --help
>letta-evals validate --help

Environment Variables

LETTA_API_KEY

API key for Letta authentication.

$export LETTA_API_KEY=your-key-here

LETTA_BASE_URL

Letta server base URL.

$export LETTA_BASE_URL=http://localhost:8283

LETTA_PROJECT_ID

Letta project ID (for cloud).

$export LETTA_PROJECT_ID=proj_abc123

OPENAI_API_KEY

OpenAI API key (for rubric graders).

$export OPENAI_API_KEY=your-openai-key

Configuration Priority

Configuration values are resolved in this order (highest to lowest priority):

  1. CLI arguments (--api-key, --base-url, --project-id)
  2. Suite YAML configuration
  3. Environment variables

Using in CI/CD

GitHub Actions

1name: Run Evals
2on: [push]
3
4jobs:
5 evaluate:
6 runs-on: ubuntu-latest
7 steps:
8 - uses: actions/checkout@v2
9
10 - name: Install dependencies
11 run: pip install letta-evals
12
13 - name: Run evaluation
14 env:
15 LETTA_API_KEY: ${{ secrets.LETTA_API_KEY }}
16 OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
17 run: |
18 letta-evals run suite.yaml --quiet --output results/
19
20 - name: Upload results
21 uses: actions/upload-artifact@v2
22 with:
23 name: eval-results
24 path: results/

GitLab CI

1evaluate:
2 script:
3 - pip install letta-evals
4 - letta-evals run suite.yaml --quiet --output results/
5 artifacts:
6 paths:
7 - results/
8 variables:
9 LETTA_API_KEY: $LETTA_API_KEY
10 OPENAI_API_KEY: $OPENAI_API_KEY

Debugging

Common Issues

“Agent file not found”

$# Check file exists relative to suite YAML location
>ls -la path/to/agent.af

“Connection refused”

$# Verify Letta server is running
>curl http://localhost:8283/v1/health

“Invalid API key”

$# Check environment variable is set
>echo $LETTA_API_KEY

Next Steps