CLI Commands

The letta-evals command-line interface lets you run evaluations, validate configurations, and inspect available components.

Quick overview:

run - Execute an evaluation suite (most common)
validate - Check suite configuration without running
list-extractors - Show available extractors
list-graders - Show available grader functions
Exit codes - 0 for pass, 1 for fail (perfect for CI/CD)

Typical workflow:

Validate your suite: letta-evals validate suite.yaml
Run evaluation: letta-evals run suite.yaml --output results/
Check exit code: echo $? (0 = passed, 1 = failed)

Letta Evals provides a command-line interface for running evaluations and managing configurations.

run

Run an evaluation suite.

letta-evals run <suite.yaml> [options]

Arguments

suite.yaml: Path to the suite configuration file (required)

Options

—output, -o

Save results to a directory.

letta-evals run suite.yaml --output results/

Creates:

results/header.json: Evaluation metadata
results/summary.json: Aggregate metrics and configuration
results/results.jsonl: Per-sample results (one JSON per line)

—quiet, -q

Quiet mode - only show pass/fail result.

letta-evals run suite.yaml --quiet

Output:

✓ PASSED

—max-concurrent

Maximum concurrent sample evaluations.

letta-evals run suite.yaml --max-concurrent 10

Default: 15

Higher values = faster evaluation but more resource usage.

—api-key

Letta API key (overrides LETTA_API_KEY environment variable).

letta-evals run suite.yaml --api-key your-key

—base-url

Letta server base URL (overrides suite config and environment variable).

letta-evals run suite.yaml --base-url http://localhost:8283

—project-id

Letta project ID for cloud deployments.

letta-evals run suite.yaml --project-id proj_abc123

—cached, -c

Path to cached results (JSONL) for re-grading trajectories without re-running the agent.

letta-evals run suite.yaml --cached previous_results.jsonl

Use this to test different graders on the same agent trajectories.

—num-runs

Run the evaluation multiple times to measure consistency and get aggregate statistics.

letta-evals run suite.yaml --num-runs 10

Default: 1 (single run)

Output with multiple runs:

Each run creates a separate run_N/ directory with individual results
An aggregate_stats.json file contains statistics across all runs (mean, standard deviation, pass rate)

Use cases:

Measuring consistency of non-deterministic agents
Getting confidence intervals for evaluation metrics
Testing agent variability across multiple runs

See Results - Multiple Runs for details on the statistics output.

Examples

Basic run:

letta-evals run suite.yaml  # Run evaluation, show results in terminal

Save results:

letta-evals run suite.yaml --output evaluation-results/  # Save to directory

High concurrency:

letta-evals run suite.yaml --max-concurrent 20  # Run 20 samples in parallel

Letta Cloud:

letta-evals run suite.yaml \
  --base-url https://api.letta.com \  # Cloud endpoint
  --api-key $LETTA_API_KEY \  # Your API key
  --project-id proj_abc123  # Your project

Quiet CI mode:

letta-evals run suite.yaml --quiet  # Only show pass/fail
if [ $? -eq 0 ]; then  # Check exit code
  echo "Evaluation passed"
else
  echo "Evaluation failed"
  exit 1  # Fail the CI build
fi

Multiple runs with statistics:

letta-evals run suite.yaml --num-runs 10 --output results/
# Creates results/run_1/, results/run_2/, ..., results/run_10/
# Plus results/aggregate_stats.json with mean, stddev, and pass rate

Exit Codes

0: Evaluation passed (gate criteria met)
1: Evaluation failed (gate criteria not met or error)

validate

Validate a suite configuration without running it.

letta-evals validate <suite.yaml>

Checks:

YAML syntax is valid
Required fields are present
Paths exist
Configuration is consistent
Grader/extractor combinations are valid

Examples

letta-evals validate suite.yaml

Output on success:

✓ Suite configuration is valid

Output on error:

✗ Validation failed:
  - Agent file not found: agent.af
  - Grader 'my_metric' references unknown function

list-extractors

List all available extractors.

letta-evals list-extractors

Shows:

Built-in extractors
Custom extractors (if registered)
Brief description of each

Output:

Available extractors:
  last_assistant      - Extract the last assistant message
  first_assistant     - Extract the first assistant message
  all_assistant       - Concatenate all assistant messages
  pattern             - Extract content matching regex
  tool_arguments      - Extract tool call arguments
  tool_output         - Extract tool return value
  after_marker        - Extract content after a marker
  memory_block        - Extract from memory block (requires agent_state)

list-graders

List all available grader functions.

letta-evals list-graders

Shows:

Built-in tool graders
Custom graders (if registered)
Brief description of each

Output:

Available graders:
  exact_match              - Exact string match with ground_truth
  contains                 - Check if contains ground_truth
  regex_match              - Match regex pattern
  ascii_printable_only     - Validate ASCII-only content

help

Show help information.

letta-evals --help

Show help for a specific command:

letta-evals run --help
letta-evals validate --help

Environment Variables

These environment variables affect CLI behavior:

LETTA_API_KEY

API key for Letta authentication.

export LETTA_API_KEY=your-key-here

LETTA_BASE_URL

Letta server base URL.

export LETTA_BASE_URL=http://localhost:8283

LETTA_PROJECT_ID

Letta project ID (for cloud).

export LETTA_PROJECT_ID=proj_abc123

OPENAI_API_KEY

OpenAI API key (for rubric graders).

export OPENAI_API_KEY=your-openai-key

OPENAI_BASE_URL

Custom OpenAI-compatible endpoint (optional).

export OPENAI_BASE_URL=https://your-endpoint.com/v1

Configuration Priority

Configuration values are resolved in this order (highest to lowest priority):

CLI arguments (--api-key, --base-url, --project-id)
Suite YAML configuration
Environment variables

Using in CI/CD

GitHub Actions

name: Run Evals
on: [push]

jobs:
  evaluate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2

      - name: Install dependencies
        run: pip install letta-evals

      - name: Run evaluation
        env:
          LETTA_API_KEY: ${{ secrets.LETTA_API_KEY }}
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
        run: |
          letta-evals run suite.yaml --quiet --output results/

      - name: Upload results
        uses: actions/upload-artifact@v2
        with:
          name: eval-results
          path: results/

GitLab CI

evaluate:
  script:
    - pip install letta-evals
    - letta-evals run suite.yaml --quiet --output results/
  artifacts:
    paths:
      - results/
  variables:
    LETTA_API_KEY: $LETTA_API_KEY
    OPENAI_API_KEY: $OPENAI_API_KEY

Debugging

Verbose Output

Currently, the CLI uses standard verbosity. For debugging:

Check the output directory for detailed results
Examine summary.json for aggregate metrics
Check results.jsonl for per-sample details

Common Issues

“Agent file not found”

# Check file exists relative to suite YAML location
ls -la path/to/agent.af

“Connection refused”

# Verify Letta server is running
curl http://localhost:8283/v1/health

“Invalid API key”

# Check environment variable is set
echo $LETTA_API_KEY

Next Steps

Understanding Results - Interpreting evaluation output
Suite YAML Reference - Complete configuration options
Getting Started - Complete tutorial with examples