CLI commands

Development tools

Testing & evals

CLI reference

The letta-evals command-line interface lets you run evaluations, validate configurations, and inspect available components.

Typical workflow:

Validate your suite: letta-evals validate suite.yaml
Run evaluation: letta-evals run suite.yaml --output results/
Check exit code: echo $? (0 = passed, 1 = failed)

run

Run an evaluation suite.

letta-evals run <suite.yaml> [options]

Arguments

suite.yaml: Path to the suite configuration file (required)

Options

—output, -o

Save results to a directory.

letta-evals run suite.yaml --output results/

Creates:

results/header.json: Evaluation metadata
results/summary.json: Aggregate metrics and configuration
results/results.jsonl: Per-sample results (one JSON per line)

—quiet, -q

Quiet mode - only show pass/fail result.

letta-evals run suite.yaml --quiet

Output:

✓ PASSED

—max-concurrent

Maximum concurrent sample evaluations. Default: 15

letta-evals run suite.yaml --max-concurrent 10

Higher values = faster evaluation but more resource usage.

—api-key

Letta API key (overrides LETTA_API_KEY environment variable).

letta-evals run suite.yaml --api-key your-key

—base-url

Letta server base URL (overrides suite config and environment variable).

letta-evals run suite.yaml --base-url https://api.letta.com

—project-id

Letta project ID for cloud deployments.

letta-evals run suite.yaml --project-id proj_abc123

—cached, -c

Path to cached results (JSONL) for re-grading trajectories without re-running the agent.

letta-evals run suite.yaml --cached previous_results.jsonl

Use this to test different graders on the same agent trajectories.

—num-runs

Run the evaluation multiple times to measure consistency. Default: 1

letta-evals run suite.yaml --num-runs 10

Output with multiple runs:

Each run creates a separate run_N/ directory with individual results
An aggregate_stats.json file contains statistics across all runs (mean, standard deviation, pass rate)

Examples

Basic run:

letta-evals run suite.yaml  # Run evaluation, show results in terminal

Save results:

letta-evals run suite.yaml --output evaluation-results/  # Save to directory

Letta Cloud:

letta-evals run suite.yaml \
  --base-url https://api.letta.com \
  --api-key $LETTA_API_KEY \
  --project-id proj_abc123

Quiet CI mode:

letta-evals run suite.yaml --quiet
if [ $? -eq 0 ]; then
  echo "Evaluation passed"
else
  echo "Evaluation failed"
  exit 1
fi

Exit Codes

0: Evaluation passed (gate criteria met)
1: Evaluation failed (gate criteria not met or error)

validate

Validate a suite configuration without running it.

letta-evals validate <suite.yaml>

Checks:

YAML syntax is valid
Required fields are present
Paths exist
Configuration is consistent
Grader/extractor combinations are valid

Output on success:

✓ Suite configuration is valid

Output on error:

✗ Validation failed:
  - Agent file not found: agent.af
  - Grader 'my_metric' references unknown function

list-extractors

List all available extractors.

letta-evals list-extractors

Output:

Available extractors:
  last_assistant      - Extract the last assistant message
  first_assistant     - Extract the first assistant message
  all_assistant       - Concatenate all assistant messages
  pattern             - Extract content matching regex
  tool_arguments      - Extract tool call arguments
  tool_output         - Extract tool return value
  after_marker        - Extract content after a marker
  memory_block        - Extract from memory block (requires agent_state)

list-graders

List all available grader functions.

letta-evals list-graders

Output:

Available graders:
  exact_match              - Exact string match with ground_truth
  contains                 - Check if contains ground_truth
  regex_match              - Match regex pattern
  ascii_printable_only     - Validate ASCII-only content

help

Show help information.

letta-evals --help

Show help for a specific command:

letta-evals run --help
letta-evals validate --help

Environment Variables

LETTA_API_KEY

API key for Letta authentication.

export LETTA_API_KEY=your-key-here

LETTA_BASE_URL

Letta server base URL.

export LETTA_BASE_URL=https://api.letta.com

LETTA_PROJECT_ID

Letta project ID (for cloud).

export LETTA_PROJECT_ID=proj_abc123

OPENAI_API_KEY

OpenAI API key (for rubric graders).

export OPENAI_API_KEY=your-openai-key

Configuration Priority

Configuration values are resolved in this order (highest to lowest priority):

CLI arguments (--api-key, --base-url, --project-id)
Suite YAML configuration
Environment variables

Using in CI/CD

GitHub Actions

name: Run Evals
on: [push]

jobs:
  evaluate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2

      - name: Install dependencies
        run: pip install letta-evals

      - name: Run evaluation
        env:
          LETTA_API_KEY: ${{ secrets.LETTA_API_KEY }}
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
        run: |
          letta-evals run suite.yaml --quiet --output results/

      - name: Upload results
        uses: actions/upload-artifact@v2
        with:
          name: eval-results
          path: results/

GitLab CI

evaluate:
  script:
    - pip install letta-evals
    - letta-evals run suite.yaml --quiet --output results/
  artifacts:
    paths:
      - results/
  variables:
    LETTA_API_KEY: $LETTA_API_KEY
    OPENAI_API_KEY: $OPENAI_API_KEY

Debugging

Common Issues

Next Steps

Understanding Results - Interpreting evaluation output
Suite YAML Reference - Complete configuration options
Getting Started - Complete tutorial with examples