Skip to content
Sign up

CLI Commands

The letta-evals command-line interface lets you run evaluations, validate configurations, and inspect available components.

Quick overview:

  • run - Execute an evaluation suite (most common)
  • validate - Check suite configuration without running
  • list-extractors - Show available extractors
  • list-graders - Show available grader functions
  • Exit codes - 0 for pass, 1 for fail (perfect for CI/CD)

Typical workflow:

  1. Validate your suite: letta-evals validate suite.yaml
  2. Run evaluation: letta-evals run suite.yaml --output results/
  3. Check exit code: echo $? (0 = passed, 1 = failed)

Letta Evals provides a command-line interface for running evaluations and managing configurations.

Run an evaluation suite.

Terminal window
letta-evals run <suite.yaml> [options]
  • suite.yaml: Path to the suite configuration file (required)

Save results to a directory.

Terminal window
letta-evals run suite.yaml --output results/

Creates:

  • results/header.json: Evaluation metadata
  • results/summary.json: Aggregate metrics and configuration
  • results/results.jsonl: Per-sample results (one JSON per line)

Quiet mode - only show pass/fail result.

Terminal window
letta-evals run suite.yaml --quiet

Output:

✓ PASSED

Maximum concurrent sample evaluations.

Terminal window
letta-evals run suite.yaml --max-concurrent 10

Default: 15

Higher values = faster evaluation but more resource usage.

Letta API key (overrides LETTA_API_KEY environment variable).

Terminal window
letta-evals run suite.yaml --api-key your-key

Letta server base URL (overrides suite config and environment variable).

Terminal window
letta-evals run suite.yaml --base-url http://localhost:8283

Letta project ID for cloud deployments.

Terminal window
letta-evals run suite.yaml --project-id proj_abc123

Path to cached results (JSONL) for re-grading trajectories without re-running the agent.

Terminal window
letta-evals run suite.yaml --cached previous_results.jsonl

Use this to test different graders on the same agent trajectories.

Run the evaluation multiple times to measure consistency and get aggregate statistics.

Terminal window
letta-evals run suite.yaml --num-runs 10

Default: 1 (single run)

Output with multiple runs:

  • Each run creates a separate run_N/ directory with individual results
  • An aggregate_stats.json file contains statistics across all runs (mean, standard deviation, pass rate)

Use cases:

  • Measuring consistency of non-deterministic agents
  • Getting confidence intervals for evaluation metrics
  • Testing agent variability across multiple runs

See Results - Multiple Runs for details on the statistics output.

Basic run:

Terminal window
letta-evals run suite.yaml # Run evaluation, show results in terminal

Save results:

Terminal window
letta-evals run suite.yaml --output evaluation-results/ # Save to directory

High concurrency:

Terminal window
letta-evals run suite.yaml --max-concurrent 20 # Run 20 samples in parallel

Letta Cloud:

Terminal window
letta-evals run suite.yaml \
--base-url https://api.letta.com \ # Cloud endpoint
--api-key $LETTA_API_KEY \ # Your API key
--project-id proj_abc123 # Your project

Quiet CI mode:

Terminal window
letta-evals run suite.yaml --quiet # Only show pass/fail
if [ $? -eq 0 ]; then # Check exit code
echo "Evaluation passed"
else
echo "Evaluation failed"
exit 1 # Fail the CI build
fi

Multiple runs with statistics:

Terminal window
letta-evals run suite.yaml --num-runs 10 --output results/
# Creates results/run_1/, results/run_2/, ..., results/run_10/
# Plus results/aggregate_stats.json with mean, stddev, and pass rate
  • 0: Evaluation passed (gate criteria met)
  • 1: Evaluation failed (gate criteria not met or error)

Validate a suite configuration without running it.

Terminal window
letta-evals validate <suite.yaml>

Checks:

  • YAML syntax is valid
  • Required fields are present
  • Paths exist
  • Configuration is consistent
  • Grader/extractor combinations are valid
Terminal window
letta-evals validate suite.yaml

Output on success:

✓ Suite configuration is valid

Output on error:

✗ Validation failed:
- Agent file not found: agent.af
- Grader 'my_metric' references unknown function

List all available extractors.

Terminal window
letta-evals list-extractors

Shows:

  • Built-in extractors
  • Custom extractors (if registered)
  • Brief description of each

Output:

Available extractors:
last_assistant - Extract the last assistant message
first_assistant - Extract the first assistant message
all_assistant - Concatenate all assistant messages
pattern - Extract content matching regex
tool_arguments - Extract tool call arguments
tool_output - Extract tool return value
after_marker - Extract content after a marker
memory_block - Extract from memory block (requires agent_state)

List all available grader functions.

Terminal window
letta-evals list-graders

Shows:

  • Built-in tool graders
  • Custom graders (if registered)
  • Brief description of each

Output:

Available graders:
exact_match - Exact string match with ground_truth
contains - Check if contains ground_truth
regex_match - Match regex pattern
ascii_printable_only - Validate ASCII-only content

Show help information.

Terminal window
letta-evals --help

Show help for a specific command:

Terminal window
letta-evals run --help
letta-evals validate --help

These environment variables affect CLI behavior:

API key for Letta authentication.

Terminal window
export LETTA_API_KEY=your-key-here

Letta server base URL.

Terminal window
export LETTA_BASE_URL=http://localhost:8283

Letta project ID (for cloud).

Terminal window
export LETTA_PROJECT_ID=proj_abc123

OpenAI API key (for rubric graders).

Terminal window
export OPENAI_API_KEY=your-openai-key

Custom OpenAI-compatible endpoint (optional).

Terminal window
export OPENAI_BASE_URL=https://your-endpoint.com/v1

Configuration values are resolved in this order (highest to lowest priority):

  1. CLI arguments (--api-key, --base-url, --project-id)
  2. Suite YAML configuration
  3. Environment variables
name: Run Evals
on: [push]
jobs:
evaluate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Install dependencies
run: pip install letta-evals
- name: Run evaluation
env:
LETTA_API_KEY: ${{ secrets.LETTA_API_KEY }}
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
run: |
letta-evals run suite.yaml --quiet --output results/
- name: Upload results
uses: actions/upload-artifact@v2
with:
name: eval-results
path: results/
evaluate:
script:
- pip install letta-evals
- letta-evals run suite.yaml --quiet --output results/
artifacts:
paths:
- results/
variables:
LETTA_API_KEY: $LETTA_API_KEY
OPENAI_API_KEY: $OPENAI_API_KEY

Currently, the CLI uses standard verbosity. For debugging:

  1. Check the output directory for detailed results
  2. Examine summary.json for aggregate metrics
  3. Check results.jsonl for per-sample details

“Agent file not found”

Terminal window
# Check file exists relative to suite YAML location
ls -la path/to/agent.af

“Connection refused”

Terminal window
# Verify Letta server is running
curl http://localhost:8283/v1/health

“Invalid API key”

Terminal window
# Check environment variable is set
echo $LETTA_API_KEY