CLI Commands
CLI Commands
Section titled “CLI Commands”The letta-evals command-line interface lets you run evaluations, validate configurations, and inspect available components.
Quick overview:
run- Execute an evaluation suite (most common)validate- Check suite configuration without runninglist-extractors- Show available extractorslist-graders- Show available grader functions- Exit codes - 0 for pass, 1 for fail (perfect for CI/CD)
Typical workflow:
- Validate your suite:
letta-evals validate suite.yaml - Run evaluation:
letta-evals run suite.yaml --output results/ - Check exit code:
echo $?(0 = passed, 1 = failed)
Letta Evals provides a command-line interface for running evaluations and managing configurations.
Run an evaluation suite.
letta-evals run <suite.yaml> [options]Arguments
Section titled “Arguments”suite.yaml: Path to the suite configuration file (required)
Options
Section titled “Options”—output, -o
Section titled “—output, -o”Save results to a directory.
letta-evals run suite.yaml --output results/Creates:
results/header.json: Evaluation metadataresults/summary.json: Aggregate metrics and configurationresults/results.jsonl: Per-sample results (one JSON per line)
—quiet, -q
Section titled “—quiet, -q”Quiet mode - only show pass/fail result.
letta-evals run suite.yaml --quietOutput:
✓ PASSED—max-concurrent
Section titled “—max-concurrent”Maximum concurrent sample evaluations.
letta-evals run suite.yaml --max-concurrent 10Default: 15
Higher values = faster evaluation but more resource usage.
—api-key
Section titled “—api-key”Letta API key (overrides LETTA_API_KEY environment variable).
letta-evals run suite.yaml --api-key your-key—base-url
Section titled “—base-url”Letta server base URL (overrides suite config and environment variable).
letta-evals run suite.yaml --base-url http://localhost:8283—project-id
Section titled “—project-id”Letta project ID for cloud deployments.
letta-evals run suite.yaml --project-id proj_abc123—cached, -c
Section titled “—cached, -c”Path to cached results (JSONL) for re-grading trajectories without re-running the agent.
letta-evals run suite.yaml --cached previous_results.jsonlUse this to test different graders on the same agent trajectories.
—num-runs
Section titled “—num-runs”Run the evaluation multiple times to measure consistency and get aggregate statistics.
letta-evals run suite.yaml --num-runs 10Default: 1 (single run)
Output with multiple runs:
- Each run creates a separate
run_N/directory with individual results - An
aggregate_stats.jsonfile contains statistics across all runs (mean, standard deviation, pass rate)
Use cases:
- Measuring consistency of non-deterministic agents
- Getting confidence intervals for evaluation metrics
- Testing agent variability across multiple runs
See Results - Multiple Runs for details on the statistics output.
Examples
Section titled “Examples”Basic run:
letta-evals run suite.yaml # Run evaluation, show results in terminalSave results:
letta-evals run suite.yaml --output evaluation-results/ # Save to directoryHigh concurrency:
letta-evals run suite.yaml --max-concurrent 20 # Run 20 samples in parallelLetta Cloud:
letta-evals run suite.yaml \ --base-url https://api.letta.com \ # Cloud endpoint --api-key $LETTA_API_KEY \ # Your API key --project-id proj_abc123 # Your projectQuiet CI mode:
letta-evals run suite.yaml --quiet # Only show pass/failif [ $? -eq 0 ]; then # Check exit code echo "Evaluation passed"else echo "Evaluation failed" exit 1 # Fail the CI buildfiMultiple runs with statistics:
letta-evals run suite.yaml --num-runs 10 --output results/# Creates results/run_1/, results/run_2/, ..., results/run_10/# Plus results/aggregate_stats.json with mean, stddev, and pass rateExit Codes
Section titled “Exit Codes”0: Evaluation passed (gate criteria met)1: Evaluation failed (gate criteria not met or error)
validate
Section titled “validate”Validate a suite configuration without running it.
letta-evals validate <suite.yaml>Checks:
- YAML syntax is valid
- Required fields are present
- Paths exist
- Configuration is consistent
- Grader/extractor combinations are valid
Examples
Section titled “Examples”letta-evals validate suite.yamlOutput on success:
✓ Suite configuration is validOutput on error:
✗ Validation failed: - Agent file not found: agent.af - Grader 'my_metric' references unknown functionlist-extractors
Section titled “list-extractors”List all available extractors.
letta-evals list-extractorsShows:
- Built-in extractors
- Custom extractors (if registered)
- Brief description of each
Output:
Available extractors: last_assistant - Extract the last assistant message first_assistant - Extract the first assistant message all_assistant - Concatenate all assistant messages pattern - Extract content matching regex tool_arguments - Extract tool call arguments tool_output - Extract tool return value after_marker - Extract content after a marker memory_block - Extract from memory block (requires agent_state)list-graders
Section titled “list-graders”List all available grader functions.
letta-evals list-gradersShows:
- Built-in tool graders
- Custom graders (if registered)
- Brief description of each
Output:
Available graders: exact_match - Exact string match with ground_truth contains - Check if contains ground_truth regex_match - Match regex pattern ascii_printable_only - Validate ASCII-only contentShow help information.
letta-evals --helpShow help for a specific command:
letta-evals run --helpletta-evals validate --helpEnvironment Variables
Section titled “Environment Variables”These environment variables affect CLI behavior:
LETTA_API_KEY
Section titled “LETTA_API_KEY”API key for Letta authentication.
export LETTA_API_KEY=your-key-hereLETTA_BASE_URL
Section titled “LETTA_BASE_URL”Letta server base URL.
export LETTA_BASE_URL=http://localhost:8283LETTA_PROJECT_ID
Section titled “LETTA_PROJECT_ID”Letta project ID (for cloud).
export LETTA_PROJECT_ID=proj_abc123OPENAI_API_KEY
Section titled “OPENAI_API_KEY”OpenAI API key (for rubric graders).
export OPENAI_API_KEY=your-openai-keyOPENAI_BASE_URL
Section titled “OPENAI_BASE_URL”Custom OpenAI-compatible endpoint (optional).
export OPENAI_BASE_URL=https://your-endpoint.com/v1Configuration Priority
Section titled “Configuration Priority”Configuration values are resolved in this order (highest to lowest priority):
- CLI arguments (
--api-key,--base-url,--project-id) - Suite YAML configuration
- Environment variables
Using in CI/CD
Section titled “Using in CI/CD”GitHub Actions
Section titled “GitHub Actions”name: Run Evalson: [push]
jobs: evaluate: runs-on: ubuntu-latest steps: - uses: actions/checkout@v2
- name: Install dependencies run: pip install letta-evals
- name: Run evaluation env: LETTA_API_KEY: ${{ secrets.LETTA_API_KEY }} OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }} run: | letta-evals run suite.yaml --quiet --output results/
- name: Upload results uses: actions/upload-artifact@v2 with: name: eval-results path: results/GitLab CI
Section titled “GitLab CI”evaluate: script: - pip install letta-evals - letta-evals run suite.yaml --quiet --output results/ artifacts: paths: - results/ variables: LETTA_API_KEY: $LETTA_API_KEY OPENAI_API_KEY: $OPENAI_API_KEYDebugging
Section titled “Debugging”Verbose Output
Section titled “Verbose Output”Currently, the CLI uses standard verbosity. For debugging:
- Check the output directory for detailed results
- Examine
summary.jsonfor aggregate metrics - Check
results.jsonlfor per-sample details
Common Issues
Section titled “Common Issues”“Agent file not found”
# Check file exists relative to suite YAML locationls -la path/to/agent.af“Connection refused”
# Verify Letta server is runningcurl http://localhost:8283/v1/health“Invalid API key”
# Check environment variable is setecho $LETTA_API_KEYNext Steps
Section titled “Next Steps”- Understanding Results - Interpreting evaluation output
- Suite YAML Reference - Complete configuration options
- Getting Started - Complete tutorial with examples