Troubleshooting

Development tools

Testing & evals

Troubleshooting

Common issues and solutions when using Letta Evals.

Installation Issues

“Command not found: letta-evals”

Problem: CLI not available after installation

Solution:

# Verify installation
pip list | grep letta-evals

# Reinstall if needed
pip install --upgrade letta-evals

Import errors

Problem: ModuleNotFoundError: No module named 'letta_evals'

Solution:

# Ensure you're in the right environment
which python

# Install in correct environment
source .venv/bin/activate
pip install letta-evals

Configuration Issues

“Agent file not found”

Problem: FileNotFoundError: agent.af

Solution:

Check the path is correct relative to the suite YAML
Use absolute paths if needed
Verify file exists: ls -la path/to/agent.af

# Correct relative path
target:
  agent_file: ./agents/my_agent.af

“Dataset not found”

Problem: Cannot load dataset file

Solution:

Verify dataset path in YAML
Check file exists: ls -la dataset.jsonl
Ensure proper JSONL format (one JSON object per line)

# Validate JSONL format
cat dataset.jsonl | jq .

“Validation failed: unknown function”

Problem: Grader function not found

Solution:

# List available graders
letta-evals list-graders

# Check spelling in suite.yaml
graders:
  my_metric:
    function: exact_match  # Correct

Connection Issues

“Connection refused”

Problem: Cannot connect to Letta server

Solution:

# Verify server is running
curl https://api.letta.com/v1/health

# Check base_url in suite.yaml
target:
  base_url: https://api.letta.com

“Unauthorized” or “Invalid API key”

Problem: Authentication failed

Solution:

# Set API key
export LETTA_API_KEY=your-key-here

# Verify key is correct
echo $LETTA_API_KEY

Runtime Issues

“No ground_truth provided”

Problem: Grader requires ground truth but sample doesn’t have it

Solution:

Add ground_truth to dataset samples:

{
  "input": "What is 2+2?",
  "ground_truth": "4"
}

Or use a grader that doesn’t require ground truth:

graders:
  quality:
    kind: rubric # Doesn't require ground_truth
    prompt_path: rubric.txt

Performance Issues

Evaluation is very slow

Solutions:

Increase concurrency:

letta-evals run suite.yaml --max-concurrent 20

Reduce samples for testing:

max_samples: 10 # Test with small subset first

Use tool graders instead of rubric graders:

graders:
  accuracy:
    kind: tool # Much faster than rubric
    function: exact_match

Results Issues

“Gates failed but scores look good”

Solution:

Check gate configuration:

gate:
  metric_key: accuracy # Correct metric?
  metric: avg_score # Or accuracy?
  op: gte # Correct operator?
  value: 0.8 # Correct threshold?

Debug Tips

Enable verbose output

Run without --quiet to see detailed progress:

letta-evals run suite.yaml

Examine output files

letta-evals run suite.yaml --output debug/

# Check summary
cat debug/summary.json | jq .

# Check individual results
cat debug/results.jsonl | jq .

Validate configuration

letta-evals validate suite.yaml

Check component availability

letta-evals list-graders
letta-evals list-extractors

Getting Help

If you’re still stuck:

Check the Getting Started guide
Review the Core Concepts
Report issues at the Letta Evals GitHub repository

When reporting issues, include:

Suite YAML configuration
Dataset sample (if not sensitive)
Error message and full stack trace
Environment info (OS, Python version)

# Get environment info
python --version
pip show letta-evals