Troubleshooting

Common issues and solutions when using Letta Evals.

Installation Issues

“Command not found: letta-evals”

Problem: CLI not available after installation

Solution:

$# Verify installation
>pip list | grep letta-evals
>
># Reinstall if needed
>pip install --upgrade letta-evals

Import errors

Problem: ModuleNotFoundError: No module named 'letta_evals'

Solution:

$# Ensure you're in the right environment
>which python
>
># Install in correct environment
>source .venv/bin/activate
>pip install letta-evals

Configuration Issues

“Agent file not found”

Problem: FileNotFoundError: agent.af

Solution:

  • Check the path is correct relative to the suite YAML
  • Use absolute paths if needed
  • Verify file exists: ls -la path/to/agent.af
1# Correct relative path
2target:
3 agent_file: ./agents/my_agent.af

“Dataset not found”

Problem: Cannot load dataset file

Solution:

  • Verify dataset path in YAML
  • Check file exists: ls -la dataset.jsonl
  • Ensure proper JSONL format (one JSON object per line)
$# Validate JSONL format
>cat dataset.jsonl | jq .

“Validation failed: unknown function”

Problem: Grader function not found

Solution:

$# List available graders
>letta-evals list-graders
>
># Check spelling in suite.yaml
>graders:
> my_metric:
> function: exact_match # Correct

Connection Issues

“Connection refused”

Problem: Cannot connect to Letta server

Solution:

$# Verify server is running
>curl http://localhost:8283/v1/health
>
># Check base_url in suite.yaml
>target:
> base_url: http://localhost:8283

“Unauthorized” or “Invalid API key”

Problem: Authentication failed

Solution:

$# Set API key
>export LETTA_API_KEY=your-key-here
>
># Verify key is correct
>echo $LETTA_API_KEY

Runtime Issues

“No ground_truth provided”

Problem: Grader requires ground truth but sample doesn’t have it

Solution:

  • Add ground_truth to dataset samples:
1{"input": "What is 2+2?", "ground_truth": "4"}
  • Or use a grader that doesn’t require ground truth:
1graders:
2 quality:
3 kind: rubric # Doesn't require ground_truth
4 prompt_path: rubric.txt

Performance Issues

Evaluation is very slow

Solutions:

  1. Increase concurrency:
$letta-evals run suite.yaml --max-concurrent 20
  1. Reduce samples for testing:
1max_samples: 10 # Test with small subset first
  1. Use tool graders instead of rubric graders:
1graders:
2 accuracy:
3 kind: tool # Much faster than rubric
4 function: exact_match

High API costs

Solutions:

  1. Use cheaper models:
1graders:
2 quality:
3 model: gpt-4o-mini # Cheaper than gpt-4o
  1. Test with small sample first:
1max_samples: 5 # Verify before running full suite

Results Issues

“All scores are 0.0”

Solutions:

  1. Verify extractor is getting content
  2. Check grader logic
  3. Test agent manually first

“Gates failed but scores look good”

Solution:

  • Check gate configuration:
1gate:
2 metric_key: accuracy # Correct metric?
3 metric: avg_score # Or accuracy?
4 op: gte # Correct operator?
5 value: 0.8 # Correct threshold?

Debug Tips

Enable verbose output

Run without --quiet to see detailed progress:

$letta-evals run suite.yaml

Examine output files

$letta-evals run suite.yaml --output debug/
>
># Check summary
>cat debug/summary.json | jq .
>
># Check individual results
>cat debug/results.jsonl | jq .

Validate configuration

$letta-evals validate suite.yaml

Check component availability

$letta-evals list-graders
>letta-evals list-extractors

Getting Help

If you’re still stuck:

  1. Check the Getting Started guide
  2. Review the Core Concepts
  3. Report issues at the Letta Evals GitHub repository

When reporting issues, include:

  • Suite YAML configuration
  • Dataset sample (if not sensitive)
  • Error message and full stack trace
  • Environment info (OS, Python version)
$# Get environment info
>python --version
>pip show letta-evals