Troubleshooting
Troubleshooting
Section titled “Troubleshooting”Common issues and solutions when using Letta Evals.
Installation Issues
Section titled “Installation Issues””Command not found: letta-evals”
Section titled “”Command not found: letta-evals””Problem: CLI not available after installation
Solution:
# Verify installationpip list | grep letta-evals
# Reinstall if neededpip install --upgrade letta-evals
# Or with uvuv syncImport errors
Section titled “Import errors”Problem: ModuleNotFoundError: No module named 'letta_evals'
Solution:
# Ensure you're in the right environmentwhich python
# Install in correct environmentsource .venv/bin/activate # or: acpip install letta-evalsConfiguration Issues
Section titled “Configuration Issues””Agent file not found”
Section titled “”Agent file not found””Problem: FileNotFoundError: agent.af
Solution:
- Check the path is correct relative to the suite YAML
- Use absolute paths if needed
- Verify file exists:
ls -la path/to/agent.af
# Correct relative pathtarget: agent_file: ./agents/my_agent.af
# Or absolute pathtarget: agent_file: /absolute/path/to/agent.af”Dataset not found”
Section titled “”Dataset not found””Problem: Cannot load dataset file
Solution:
- Verify dataset path in YAML
- Check file exists:
ls -la dataset.jsonl - Ensure proper JSONL format (one JSON object per line)
# Validate JSONL formatcat dataset.jsonl | jq .“Validation failed: unknown function”
Section titled ““Validation failed: unknown function””Problem: Grader function not found
Solution:
# List available gradersletta-evals list-graders
# Check spelling in suite.yamlgraders: my_metric: function: exact_match # Correct # not: exactMatch or exact-match“Validation failed: unknown extractor”
Section titled ““Validation failed: unknown extractor””Problem: Extractor not found
Solution:
# List available extractorsletta-evals list-extractors
# Check spellinggraders: my_metric: extractor: last_assistant # Correct # not: lastAssistant or last-assistantConnection Issues
Section titled “Connection Issues””Connection refused”
Section titled “”Connection refused””Problem: Cannot connect to Letta server
Solution:
# Verify server is runningcurl http://localhost:8283/v1/health
# Check base_url in suite.yamltarget: base_url: http://localhost:8283 # Correct port?
# Or use environment variableexport LETTA_BASE_URL=http://localhost:8283“Unauthorized” or “Invalid API key”
Section titled ““Unauthorized” or “Invalid API key””Problem: Authentication failed
Solution:
# Set API keyexport LETTA_API_KEY=your-key-here
# Or in suite.yamltarget: api_key: your-key-here
# Verify key is correctecho $LETTA_API_KEY“Request timeout”
Section titled ““Request timeout””Problem: Requests taking too long
Solution:
# Increase timeouttarget: timeout: 600.0 # 10 minutes
# Rubric grader timeoutgraders: quality: kind: rubric timeout: 300.0 # 5 minutesRuntime Issues
Section titled “Runtime Issues””No ground_truth provided”
Section titled “”No ground_truth provided””Problem: Grader requires ground truth but sample doesn’t have it
Solution:
- Add ground_truth to dataset samples:
{ "input": "What is 2+2?", "ground_truth": "4"}- Or use a grader that doesn’t require ground truth:
graders: quality: kind: rubric # Doesn't require ground_truth prompt_path: rubric.txt“Extractor requires agent_state”
Section titled ““Extractor requires agent_state””Problem: memory_block extractor needs agent state but it wasn’t fetched
Solution: This should be automatic, but if you see this error:
- Check that the extractor is correctly configured
- Ensure the agent exists and is accessible
- Try using a different extractor if memory isn’t needed
”Score must be between 0.0 and 1.0”
Section titled “”Score must be between 0.0 and 1.0””Problem: Custom grader returning invalid score
Solution:
@graderdef my_grader(sample, submission): score = calculate_score(submission) # Clamp score to valid range score = max(0.0, min(1.0, score)) return GradeResult(score=score, rationale="...")“Invalid JSON in response”
Section titled ““Invalid JSON in response””Problem: Rubric grader got non-JSON response
Solution:
- Check OpenAI API key is valid
- Verify model name is correct
- Check for network issues
- Try increasing max_retries:
graders: quality: kind: rubric max_retries: 10Performance Issues
Section titled “Performance Issues”Evaluation is very slow
Section titled “Evaluation is very slow”Problem: Taking too long to complete
Solutions:
- Increase concurrency:
letta-evals run suite.yaml --max-concurrent 20- Reduce samples for testing:
max_samples: 10 # Test with small subset first- Use tool graders instead of rubric graders when possible:
graders: accuracy: kind: tool # Much faster than rubric function: exact_match- Check network latency:
# Test server response timetime curl http://localhost:8283/v1/healthHigh API costs
Section titled “High API costs”Problem: Rubric graders costing too much
Solutions:
- Use cheaper models:
graders: quality: model: gpt-4o-mini # Cheaper than gpt-4o- Reduce number of rubric graders:
graders: accuracy: kind: tool # Free quality: kind: rubric # Only use for subjective evaluation- Test with small sample first:
max_samples: 5 # Verify before running full suiteResults Issues
Section titled “Results Issues””No results generated”
Section titled “”No results generated””Problem: No output files created
Solution:
# Specify output directoryletta-evals run suite.yaml --output results/
# Check for errors in console outputletta-evals run suite.yaml # Without --quiet“All scores are 0.0”
Section titled ““All scores are 0.0””Problem: Everything failing
Solutions:
- Check if agent is working:
# Test agent manually first- Verify extractor is getting content:
- Add debug logging
- Check sample results in output
- Check grader logic:
# Test grader independentlyfrom letta_evals.models import Sample, GradeResultsample = Sample(id=0, input="test", ground_truth="test")result = my_grader(sample, "test")print(result)“Gates failed but scores look good”
Section titled ““Gates failed but scores look good””Problem: Passing samples but gate failing
Solution:
- Check gate configuration:
gate: metric_key: accuracy # Correct metric? metric: avg_score # Or accuracy? op: gte # Correct operator? value: 0.8 # Correct threshold?- Understand the difference between
avg_scoreandaccuracy - Check per-sample pass criteria with
pass_opandpass_value
Environment Issues
Section titled “Environment Issues””OPENAI_API_KEY not found”
Section titled “”OPENAI_API_KEY not found””Problem: Rubric grader can’t find API key
Solution:
# Set in environmentexport OPENAI_API_KEY=your-key-here
# Or in .env fileecho "OPENAI_API_KEY=your-key-here" >> .env
# Verifyecho $OPENAI_API_KEY“Cannot use both model_configs and model_handles”
Section titled ““Cannot use both model_configs and model_handles””Problem: Specified both in target config
Solution:
# Use one or the other, not bothtarget: model_configs: [gpt-4o-mini] # For local server # OR model_handles: ["openai/gpt-4o-mini"] # For cloudDebug Tips
Section titled “Debug Tips”Enable verbose output
Section titled “Enable verbose output”Run without --quiet to see detailed progress:
letta-evals run suite.yamlExamine output files
Section titled “Examine output files”letta-evals run suite.yaml --output debug/
# Check summarycat debug/summary.json | jq .
# Check individual resultscat debug/results.jsonl | jq .Test with minimal suite
Section titled “Test with minimal suite”Create a minimal test:
name: debug-testdataset: test.jsonl # Just 1-2 samples
target: kind: agent agent_file: agent.af
graders: test: kind: tool function: contains extractor: last_assistant
gate: op: gte value: 0.0 # Always passValidate configuration
Section titled “Validate configuration”letta-evals validate suite.yamlCheck component availability
Section titled “Check component availability”letta-evals list-gradersletta-evals list-extractorsGetting Help
Section titled “Getting Help”If you’re still stuck:
- Check the documentation
- Look at examples
- Report issues at https://github.com/anthropics/claude-code/issues
When reporting issues, include:
- Suite YAML configuration
- Dataset sample (if not sensitive)
- Error message and full stack trace
- Output from
--outputdirectory - Environment info (OS, Python version)
# Get environment infopython --versionpip show letta-evals