Targets
A target is the agent you’re evaluating. In Letta Evals, the target configuration determines how agents are created, accessed, and tested.
Quick overview:
- Three ways to specify agents: agent file (
.af
), existing agent ID, or programmatic creation script - Critical distinction:
agent_file
/agent_script
create fresh agents per sample (isolated tests), whileagent_id
uses one agent for all samples (stateful conversation) - Multi-model support: Test the same agent configuration across different LLM models
- Flexible connection: Connect to local Letta servers or Letta Cloud
When to use each approach:
agent_file
- Pre-configured agents saved as.af
files (most common)agent_id
- Testing existing agents or multi-turn conversations with stateagent_script
- Dynamic agent creation with per-sample customization
The target configuration specifies how to create or access the agent for evaluation.
Target Configuration
All targets have a kind
field (currently only agent
is supported):
Agent Sources
You must specify exactly ONE of these:
agent_file
Path to a .af
(Agent File) to upload:
The agent file will be uploaded to the Letta server and a new agent created for the evaluation.
agent_id
ID of an existing agent on the server:
Modifies agent in-place: Using agent_id
will modify your agent’s state, memory, and message history during evaluation. The same agent instance is used for all samples, processing them sequentially. Do not use production agents or agents you don’t want to modify. Use agent_file
or agent_script
for reproducible, isolated testing.
agent_script
Path to a Python script with an agent factory function for programmatic agent creation:
Format: path/to/script.py:function_name
The function must be decorated with @agent_factory
and have the signature async (client: AsyncLetta, sample: Sample) -> str
:
Key features:
- Creates a fresh agent for each sample
- Can customize agents using
sample.agent_args
from the dataset - Allows testing agent creation logic itself
- Useful when you don’t have pre-saved agent files
When to use:
- Testing agent creation workflows
- Dynamic per-sample agent configuration
- Agents that need sample-specific memory or tools
- Programmatic agent testing
Connection Configuration
base_url
Letta server URL:
Default: http://localhost:8283
api_key
API key for authentication (required for Letta Cloud):
Or set via environment variable:
project_id
Letta project ID (for Letta Cloud):
Or set via environment variable:
timeout
Request timeout in seconds:
Default: 300 seconds
Multi-Model Evaluation
Test the same agent across different models:
model_configs
List of model configuration names from JSON files:
The evaluation will run once for each model config. Model configs are JSON files in letta_evals/llm_model_configs/
.
model_handles
List of model handles (cloud-compatible identifiers):
Use this for Letta Cloud deployments.
Note: You cannot specify both model_configs
and model_handles
.
Complete Examples
Local Development
Letta Cloud
Multi-Model Testing
Results will include per-model metrics:
Programmatic Agent Creation
Environment Variable Precedence
Configuration values are resolved in this order (highest priority first):
- CLI arguments (
--api-key
,--base-url
,--project-id
) - Suite YAML configuration
- Environment variables (
LETTA_API_KEY
,LETTA_BASE_URL
,LETTA_PROJECT_ID
)
Agent Lifecycle and Testing Behavior
The way your agent is specified fundamentally changes how the evaluation runs:
With agent_file or agent_script: Independent Testing
Agent lifecycle:
- A fresh agent instance is created for each sample
- Agent processes the sample input(s)
- Agent remains on the server after the sample completes
Testing behavior: Each sample is an independent, isolated test. Agent state (memory, message history) does not carry over between samples. This enables parallel execution and ensures reproducible results.
Use cases:
- Testing how the agent responds to various independent inputs
- Ensuring consistent behavior across different scenarios
- Regression testing where each case should be isolated
- Evaluating agent responses without prior context
Example: If you have 10 test cases, 10 separate agent instances will be created (one per test case), and they can run in parallel.
With agent_id: Sequential Script Testing
Agent lifecycle:
- The same agent instance is used for all samples
- Agent processes each sample in sequence
- Agent state persists throughout the entire evaluation
Testing behavior: The dataset becomes a conversation script where each sample builds on previous ones. Agent memory and message history accumulate, and earlier interactions affect later responses. Samples must execute sequentially.
Use cases:
- Testing multi-turn conversations with context
- Evaluating how agent memory evolves over time
- Simulating a single user session with multiple interactions
- Testing scenarios where context should accumulate
Example: If you have 10 test cases, they all run against the same agent instance in order, with state carrying over between each test.
Critical Differences
Best practice: Use agent_file
or agent_script
for most evaluations to ensure reproducible, isolated tests. Use agent_id
only when you specifically need to test how agent state evolves across multiple interactions.
Validation
The runner validates:
- Exactly one of
agent_file
,agent_id
, oragent_script
is specified - Agent files have
.af
extension - Agent script paths are valid
Next Steps
- Suite YAML Reference - Complete target configuration options
- Datasets - Using agent_args for sample-specific configuration
- Getting Started - Complete tutorial with target examples