Extractors
Extractors select what content to evaluate from an agent’s response. They navigate the conversation trajectory and extract the specific piece you want to grade.
Quick overview:
- Purpose: Agent responses are complex (messages, tool calls, memory) - extractors isolate what to grade
- Built-in options: last_assistant, tool_arguments, memory_block, pattern, and more
- Flexible: Different graders can use different extractors in the same suite
- Automatic: No setup needed - just specify in your grader config
Common patterns:
last_assistant
- Most common, gets the agent’s final message (90% of use cases)tool_arguments
- Verify agent called the right tool with correct argsmemory_block
- Check if agent updated memory correctlypattern
- Extract structured data with regex
Extractors determine what part of the agent’s response gets graded. They pull out specific content from the conversation trajectory.
Why Extractors?
An agent’s response is complex - it includes assistant messages, tool calls, tool returns, memory updates, etc. Extractors let you focus on exactly what you want to evaluate.
The evaluation flow:
For example:
Trajectory Structure
A trajectory is a list of turns, where each turn is a list of Letta messages:
Extractors navigate this structure to pull out the submission text.
Built-in Extractors
last_assistant
Extracts the last assistant message content.
Most common extractor - gets the agent’s final response.
first_assistant
Extracts the first assistant message content.
Useful for testing immediate responses before tool usage.
all_assistant
Concatenates all assistant messages with a separator.
Use when you need the full conversation context.
last_turn
Extracts all assistant messages from the last turn only.
Useful when the agent makes multiple statements in the final turn.
pattern
Extracts content matching a regex pattern from assistant messages.
Example: Extract “42” from “The answer is Result: 42”
tool_arguments
Extracts arguments from a specific tool call.
Returns the JSON arguments as a string.
Example: If agent calls search(query="pandas", limit=10)
, extracts:
tool_output
Extracts the return value from a specific tool call.
Finds the tool call and its corresponding return message.
after_marker
Extracts content after a specific marker string.
Example: From “Here’s my analysis… ANSWER: Paris”, extracts “Paris”
memory_block
Extracts content from a specific memory block (requires agent_state).
Important: This extractor requires the agent’s final state, which adds overhead. The runner automatically fetches agent_state when this extractor is used.
Example use case: Verify the agent correctly updated its memory about the user.
Extractor Configuration
Some extractors accept additional configuration via extractor_config
:
Choosing an Extractor
Content Flattening
Assistant messages can contain multiple content parts. Extractors automatically flatten complex content to plain text.
Empty Extraction
If an extractor finds no matching content, it returns an empty string ""
. This typically results in a score of 0.0 from the grader.
Custom Extractors
You can write custom extractors. See Custom Extractors for details.
Example:
Register by importing in your suite’s setup script or custom evaluators file.
Multi-Metric Extraction
Different graders can use different extractors:
Each grader independently extracts and evaluates different aspects.
Listing Extractors
See all available extractors:
Examples
Extract Final Answer
Agent: “Let me search… uses tool … The answer is Paris.” Extracted: “The answer is Paris.”
Extract Tool Arguments
Agent calls: search(query="pandas", limit=5)
Extracted: {"query": "pandas", "limit": 5}
Extract Pattern
Agent: “After calculation… RESULT: SUCCESS” Extracted: “SUCCESS”
Extract Memory
Agent updates memory block “human” to: “User’s name is Alice” Extracted: “User’s name is Alice”
Troubleshooting
Extractor returns empty string
Problem: Grader always gives score 0.0 because extractor finds nothing.
Common causes:
- Wrong extractor: Using
first_assistant
but agent doesn’t respond until after tool use → uselast_assistant
- Wrong tool name:
tool_arguments
withtool_name: "search"
but agent calls"web_search"
→ check actual tool name - Wrong memory block:
memory_block
withblock_label: "user"
but block is actually labeled"human"
→ check block labels - Pattern doesn’t match:
pattern: "Answer: (.*)"
but agent says “The answer is…” → adjust regex
Debug tips:
- Check the trajectory in results JSON to see actual agent output
- Use
last_assistant
first to see what’s there - Verify tool names with
letta-evals list-extractors
Next Steps
- Built-in Extractors Reference - Complete extractor documentation
- Custom Extractors Guide - Write your own extractors
- Graders - How to use extractors with graders