Extractors
Extractors
Section titled “Extractors”Extractors select what content to evaluate from an agent’s response. They navigate the conversation trajectory and extract the specific piece you want to grade.
Quick overview:
- Purpose: Agent responses are complex (messages, tool calls, memory) - extractors isolate what to grade
- Built-in options: last_assistant, tool_arguments, memory_block, pattern, and more
- Flexible: Different graders can use different extractors in the same suite
- Automatic: No setup needed - just specify in your grader config
Common patterns:
last_assistant- Most common, gets the agent’s final message (90% of use cases)tool_arguments- Verify agent called the right tool with correct argsmemory_block- Check if agent updated memory correctlypattern- Extract structured data with regex
Extractors determine what part of the agent’s response gets graded. They pull out specific content from the conversation trajectory.
Why Extractors?
Section titled “Why Extractors?”An agent’s response is complex - it includes assistant messages, tool calls, tool returns, memory updates, etc. Extractors let you focus on exactly what you want to evaluate.
The evaluation flow:
Agent Response → Extractor → Submission Text → Grader → ScoreFor example:
Full trajectory: UserMessage: "What's the capital of France?" ToolCallMessage: search(query="capital of france") ToolReturnMessage: "Paris is the capital..." AssistantMessage: "The capital of France is Paris."
↓ extractor: last_assistant ↓
Extracted: "The capital of France is Paris."
↓ grader: contains (ground_truth="Paris") ↓
Score: 1.0Trajectory Structure
Section titled “Trajectory Structure”A trajectory is a list of turns, where each turn is a list of Letta messages:
[ [UserMessage(...), AssistantMessage(...), ToolCallMessage(...), ToolReturnMessage(...)], # Turn 1 [AssistantMessage(...)] # Turn 2]Extractors navigate this structure to pull out the submission text.
Built-in Extractors
Section titled “Built-in Extractors”last_assistant
Section titled “last_assistant”Extracts the last assistant message content.
graders: quality: kind: tool function: contains extractor: last_assistant # Extract final agent messageMost common extractor - gets the agent’s final response.
first_assistant
Section titled “first_assistant”Extracts the first assistant message content.
graders: initial_response: kind: tool function: contains extractor: first_assistant # Extract first agent messageUseful for testing immediate responses before tool usage.
all_assistant
Section titled “all_assistant”Concatenates all assistant messages with a separator.
graders: complete_response: kind: rubric prompt_path: rubric.txt extractor: all_assistant # Concatenate all agent messages extractor_config: separator: "\n\n" # Join messages with double newlineUse when you need the full conversation context.
last_turn
Section titled “last_turn”Extracts all assistant messages from the last turn only.
graders: final_turn: kind: tool function: contains extractor: last_turn # Messages from final turn only extractor_config: separator: " " # Join with spacesUseful when the agent makes multiple statements in the final turn.
pattern
Section titled “pattern”Extracts content matching a regex pattern from assistant messages.
graders: extract_number: kind: tool function: exact_match extractor: pattern # Extract using regex extractor_config: pattern: 'Result: (\d+)' # Regex pattern to match group: 1 # Extract capture group 1 search_all: false # Only find first matchExample: Extract “42” from “The answer is Result: 42”
tool_arguments
Section titled “tool_arguments”Extracts arguments from a specific tool call.
graders: search_query: kind: tool function: contains extractor: tool_arguments # Extract tool call arguments extractor_config: tool_name: search # Which tool to extract fromReturns the JSON arguments as a string.
Example: If agent calls search(query="pandas", limit=10), extracts:
{ "query": "pandas", "limit": 10 }tool_output
Section titled “tool_output”Extracts the return value from a specific tool call.
graders: search_results: kind: tool function: contains extractor: tool_output # Extract tool return value extractor_config: tool_name: search # Which tool's output to extractFinds the tool call and its corresponding return message.
after_marker
Section titled “after_marker”Extracts content after a specific marker string.
graders: answer_section: kind: tool function: contains extractor: after_marker # Extract content after marker extractor_config: marker: "ANSWER:" # Marker string to find include_marker: false # Don't include "ANSWER:" in outputExample: From “Here’s my analysis… ANSWER: Paris”, extracts “Paris”
memory_block
Section titled “memory_block”Extracts content from a specific memory block (requires agent_state).
graders: human_memory: kind: tool function: exact_match extractor: memory_block # Extract from agent memory extractor_config: block_label: human # Which memory block to extractImportant: This extractor requires the agent’s final state, which adds overhead. The runner automatically fetches agent_state when this extractor is used.
Example use case: Verify the agent correctly updated its memory about the user.
Extractor Configuration
Section titled “Extractor Configuration”Some extractors accept additional configuration via extractor_config:
graders: my_metric: kind: tool function: contains extractor: pattern # Use pattern extractor extractor_config: # Configuration for this extractor pattern: "Answer: (.*)" # Regex pattern group: 1 # Extract capture group 1Choosing an Extractor
Section titled “Choosing an Extractor”| Use Case | Recommended Extractor |
|---|---|
| Final agent response | last_assistant |
| First response before tools | first_assistant |
| Complete conversation | all_assistant |
| Specific format extraction | pattern |
| Tool usage validation | tool_arguments |
| Tool result checking | tool_output |
| Memory validation | memory_block |
| Structured output | after_marker |
Content Flattening
Section titled “Content Flattening”Assistant messages can contain multiple content parts. Extractors automatically flatten complex content to plain text.
Empty Extraction
Section titled “Empty Extraction”If an extractor finds no matching content, it returns an empty string "". This typically results in a score of 0.0 from the grader.
Custom Extractors
Section titled “Custom Extractors”You can write custom extractors. See Custom Extractors for details.
Example:
from letta_evals.decorators import extractorfrom letta_client import LettaMessageUnion
@extractordef my_extractor(trajectory: List[List[LettaMessageUnion]], config: dict) -> str: # Custom extraction logic return extracted_textRegister by importing in your suite’s setup script or custom evaluators file.
Multi-Metric Extraction
Section titled “Multi-Metric Extraction”Different graders can use different extractors:
graders: response_quality: # Evaluate final message quality kind: rubric prompt_path: quality.txt extractor: last_assistant # Extract final response
tool_usage: # Check tool was called correctly kind: tool function: exact_match extractor: tool_arguments # Extract tool args extractor_config: tool_name: search # From search tool
memory_update: # Verify memory updated kind: tool function: contains extractor: memory_block # Extract from memory extractor_config: block_label: human # Human memory blockEach grader independently extracts and evaluates different aspects.
Listing Extractors
Section titled “Listing Extractors”See all available extractors:
letta-evals list-extractorsExamples
Section titled “Examples”Extract Final Answer
Section titled “Extract Final Answer”extractor: last_assistant # Get final agent messageAgent: “Let me search… uses tool … The answer is Paris.” Extracted: “The answer is Paris.”
Extract Tool Arguments
Section titled “Extract Tool Arguments”extractor: tool_arguments # Get tool call argsextractor_config: tool_name: search # From search toolAgent calls: search(query="pandas", limit=5)
Extracted: {"query": "pandas", "limit": 5}
Extract Pattern
Section titled “Extract Pattern”extractor: pattern # Extract with regexextractor_config: pattern: 'RESULT: (\w+)' # Match pattern group: 1 # Extract capture group 1Agent: “After calculation… RESULT: SUCCESS” Extracted: “SUCCESS”
Extract Memory
Section titled “Extract Memory”extractor: memory_block # Extract from agent memoryextractor_config: block_label: human # Human memory blockAgent updates memory block “human” to: “User’s name is Alice” Extracted: “User’s name is Alice”
Troubleshooting
Section titled “Troubleshooting”Extractor returns empty string
Section titled “Extractor returns empty string”Problem: Grader always gives score 0.0 because extractor finds nothing.
Common causes:
- Wrong extractor: Using
first_assistantbut agent doesn’t respond until after tool use → uselast_assistant - Wrong tool name:
tool_argumentswithtool_name: "search"but agent calls"web_search"→ check actual tool name - Wrong memory block:
memory_blockwithblock_label: "user"but block is actually labeled"human"→ check block labels - Pattern doesn’t match:
pattern: "Answer: (.*)"but agent says “The answer is…” → adjust regex
Debug tips:
- Check the trajectory in results JSON to see actual agent output
- Use
last_assistantfirst to see what’s there - Verify tool names with
letta-evals list-extractors
Pattern extractor not working
Section titled “Pattern extractor not working”Problem: Pattern extractor returns empty or wrong content.
Solutions:
- Test your regex separately first
- Remember to escape special characters:
\.,\(,\) - Use
group: 0to see the full match (default) - Use
group: 1to extract first capture group - Set
search_all: trueif you need all matches
Memory block extractor fails
Section titled “Memory block extractor fails”Problem: memory_block extractor causes errors or returns nothing.
Solutions:
- Verify the block label exactly matches (case-sensitive)
- Check that agent actually has this memory block
- Remember: this adds overhead by fetching agent state
Tool extractor finds wrong tool
Section titled “Tool extractor finds wrong tool”Problem: Multiple tool calls, but extractor gets the wrong one.
Current behavior: Extractors get the first matching tool call.
Workaround: Use custom extractor to implement more specific logic.
Next Steps
Section titled “Next Steps”- Built-in Extractors Reference - Complete extractor documentation
- Custom Extractors Guide - Write your own extractors
- Graders - How to use extractors with graders