Extractors

Extractors select what content to evaluate from an agent’s response. They navigate the conversation trajectory and extract the specific piece you want to grade.

Quick overview:

  • Purpose: Agent responses are complex (messages, tool calls, memory) - extractors isolate what to grade
  • Built-in options: last_assistant, tool_arguments, memory_block, pattern, and more
  • Flexible: Different graders can use different extractors in the same suite
  • Automatic: No setup needed - just specify in your grader config

Common patterns:

  • last_assistant - Most common, gets the agent’s final message (90% of use cases)
  • tool_arguments - Verify agent called the right tool with correct args
  • memory_block - Check if agent updated memory correctly
  • pattern - Extract structured data with regex

Extractors determine what part of the agent’s response gets graded. They pull out specific content from the conversation trajectory.

Why Extractors?

An agent’s response is complex - it includes assistant messages, tool calls, tool returns, memory updates, etc. Extractors let you focus on exactly what you want to evaluate.

The evaluation flow:

Agent Response → Extractor → Submission Text → Grader → Score

For example:

Full trajectory:
UserMessage: "What's the capital of France?"
ToolCallMessage: search(query="capital of france")
ToolReturnMessage: "Paris is the capital..."
AssistantMessage: "The capital of France is Paris."
↓ extractor: last_assistant ↓
Extracted: "The capital of France is Paris."
↓ grader: contains (ground_truth="Paris") ↓
Score: 1.0

Trajectory Structure

A trajectory is a list of turns, where each turn is a list of Letta messages:

1[
2 [UserMessage(...), AssistantMessage(...), ToolCallMessage(...), ToolReturnMessage(...)], # Turn 1
3 [AssistantMessage(...)] # Turn 2
4]

Extractors navigate this structure to pull out the submission text.

Built-in Extractors

last_assistant

Extracts the last assistant message content.

1graders:
2 quality:
3 kind: tool
4 function: contains
5 extractor: last_assistant # Extract final agent message

Most common extractor - gets the agent’s final response.

first_assistant

Extracts the first assistant message content.

1graders:
2 initial_response:
3 kind: tool
4 function: contains
5 extractor: first_assistant # Extract first agent message

Useful for testing immediate responses before tool usage.

all_assistant

Concatenates all assistant messages with a separator.

1graders:
2 complete_response:
3 kind: rubric
4 prompt_path: rubric.txt
5 extractor: all_assistant # Concatenate all agent messages
6 extractor_config:
7 separator: "\n\n" # Join messages with double newline

Use when you need the full conversation context.

last_turn

Extracts all assistant messages from the last turn only.

1graders:
2 final_turn:
3 kind: tool
4 function: contains
5 extractor: last_turn # Messages from final turn only
6 extractor_config:
7 separator: " " # Join with spaces

Useful when the agent makes multiple statements in the final turn.

pattern

Extracts content matching a regex pattern from assistant messages.

1graders:
2 extract_number:
3 kind: tool
4 function: exact_match
5 extractor: pattern # Extract using regex
6 extractor_config:
7 pattern: 'Result: (\d+)' # Regex pattern to match
8 group: 1 # Extract capture group 1
9 search_all: false # Only find first match

Example: Extract “42” from “The answer is Result: 42”

tool_arguments

Extracts arguments from a specific tool call.

1graders:
2 search_query:
3 kind: tool
4 function: contains
5 extractor: tool_arguments # Extract tool call arguments
6 extractor_config:
7 tool_name: search # Which tool to extract from

Returns the JSON arguments as a string.

Example: If agent calls search(query="pandas", limit=10), extracts:

1{"query": "pandas", "limit": 10}

tool_output

Extracts the return value from a specific tool call.

1graders:
2 search_results:
3 kind: tool
4 function: contains
5 extractor: tool_output # Extract tool return value
6 extractor_config:
7 tool_name: search # Which tool's output to extract

Finds the tool call and its corresponding return message.

after_marker

Extracts content after a specific marker string.

1graders:
2 answer_section:
3 kind: tool
4 function: contains
5 extractor: after_marker # Extract content after marker
6 extractor_config:
7 marker: "ANSWER:" # Marker string to find
8 include_marker: false # Don't include "ANSWER:" in output

Example: From “Here’s my analysis… ANSWER: Paris”, extracts “Paris”

memory_block

Extracts content from a specific memory block (requires agent_state).

1graders:
2 human_memory:
3 kind: tool
4 function: exact_match
5 extractor: memory_block # Extract from agent memory
6 extractor_config:
7 block_label: human # Which memory block to extract

Important: This extractor requires the agent’s final state, which adds overhead. The runner automatically fetches agent_state when this extractor is used.

Example use case: Verify the agent correctly updated its memory about the user.

Extractor Configuration

Some extractors accept additional configuration via extractor_config:

1graders:
2 my_metric:
3 kind: tool
4 function: contains
5 extractor: pattern # Use pattern extractor
6 extractor_config: # Configuration for this extractor
7 pattern: 'Answer: (.*)' # Regex pattern
8 group: 1 # Extract capture group 1

Choosing an Extractor

Use CaseRecommended Extractor
Final agent responselast_assistant
First response before toolsfirst_assistant
Complete conversationall_assistant
Specific format extractionpattern
Tool usage validationtool_arguments
Tool result checkingtool_output
Memory validationmemory_block
Structured outputafter_marker

Content Flattening

Assistant messages can contain multiple content parts. Extractors automatically flatten complex content to plain text.

Empty Extraction

If an extractor finds no matching content, it returns an empty string "". This typically results in a score of 0.0 from the grader.

Custom Extractors

You can write custom extractors. See Custom Extractors for details.

Example:

1from letta_evals.decorators import extractor
2from letta_client import LettaMessageUnion
3
4@extractor
5def my_extractor(trajectory: List[List[LettaMessageUnion]], config: dict) -> str:
6 # Custom extraction logic
7 return extracted_text

Register by importing in your suite’s setup script or custom evaluators file.

Multi-Metric Extraction

Different graders can use different extractors:

1graders:
2 response_quality: # Evaluate final message quality
3 kind: rubric
4 prompt_path: quality.txt
5 extractor: last_assistant # Extract final response
6
7 tool_usage: # Check tool was called correctly
8 kind: tool
9 function: exact_match
10 extractor: tool_arguments # Extract tool args
11 extractor_config:
12 tool_name: search # From search tool
13
14 memory_update: # Verify memory updated
15 kind: tool
16 function: contains
17 extractor: memory_block # Extract from memory
18 extractor_config:
19 block_label: human # Human memory block

Each grader independently extracts and evaluates different aspects.

Listing Extractors

See all available extractors:

$letta-evals list-extractors

Examples

Extract Final Answer

1extractor: last_assistant # Get final agent message

Agent: “Let me search… uses tool … The answer is Paris.” Extracted: “The answer is Paris.”

Extract Tool Arguments

1extractor: tool_arguments # Get tool call args
2extractor_config:
3 tool_name: search # From search tool

Agent calls: search(query="pandas", limit=5) Extracted: {"query": "pandas", "limit": 5}

Extract Pattern

1extractor: pattern # Extract with regex
2extractor_config:
3 pattern: 'RESULT: (\w+)' # Match pattern
4 group: 1 # Extract capture group 1

Agent: “After calculation… RESULT: SUCCESS” Extracted: “SUCCESS”

Extract Memory

1extractor: memory_block # Extract from agent memory
2extractor_config:
3 block_label: human # Human memory block

Agent updates memory block “human” to: “User’s name is Alice” Extracted: “User’s name is Alice”

Troubleshooting

Extractor returns empty string

Problem: Grader always gives score 0.0 because extractor finds nothing.

Common causes:

  • Wrong extractor: Using first_assistant but agent doesn’t respond until after tool use → use last_assistant
  • Wrong tool name: tool_arguments with tool_name: "search" but agent calls "web_search" → check actual tool name
  • Wrong memory block: memory_block with block_label: "user" but block is actually labeled "human" → check block labels
  • Pattern doesn’t match: pattern: "Answer: (.*)" but agent says “The answer is…” → adjust regex

Debug tips:

  1. Check the trajectory in results JSON to see actual agent output
  2. Use last_assistant first to see what’s there
  3. Verify tool names with letta-evals list-extractors

Next Steps