Extractors

Extractors select what content to evaluate from an agent’s response. They navigate the conversation trajectory and extract the specific piece you want to grade.

Quick overview:

Purpose: Agent responses are complex (messages, tool calls, memory) - extractors isolate what to grade
Built-in options: last_assistant, tool_arguments, memory_block, pattern, and more
Flexible: Different graders can use different extractors in the same suite
Automatic: No setup needed - just specify in your grader config

Common patterns:

last_assistant - Most common, gets the agent’s final message (90% of use cases)
tool_arguments - Verify agent called the right tool with correct args
memory_block - Check if agent updated memory correctly
pattern - Extract structured data with regex

Extractors determine what part of the agent’s response gets graded. They pull out specific content from the conversation trajectory.

Why Extractors?

An agent’s response is complex - it includes assistant messages, tool calls, tool returns, memory updates, etc. Extractors let you focus on exactly what you want to evaluate.

The evaluation flow:

Agent Response → Extractor → Submission Text → Grader → Score

For example:

Full trajectory:
  UserMessage: "What's the capital of France?"
  ToolCallMessage: search(query="capital of france")
  ToolReturnMessage: "Paris is the capital..."
  AssistantMessage: "The capital of France is Paris."

↓ extractor: last_assistant ↓

Extracted: "The capital of France is Paris."

↓ grader: contains (ground_truth="Paris") ↓

Score: 1.0

Trajectory Structure

A trajectory is a list of turns, where each turn is a list of Letta messages:

[
  [UserMessage(...), AssistantMessage(...), ToolCallMessage(...), ToolReturnMessage(...)],  # Turn 1
  [AssistantMessage(...)]  # Turn 2
]

Extractors navigate this structure to pull out the submission text.

Built-in Extractors

last_assistant

Extracts the last assistant message content.

graders:
  quality:
    kind: tool
    function: contains
    extractor: last_assistant # Extract final agent message

Most common extractor - gets the agent’s final response.

first_assistant

Extracts the first assistant message content.

graders:
  initial_response:
    kind: tool
    function: contains
    extractor: first_assistant # Extract first agent message

Useful for testing immediate responses before tool usage.

all_assistant

Concatenates all assistant messages with a separator.

graders:
  complete_response:
    kind: rubric
    prompt_path: rubric.txt
    extractor: all_assistant # Concatenate all agent messages
    extractor_config:
      separator: "\n\n" # Join messages with double newline

Use when you need the full conversation context.

last_turn

Extracts all assistant messages from the last turn only.

graders:
  final_turn:
    kind: tool
    function: contains
    extractor: last_turn # Messages from final turn only
    extractor_config:
      separator: " " # Join with spaces

Useful when the agent makes multiple statements in the final turn.

pattern

Extracts content matching a regex pattern from assistant messages.

graders:
  extract_number:
    kind: tool
    function: exact_match
    extractor: pattern # Extract using regex
    extractor_config:
      pattern: 'Result: (\d+)' # Regex pattern to match
      group: 1 # Extract capture group 1
      search_all: false # Only find first match

Example: Extract “42” from “The answer is Result: 42”

tool_arguments

Extracts arguments from a specific tool call.

graders:
  search_query:
    kind: tool
    function: contains
    extractor: tool_arguments # Extract tool call arguments
    extractor_config:
      tool_name: search # Which tool to extract from

Returns the JSON arguments as a string.

Example: If agent calls search(query="pandas", limit=10), extracts:

{ "query": "pandas", "limit": 10 }

tool_output

Extracts the return value from a specific tool call.

graders:
  search_results:
    kind: tool
    function: contains
    extractor: tool_output # Extract tool return value
    extractor_config:
      tool_name: search # Which tool's output to extract

Finds the tool call and its corresponding return message.

after_marker

Extracts content after a specific marker string.

graders:
  answer_section:
    kind: tool
    function: contains
    extractor: after_marker # Extract content after marker
    extractor_config:
      marker: "ANSWER:" # Marker string to find
      include_marker: false # Don't include "ANSWER:" in output

Example: From “Here’s my analysis… ANSWER: Paris”, extracts “Paris”

memory_block

Extracts content from a specific memory block (requires agent_state).

graders:
  human_memory:
    kind: tool
    function: exact_match
    extractor: memory_block # Extract from agent memory
    extractor_config:
      block_label: human # Which memory block to extract

Important: This extractor requires the agent’s final state, which adds overhead. The runner automatically fetches agent_state when this extractor is used.

Example use case: Verify the agent correctly updated its memory about the user.

Extractor Configuration

Some extractors accept additional configuration via extractor_config:

graders:
  my_metric:
    kind: tool
    function: contains
    extractor: pattern # Use pattern extractor
    extractor_config: # Configuration for this extractor
      pattern: "Answer: (.*)" # Regex pattern
      group: 1 # Extract capture group 1

Choosing an Extractor

Use Case	Recommended Extractor
Final agent response	`last_assistant`
First response before tools	`first_assistant`
Complete conversation	`all_assistant`
Specific format extraction	`pattern`
Tool usage validation	`tool_arguments`
Tool result checking	`tool_output`
Memory validation	`memory_block`
Structured output	`after_marker`

Content Flattening

Assistant messages can contain multiple content parts. Extractors automatically flatten complex content to plain text.

Empty Extraction

If an extractor finds no matching content, it returns an empty string "". This typically results in a score of 0.0 from the grader.

Custom Extractors

You can write custom extractors. See Custom Extractors for details.

Example:

from letta_evals.decorators import extractor
from letta_client import LettaMessageUnion

@extractor
def my_extractor(trajectory: List[List[LettaMessageUnion]], config: dict) -> str:
    # Custom extraction logic
    return extracted_text

Multi-Metric Extraction

Different graders can use different extractors:

graders:
  response_quality: # Evaluate final message quality
    kind: rubric
    prompt_path: quality.txt
    extractor: last_assistant # Extract final response

  tool_usage: # Check tool was called correctly
    kind: tool
    function: exact_match
    extractor: tool_arguments # Extract tool args
    extractor_config:
      tool_name: search # From search tool

  memory_update: # Verify memory updated
    kind: tool
    function: contains
    extractor: memory_block # Extract from memory
    extractor_config:
      block_label: human # Human memory block

Each grader independently extracts and evaluates different aspects.

Listing Extractors

See all available extractors:

letta-evals list-extractors

Examples

Extract Final Answer

extractor: last_assistant # Get final agent message

Agent: “Let me search… uses tool … The answer is Paris.” Extracted: “The answer is Paris.”

Extract Tool Arguments

extractor: tool_arguments # Get tool call args
extractor_config:
  tool_name: search # From search tool

Agent calls: search(query="pandas", limit=5) Extracted: {"query": "pandas", "limit": 5}

Extract Pattern

extractor: pattern # Extract with regex
extractor_config:
  pattern: 'RESULT: (\w+)' # Match pattern
  group: 1 # Extract capture group 1

Agent: “After calculation… RESULT: SUCCESS” Extracted: “SUCCESS”

Extract Memory

extractor: memory_block # Extract from agent memory
extractor_config:
  block_label: human # Human memory block

Agent updates memory block “human” to: “User’s name is Alice” Extracted: “User’s name is Alice”

Troubleshooting

Extractor returns empty string

Problem: Grader always gives score 0.0 because extractor finds nothing.

Common causes:

Wrong extractor: Using first_assistant but agent doesn’t respond until after tool use → use last_assistant
Wrong tool name: tool_arguments with tool_name: "search" but agent calls "web_search" → check actual tool name
Wrong memory block: memory_block with block_label: "user" but block is actually labeled "human" → check block labels
Pattern doesn’t match: pattern: "Answer: (.*)" but agent says “The answer is…” → adjust regex

Debug tips:

Check the trajectory in results JSON to see actual agent output
Use last_assistant first to see what’s there
Verify tool names with letta-evals list-extractors

Pattern extractor not working

Problem: Pattern extractor returns empty or wrong content.

Solutions:

Test your regex separately first
Remember to escape special characters: \., \(, \)
Use group: 0 to see the full match (default)
Use group: 1 to extract first capture group
Set search_all: true if you need all matches

Memory block extractor fails

Problem: memory_block extractor causes errors or returns nothing.

Solutions:

Verify the block label exactly matches (case-sensitive)
Check that agent actually has this memory block
Remember: this adds overhead by fetching agent state

Tool extractor finds wrong tool

Problem: Multiple tool calls, but extractor gets the wrong one.

Current behavior: Extractors get the first matching tool call.

Workaround: Use custom extractor to implement more specific logic.

Next Steps

Built-in Extractors Reference - Complete extractor documentation
Custom Extractors Guide - Write your own extractors
Graders - How to use extractors with graders