Extractors

Development tools

Testing & evals

Core concepts

Extractors select what content to evaluate from an agent’s response. They navigate the conversation trajectory and extract the specific piece you want to grade.

Common patterns:

last_assistant - Most common, gets the agent’s final message (90% of use cases)
tool_arguments - Verify agent called the right tool with correct args
memory_block - Check if agent updated memory correctly
pattern - Extract structured data with regex

Extractors determine what part of the agent’s response gets graded. They pull out specific content from the conversation trajectory.

Why Extractors?

An agent’s response is complex - it includes assistant messages, tool calls, tool returns, memory updates, etc. Extractors let you focus on exactly what you want to evaluate.

The evaluation flow:

Agent Response → Extractor → Submission Text → Grader → Score

For example:

Full trajectory:
  UserMessage: "What's the capital of France?"
  ToolCallMessage: search(query="capital of france")
  ToolReturnMessage: "Paris is the capital..."
  AssistantMessage: "The capital of France is Paris."

↓ extractor: last_assistant ↓

Extracted: "The capital of France is Paris."

↓ grader: contains (ground_truth="Paris") ↓

Score: 1.0

Trajectory Structure

A trajectory is a list of turns, where each turn is a list of Letta messages:

[
  [UserMessage(...), AssistantMessage(...), ToolCallMessage(...), ToolReturnMessage(...)],  # Turn 1
  [AssistantMessage(...)]  # Turn 2
]

Extractors navigate this structure to pull out the submission text.

Built-in Extractors

last_assistant

Extracts the last assistant message content.

graders:
  quality:
    kind: tool
    function: contains
    extractor: last_assistant # Extract final agent message

Most common extractor - gets the agent’s final response.

first_assistant

Extracts the first assistant message content.

graders:
  initial_response:
    kind: tool
    function: contains
    extractor: first_assistant # Extract first agent message

Useful for testing immediate responses before tool usage.

all_assistant

Concatenates all assistant messages with a separator.

graders:
  complete_response:
    kind: rubric
    prompt_path: rubric.txt
    extractor: all_assistant # Concatenate all agent messages
    extractor_config:
      separator: "\n\n" # Join messages with double newline

Use when you need the full conversation context.

last_turn

Extracts all assistant messages from the last turn only.

graders:
  final_turn:
    kind: tool
    function: contains
    extractor: last_turn # Messages from final turn only
    extractor_config:
      separator: " " # Join with spaces

Useful when the agent makes multiple statements in the final turn.

pattern

Extracts content matching a regex pattern from assistant messages.

graders:
  extract_number:
    kind: tool
    function: exact_match
    extractor: pattern # Extract using regex
    extractor_config:
      pattern: 'Result: (\d+)' # Regex pattern to match
      group: 1 # Extract capture group 1
      search_all: false # Only find first match

Example: Extract “42” from “The answer is Result: 42”

tool_arguments

Extracts arguments from a specific tool call.

graders:
  search_query:
    kind: tool
    function: contains
    extractor: tool_arguments # Extract tool call arguments
    extractor_config:
      tool_name: search # Which tool to extract from

Returns the JSON arguments as a string.

Example: If agent calls search(query="pandas", limit=10), extracts:

{ "query": "pandas", "limit": 10 }

tool_output

Extracts the return value from a specific tool call.

graders:
  search_results:
    kind: tool
    function: contains
    extractor: tool_output # Extract tool return value
    extractor_config:
      tool_name: search # Which tool's output to extract

Finds the tool call and its corresponding return message.

after_marker

Extracts content after a specific marker string.

graders:
  answer_section:
    kind: tool
    function: contains
    extractor: after_marker # Extract content after marker
    extractor_config:
      marker: "ANSWER:" # Marker string to find
      include_marker: false # Don't include "ANSWER:" in output

Example: From “Here’s my analysis… ANSWER: Paris”, extracts “Paris”

memory_block

Extracts content from a specific memory block (requires agent_state).

graders:
  human_memory:
    kind: tool
    function: exact_match
    extractor: memory_block # Extract from agent memory
    extractor_config:
      block_label: human # Which memory block to extract

Example use case: Verify the agent correctly updated its memory about the user.

Extractor Configuration

Some extractors accept additional configuration via extractor_config:

graders:
  my_metric:
    kind: tool
    function: contains
    extractor: pattern # Use pattern extractor
    extractor_config: # Configuration for this extractor
      pattern: "Answer: (.*)" # Regex pattern
      group: 1 # Extract capture group 1

Choosing an Extractor

Use Case	Recommended Extractor
Final agent response	`last_assistant`
First response before tools	`first_assistant`
Complete conversation	`all_assistant`
Specific format extraction	`pattern`
Tool usage validation	`tool_arguments`
Tool result checking	`tool_output`
Memory validation	`memory_block`
Structured output	`after_marker`

Content Flattening

Assistant messages can contain multiple content parts. Extractors automatically flatten complex content to plain text.

Empty Extraction

If an extractor finds no matching content, it returns an empty string "". This typically results in a score of 0.0 from the grader.

Custom Extractors

You can write custom extractors. See Custom Extractors for details.

Example:

from letta_evals.decorators import extractor
from letta_client import LettaMessageUnion

@extractor
def my_extractor(trajectory: List[List[LettaMessageUnion]], config: dict) -> str:
    # Custom extraction logic
    return extracted_text

Multi-Metric Extraction

Different graders can use different extractors:

graders:
  response_quality: # Evaluate final message quality
    kind: rubric
    prompt_path: quality.txt
    extractor: last_assistant # Extract final response

  tool_usage: # Check tool was called correctly
    kind: tool
    function: exact_match
    extractor: tool_arguments # Extract tool args
    extractor_config:
      tool_name: search # From search tool

  memory_update: # Verify memory updated
    kind: tool
    function: contains
    extractor: memory_block # Extract from memory
    extractor_config:
      block_label: human # Human memory block

Each grader independently extracts and evaluates different aspects.

Listing Extractors

See all available extractors:

letta-evals list-extractors

Examples

Extract Final Answer

extractor: last_assistant # Get final agent message

Agent: “Let me search… uses tool … The answer is Paris.” Extracted: “The answer is Paris.”

Extract Tool Arguments

extractor: tool_arguments # Get tool call args
extractor_config:
  tool_name: search # From search tool

Agent calls: search(query="pandas", limit=5) Extracted: {"query": "pandas", "limit": 5}

Extract Pattern

extractor: pattern # Extract with regex
extractor_config:
  pattern: 'RESULT: (\w+)' # Match pattern
  group: 1 # Extract capture group 1

Agent: “After calculation… RESULT: SUCCESS” Extracted: “SUCCESS”

Extract Memory

extractor: memory_block # Extract from agent memory
extractor_config:
  block_label: human # Human memory block

Agent updates memory block “human” to: “User’s name is Alice” Extracted: “User’s name is Alice”

Troubleshooting

Next Steps

Built-in Extractors Reference - Complete extractor documentation
Custom Extractors Guide - Write your own extractors
Graders - How to use extractors with graders