Multi-turn conversations

Development tools

Testing & evals

Advanced

Multi-turn conversations allow you to test how agents handle context across multiple exchanges.

Why Use Multi-Turn?

Multi-turn conversations enable testing that single-turn prompts cannot:

Memory storage: Verify agents persist information to memory blocks
Tool call sequences: Test multi-step workflows
Context retention: Ensure agents remember details from earlier
State evolution: Track how agent state changes across interactions
Conversational coherence: Test if agents maintain context appropriately

Format

Single-Turn (Default)

{
  "input": "What is the capital of France?",
  "ground_truth": "Paris"
}

Multi-Turn

{
  "input": [
    "My name is Alice",
    "What's my name?"
  ],
  "ground_truth": "Alice"
}

The agent processes each input in sequence, with state carrying over between turns.

Example 1: Memory Recall Testing

Test if the agent remembers information across turns:

{
  "input": [
    "Remember that my favorite color is blue",
    "What's my favorite color?"
  ],
  "ground_truth": "blue"
}

Suite configuration:

graders:
  response_check:
    kind: tool
    function: contains
    extractor: last_assistant # Check the agent's response

Example 2: Memory Correction Testing

Test if the agent correctly updates memory when users correct themselves:

{
  "input": [
    "Please remember that I like bananas.",
    "Actually, sorry, I meant I like apples."
  ],
  "ground_truth": "apples"
}

Suite configuration:

graders:
  memory_check:
    kind: tool
    function: contains
    extractor: memory_block
    extractor_config:
      block_label: human # Check the actual memory block, not just the response

When to Test Memory Blocks vs. Responses

Use last_assistant or all_assistant extractors when:

Testing what the agent says in conversation
Verifying response content and phrasing
Checking conversational coherence

Use memory_block extractor when:

Verifying information was actually stored in memory
Testing memory updates and corrections
Validating persistent state changes
Ensuring the agent’s internal state is correct

See the multiturn-memory-block-extractor example for a complete working implementation.

Next Steps

Datasets - Creating test datasets
Extractors - Extracting from trajectories
Targets - Agent lifecycle and testing behavior