Streaming agent responses
Messages from the Letta server can be streamed to the client. If you’re building a UI on the Letta API, enabling streaming allows your UI to update in real-time as the agent generates a response to an input message.
When working with agents that execute long-running operations (e.g., complex tool calls, extensive searches, or code execution), you may encounter timeouts with the message routes. See our tips on handling long-running tasks for more info.
Quick Start
Letta supports two streaming modes: step streaming (default) and token streaming.
To enable streaming, use the /v1/agents/{agent_id}/messages/stream
endpoint instead of /messages
:
Streaming Modes Comparison
Understanding Message Flow
Message Types and Flow Patterns
The messages you receive depend on your agent’s configuration:
With reasoning enabled (default):
- Simple response:
reasoning_message
→assistant_message
- With tool use:
reasoning_message
→tool_call_message
→tool_return_message
→reasoning_message
→assistant_message
With reasoning disabled (reasoning=false
):
- Simple response:
assistant_message
- With tool use:
tool_call_message
→tool_return_message
→assistant_message
Message Type Reference
reasoning_message
: Agent’s internal thinking process (only whenreasoning=true
)assistant_message
: The actual response shown to the usertool_call_message
: Request to execute a tooltool_return_message
: Result from tool executionstop_reason
: Indicates end of response (end_turn
)usage_statistics
: Token usage and step count metrics
Controlling Reasoning Messages
Step Streaming (Default)
Step streaming delivers complete messages after each agent step completes. This is the default behavior when you use the streaming endpoint.
How It Works
- Agent processes your request through steps (reasoning, tool calls, generating responses)
- After each step completes, you receive a complete
LettaMessage
via SSE - Each message can be processed immediately without reassembly
Example
Example Output
Token Streaming
Token streaming provides partial content chunks as they’re generated by the LLM, enabling a ChatGPT-like experience where text appears character by character.
How It Works
- Set
stream_tokens: true
in your request - Receive multiple chunks with the same message ID
- Each chunk contains a piece of the content
- Client must accumulate chunks by ID to rebuild complete messages
Example with Reassembly
Example Output
Implementation Tips
Universal Handling Pattern
The accumulator pattern shown above works for both streaming modes:
- Step streaming: Each message is complete (single chunk per ID)
- Token streaming: Multiple chunks per ID need accumulation
This means you can write your client code once to handle both cases.
SSE Format Notes
All streaming responses follow the Server-Sent Events (SSE) format:
- Each event starts with
data:
followed by JSON - Stream ends with
data: [DONE]
- Empty lines separate events
Learn more about SSE format here.
Handling Different LLM Providers
If your Letta server connects to multiple LLM providers, some may not support token streaming. Your client code will still work - the server will fall back to step streaming automatically when token streaming isn’t available.