Streaming agent responses

Messages from the Letta server can be streamed to the client. If you’re building a UI on the Letta API, enabling streaming allows your UI to update in real-time as the agent generates a response to an input message.

When working with agents that execute long-running operations (e.g., complex tool calls, extensive searches, or code execution), you may encounter timeouts with the message routes. See our tips on handling long-running tasks for more info.

Quick Start

Letta supports two streaming modes: step streaming (default) and token streaming.

To enable streaming, use the /v1/agents/{agent_id}/messages/stream endpoint instead of /messages:

1 import { LettaClient } from '@letta-ai/letta-client';
2 
3 const client = new LettaClient({ token: 'YOUR_API_KEY' });
4 
5 // Step streaming (default) - returns complete messages
6 const stream = await client.agents.messages.createStream(
7     agent.id, {
8         messages: [{role: "user", content: "Hello!"}]
9     }
10 );
11 for await (const chunk of stream) {
12     console.log(chunk);  // Complete message objects
13 }
14 
15 // Token streaming - returns partial chunks for real-time UX
16 const tokenStream = await client.agents.messages.createStream(
17     agent.id, {
18         messages: [{role: "user", content: "Hello!"}],
19         streamTokens: true  // Enable token streaming
20     }
21 );
22 for await (const chunk of tokenStream) {
23     console.log(chunk);  // Partial content chunks
24 }

Streaming Modes Comparison

Aspect	Step Streaming (default)	Token Streaming
What you get	Complete messages after each step	Partial chunks as tokens generate
When to use	Simple implementation	ChatGPT-like real-time UX
Reassembly needed	No	Yes (by message ID)
Message IDs	Unique per message	Same ID across chunks
Content format	Full text in each message	Incremental text pieces
Enable with	Default behavior	`stream_tokens: true`

Understanding Message Flow

Message Types and Flow Patterns

The messages you receive depend on your agent’s configuration:

With reasoning enabled (default):

Simple response: reasoning_message → assistant_message
With tool use: reasoning_message → tool_call_message → tool_return_message → reasoning_message → assistant_message

With reasoning disabled (reasoning=false):

Simple response: assistant_message
With tool use: tool_call_message → tool_return_message → assistant_message

Message Type Reference

reasoning_message: Agent’s internal thinking process (only when reasoning=true)
assistant_message: The actual response shown to the user
tool_call_message: Request to execute a tool
tool_return_message: Result from tool execution
stop_reason: Indicates end of response (end_turn)
usage_statistics: Token usage and step count metrics

Controlling Reasoning Messages

1 // With reasoning (default) - includes reasoning_message events
2 const agent = await client.agents.create({
3     model: "openai/gpt-4o-mini",
4     // reasoning: true is the default
5 });
6 
7 // Without reasoning - no reasoning_message events
8 const agentNoReasoning = await client.agents.create({
9     model: "openai/gpt-4o-mini",
10     reasoning: false  // Disable reasoning messages
11 });

Step Streaming (Default)

Step streaming delivers complete messages after each agent step completes. This is the default behavior when you use the streaming endpoint.

How It Works

Agent processes your request through steps (reasoning, tool calls, generating responses)
After each step completes, you receive a complete LettaMessage via SSE
Each message can be processed immediately without reassembly

Example

1 import { LettaClient } from '@letta-ai/letta-client';
2 import type { LettaMessage } from '@letta-ai/letta-client/api/types';
3 
4 const client = new LettaClient({ token: 'YOUR_API_KEY' });
5 
6 const stream = await client.agents.messages.createStream(
7     agent.id, {
8         messages: [{role: "user", content: "What's 2+2?"}]
9     }
10 );
11 
12 for await (const chunk of stream as AsyncIterable<LettaMessage>) {
13     if (chunk.messageType === 'reasoning_message') {
14         console.log(`Thinking: ${(chunk as any).reasoning}`);
15     } else if (chunk.messageType === 'assistant_message') {
16         console.log(`Response: ${(chunk as any).content}`);
17     }
18 }

Example Output

data: {"id":"msg-123","message_type":"reasoning_message","reasoning":"User is asking a simple math question."}
data: {"id":"msg-456","message_type":"assistant_message","content":"2 + 2 equals 4!"}
data: {"message_type":"stop_reason","stop_reason":"end_turn"}
data: {"message_type":"usage_statistics","completion_tokens":50,"total_tokens":2821}
data: [DONE]

Token Streaming

Token streaming provides partial content chunks as they’re generated by the LLM, enabling a ChatGPT-like experience where text appears character by character.

How It Works

Set stream_tokens: true in your request
Receive multiple chunks with the same message ID
Each chunk contains a piece of the content
Client must accumulate chunks by ID to rebuild complete messages

Example with Reassembly

1 import { LettaClient } from '@letta-ai/letta-client';
2 import type { LettaMessage } from '@letta-ai/letta-client/api/types';
3 
4 const client = new LettaClient({ token: 'YOUR_API_KEY' });
5 
6 // Token streaming with reassembly
7 interface MessageAccumulator {
8     type: string;
9     content: string;
10 }
11 
12 const messageAccumulators = new Map<string, MessageAccumulator>();
13 
14 const stream = await client.agents.messages.createStream(
15     agent.id, {
16         messages: [{role: "user", content: "Tell me a joke"}],
17         streamTokens: true  // Note: camelCase
18     }
19 );
20 
21 for await (const chunk of stream as AsyncIterable<LettaMessage>) {
22     if (chunk.id && chunk.messageType) {
23         const msgId = chunk.id;
24         const msgType = chunk.messageType;
25 
26         // Initialize accumulator for new messages
27         if (!messageAccumulators.has(msgId)) {
28             messageAccumulators.set(msgId, {
29                 type: msgType,
30                 content: ''
31             });
32         }
33 
34         // Accumulate content based on message type
35         const acc = messageAccumulators.get(msgId)!;
36 
37         // Only accumulate if the type matches (in case types share IDs)
38         if (acc.type === msgType) {
39             if (msgType === 'reasoning_message') {
40                 acc.content += (chunk as any).reasoning || '';
41             } else if (msgType === 'assistant_message') {
42                 acc.content += (chunk as any).content || '';
43             }
44         }
45 
46         // Update UI with accumulated content
47         process.stdout.write(acc.content);
48     }
49 }

Example Output

# Same ID across chunks of the same message
data: {"id":"msg-abc","message_type":"assistant_message","content":"Why"}
data: {"id":"msg-abc","message_type":"assistant_message","content":" did"}
data: {"id":"msg-abc","message_type":"assistant_message","content":" the"}
data: {"id":"msg-abc","message_type":"assistant_message","content":" scarecrow"}
data: {"id":"msg-abc","message_type":"assistant_message","content":" win"}
# ... more chunks with same ID
data: [DONE]

Implementation Tips

Universal Handling Pattern

The accumulator pattern shown above works for both streaming modes:

Step streaming: Each message is complete (single chunk per ID)
Token streaming: Multiple chunks per ID need accumulation

This means you can write your client code once to handle both cases.

SSE Format Notes

All streaming responses follow the Server-Sent Events (SSE) format:

Each event starts with data: followed by JSON
Stream ends with data: [DONE]
Empty lines separate events

Learn more about SSE format here.

Handling Different LLM Providers

If your Letta server connects to multiple LLM providers, some may not support token streaming. Your client code will still work - the server will fall back to step streaming automatically when token streaming isn’t available.