Streaming agent responses

Messages from the Letta server can be streamed to the client. If you’re building a UI on the Letta API, enabling streaming allows your UI to update in real-time as the agent generates a response to an input message.

When working with agents that execute long-running operations (e.g., complex tool calls, extensive searches, or code execution), you may encounter timeouts with the message routes. See our tips on handling long-running tasks for more info.

Quick Start

Letta supports two streaming modes: step streaming (default) and token streaming.

To enable streaming, use the /v1/agents/{agent_id}/messages/stream endpoint instead of /messages:

1import { LettaClient } from '@letta-ai/letta-client';
2
3const client = new LettaClient({ token: 'YOUR_API_KEY' });
4
5// Step streaming (default) - returns complete messages
6const stream = await client.agents.messages.createStream(
7 agent.id, {
8 messages: [{role: "user", content: "Hello!"}]
9 }
10);
11for await (const chunk of stream) {
12 console.log(chunk); // Complete message objects
13}
14
15// Token streaming - returns partial chunks for real-time UX
16const tokenStream = await client.agents.messages.createStream(
17 agent.id, {
18 messages: [{role: "user", content: "Hello!"}],
19 streamTokens: true // Enable token streaming
20 }
21);
22for await (const chunk of tokenStream) {
23 console.log(chunk); // Partial content chunks
24}

Streaming Modes Comparison

AspectStep Streaming (default)Token Streaming
What you getComplete messages after each stepPartial chunks as tokens generate
When to useSimple implementationChatGPT-like real-time UX
Reassembly neededNoYes (by message ID)
Message IDsUnique per messageSame ID across chunks
Content formatFull text in each messageIncremental text pieces
Enable withDefault behaviorstream_tokens: true

Understanding Message Flow

Message Types and Flow Patterns

The messages you receive depend on your agent’s configuration:

With reasoning enabled (default):

  • Simple response: reasoning_messageassistant_message
  • With tool use: reasoning_messagetool_call_messagetool_return_messagereasoning_messageassistant_message

With reasoning disabled (reasoning=false):

  • Simple response: assistant_message
  • With tool use: tool_call_messagetool_return_messageassistant_message

Message Type Reference

  • reasoning_message: Agent’s internal thinking process (only when reasoning=true)
  • assistant_message: The actual response shown to the user
  • tool_call_message: Request to execute a tool
  • tool_return_message: Result from tool execution
  • stop_reason: Indicates end of response (end_turn)
  • usage_statistics: Token usage and step count metrics

Controlling Reasoning Messages

1// With reasoning (default) - includes reasoning_message events
2const agent = await client.agents.create({
3 model: "openai/gpt-4o-mini",
4 // reasoning: true is the default
5});
6
7// Without reasoning - no reasoning_message events
8const agentNoReasoning = await client.agents.create({
9 model: "openai/gpt-4o-mini",
10 reasoning: false // Disable reasoning messages
11});

Step Streaming (Default)

Step streaming delivers complete messages after each agent step completes. This is the default behavior when you use the streaming endpoint.

How It Works

  1. Agent processes your request through steps (reasoning, tool calls, generating responses)
  2. After each step completes, you receive a complete LettaMessage via SSE
  3. Each message can be processed immediately without reassembly

Example

1import { LettaClient } from '@letta-ai/letta-client';
2import type { LettaMessage } from '@letta-ai/letta-client/api/types';
3
4const client = new LettaClient({ token: 'YOUR_API_KEY' });
5
6const stream = await client.agents.messages.createStream(
7 agent.id, {
8 messages: [{role: "user", content: "What's 2+2?"}]
9 }
10);
11
12for await (const chunk of stream as AsyncIterable<LettaMessage>) {
13 if (chunk.messageType === 'reasoning_message') {
14 console.log(`Thinking: ${(chunk as any).reasoning}`);
15 } else if (chunk.messageType === 'assistant_message') {
16 console.log(`Response: ${(chunk as any).content}`);
17 }
18}

Example Output

data: {"id":"msg-123","message_type":"reasoning_message","reasoning":"User is asking a simple math question."}
data: {"id":"msg-456","message_type":"assistant_message","content":"2 + 2 equals 4!"}
data: {"message_type":"stop_reason","stop_reason":"end_turn"}
data: {"message_type":"usage_statistics","completion_tokens":50,"total_tokens":2821}
data: [DONE]

Token Streaming

Token streaming provides partial content chunks as they’re generated by the LLM, enabling a ChatGPT-like experience where text appears character by character.

How It Works

  1. Set stream_tokens: true in your request
  2. Receive multiple chunks with the same message ID
  3. Each chunk contains a piece of the content
  4. Client must accumulate chunks by ID to rebuild complete messages

Example with Reassembly

1import { LettaClient } from '@letta-ai/letta-client';
2import type { LettaMessage } from '@letta-ai/letta-client/api/types';
3
4const client = new LettaClient({ token: 'YOUR_API_KEY' });
5
6// Token streaming with reassembly
7interface MessageAccumulator {
8 type: string;
9 content: string;
10}
11
12const messageAccumulators = new Map<string, MessageAccumulator>();
13
14const stream = await client.agents.messages.createStream(
15 agent.id, {
16 messages: [{role: "user", content: "Tell me a joke"}],
17 streamTokens: true // Note: camelCase
18 }
19);
20
21for await (const chunk of stream as AsyncIterable<LettaMessage>) {
22 if (chunk.id && chunk.messageType) {
23 const msgId = chunk.id;
24 const msgType = chunk.messageType;
25
26 // Initialize accumulator for new messages
27 if (!messageAccumulators.has(msgId)) {
28 messageAccumulators.set(msgId, {
29 type: msgType,
30 content: ''
31 });
32 }
33
34 // Accumulate content based on message type
35 const acc = messageAccumulators.get(msgId)!;
36
37 // Only accumulate if the type matches (in case types share IDs)
38 if (acc.type === msgType) {
39 if (msgType === 'reasoning_message') {
40 acc.content += (chunk as any).reasoning || '';
41 } else if (msgType === 'assistant_message') {
42 acc.content += (chunk as any).content || '';
43 }
44 }
45
46 // Update UI with accumulated content
47 process.stdout.write(acc.content);
48 }
49}

Example Output

# Same ID across chunks of the same message
data: {"id":"msg-abc","message_type":"assistant_message","content":"Why"}
data: {"id":"msg-abc","message_type":"assistant_message","content":" did"}
data: {"id":"msg-abc","message_type":"assistant_message","content":" the"}
data: {"id":"msg-abc","message_type":"assistant_message","content":" scarecrow"}
data: {"id":"msg-abc","message_type":"assistant_message","content":" win"}
# ... more chunks with same ID
data: [DONE]

Implementation Tips

Universal Handling Pattern

The accumulator pattern shown above works for both streaming modes:

  • Step streaming: Each message is complete (single chunk per ID)
  • Token streaming: Multiple chunks per ID need accumulation

This means you can write your client code once to handle both cases.

SSE Format Notes

All streaming responses follow the Server-Sent Events (SSE) format:

  • Each event starts with data: followed by JSON
  • Stream ends with data: [DONE]
  • Empty lines separate events

Learn more about SSE format here.

Handling Different LLM Providers

If your Letta server connects to multiple LLM providers, some may not support token streaming. Your client code will still work - the server will fall back to step streaming automatically when token streaming isn’t available.