---
title: Streaming | Letta Docs
description: Learn how to stream messages and tokens in the Letta API
---

Messages from the Letta API can be streamed to the client. If you’re building a UI on the Letta API, enabling streaming allows your UI to update in real-time as the agent generates a response to an input message.

Streaming is enabled by default when using the [Letta Code SDK](/letta-code-sdk/quickstart/index.md).

When using agents that execute long-running operations (e.g., 1 minute or more), you may encounter timeouts with the default message routes. See our [tips on handling long-running executions](/guides/core-concepts/messages/long-running-executions/index.md) for more info.

## Quickstart

Letta supports two streaming modes: **step streaming** (default) and **token streaming**. To enable step streaming, set the `streaming` flag to `true` in the [`agent.messages.create` route](/api/resources/agents/subresources/messages/methods/create/index.md).

- [TypeScript](#tab-panel-201)
- [Python](#tab-panel-202)

```
import Letta from "@letta-ai/letta-client";


const client = new Letta({ apiKey: process.env.LETTA_API_KEY });


// Step streaming (default) - returns complete messages
const stream = await client.agents.messages.create(agent.id, {
  messages: [{ role: "user", content: "Hello!" }],
  streaming: true,
});
for await (const chunk of stream) {
  console.log(chunk); // Complete message objects
}


// Token streaming - returns partial chunks for real-time UX
const tokenStream = await client.agents.messages.create(agent.id, {
  messages: [{ role: "user", content: "Hello!" }],
  streaming: true,
  stream_tokens: true, // Enable token streaming
});
for await (const chunk of tokenStream) {
  console.log(chunk); // Partial content chunks
}
```

```
# Step streaming (default) - returns complete messages
stream = client.agents.messages.create(
    agent_id=agent.id,
    messages=[{"role": "user", "content": "Hello!"}],
    streaming=True
)
for chunk in stream:
    print(chunk)  # Complete message objects


# Token streaming - returns partial chunks for real-time UX
stream = client.agents.messages.create(
    agent_id=agent.id,
    messages=[{"role": "user", "content": "Hello!"}],
    streaming=True,
    stream_tokens=True  # Enable token streaming
)
for chunk in stream:
    print(chunk)  # Partial content chunks
```

## Streaming Modes Comparison

| Aspect                | Step Streaming (default)          | Token Streaming                   |
| --------------------- | --------------------------------- | --------------------------------- |
| **What you get**      | Complete messages after each step | Partial chunks as tokens generate |
| **When to use**       | Simple implementation             | ChatGPT-like real-time UX         |
| **Reassembly needed** | No                                | Yes (by message ID)               |
| **Message IDs**       | Unique per message                | Same ID across chunks             |
| **Content format**    | Full text in each message         | Incremental text pieces           |
| **Enable with**       | Default behavior                  | `stream_tokens: true`             |

## Understanding Message Flow

### Message Types and Flow Patterns

The messages you receive depend on your agent’s configuration:

**With reasoning enabled (default):**

- Simple response: `reasoning_message` → `assistant_message`
- With tool use: `reasoning_message` → `tool_call_message` → `tool_return_message` → `reasoning_message` → `assistant_message`

**With reasoning disabled (`reasoning=false`):**

- Simple response: `assistant_message`
- With tool use: `tool_call_message` → `tool_return_message` → `assistant_message`

### Message Type Reference

- **`reasoning_message`**: Agent’s internal thinking process (only when `reasoning=true`)
- **`assistant_message`**: The actual response shown to the user
- **`tool_call_message`**: Request to execute a tool
- **`tool_return_message`**: Result from tool execution
- **`stop_reason`**: Indicates end of response (`end_turn`)
- **`usage_statistics`**: Token usage and step count metrics

### Controlling Reasoning Messages

- [TypeScript](#tab-panel-193)
- [Python](#tab-panel-194)

```
// With reasoning (default) - includes reasoning_message events
const agent = await client.agents.create({
  model: "openai/gpt-4o-mini",
  // reasoning: true is the default
});


// Without reasoning - no reasoning_message events
const agentNoReasoning = await client.agents.create({
  model: "openai/gpt-4o-mini",
  reasoning: false, // Disable reasoning messages
});
```

```
# With reasoning (default) - includes reasoning_message events
agent = client.agents.create(
    model="openai/gpt-4o-mini",
    # reasoning=True is the default
)


# Without reasoning - no reasoning_message events
agent = client.agents.create(
    model="openai/gpt-4o-mini",
    reasoning=False  # Disable reasoning messages
)
```

## Step Streaming (Default)

Step streaming delivers **complete messages** after each agent step completes. This is the default behavior when you use the streaming endpoint.

### How It Works

1. Agent processes your request through steps (reasoning, tool calls, generating responses)
2. After each step completes, you receive a complete `LettaMessage` via SSE
3. Each message can be processed immediately without reassembly

### Example

- [TypeScript](#tab-panel-195)
- [Python](#tab-panel-196)
- [curl](#tab-panel-197)

```
import Letta from "@letta-ai/letta-client";
import type { LettaMessage } from "@letta-ai/letta-client/api/types";


const client = new Letta({ apiKey: process.env.LETTA_API_KEY });


const stream = await client.agents.messages.stream(agent.id, {
  messages: [{ role: "user", content: "What's 2+2?" }],
});


for await (const chunk of stream as AsyncIterable<LettaMessage>) {
  if (chunk.message_type === "reasoning_message") {
    console.log(`Thinking: ${(chunk as any).reasoning}`);
  } else if (chunk.message_type === "assistant_message") {
    console.log(`Response: ${(chunk as any).content}`);
  }
}
```

```
stream = client.agents.messages.create(
    agent_id=agent.id,
    messages=[{"role": "user", "content": "What's 2+2?"}],
    streaming=True
)


for chunk in stream:
    if hasattr(chunk, 'message_type'):
        if chunk.message_type == 'reasoning_message':
            print(f"Thinking: {chunk.reasoning}")
        elif chunk.message_type == 'assistant_message':
            print(f"Response: {chunk.content}")
```

Terminal window

```
curl -N --request POST \
  --url https://api.letta.com/v1/agents/$AGENT_ID/messages/stream \
  --header "Authorization: Bearer $LETTA_API_KEY" \
  --header 'Content-Type: application/json' \
  --data '{"messages": [{"role": "user", "content": "What is 2+2?"}]}'


# For self-hosted: Replace https://api.letta.com with http://localhost:8283
```

### Example Output

```
data: {"id":"msg-123","message_type":"reasoning_message","reasoning":"User is asking a simple math question."}
data: {"id":"msg-456","message_type":"assistant_message","content":"2 + 2 equals 4!"}
data: {"message_type":"stop_reason","stop_reason":"end_turn"}
data: {"message_type":"usage_statistics","completion_tokens":50,"total_tokens":2821}
data: [DONE]
```

## Token Streaming

Token streaming provides **partial content chunks** as they’re generated by the LLM, enabling a ChatGPT-like experience where text appears character by character.

### How It Works

1. Set `stream_tokens: true` in your request
2. Receive multiple chunks with the **same message ID**
3. Each chunk contains a piece of the content
4. Client must accumulate chunks by ID to rebuild complete messages

### Example with Reassembly

- [TypeScript](#tab-panel-198)
- [Python](#tab-panel-199)
- [curl](#tab-panel-200)

```
import Letta from "@letta-ai/letta-client";
import type { LettaMessage } from "@letta-ai/letta-client/api/types";


const client = new Letta({ apiKey: process.env.LETTA_API_KEY });


// Token streaming with reassembly
interface MessageAccumulator {
  type: string;
  content: string;
}


const messageAccumulators = new Map<string, MessageAccumulator>();


const stream = await client.agents.messages.stream(agent.id, {
  messages: [{ role: "user", content: "Tell me a joke" }],
  stream_tokens: true,
});


for await (const chunk of stream as AsyncIterable<LettaMessage>) {
  if (chunk.id && chunk.message_type) {
    const msgId = chunk.id;
    const msgType = chunk.message_type;


    // Initialize accumulator for new messages
    if (!messageAccumulators.has(msgId)) {
      messageAccumulators.set(msgId, {
        type: msgType,
        content: "",
      });
    }


    // Accumulate content based on message type
    const acc = messageAccumulators.get(msgId)!;


    // Only accumulate if the type matches (in case types share IDs)
    if (acc.type === msgType) {
      if (msgType === "reasoning_message") {
        acc.content += (chunk as any).reasoning || "";
      } else if (msgType === "assistant_message") {
        acc.content += (chunk as any).content || "";
      }
    }


    // Update UI with accumulated content
    process.stdout.write(acc.content);
  }
}
```

```
# Token streaming with reassembly
message_accumulators = {}


stream = client.agents.messages.create(
    agent_id=agent.id,
    messages=[{"role": "user", "content": "Tell me a joke"}],
    streaming=True,
    stream_tokens=True
)


for chunk in stream:
    if hasattr(chunk, 'id') and hasattr(chunk, 'message_type'):
        msg_id = chunk.id
        msg_type = chunk.message_type


        # Initialize accumulator for new messages
        if msg_id not in message_accumulators:
            message_accumulators[msg_id] = {
                'type': msg_type,
                'content': ''
            }


        # Accumulate content
        if msg_type == 'reasoning_message':
            message_accumulators[msg_id]['content'] += chunk.reasoning
        elif msg_type == 'assistant_message':
            message_accumulators[msg_id]['content'] += chunk.content


        # Display accumulated content in real-time
        print(message_accumulators[msg_id]['content'], end='', flush=True)
```

Terminal window

```
curl -N --request POST \
  --url https://api.letta.com/v1/agents/$AGENT_ID/messages/stream \
  --header "Authorization: Bearer $LETTA_API_KEY" \
  --header 'Content-Type: application/json' \
  --data '{
    "messages": [{"role": "user", "content": "Tell me a joke"}],
    "stream_tokens": true
  }'
```

### Example Output

```
# Same ID across chunks of the same message
data: {"id":"msg-abc","message_type":"assistant_message","content":"Why"}
data: {"id":"msg-abc","message_type":"assistant_message","content":" did"}
data: {"id":"msg-abc","message_type":"assistant_message","content":" the"}
data: {"id":"msg-abc","message_type":"assistant_message","content":" scarecrow"}
data: {"id":"msg-abc","message_type":"assistant_message","content":" win"}
# ... more chunks with same ID
data: [DONE]
```

## Implementation Tips

### Universal Handling Pattern

The accumulator pattern shown above works for **both** streaming modes:

- **Step streaming**: Each message is complete (single chunk per ID)
- **Token streaming**: Multiple chunks per ID need accumulation

This means you can write your client code once to handle both cases.

### SSE Format Notes

All streaming responses follow the Server-Sent Events (SSE) format:

- Each event starts with `data: `followed by JSON
- Stream ends with `data: [DONE]`
- Empty lines separate events

Learn more about SSE format [here](https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events/Using_server-sent_events).

### Handling Different LLM Providers

If your Letta server connects to multiple LLM providers, some may not support token streaming. Your client code will still work - the server will fall back to step streaming automatically when token streaming isn’t available.
