Streaming agent responses
Messages from the Letta server can be streamed to the client. If you’re building a UI on the Letta API, enabling streaming allows your UI to update in real-time as the agent generates a response to an input message.
There are two kinds of streaming you can enable: streaming agent steps and streaming tokens.
To enable streaming (either mode), you need to use the /v1/agent/messages/stream
API route instead of the /v1/agent/messages
API route.
Streaming agent steps
When you send a message to the Letta server, the agent may run multiple steps while generating a response. For example, an agent may run a search query, then use the results of that query to generate a response.
When you use the /messages/stream
route, stream_steps
is enabled by default, and the response to the POST
request will stream back as server-sent events (read more about SSE format here):
Streaming tokens
You can also stream chunks of tokens from the agent as they are generated by the underlying LLM process by setting stream_tokens
to true
in your API request:
With token streaming enabled, the response will look very similar to the prior example (agent steps streaming), but instead of receiving complete messages, the client receives multiple messages with chunks of the response. The client is responsible for reassembling the response from the chunks. We’ve ommited most of the chunks for brevity:
Tips on handling streaming in your client code
The data structure for token streaming is the same as for agent steps streaming (LettaMessage
) - just instead of returning complete messages, the Letta server will return multiple messages each with a chunk of the response.
Because the format of the data looks the same, if you write your frontend code to handle tokens streaming, it will also work for agent steps streaming.
For example, if the Letta server is connected to multiple LLM backend providers and only a subset of them support LLM token streaming, you can use the same frontend code (interacting with the Letta API) to handle both streaming and non-streaming providers.
If you send a message to an agent with streaming enabled (stream_tokens
are true
), the server will stream back LettaMessage
objects with chunks if the selected LLM provider supports token streaming, and LettaMessage
objects with complete strings if the selected LLM provider does not support token streaming.