Low Latency Voice Agents
All Letta agents can be connected to a voice provider by using the voice chat completion endpoint at http://localhost:8283/v1/voice-beta/<AGENT_ID>
. However for voice applications, we recommend using the voice_convo_agent
agent architecture, which is a low-latency architecture optimized for voice.
Creating a latency-optimized voice agent
You can create a latency-optimized voice agent by using the voice_convo_agent
agent architecture and setting enable_sleeptime
to True
.
This will create a low-latency agent which has a sleep-time agent to manage memory and re-write it’s context in the background. You can attach additional tools and blocks to this agent just as you would any other Letta agent.
Configuring message buffer size
You can configure the message buffer size of the agent, which controls how many messages can be kept in the buffer until they are evicted. For latency-sensitive applications, we recommend setting a low buffer size.
You can configure:
max_message_buffer_length
: the maximum number of messages in the buffer until a compaction (summarization) is triggeredmin_message_buffer_length
: the minimum number of messages to keep in the buffer (to ensure continuity of the conversation)
You can configure these parameters in the ADE or from the SDK:
Configuring the sleep-time agent
Voice agents have a sleep-time agent that manages memory and rewrites context in the background. The sleeptime agent can have a different model type than the main agent. We recommend using bigger models for the sleeptime agent to optimize the context and memory quality, and smaller models for the main voice agent to minimize latency.
For example, you can configure the sleeptime agent to use claude-sonnet-4
by getting the agent’s ID from the group: