Low Latency Voice Agents

All Letta agents can be connected to a voice provider by using the voice chat completion endpoint at http://localhost:8283/v1/voice-beta/<AGENT_ID>. However for voice applications, we recommend using the voice_convo_agent agent architecture, which is a low-latency architecture optimized for voice.

Creating a latency-optimized voice agent

You can create a latency-optimized voice agent by using the voice_convo_agent agent architecture and setting enable_sleeptime to True.

1from letta_client import Letta
2
3client = Letta(token=os.getenv('LETTA_API_KEY'))
4
5# create the Letta agent
6agent = client.agents.create(
7 agent_type="voice_convo_agent",
8 memory_blocks=[
9 {"value": "Name: ?", "label": "human"},
10 {"value": "You are a helpful assistant.", "label": "persona"},
11 ],
12 model="openai/gpt-4o-mini", # Use 4o-mini for speed
13 embedding="openai/text-embedding-3-small",
14 enable_sleeptime=True,
15 initial_message_sequence = [],
16)

This will create a low-latency agent which has a sleep-time agent to manage memory and re-write it’s context in the background. You can attach additional tools and blocks to this agent just as you would any other Letta agent.

Configuring message buffer size

You can configure the message buffer size of the agent, which controls how many messages can be kept in the buffer until they are evicted. For latency-sensitive applications, we recommend setting a low buffer size.

You can configure:

  • max_message_buffer_length: the maximum number of messages in the buffer until a compaction (summarization) is triggered
  • min_message_buffer_length: the minimum number of messages to keep in the buffer (to ensure continuity of the conversation)

You can configure these parameters in the ADE or from the SDK:

1from letta_client import VoiceSleeptimeManagerUpdate
2
3# get the group
4group_id = agent.multi_agent_group.id
5max_message_buffer_length = agent.multi_agent_group.max_message_buffer_length
6min_message_buffer_length = agent.multi_agent_group.min_message_buffer_length
7print(f"Group id: {group_id}, max_message_buffer_length: {max_message_buffer_length}, min_message_buffer_length: {min_message_buffer_length}")
8# change it to be more frequent
9group = client.groups.modify(
10 group_id=group_id,
11 manager_config=VoiceSleeptimeManagerUpdate(
12 max_message_buffer_length=10,
13 min_message_buffer_length=6,
14 )
15)

Configuring the sleep-time agent

Voice agents have a sleep-time agent that manages memory and rewrites context in the background. The sleeptime agent can have a different model type than the main agent. We recommend using bigger models for the sleeptime agent to optimize the context and memory quality, and smaller models for the main voice agent to minimize latency.

For example, you can configure the sleeptime agent to use claude-sonnet-4 by getting the agent’s ID from the group:

1sleeptime_agent_id = [agent_id for agent_id in group.agent_ids if agent_id != agent.id][0]
2client.agents.modify(
3 agent_id=sleeptime_agent_id,
4 model="anthropic/claude-sonnet-4-20250514"
5)