Low-latency Agents
Agents optimized for low-latency environments like voice
Low-latency agents optimize for minimal response time by using a constrained context window and aggressive memory management. They’re ideal for real-time applications like voice interfaces where latency matters more than context retention.
Architecture
Low-latency agents use a much smaller context window than standard MemGPT agents, reducing the time-to-first-token at the cost of much more limited conversation history and memory block size. A sleep-time agent aggressively manages memory to keep only the most relevant information in context.
Key differences from MemGPT v2:
- Artificially constrained context window for faster response times
- More aggressive memory management with smaller memory blocks
- Optimized sleep-time agent tuned for minimal context size
- Prioritizes speed over comprehensive context retention
To learn more about how to use low-latency agents for voice applications, see our Voice Agents guide.
Creating Low-latency Agents
Use the voice_convo_agent
agent type to create a low-latency agent.
Set enable_sleeptime
to true
to enable the sleep-time agent which will manage the memory state of the low-latency agent in the background.
Additionally, set initial_message_sequence
to an empty array to start the conversation with no initial messages for a completely empty initial message buffer.