Context Management
Understanding Context Management and Agent Memory
Effectively managing what tokens are included in the context window is critical for the performance of your agent. Deciding what is or isn’t included in the context window determines what information (such as long-term memories) or instructions the agent is aware of.
Typical context windows have a system prompt at the beginning of the context window, and then the message history. Letta adds additional sections of the context window, called memory blocks. These memory blocks are units of context management. Memory blocks can be modified by the agent itself (via tools), by other agents, or by the developer (via the API).
Memory Blocks
Memory blocks represent a section of an agent’s context window. An agent may have multiple memory blocks, or none at all. A memory block consists of:
- A
limit
, corresponding to the character limit of the block (i.e. how many characters in the context window can be used up by this block) - A
value
, corresponding to the data represented in the context window for this block - A
label
, corresponding to the type of data represented in the block (e.g.human
,persona
)
Creating an agent with memory blocks
When you create an agent, you can specify memory blocks to also be created with the agent. For most chat applications, we recommend create a human
block (to represent memories about the user) and a persona
block (to represent the agent’s persona).
When the agent is created, the corresponding blocks are also created and attached to the agent, so that the block value will be in the context window.
Creating and attaching memory blocks
You can also directly create blocks and attach them to an agent. This can be useful if you want to create blocks that are shared between multiple agents. If multiple agents are attached to a block, they will all have the block data in their context windows (essentially providing shared memory).
Below is an example of creating a block directory, and attaching the block to two agents by specifying the block_ids
field.
You can also attach blocks to existing agents:
You can see all agents attached to a block by using the block_id
field in the blocks retrieve endpoint.
Context (Memory) Management
Letta agents are able to manage their own context window (and the context window of other agents!) using memory management tools.
Default memory management
By default, Letta agents are provided with tools to modify their own memory blocks. This allows agents to learn and form memories over time, as described in the MemGPT paper.
The default tools are:
core_memory_replace
: Replace a value inside a blockcore_memory_append
: Append a new value to a block
If you do not want your agents to manage their memory, you should disable default tools with include_base_tools=False
during the agent creation. You can also detach the memory editing tools post-agent creation - if you do so, remember to check the system instructions to make sure there are no references to tools that no longer exist.
Memory management with sleep-time compute
If you want to enable memory management with sleep-time compute, you can set enable_sleeptime=True
in the agent creation. For agents enabled with sleep-time, Letta will automatically create sleep-time agents which have the ability to update the blocks of the primary agent.
Memory management with sleep-time compute can reduce the latency of your main agent (since it is no longer responsible for managing its own memory), but can come at the cost of higher token usage. See our documentation on sleeptime agents for more details.
Enabling agents to modify their own memory blocks with tools
You can enable agents to modify their own blocks with tools. By default, agents with type memgpt_agent
will have the tools core_memory_replace
and core_memory_append
to allow them to replace or append values in their own blocks. You can also make custom modification to blocks by implementing your own custom tools that can access the agent’s state by passing in the special agent_state
parameter into your tools.
Below is an example of a tool that re-writes the entire memory block of an agent with a new string:
Modifying blocks via the API
You can also modify blocks via the API to directly edit agents’ context windows and memory. This can be useful in cases where you want to extract the contents of an agents memory some place in your application (for example, a dashboard or memory viewer), or when you want to programatically modify an agents memory state (for example, allowing an end-user to directly correct or modify their agent’s memory).
Modifying blocks of other Letta agents via API tools
Importing the Letta Python client inside a tool is a powerful way to allow agents to interact with other agents, since you can use any of the API endpoints. For example, you could create a custom tool that allows an agent to create another Letta agent.
You can allow agents to modify the blocks of other agents by creating tools that import the Letta Python SDK, then using the block update endpoint:
Stateful Workflows (advanced)
In some advanced usecases, you may want your agent to have persistent memory while not retaining conversation history. For example, if you are using a Letta agent as a “workflow” that’s run many times across many different users, you may not want to keep the conversation or event history inside of the message buffer.
You can create a stateful agent that does not retain conversation (event) history (i.e. a “stateful workflow”) by setting the message_buffer_autoclear
flag to true
during agent creation. If set to true
(default false
), the message history will not be persisted in-context between requests (though the agent will still have access to core, archival, and recall memory).