Context Management

Understanding Context Management and Agent Memory

Effectively managing what tokens are included in the context window is critical for the performance of your agent. Deciding what is or isn’t included in the context window determines what information (such as long-term memories) or instructions the agent is aware of.

Typical context windows have a system prompt at the beginning of the context window, and then the message history. Letta adds additional sections of the context window, called memory blocks. These memory blocks are units of context management. Memory blocks can be modified by the agent itself (via tools), by other agents, or by the developer (via the API).

Memory Blocks

Memory blocks represent a section of an agent’s context window. An agent may have multiple memory blocks, or none at all. A memory block consists of:

  • A limit, corresponding to the character limit of the block (i.e. how many characters in the context window can be used up by this block)
  • A value, corresponding to the data represented in the context window for this block
  • A label, corresponding to the type of data represented in the block (e.g. human, persona)

Creating an agent with memory blocks

When you create an agent, you can specify memory blocks to also be created with the agent. For most chat applications, we recommend create a human block (to represent memories about the user) and a persona block (to represent the agent’s persona).

1# install letta_client with `pip install letta-client`
2from letta_client import Letta
3
4# create a client to connect to your local Letta Server
5client = Letta(
6 base_url="http://localhost:8283"
7)
8
9# create an agent with two basic self-editing memory blocks
10agent_state = client.agents.create(
11 memory_blocks=[
12 {
13 "label": "human",
14 "value": "The human's name is Bob the Builder.",
15 "limit": 5000
16 },
17 {
18 "label": "persona",
19 "value": "My name is Sam, the all-knowing sentient AI.",
20 "limit": 5000
21 }
22 ],
23 model="openai/gpt-4o-mini",
24 embedding="openai/text-embedding-3-small"
25)

When the agent is created, the corresponding blocks are also created and attached to the agent, so that the block value will be in the context window.

Creating and attaching memory blocks

You can also directly create blocks and attach them to an agent. This can be useful if you want to create blocks that are shared between multiple agents. If multiple agents are attached to a block, they will all have the block data in their context windows (essentially providing shared memory).

Below is an example of creating a block directory, and attaching the block to two agents by specifying the block_ids field.

1# create a persisted block, which can be attached to agents
2block = client.blocks.create(
3 label="organization",
4 value="Organization: Letta",
5 limit=4000,
6)
7
8# create an agent with both a shared block and its own blocks
9shared_block_agent1 = client.agents.create(
10 name="shared_block_agent1",
11 memory_blocks=[
12 {
13 "label": "persona",
14 "value": "I am agent 1"
15 },
16 ],
17 block_ids=[block.id],
18 model="openai/gpt-4o-mini",
19 embedding="openai/text-embedding-3-small"
20)
21
22# create another agent sharing the block
23shared_block_agent2 = client.agents.create(
24 name="shared_block_agent2",
25 memory_blocks=[
26 {
27 "label": "persona",
28 "value": "I am agent 2"
29 },
30 ],
31 block_ids=[block.id],
32 model="openai/gpt-4o-mini",
33 embedding="openai/text-embedding-3-small"
34)

You can also attach blocks to existing agents:

1client.agents.blocks.attach(agent_id=agent.id, block_id=block.id)

You can see all agents attached to a block by using the block_id field in the blocks retrieve endpoint.

Context (Memory) Management

Letta agents are able to manage their own context window (and the context window of other agents!) using memory management tools.

Default memory management

By default, Letta agents are provided with tools to modify their own memory blocks. This allows agents to learn and form memories over time, as described in the MemGPT paper.

The default tools are:

  • core_memory_replace: Replace a value inside a block
  • core_memory_append: Append a new value to a block

If you do not want your agents to manage their memory, you should disable default tools with include_base_tools=False during the agent creation. You can also detach the memory editing tools post-agent creation - if you do so, remember to check the system instructions to make sure there are no references to tools that no longer exist.

Memory management with sleep-time compute

If you want to enable memory management with sleep-time compute, you can set enable_sleeptime=True in the agent creation. For agents enabled with sleep-time, Letta will automatically create sleep-time agents which have the ability to update the blocks of the primary agent.

Memory management with sleep-time compute can reduce the latency of your main agent (since it is no longer responsible for managing its own memory), but can come at the cost of higher token usage. See our documentation on sleeptime agents for more details.

Enabling agents to modify their own memory blocks with tools

You can enable agents to modify their own blocks with tools. By default, agents with type memgpt_agent will have the tools core_memory_replace and core_memory_append to allow them to replace or append values in their own blocks. You can also make custom modification to blocks by implementing your own custom tools that can access the agent’s state by passing in the special agent_state parameter into your tools.

Below is an example of a tool that re-writes the entire memory block of an agent with a new string:

1def rethink_memory(agent_state: "AgentState", new_memory: str, target_block_label: str) -> None:
2 """
3 Rewrite memory block for the main agent, new_memory should contain all current information from the block that is not outdated or inconsistent, integrating any new information, resulting in a new memory block that is organized, readable, and comprehensive.
4
5 Args:
6 new_memory (str): The new memory with information integrated from the memory block. If there is no new information, then this should be the same as the content in the source block.
7 target_block_label (str): The name of the block to write to.
8
9 Returns:
10 None: None is always returned as this function does not produce a response.
11 """
12
13 if agent_state.memory.get_block(target_block_label) is None:
14 agent_state.memory.create_block(label=target_block_label, value=new_memory)
15
16 agent_state.memory.update_block_value(label=target_block_label, value=new_memory)
17 return None

Modifying blocks via the API

You can also modify blocks via the API to directly edit agents’ context windows and memory. This can be useful in cases where you want to extract the contents of an agents memory some place in your application (for example, a dashboard or memory viewer), or when you want to programatically modify an agents memory state (for example, allowing an end-user to directly correct or modify their agent’s memory).

Modifying blocks of other Letta agents via API tools

Importing the Letta Python client inside a tool is a powerful way to allow agents to interact with other agents, since you can use any of the API endpoints. For example, you could create a custom tool that allows an agent to create another Letta agent.

You can allow agents to modify the blocks of other agents by creating tools that import the Letta Python SDK, then using the block update endpoint:

1def update_supervisor_block(block_label: str, new_value: str) -> None:
2 """
3 Update the value of a block in the supervisor agent.
4
5 Args:
6 block_label (str): The label of the block to update.
7 new_value (str): The new value for the block.
8
9 Returns:
10 None: None is always returned as this function does not produce a response.
11 """
12 from letta_client import Letta
13
14 client = Letta(
15 base_url="http://localhost:8283"
16 )
17
18 client.agents.blocks.modify(
19 agent_id=agent_id,
20 block_label=block_label,
21 value=new_value
22 )

Stateful Workflows (advanced)

In some advanced usecases, you may want your agent to have persistent memory while not retaining conversation history. For example, if you are using a Letta agent as a “workflow” that’s run many times across many different users, you may not want to keep the conversation or event history inside of the message buffer.

You can create a stateful agent that does not retain conversation (event) history (i.e. a “stateful workflow”) by setting the message_buffer_autoclear flag to true during agent creation. If set to true (default false), the message history will not be persisted in-context between requests (though the agent will still have access to core, archival, and recall memory).