Context engineering
Engineer agent context windows by controlling what information appears and when.
Context engineering (aka “memory management” or “context management”) is the process of managing the context window of an agent to ensure it has access to the information it needs to perform its task.
Letta and MemGPT introduced the concept of agentic context engineering, where the context window engineering is done by one or more AI agents. In Letta, agents are able to manage their own context window (and the context window of other agents!) using special memory management tools.
Memory management in regular agents
Section titled “Memory management in regular agents”By default, Letta agents are provided with tools to modify their own memory blocks. This allows agents to learn and form memories over time, as described in the MemGPT paper.
The default tools are:
memory_insert: Insert content into a blockmemory_replace: Replace content in a block
If you do not want your agents to manage their memory, you should disable default tools with include_base_tools=False during the agent creation. You can also detach the memory editing tools post-agent creation - if you do so, remember to check the system instructions to make sure there are no references to tools that no longer exist.
Memory management with sleep-time compute
Section titled “Memory management with sleep-time compute”If you want to enable memory management with sleep-time compute, you can set enable_sleeptime=True in the agent creation. For agents enabled with sleep-time, Letta will automatically create sleep-time agents which have the ability to update the blocks of the primary agent. Sleep-time agents will also include memory_rethink and memory_finish_edits tools.
Memory management with sleep-time compute can reduce the latency of your main agent (since it is no longer responsible for managing its own memory), but can come at the cost of higher token usage. See our documentation on sleeptime agents for more details.
Enabling agents to modify their own memory blocks with tools
Section titled “Enabling agents to modify their own memory blocks with tools”You can enable agents to modify their own blocks with tools. By default, agents with type memgpt_v2_agent will have the tools memory_insert and memory_replace to allow them to manage values in their own blocks. The legacy tools core_memory_replace and core_memory_append are deprecated but still available for backwards compatibility for type memgpt_agent. You can also make custom modification to blocks by implementing your own custom tools that can access the agent’s state by passing in the special agent_state parameter into your tools.
Below is an example of a tool that re-writes the entire memory block of an agent with a new string:
function rethinkMemory( agentState: AgentState, newMemory: string, targetBlockLabel: string,): void { /** * Rewrite memory block for the main agent, newMemory should contain all current information from the block that is not outdated or inconsistent, integrating any new information, resulting in a new memory block that is organized, readable, and comprehensive. * * @param newMemory - The new memory with information integrated from the memory block. If there is no new information, then this should be the same as the content in the source block. * @param targetBlockLabel - The name of the block to write to. * * @returns void - Always returns void as this function does not produce a response. */
if (agentState.memory.getBlock(targetBlockLabel) === null) { agentState.memory.createBlock(targetBlockLabel, newMemory); }
agentState.memory.updateBlockValue(targetBlockLabel, newMemory);}def rethink_memory(agent_state: "AgentState", new_memory: str, target_block_label: str) -> None: """ Rewrite memory block for the main agent, new_memory should contain all current information from the block that is not outdated or inconsistent, integrating any new information, resulting in a new memory block that is organized, readable, and comprehensive.
Args: new_memory (str): The new memory with information integrated from the memory block. If there is no new information, then this should be the same as the content in the source block. target_block_label (str): The name of the block to write to.
Returns: None: None is always returned as this function does not produce a response. """
if agent_state.memory.get_block(target_block_label) is None: agent_state.memory.create_block(label=target_block_label, value=new_memory)
agent_state.memory.update_block_value(label=target_block_label, value=new_memory) return NoneModifying blocks via the API
Section titled “Modifying blocks via the API”You can also modify blocks via the API to directly edit agents’ context windows and memory. This can be useful in cases where you want to extract the contents of an agents memory some place in your application (for example, a dashboard or memory viewer), or when you want to programatically modify an agents memory state (for example, allowing an end-user to directly correct or modify their agent’s memory).
Modifying blocks of other Letta agents via API tools
Section titled “Modifying blocks of other Letta agents via API tools”You can allow agents to modify the blocks of other agents by creating tools that import the Letta SDK, then using the block update endpoint:
function updateSupervisorBlock(blockLabel: string, newValue: string): void { /** * Update the value of a block in the supervisor agent. * * @param blockLabel - The label of the block to update. * @param newValue - The new value for the block. * * @returns void - Always returns void as this function does not produce a response. */ const { Letta } = require("@letta-ai/letta-client");
const client = new Letta({ apiKey: process.env.LETTA_API_KEY, });
await client.agents.blocks.update(agentId, blockLabel, newValue);}def update_supervisor_block(block_label: str, new_value: str) -> None: """ Update the value of a block in the supervisor agent.
Args: block_label (str): The label of the block to update. new_value (str): The new value for the block.
Returns: None: None is always returned as this function does not produce a response. """ from letta_client import Letta import os
client = Letta(api_key=os.getenv("LETTA_API_KEY"))
client.agents.blocks.update( agent_id=agent_id, block_label=block_label, value=new_value )Automatic compaction
Section titled “Automatic compaction”When an agent’s conversation history grows too long to fit in its context window, Letta automatically compacts (summarizes) older messages to make room for new ones. The compaction_settings field lets you customize how this compaction works.
Default behavior
Section titled “Default behavior”If you don’t specify compaction_settings, Letta uses sensible defaults:
- Mode:
sliding_window(keeps recent messages, summarizes older ones) - Model: Same as the agent’s main model
- Sliding window:
sliding_window_percentage=0.3(targets keeping ~70% of the most recent history; increases the summarized portion in ~10% steps if needed to fit) - Summary limit: 2000 characters
For most use cases, the defaults work well and you don’t need to configure compaction.
When to customize compaction
Section titled “When to customize compaction”Customize compaction_settings when you want to:
- Use a cheaper/faster model for summarization
- Preserve more or less recent context
- Change the summarization strategy
- Customize the summarization prompt
Compaction settings schema
Section titled “Compaction settings schema”If you specify compaction_settings, the only required field is:
model(string): the summarizer model handle (e.g."openai/gpt-4o-mini")
All other fields are optional.
| Field | Type | Required | Description |
|---|---|---|---|
model | string | Yes | Summarizer model handle (format: provider/model-name) |
model_settings | object | No | Optional overrides for the summarizer model defaults |
prompt | string | No | Custom system prompt for the summarizer |
prompt_acknowledgement | boolean | No | Whether to include an acknowledgement post-prompt |
clip_chars | int | null | No | Max summary length in characters (default: 2000) |
mode | string | No | "sliding_window" or "all" (default: "sliding_window") |
sliding_window_percentage | float | No | How aggressively older history is summarized (default: 0.3) |
Compaction modes
Section titled “Compaction modes”Sliding window (default): Preserves recent messages and only summarizes older ones.
Before compaction (10 messages):[msg1, msg2, msg3, msg4, msg5, msg6, msg7, msg8, msg9, msg10] |---- oldest ~30% summarized ----|
After compaction:[summary of msg1-3, msg4, msg5, msg6, msg7, msg8, msg9, msg10]The sliding_window_percentage controls how aggressively older history is summarized:
0.2= summarize less (keep more recent context)0.5= summarize more (keep less recent context)
All mode: The entire conversation history is summarized. Use when you need maximum space reduction.
Example: Custom compaction settings
Section titled “Example: Custom compaction settings”from letta_client import Lettaimport os
client = Letta(api_key=os.getenv("LETTA_API_KEY"))
agent = client.agents.create( name="my_agent", model="openai/gpt-4o", compaction_settings={ "model": "openai/gpt-4o-mini", # Cheaper model for summarization "mode": "sliding_window", "sliding_window_percentage": 0.2, # Preserve more context })import Letta from "@letta-ai/letta-client";
const client = new Letta({ apiKey: process.env.LETTA_API_KEY });
const agent = await client.agents.create({ name: "my_agent", model: "openai/gpt-4o", compaction_settings: { model: "openai/gpt-4o-mini", // Cheaper model for summarization mode: "sliding_window", sliding_window_percentage: 0.2, // Preserve more context },} as any);