Compaction (summarization)
Configuring compaction settings in the Letta API
When an agent’s conversation history grows too long to fit in its context window, Letta automatically compacts (summarizes) older messages to make room for new ones.
The compaction_settings field lets you customize how this compaction works.
Default behavior
Section titled “Default behavior”If you don’t specify compaction_settings, Letta uses sensible defaults:
- Mode:
sliding_window(keeps recent messages, summarizes older ones) - Model: Same as the agent’s main model
- Sliding window:
sliding_window_percentage=0.3(targets keeping ~70% of the most recent history; increases the summarized portion in ~10% steps if needed to fit) - Summary limit: 2000 characters
For most use cases, the defaults work well and you don’t need to configure compaction.
When to customize compaction
Section titled “When to customize compaction”Customize compaction_settings when you want to:
- Use a cheaper/faster model for summarization
- Preserve more or less recent context
- Change the summarization strategy (e.g. to maximize prefix caching)
- Customize the summarization prompt
Compaction settings schema
Section titled “Compaction settings schema”If you specify compaction_settings, the only required field is:
model(string): the summarizer model handle (e.g."openai/gpt-4o-mini")
All other fields are optional.
| Field | Type | Required | Description |
|---|---|---|---|
model | string | Yes | Summarizer model handle (format: provider/model-name) |
model_settings | object | No | Optional overrides for the summarizer model defaults |
prompt | string | No | Custom system prompt for the summarizer |
prompt_acknowledgement | boolean | No | Whether to include an acknowledgement post-prompt |
clip_chars | int | null | No | Max summary length in characters (default: 2000) |
mode | string | No | "sliding_window" or "all" (default: "sliding_window") |
sliding_window_percentage | float | No | How aggressively older history is summarized (default: 0.3) |
Compaction modes
Section titled “Compaction modes”Sliding window (default): Preserves recent messages and only summarizes older ones.
Before compaction (10 messages):[msg1, msg2, msg3, msg4, msg5, msg6, msg7, msg8, msg9, msg10] |---- oldest ~30% summarized ----|
After compaction:[summary of msg1-3, msg4, msg5, msg6, msg7, msg8, msg9, msg10]The sliding_window_percentage controls how aggressively older history is summarized:
0.2= summarize less (keep more recent context)0.5= summarize more (keep less recent context)
All mode: The entire conversation history is summarized. Use when you need maximum space reduction.
Example: Custom compaction settings
Section titled “Example: Custom compaction settings”from letta_client import Lettaimport os
client = Letta(api_key=os.getenv("LETTA_API_KEY"))
agent = client.agents.create( name="my_agent", model="openai/gpt-4o", compaction_settings={ "model": "openai/gpt-4o-mini", # Cheaper model for summarization "mode": "sliding_window", "sliding_window_percentage": 0.2, # Preserve more context })import Letta from "@letta-ai/letta-client";
const client = new Letta({ apiKey: process.env.LETTA_API_KEY });
const agent = await client.agents.create({ name: "my_agent", model: "openai/gpt-4o", compaction_settings: { model: "openai/gpt-4o-mini", // Cheaper model for summarization mode: "sliding_window", sliding_window_percentage: 0.2, // Preserve more context },} as any);