Compaction (summarization)
Configuring compaction settings in the Letta API
When an agent’s conversation history grows too long to fit in its context window, Letta automatically compacts (summarizes) older messages to make room for new ones.
The compaction_settings field lets you customize how this compaction works.
Default behavior
Section titled “Default behavior”If you don’t specify compaction_settings, Letta uses sensible defaults:
- Mode:
sliding_window(keeps recent messages, summarizes older ones) - Model: Provider-specific default (
claude-haiku-4-5,gpt-5-mini, orgemini-2.5-flash), falling back to the agent’s model - Sliding window:
sliding_window_percentage=0.3(targets keeping ~70% of the most recent history; increases the summarized portion in ~10% steps if needed to fit) - Summary limit: 50,000 characters
For most use cases, the defaults work well and you don’t need to configure compaction.
When to customize compaction
Section titled “When to customize compaction”Customize compaction_settings when you want to:
- Use a cheaper/faster model for summarization
- Preserve more or less recent context
- Maximize prefix caching by using self-compaction
- Customize the summarization prompt
Compaction settings schema
Section titled “Compaction settings schema”All fields are optional. If you don’t specify a model, Letta uses a provider-specific default: claude-haiku-4-5 for Anthropic, gpt-5-mini for OpenAI, gemini-2.5-flash for Google AI. If the provider isn’t recognized, the agent’s own model is used as fallback.
| Field | Type | Required | Description |
|---|---|---|---|
model | string | No | Summarizer model handle (format: provider/model-name). If not set, uses a provider-specific default (see above). |
model_settings | object | No | Optional overrides for the summarizer model defaults |
prompt | string | No | Custom system prompt for the summarizer |
prompt_acknowledgement | boolean | No | Whether to include an acknowledgement post-prompt |
clip_chars | int | null | No | Max summary length in characters (default: 50,000) |
mode | string | No | Compaction strategy (default: "sliding_window"). See Compaction modes. |
sliding_window_percentage | float | No | Fraction of messages to summarize (default: 0.3, meaning summarize ~30% and keep ~70%) |
Compaction modes
Section titled “Compaction modes”There are four compaction modes:
sliding_window (default): Preserves recent messages and summarizes older ones using a separate summarizer call.
Before compaction (10 messages):[msg1, msg2, msg3, msg4, msg5, msg6, msg7, msg8, msg9, msg10] |---- oldest ~30% summarized ----|
After compaction:[summary of msg1-3, msg4, msg5, msg6, msg7, msg8, msg9, msg10]The sliding_window_percentage controls what fraction of messages get summarized:
0.2= summarize 20% of messages (keep 80%)0.5= summarize 50% of messages (keep 50%)
all: The entire conversation history is summarized in a separate summarizer call. Use when you need maximum space reduction.
self_compact_sliding_window: Same sliding window strategy, but the summarization request includes the agent’s system prompt and tool definitions. The summarization instruction is appended as a user message within the agent’s existing context. This keeps the prompt prefix identical to normal agent requests, improving cache hit rates.
self_compact_all: Same as all, but with the agent’s system prompt and tools included in the request for cache compatibility.
Self-compaction uses the same provider-specific default models as other modes (see above). You can set an explicit model or prompt to override the defaults.
Example: Custom compaction with a separate summarizer
Section titled “Example: Custom compaction with a separate summarizer”from letta_client import Lettaimport os
client = Letta(api_key=os.getenv("LETTA_API_KEY"))
agent = client.agents.create( name="my_agent", model="anthropic/claude-sonnet-4-6", compaction_settings={ "model": "anthropic/claude-haiku-4-5", # Cheaper model for summarization "mode": "sliding_window", "sliding_window_percentage": 0.2, # Preserve more context })import Letta from "@letta-ai/letta-client";
const client = new Letta({ apiKey: process.env.LETTA_API_KEY });
const agent = await client.agents.create({ name: "my_agent", model: "anthropic/claude-sonnet-4-6", compaction_settings: { model: "anthropic/claude-haiku-4-5", // Cheaper model for summarization mode: "sliding_window", sliding_window_percentage: 0.2, // Preserve more context },});Example: Self-compaction for prompt caching
Section titled “Example: Self-compaction for prompt caching”# Enable self-compaction to maximize prefix cache hitsclient.agents.update( agent_id="agent-xxx", compaction_settings={ "mode": "self_compact_sliding_window", })// Enable self-compaction to maximize prefix cache hitsawait client.agents.update("agent-xxx", { compaction_settings: { mode: "self_compact_sliding_window", },});