Compaction (summarization)

Configuring compaction settings in the Letta API

When an agent’s conversation history grows too long to fit in its context window, Letta automatically compacts (summarizes) older messages to make room for new ones. The compaction_settings field lets you customize how this compaction works.

Default behavior

If you don’t specify compaction_settings, Letta uses sensible defaults:

Mode: sliding_window (keeps recent messages, summarizes older ones)
Model: Provider-specific default (claude-haiku-4-5, gpt-5-mini, or gemini-2.5-flash), falling back to the agent’s model
Sliding window: sliding_window_percentage=0.3 (targets keeping ~70% of the most recent history; increases the summarized portion in ~10% steps if needed to fit)
Summary limit: 50,000 characters

For most use cases, the defaults work well and you don’t need to configure compaction.

When to customize compaction

Customize compaction_settings when you want to:

Use a cheaper/faster model for summarization
Preserve more or less recent context
Maximize prefix caching by using self-compaction
Customize the summarization prompt

Compaction settings schema

All fields are optional. If you don’t specify a model, Letta uses a provider-specific default: claude-haiku-4-5 for Anthropic, gpt-5-mini for OpenAI, gemini-2.5-flash for Google AI. If the provider isn’t recognized, the agent’s own model is used as fallback.

Field	Type	Required	Description
`model`	string	No	Summarizer model handle (format: provider/model-name). If not set, uses a provider-specific default (see above).
`model_settings`	object	No	Optional overrides for the summarizer model defaults
`prompt`	string	No	Custom system prompt for the summarizer
`prompt_acknowledgement`	boolean	No	Whether to include an acknowledgement post-prompt
`clip_chars`	int \| null	No	Max summary length in characters (default: 50,000)
`mode`	string	No	Compaction strategy (default: `"sliding_window"`). See Compaction modes.
`sliding_window_percentage`	float	No	Fraction of messages to summarize (default: 0.3, meaning summarize ~30% and keep ~70%)

Compaction modes

There are four compaction modes:

sliding_window (default): Preserves recent messages and summarizes older ones using a separate summarizer call.

Before compaction (10 messages):
[msg1, msg2, msg3, msg4, msg5, msg6, msg7, msg8, msg9, msg10]
      |---- oldest ~30% summarized ----|

After compaction:
[summary of msg1-3, msg4, msg5, msg6, msg7, msg8, msg9, msg10]

The sliding_window_percentage controls what fraction of messages get summarized:

0.2 = summarize 20% of messages (keep 80%)
0.5 = summarize 50% of messages (keep 50%)

all: The entire conversation history is summarized in a separate summarizer call. Use when you need maximum space reduction.

self_compact_sliding_window: Same sliding window strategy, but the summarization request includes the agent’s system prompt and tool definitions. The summarization instruction is appended as a user message within the agent’s existing context. This keeps the prompt prefix identical to normal agent requests, improving cache hit rates.

self_compact_all: Same as all, but with the agent’s system prompt and tools included in the request for cache compatibility.

Self-compaction uses the same provider-specific default models as other modes (see above). You can set an explicit model or prompt to override the defaults.

Example: Custom compaction with a separate summarizer

Python
TypeScript

from letta_client import Letta
import os

client = Letta(api_key=os.getenv("LETTA_API_KEY"))

agent = client.agents.create(
    name="my_agent",
    model="anthropic/claude-sonnet-4-6",
    compaction_settings={
        "model": "anthropic/claude-haiku-4-5",  # Cheaper model for summarization
        "mode": "sliding_window",
        "sliding_window_percentage": 0.2,  # Preserve more context
    }
)

import Letta from "@letta-ai/letta-client";

const client = new Letta({ apiKey: process.env.LETTA_API_KEY });

const agent = await client.agents.create({
  name: "my_agent",
  model: "anthropic/claude-sonnet-4-6",
  compaction_settings: {
    model: "anthropic/claude-haiku-4-5",  // Cheaper model for summarization
    mode: "sliding_window",
    sliding_window_percentage: 0.2,  // Preserve more context
  },
});

Example: Self-compaction for prompt caching

Python
TypeScript

# Enable self-compaction to maximize prefix cache hits
client.agents.update(
    agent_id="agent-xxx",
    compaction_settings={
        "mode": "self_compact_sliding_window",
    }
)

// Enable self-compaction to maximize prefix cache hits
await client.agents.update("agent-xxx", {
  compaction_settings: {
    mode: "self_compact_sliding_window",
  },
});