---
title: Compaction (summarization) | Letta Docs
description: Configuring compaction settings in the Letta API
---

When an agent’s conversation history grows too long to fit in its context window, Letta automatically **compacts** (summarizes) older messages to make room for new ones. The `compaction_settings` field lets you customize how this compaction works.

### Default behavior

If you don’t specify `compaction_settings`, Letta uses sensible defaults:

- **Mode**: `sliding_window` (keeps recent messages, summarizes older ones)
- **Model**: Provider-specific default (`claude-haiku-4-5`, `gpt-5-mini`, or `gemini-2.5-flash`), falling back to the agent’s model
- **Sliding window**: `sliding_window_percentage=0.3` (summarizes \~30% of messages and keeps \~70%; increases the summarized portion in \~10% steps if needed to fit the token budget)
- **Summary limit**: 50,000 characters

For most use cases, the defaults work well and you don’t need to configure compaction.

### When to customize compaction

Customize `compaction_settings` when you want to:

- Use a cheaper/faster model for summarization
- Preserve more or less recent context
- Maximize prefix caching by using self-compaction
- Customize the summarization prompt

### Compaction settings schema

All fields are optional. If you don’t specify a `model`, Letta uses a provider-specific default: `claude-haiku-4-5` for Anthropic, `gpt-5-mini` for OpenAI, `gemini-2.5-flash` for Google AI. If the provider isn’t recognized, the agent’s own model is used as fallback.

| Field                       | Type        | Required | Description                                                                                                      |
| --------------------------- | ----------- | -------- | ---------------------------------------------------------------------------------------------------------------- |
| `model`                     | string      | No       | Summarizer model handle (format: provider/model-name). If not set, uses a provider-specific default (see above). |
| `model_settings`            | object      | No       | Optional overrides for the summarizer model defaults                                                             |
| `prompt`                    | string      | No       | Custom system prompt for the summarizer                                                                          |
| `prompt_acknowledgement`    | boolean     | No       | Whether to include an acknowledgement post-prompt                                                                |
| `clip_chars`                | int \| null | No       | Max summary length in characters (default: 50,000)                                                               |
| `mode`                      | string      | No       | Compaction strategy (default: `"sliding_window"`). See [Compaction modes](#compaction-modes).                    |
| `sliding_window_percentage` | float       | No       | Fraction of messages to summarize (default: 0.3, meaning summarize \~30% and keep \~70%)                         |

`sliding_window_percentage` is the fraction of messages that get summarized. A value of `0.3` means \~30% of messages are summarized and \~70% are kept. Higher values summarize more aggressively. This is the starting target: if the remaining context is still too large, Letta can increase the summarized fraction in \~10% steps until it fits.

### Compaction modes

There are four compaction modes:

**`sliding_window` (default)**: Preserves recent messages and summarizes older ones using a separate summarizer call.

```
Before compaction (10 messages):
[msg1, msg2, msg3, msg4, msg5, msg6, msg7, msg8, msg9, msg10]
      |---- oldest ~30% summarized ----|


After compaction:
[summary of msg1-3, msg4, msg5, msg6, msg7, msg8, msg9, msg10]
```

The `sliding_window_percentage` controls what fraction of messages get summarized:

- `0.2` = summarize 20% of messages (keep 80%)
- `0.5` = summarize 50% of messages (keep 50%)
- `0.8` = summarize 80% of messages (keep 20%)

**`all`**: The entire conversation history is summarized in a separate summarizer call. Use when you need maximum space reduction.

**`self_compact_sliding_window`**: Same sliding window strategy, but the summarization request includes the agent’s system prompt and tool definitions. The summarization instruction is appended as a user message within the agent’s existing context. This keeps the prompt prefix identical to normal agent requests, improving cache hit rates.

**`self_compact_all`**: Same as `all`, but with the agent’s system prompt and tools included in the request for cache compatibility.

Self-compaction modes keep the agent’s system prompt and tools in the summarization request. This means the LLM provider can serve the request from its prompt cache, since the prefix matches normal agent requests.

Self-compaction uses the same provider-specific default models as other modes (see above). You can set an explicit `model` or `prompt` to override the defaults.

### Example: Custom compaction with a separate summarizer

- [Python](#tab-panel-64)
- [TypeScript](#tab-panel-65)

```
from letta_client import Letta
import os


client = Letta(api_key=os.getenv("LETTA_API_KEY"))


agent = client.agents.create(
    name="my_agent",
    model="anthropic/claude-sonnet-4-6",
    compaction_settings={
        "model": "anthropic/claude-haiku-4-5",  # Cheaper model for summarization
        "mode": "sliding_window",
        "sliding_window_percentage": 0.2,  # Preserve more context
    }
)
```

```
import Letta from "@letta-ai/letta-client";


const client = new Letta({ apiKey: process.env.LETTA_API_KEY });


const agent = await client.agents.create({
  name: "my_agent",
  model: "anthropic/claude-sonnet-4-6",
  compaction_settings: {
    model: "anthropic/claude-haiku-4-5",  // Cheaper model for summarization
    mode: "sliding_window",
    sliding_window_percentage: 0.2,  // Preserve more context
  },
});
```

### Example: Self-compaction for prompt caching

- [Python](#tab-panel-66)
- [TypeScript](#tab-panel-67)

```
# Enable self-compaction to maximize prefix cache hits
client.agents.update(
    agent_id="agent-xxx",
    compaction_settings={
        "mode": "self_compact_sliding_window",
    }
)
```

```
// Enable self-compaction to maximize prefix cache hits
await client.agents.update("agent-xxx", {
  compaction_settings: {
    mode: "self_compact_sliding_window",
  },
});
```