Summarize Messages

post/v1/agents/{agent_id}/summarize

Summarize an agent's conversation history.

Path ParametersExpand Collapse

agent_id: string

The ID of the agent in the format 'agent-'

minLength42

maxLength42

Body ParametersExpand Collapse

compaction_settings: optional object { clip_chars, mode, model, 4 more }

Configuration for conversation compaction / summarization.

Per-model settings (temperature, max tokens, etc.) are derived from the default configuration for that handle.

clip_chars: optional number

The maximum length of the summary in characters. If none, no clipping is performed.

mode: optional "all" or "sliding_window" or "self_compact_all" or "self_compact_sliding_window"

The type of summarization technique use.

Accepts one of the following:

"all"

"sliding_window"

"self_compact_all"

"self_compact_sliding_window"

model: optional string

Model handle to use for sliding_window/all summarization (format: provider/model-name). If None, uses lightweight provider-specific defaults.

model_settings: optional OpenAIModelSettings { max_output_tokens, parallel_tool_calls, provider_type, 4 more } or AnthropicModelSettings { effort, max_output_tokens, parallel_tool_calls, 6 more } or GoogleAIModelSettings { max_output_tokens, parallel_tool_calls, provider_type, 3 more } or 10 more

Optional model settings used to override defaults for the summarizer model.

Accepts one of the following:

OpenAIModelSettings = object { max_output_tokens, parallel_tool_calls, provider_type, 4 more }

max_output_tokens: optional number

The maximum number of tokens the model can generate.

parallel_tool_calls: optional boolean

Whether to enable parallel tool calling.

provider_type: optional "openai"

The type of the provider.

reasoning: optional object { reasoning_effort }

The reasoning configuration for the model.

reasoning_effort: optional "none" or "minimal" or "low" or 3 more

The reasoning effort to use when generating text reasoning models

Accepts one of the following:

"none"

"minimal"

"low"

"medium"

"high"

"xhigh"

response_format: optional TextResponseFormat { type } or JsonSchemaResponseFormat { json_schema, type } or JsonObjectResponseFormat { type }

The response format for the model.

Accepts one of the following:

TextResponseFormat = object { type }

Response format for plain text responses.

type: optional "text"

The type of the response format.

JsonSchemaResponseFormat = object { json_schema, type }

Response format for JSON schema-based responses.

json_schema: map[unknown]

The JSON schema of the response.

type: optional "json_schema"

The type of the response format.

JsonObjectResponseFormat = object { type }

Response format for JSON object responses.

type: optional "json_object"

The type of the response format.

strict: optional boolean

Enable strict mode for tool calling. When true, tool outputs are guaranteed to match JSON schemas.

temperature: optional number

The temperature of the model.

AnthropicModelSettings = object { effort, max_output_tokens, parallel_tool_calls, 6 more }

effort: optional "low" or "medium" or "high"

Effort level for Opus 4.5 model (controls token conservation). Not setting this gives similar performance to 'high'.

Accepts one of the following:

"low"

"medium"

"high"

max_output_tokens: optional number

The maximum number of tokens the model can generate.

parallel_tool_calls: optional boolean

Whether to enable parallel tool calling.

provider_type: optional "anthropic"

The type of the provider.

response_format: optional TextResponseFormat { type } or JsonSchemaResponseFormat { json_schema, type } or JsonObjectResponseFormat { type }

The response format for the model.

Accepts one of the following:

TextResponseFormat = object { type }

Response format for plain text responses.

type: optional "text"

The type of the response format.

JsonSchemaResponseFormat = object { json_schema, type }

Response format for JSON schema-based responses.

json_schema: map[unknown]

The JSON schema of the response.

type: optional "json_schema"

The type of the response format.

JsonObjectResponseFormat = object { type }

Response format for JSON object responses.

type: optional "json_object"

The type of the response format.

strict: optional boolean

Enable strict mode for tool calling. When true, tool outputs are guaranteed to match JSON schemas.

temperature: optional number

The temperature of the model.

thinking: optional object { budget_tokens, type }

The thinking configuration for the model.

budget_tokens: optional number

The maximum number of tokens the model can use for extended thinking.

type: optional "enabled" or "disabled"

The type of thinking to use.

Accepts one of the following:

"enabled"

"disabled"

verbosity: optional "low" or "medium" or "high"

Soft control for how verbose model output should be, used for GPT-5 models.

Accepts one of the following:

"low"

"medium"

"high"

GoogleAIModelSettings = object { max_output_tokens, parallel_tool_calls, provider_type, 3 more }

max_output_tokens: optional number

The maximum number of tokens the model can generate.

parallel_tool_calls: optional boolean

Whether to enable parallel tool calling.

provider_type: optional "google_ai"

The type of the provider.

response_schema: optional TextResponseFormat { type } or JsonSchemaResponseFormat { json_schema, type } or JsonObjectResponseFormat { type }

The response schema for the model.

Accepts one of the following:

TextResponseFormat = object { type }

Response format for plain text responses.

type: optional "text"

The type of the response format.

JsonSchemaResponseFormat = object { json_schema, type }

Response format for JSON schema-based responses.

json_schema: map[unknown]

The JSON schema of the response.

type: optional "json_schema"

The type of the response format.

JsonObjectResponseFormat = object { type }

Response format for JSON object responses.

type: optional "json_object"

The type of the response format.

temperature: optional number

The temperature of the model.

thinking_config: optional object { include_thoughts, thinking_budget }

The thinking configuration for the model.

include_thoughts: optional boolean

Whether to include thoughts in the model's response.

thinking_budget: optional number

The thinking budget for the model.

GoogleVertexModelSettings = object { max_output_tokens, parallel_tool_calls, provider_type, 3 more }

max_output_tokens: optional number

The maximum number of tokens the model can generate.

parallel_tool_calls: optional boolean

Whether to enable parallel tool calling.

provider_type: optional "google_vertex"

The type of the provider.

response_schema: optional TextResponseFormat { type } or JsonSchemaResponseFormat { json_schema, type } or JsonObjectResponseFormat { type }

The response schema for the model.

Accepts one of the following:

TextResponseFormat = object { type }

Response format for plain text responses.

type: optional "text"

The type of the response format.

JsonSchemaResponseFormat = object { json_schema, type }

Response format for JSON schema-based responses.

json_schema: map[unknown]

The JSON schema of the response.

type: optional "json_schema"

The type of the response format.

JsonObjectResponseFormat = object { type }

Response format for JSON object responses.

type: optional "json_object"

The type of the response format.

temperature: optional number

The temperature of the model.

thinking_config: optional object { include_thoughts, thinking_budget }

The thinking configuration for the model.

include_thoughts: optional boolean

Whether to include thoughts in the model's response.

thinking_budget: optional number

The thinking budget for the model.

AzureModelSettings = object { max_output_tokens, parallel_tool_calls, provider_type, 2 more }

Azure OpenAI model configuration (OpenAI-compatible).

max_output_tokens: optional number

The maximum number of tokens the model can generate.

parallel_tool_calls: optional boolean

Whether to enable parallel tool calling.

provider_type: optional "azure"

The type of the provider.

response_format: optional TextResponseFormat { type } or JsonSchemaResponseFormat { json_schema, type } or JsonObjectResponseFormat { type }

The response format for the model.

Accepts one of the following:

TextResponseFormat = object { type }

Response format for plain text responses.

type: optional "text"

The type of the response format.

JsonSchemaResponseFormat = object { json_schema, type }

Response format for JSON schema-based responses.

json_schema: map[unknown]

The JSON schema of the response.

type: optional "json_schema"

The type of the response format.

JsonObjectResponseFormat = object { type }

Response format for JSON object responses.

type: optional "json_object"

The type of the response format.

temperature: optional number

The temperature of the model.

XaiModelSettings = object { max_output_tokens, parallel_tool_calls, provider_type, 2 more }

xAI model configuration (OpenAI-compatible).

max_output_tokens: optional number

The maximum number of tokens the model can generate.

parallel_tool_calls: optional boolean

Whether to enable parallel tool calling.

provider_type: optional "xai"

The type of the provider.

response_format: optional TextResponseFormat { type } or JsonSchemaResponseFormat { json_schema, type } or JsonObjectResponseFormat { type }

The response format for the model.

Accepts one of the following:

TextResponseFormat = object { type }

Response format for plain text responses.

type: optional "text"

The type of the response format.

JsonSchemaResponseFormat = object { json_schema, type }

Response format for JSON schema-based responses.

json_schema: map[unknown]

The JSON schema of the response.

type: optional "json_schema"

The type of the response format.

JsonObjectResponseFormat = object { type }

Response format for JSON object responses.

type: optional "json_object"

The type of the response format.

temperature: optional number

The temperature of the model.

Zai = object { max_output_tokens, parallel_tool_calls, provider_type, 3 more }

Z.ai (ZhipuAI) model configuration (OpenAI-compatible).

max_output_tokens: optional number

The maximum number of tokens the model can generate.

parallel_tool_calls: optional boolean

Whether to enable parallel tool calling.

provider_type: optional "zai"

The type of the provider.

response_format: optional TextResponseFormat { type } or JsonSchemaResponseFormat { json_schema, type } or JsonObjectResponseFormat { type }

The response format for the model.

Accepts one of the following:

TextResponseFormat = object { type }

Response format for plain text responses.

type: optional "text"

The type of the response format.

JsonSchemaResponseFormat = object { json_schema, type }

Response format for JSON schema-based responses.

json_schema: map[unknown]

The JSON schema of the response.

type: optional "json_schema"

The type of the response format.

JsonObjectResponseFormat = object { type }

Response format for JSON object responses.

type: optional "json_object"

The type of the response format.

temperature: optional number

The temperature of the model.

thinking: optional object { clear_thinking, type }

The thinking configuration for GLM-4.5+ models.

clear_thinking: optional boolean

If False, preserved thinking is used (recommended for agents).

type: optional "enabled" or "disabled"

Whether thinking is enabled or disabled.

Accepts one of the following:

"enabled"

"disabled"

GroqModelSettings = object { max_output_tokens, parallel_tool_calls, provider_type, 2 more }

Groq model configuration (OpenAI-compatible).

max_output_tokens: optional number

The maximum number of tokens the model can generate.

parallel_tool_calls: optional boolean

Whether to enable parallel tool calling.

provider_type: optional "groq"

The type of the provider.

response_format: optional TextResponseFormat { type } or JsonSchemaResponseFormat { json_schema, type } or JsonObjectResponseFormat { type }

The response format for the model.

Accepts one of the following:

TextResponseFormat = object { type }

Response format for plain text responses.

type: optional "text"

The type of the response format.

JsonSchemaResponseFormat = object { json_schema, type }

Response format for JSON schema-based responses.

json_schema: map[unknown]

The JSON schema of the response.

type: optional "json_schema"

The type of the response format.

JsonObjectResponseFormat = object { type }

Response format for JSON object responses.

type: optional "json_object"

The type of the response format.

temperature: optional number

The temperature of the model.

DeepseekModelSettings = object { max_output_tokens, parallel_tool_calls, provider_type, 2 more }

Deepseek model configuration (OpenAI-compatible).

max_output_tokens: optional number

The maximum number of tokens the model can generate.

parallel_tool_calls: optional boolean

Whether to enable parallel tool calling.

provider_type: optional "deepseek"

The type of the provider.

response_format: optional TextResponseFormat { type } or JsonSchemaResponseFormat { json_schema, type } or JsonObjectResponseFormat { type }

The response format for the model.

Accepts one of the following:

TextResponseFormat = object { type }

Response format for plain text responses.

type: optional "text"

The type of the response format.

JsonSchemaResponseFormat = object { json_schema, type }

Response format for JSON schema-based responses.

json_schema: map[unknown]

The JSON schema of the response.

type: optional "json_schema"

The type of the response format.

JsonObjectResponseFormat = object { type }

Response format for JSON object responses.

type: optional "json_object"

The type of the response format.

temperature: optional number

The temperature of the model.

TogetherModelSettings = object { max_output_tokens, parallel_tool_calls, provider_type, 2 more }

Together AI model configuration (OpenAI-compatible).

max_output_tokens: optional number

The maximum number of tokens the model can generate.

parallel_tool_calls: optional boolean

Whether to enable parallel tool calling.

provider_type: optional "together"

The type of the provider.

response_format: optional TextResponseFormat { type } or JsonSchemaResponseFormat { json_schema, type } or JsonObjectResponseFormat { type }

The response format for the model.

Accepts one of the following:

TextResponseFormat = object { type }

Response format for plain text responses.

type: optional "text"

The type of the response format.

JsonSchemaResponseFormat = object { json_schema, type }

Response format for JSON schema-based responses.

json_schema: map[unknown]

The JSON schema of the response.

type: optional "json_schema"

The type of the response format.

JsonObjectResponseFormat = object { type }

Response format for JSON object responses.

type: optional "json_object"

The type of the response format.

temperature: optional number

The temperature of the model.

BedrockModelSettings = object { max_output_tokens, parallel_tool_calls, provider_type, 2 more }

AWS Bedrock model configuration.

max_output_tokens: optional number

The maximum number of tokens the model can generate.

parallel_tool_calls: optional boolean

Whether to enable parallel tool calling.

provider_type: optional "bedrock"

The type of the provider.

response_format: optional TextResponseFormat { type } or JsonSchemaResponseFormat { json_schema, type } or JsonObjectResponseFormat { type }

The response format for the model.

Accepts one of the following:

TextResponseFormat = object { type }

Response format for plain text responses.

type: optional "text"

The type of the response format.

JsonSchemaResponseFormat = object { json_schema, type }

Response format for JSON schema-based responses.

json_schema: map[unknown]

The JSON schema of the response.

type: optional "json_schema"

The type of the response format.

JsonObjectResponseFormat = object { type }

Response format for JSON object responses.

type: optional "json_object"

The type of the response format.

temperature: optional number

The temperature of the model.

Openrouter = object { max_output_tokens, parallel_tool_calls, provider_type, 2 more }

OpenRouter model configuration (OpenAI-compatible).

max_output_tokens: optional number

The maximum number of tokens the model can generate.

parallel_tool_calls: optional boolean

Whether to enable parallel tool calling.

provider_type: optional "openrouter"

The type of the provider.

response_format: optional TextResponseFormat { type } or JsonSchemaResponseFormat { json_schema, type } or JsonObjectResponseFormat { type }

The response format for the model.

Accepts one of the following:

TextResponseFormat = object { type }

Response format for plain text responses.

type: optional "text"

The type of the response format.

JsonSchemaResponseFormat = object { json_schema, type }

Response format for JSON schema-based responses.

json_schema: map[unknown]

The JSON schema of the response.

type: optional "json_schema"

The type of the response format.

JsonObjectResponseFormat = object { type }

Response format for JSON object responses.

type: optional "json_object"

The type of the response format.

temperature: optional number

The temperature of the model.

ChatgptOAuth = object { max_output_tokens, parallel_tool_calls, provider_type, 2 more }

ChatGPT OAuth model configuration (uses ChatGPT backend API).

max_output_tokens: optional number

The maximum number of tokens the model can generate.

parallel_tool_calls: optional boolean

Whether to enable parallel tool calling.

provider_type: optional "chatgpt_oauth"

The type of the provider.

reasoning: optional object { reasoning_effort }

The reasoning configuration for the model.

reasoning_effort: optional "none" or "low" or "medium" or 2 more

The reasoning effort level for GPT-5.x and o-series models.

Accepts one of the following:

"none"

"low"

"medium"

"high"

"xhigh"

temperature: optional number

The temperature of the model.

prompt: optional string

The prompt to use for summarization. If None, uses mode-specific default.

prompt_acknowledgement: optional boolean

Whether to include an acknowledgement post-prompt (helps prevent non-summary outputs).

sliding_window_percentage: optional number

The percentage of the context window to keep post-summarization (only used in sliding window modes).

ReturnsExpand Collapse

CompactionResponse = object { num_messages_after, num_messages_before, summary }

num_messages_after: number

num_messages_before: number

summary: string

Summarize Messages

curl https://api.letta.com/v1/agents/$AGENT_ID/summarize \
    -X POST \
    -H "Authorization: Bearer $LETTA_API_KEY"

{
  "num_messages_after": 0,
  "num_messages_before": 0,
  "summary": "summary"
}

Returns Examples

{
  "num_messages_after": 0,
  "num_messages_before": 0,
  "summary": "summary"
}