Summarize Messages
Summarize an agent's conversation history.
Path ParametersExpand Collapse
agent_id: string
The ID of the agent in the format 'agent-
Body ParametersExpand Collapse
compaction_settings: optional object { model, clip_chars, mode, 4 more }
Configuration for conversation compaction / summarization.
model is the only required user-facing field – it specifies the summarizer
model handle (e.g. "openai/gpt-4o-mini"). Per-model settings (temperature,
max tokens, etc.) are derived from the default configuration for that handle.
model: string
Model handle to use for summarization (format: provider/model-name).
clip_chars: optional number
The maximum length of the summary in characters. If none, no clipping is performed.
mode: optional "all" or "sliding_window"
The type of summarization technique use.
model_settings: optional OpenAIModelSettings { max_output_tokens, parallel_tool_calls, provider_type, 3 more } or AnthropicModelSettings { effort, max_output_tokens, parallel_tool_calls, 5 more } or GoogleAIModelSettings { max_output_tokens, parallel_tool_calls, provider_type, 3 more } or 8 more
Optional model settings used to override defaults for the summarizer model.
OpenAIModelSettings = object { max_output_tokens, parallel_tool_calls, provider_type, 3 more }
max_output_tokens: optional number
The maximum number of tokens the model can generate.
parallel_tool_calls: optional boolean
Whether to enable parallel tool calling.
provider_type: optional "openai"
The type of the provider.
reasoning: optional object { reasoning_effort }
The reasoning configuration for the model.
reasoning_effort: optional "none" or "minimal" or "low" or 3 more
The reasoning effort to use when generating text reasoning models
response_format: optional TextResponseFormat { type } or JsonSchemaResponseFormat { json_schema, type } or JsonObjectResponseFormat { type }
The response format for the model.
TextResponseFormat = object { type }
Response format for plain text responses.
type: optional "text"
The type of the response format.
JsonSchemaResponseFormat = object { json_schema, type }
Response format for JSON schema-based responses.
json_schema: map[unknown]
The JSON schema of the response.
type: optional "json_schema"
The type of the response format.
JsonObjectResponseFormat = object { type }
Response format for JSON object responses.
type: optional "json_object"
The type of the response format.
temperature: optional number
The temperature of the model.
AnthropicModelSettings = object { effort, max_output_tokens, parallel_tool_calls, 5 more }
effort: optional "low" or "medium" or "high"
Effort level for Opus 4.5 model (controls token conservation). Not setting this gives similar performance to 'high'.
max_output_tokens: optional number
The maximum number of tokens the model can generate.
parallel_tool_calls: optional boolean
Whether to enable parallel tool calling.
provider_type: optional "anthropic"
The type of the provider.
response_format: optional TextResponseFormat { type } or JsonSchemaResponseFormat { json_schema, type } or JsonObjectResponseFormat { type }
The response format for the model.
TextResponseFormat = object { type }
Response format for plain text responses.
type: optional "text"
The type of the response format.
JsonSchemaResponseFormat = object { json_schema, type }
Response format for JSON schema-based responses.
json_schema: map[unknown]
The JSON schema of the response.
type: optional "json_schema"
The type of the response format.
JsonObjectResponseFormat = object { type }
Response format for JSON object responses.
type: optional "json_object"
The type of the response format.
temperature: optional number
The temperature of the model.
thinking: optional object { budget_tokens, type }
The thinking configuration for the model.
budget_tokens: optional number
The maximum number of tokens the model can use for extended thinking.
type: optional "enabled" or "disabled"
The type of thinking to use.
verbosity: optional "low" or "medium" or "high"
Soft control for how verbose model output should be, used for GPT-5 models.
GoogleAIModelSettings = object { max_output_tokens, parallel_tool_calls, provider_type, 3 more }
max_output_tokens: optional number
The maximum number of tokens the model can generate.
parallel_tool_calls: optional boolean
Whether to enable parallel tool calling.
provider_type: optional "google_ai"
The type of the provider.
response_schema: optional TextResponseFormat { type } or JsonSchemaResponseFormat { json_schema, type } or JsonObjectResponseFormat { type }
The response schema for the model.
TextResponseFormat = object { type }
Response format for plain text responses.
type: optional "text"
The type of the response format.
JsonSchemaResponseFormat = object { json_schema, type }
Response format for JSON schema-based responses.
json_schema: map[unknown]
The JSON schema of the response.
type: optional "json_schema"
The type of the response format.
JsonObjectResponseFormat = object { type }
Response format for JSON object responses.
type: optional "json_object"
The type of the response format.
temperature: optional number
The temperature of the model.
thinking_config: optional object { include_thoughts, thinking_budget }
The thinking configuration for the model.
include_thoughts: optional boolean
Whether to include thoughts in the model's response.
thinking_budget: optional number
The thinking budget for the model.
GoogleVertexModelSettings = object { max_output_tokens, parallel_tool_calls, provider_type, 3 more }
max_output_tokens: optional number
The maximum number of tokens the model can generate.
parallel_tool_calls: optional boolean
Whether to enable parallel tool calling.
provider_type: optional "google_vertex"
The type of the provider.
response_schema: optional TextResponseFormat { type } or JsonSchemaResponseFormat { json_schema, type } or JsonObjectResponseFormat { type }
The response schema for the model.
TextResponseFormat = object { type }
Response format for plain text responses.
type: optional "text"
The type of the response format.
JsonSchemaResponseFormat = object { json_schema, type }
Response format for JSON schema-based responses.
json_schema: map[unknown]
The JSON schema of the response.
type: optional "json_schema"
The type of the response format.
JsonObjectResponseFormat = object { type }
Response format for JSON object responses.
type: optional "json_object"
The type of the response format.
temperature: optional number
The temperature of the model.
thinking_config: optional object { include_thoughts, thinking_budget }
The thinking configuration for the model.
include_thoughts: optional boolean
Whether to include thoughts in the model's response.
thinking_budget: optional number
The thinking budget for the model.
AzureModelSettings = object { max_output_tokens, parallel_tool_calls, provider_type, 2 more }
Azure OpenAI model configuration (OpenAI-compatible).
max_output_tokens: optional number
The maximum number of tokens the model can generate.
parallel_tool_calls: optional boolean
Whether to enable parallel tool calling.
provider_type: optional "azure"
The type of the provider.
response_format: optional TextResponseFormat { type } or JsonSchemaResponseFormat { json_schema, type } or JsonObjectResponseFormat { type }
The response format for the model.
TextResponseFormat = object { type }
Response format for plain text responses.
type: optional "text"
The type of the response format.
JsonSchemaResponseFormat = object { json_schema, type }
Response format for JSON schema-based responses.
json_schema: map[unknown]
The JSON schema of the response.
type: optional "json_schema"
The type of the response format.
JsonObjectResponseFormat = object { type }
Response format for JSON object responses.
type: optional "json_object"
The type of the response format.
temperature: optional number
The temperature of the model.
XaiModelSettings = object { max_output_tokens, parallel_tool_calls, provider_type, 2 more }
xAI model configuration (OpenAI-compatible).
max_output_tokens: optional number
The maximum number of tokens the model can generate.
parallel_tool_calls: optional boolean
Whether to enable parallel tool calling.
provider_type: optional "xai"
The type of the provider.
response_format: optional TextResponseFormat { type } or JsonSchemaResponseFormat { json_schema, type } or JsonObjectResponseFormat { type }
The response format for the model.
TextResponseFormat = object { type }
Response format for plain text responses.
type: optional "text"
The type of the response format.
JsonSchemaResponseFormat = object { json_schema, type }
Response format for JSON schema-based responses.
json_schema: map[unknown]
The JSON schema of the response.
type: optional "json_schema"
The type of the response format.
JsonObjectResponseFormat = object { type }
Response format for JSON object responses.
type: optional "json_object"
The type of the response format.
temperature: optional number
The temperature of the model.
Zai = object { max_output_tokens, parallel_tool_calls, provider_type, 2 more }
Z.ai (ZhipuAI) model configuration (OpenAI-compatible).
max_output_tokens: optional number
The maximum number of tokens the model can generate.
parallel_tool_calls: optional boolean
Whether to enable parallel tool calling.
provider_type: optional "zai"
The type of the provider.
response_format: optional TextResponseFormat { type } or JsonSchemaResponseFormat { json_schema, type } or JsonObjectResponseFormat { type }
The response format for the model.
TextResponseFormat = object { type }
Response format for plain text responses.
type: optional "text"
The type of the response format.
JsonSchemaResponseFormat = object { json_schema, type }
Response format for JSON schema-based responses.
json_schema: map[unknown]
The JSON schema of the response.
type: optional "json_schema"
The type of the response format.
JsonObjectResponseFormat = object { type }
Response format for JSON object responses.
type: optional "json_object"
The type of the response format.
temperature: optional number
The temperature of the model.
GroqModelSettings = object { max_output_tokens, parallel_tool_calls, provider_type, 2 more }
Groq model configuration (OpenAI-compatible).
max_output_tokens: optional number
The maximum number of tokens the model can generate.
parallel_tool_calls: optional boolean
Whether to enable parallel tool calling.
provider_type: optional "groq"
The type of the provider.
response_format: optional TextResponseFormat { type } or JsonSchemaResponseFormat { json_schema, type } or JsonObjectResponseFormat { type }
The response format for the model.
TextResponseFormat = object { type }
Response format for plain text responses.
type: optional "text"
The type of the response format.
JsonSchemaResponseFormat = object { json_schema, type }
Response format for JSON schema-based responses.
json_schema: map[unknown]
The JSON schema of the response.
type: optional "json_schema"
The type of the response format.
JsonObjectResponseFormat = object { type }
Response format for JSON object responses.
type: optional "json_object"
The type of the response format.
temperature: optional number
The temperature of the model.
DeepseekModelSettings = object { max_output_tokens, parallel_tool_calls, provider_type, 2 more }
Deepseek model configuration (OpenAI-compatible).
max_output_tokens: optional number
The maximum number of tokens the model can generate.
parallel_tool_calls: optional boolean
Whether to enable parallel tool calling.
provider_type: optional "deepseek"
The type of the provider.
response_format: optional TextResponseFormat { type } or JsonSchemaResponseFormat { json_schema, type } or JsonObjectResponseFormat { type }
The response format for the model.
TextResponseFormat = object { type }
Response format for plain text responses.
type: optional "text"
The type of the response format.
JsonSchemaResponseFormat = object { json_schema, type }
Response format for JSON schema-based responses.
json_schema: map[unknown]
The JSON schema of the response.
type: optional "json_schema"
The type of the response format.
JsonObjectResponseFormat = object { type }
Response format for JSON object responses.
type: optional "json_object"
The type of the response format.
temperature: optional number
The temperature of the model.
TogetherModelSettings = object { max_output_tokens, parallel_tool_calls, provider_type, 2 more }
Together AI model configuration (OpenAI-compatible).
max_output_tokens: optional number
The maximum number of tokens the model can generate.
parallel_tool_calls: optional boolean
Whether to enable parallel tool calling.
provider_type: optional "together"
The type of the provider.
response_format: optional TextResponseFormat { type } or JsonSchemaResponseFormat { json_schema, type } or JsonObjectResponseFormat { type }
The response format for the model.
TextResponseFormat = object { type }
Response format for plain text responses.
type: optional "text"
The type of the response format.
JsonSchemaResponseFormat = object { json_schema, type }
Response format for JSON schema-based responses.
json_schema: map[unknown]
The JSON schema of the response.
type: optional "json_schema"
The type of the response format.
JsonObjectResponseFormat = object { type }
Response format for JSON object responses.
type: optional "json_object"
The type of the response format.
temperature: optional number
The temperature of the model.
BedrockModelSettings = object { max_output_tokens, parallel_tool_calls, provider_type, 2 more }
AWS Bedrock model configuration.
max_output_tokens: optional number
The maximum number of tokens the model can generate.
parallel_tool_calls: optional boolean
Whether to enable parallel tool calling.
provider_type: optional "bedrock"
The type of the provider.
response_format: optional TextResponseFormat { type } or JsonSchemaResponseFormat { json_schema, type } or JsonObjectResponseFormat { type }
The response format for the model.
TextResponseFormat = object { type }
Response format for plain text responses.
type: optional "text"
The type of the response format.
JsonSchemaResponseFormat = object { json_schema, type }
Response format for JSON schema-based responses.
json_schema: map[unknown]
The JSON schema of the response.
type: optional "json_schema"
The type of the response format.
JsonObjectResponseFormat = object { type }
Response format for JSON object responses.
type: optional "json_object"
The type of the response format.
temperature: optional number
The temperature of the model.
prompt: optional string
The prompt to use for summarization.
prompt_acknowledgement: optional boolean
Whether to include an acknowledgement post-prompt (helps prevent non-summary outputs).
sliding_window_percentage: optional number
The percentage of the context window to keep post-summarization (only used in sliding window mode).
ReturnsExpand Collapse
Summarize Messages
curl https://api.letta.com/v1/agents/$AGENT_ID/summarize \
-X POST \
-H "Authorization: Bearer $LETTA_API_KEY"
{
"num_messages_after": 0,
"num_messages_before": 0,
"summary": "summary"
}
Returns Examples
{
"num_messages_after": 0,
"num_messages_before": 0,
"summary": "summary"
}