Compact Conversation
Compact (summarize) a conversation's message history.
This endpoint summarizes the in-context messages for a specific conversation, reducing the message count while preserving important context.
Agent-direct mode: Pass conversation_id="default" with agent_id in request body to compact the agent's default conversation messages.
Deprecated: Passing an agent ID as conversation_id still works but will be removed.
Path ParametersExpand Collapse
conversation_id: string
The conversation identifier. Can be a conversation ID ('conv-
Body ParametersExpand Collapse
agent_id: optional string
Agent ID for agent-direct mode with 'default' conversation. Use with conversation_id='default' in the URL path.
compaction_settings: optional object { clip_chars, mode, model, 4 more }
Configuration for conversation compaction / summarization.
Per-model settings (temperature, max tokens, etc.) are derived from the default configuration for that handle.
clip_chars: optional number
The maximum length of the summary in characters. If none, no clipping is performed.
mode: optional "all" or "sliding_window" or "self_compact_all" or "self_compact_sliding_window"
The type of summarization technique use.
model: optional string
Model handle to use for sliding_window/all summarization (format: provider/model-name). If None, uses lightweight provider-specific defaults.
model_settings: optional OpenAIModelSettings { max_output_tokens, parallel_tool_calls, provider_type, 4 more } or AnthropicModelSettings { effort, max_output_tokens, parallel_tool_calls, 6 more } or GoogleAIModelSettings { max_output_tokens, parallel_tool_calls, provider_type, 3 more } or 10 more
Optional model settings used to override defaults for the summarizer model.
OpenAIModelSettings = object { max_output_tokens, parallel_tool_calls, provider_type, 4 more }
max_output_tokens: optional number
The maximum number of tokens the model can generate.
parallel_tool_calls: optional boolean
Whether to enable parallel tool calling.
provider_type: optional "openai"
The type of the provider.
reasoning: optional object { reasoning_effort }
The reasoning configuration for the model.
reasoning_effort: optional "none" or "minimal" or "low" or 3 more
The reasoning effort to use when generating text reasoning models
response_format: optional TextResponseFormat { type } or JsonSchemaResponseFormat { json_schema, type } or JsonObjectResponseFormat { type }
The response format for the model.
TextResponseFormat = object { type }
Response format for plain text responses.
type: optional "text"
The type of the response format.
JsonSchemaResponseFormat = object { json_schema, type }
Response format for JSON schema-based responses.
json_schema: map[unknown]
The JSON schema of the response.
type: optional "json_schema"
The type of the response format.
JsonObjectResponseFormat = object { type }
Response format for JSON object responses.
type: optional "json_object"
The type of the response format.
strict: optional boolean
Enable strict mode for tool calling. When true, tool outputs are guaranteed to match JSON schemas.
temperature: optional number
The temperature of the model.
AnthropicModelSettings = object { effort, max_output_tokens, parallel_tool_calls, 6 more }
effort: optional "low" or "medium" or "high" or "max"
Effort level for supported Anthropic models (controls token spending). 'max' is only available on Opus 4.6. Not setting this gives similar performance to 'high'.
max_output_tokens: optional number
The maximum number of tokens the model can generate.
parallel_tool_calls: optional boolean
Whether to enable parallel tool calling.
provider_type: optional "anthropic"
The type of the provider.
response_format: optional TextResponseFormat { type } or JsonSchemaResponseFormat { json_schema, type } or JsonObjectResponseFormat { type }
The response format for the model.
TextResponseFormat = object { type }
Response format for plain text responses.
type: optional "text"
The type of the response format.
JsonSchemaResponseFormat = object { json_schema, type }
Response format for JSON schema-based responses.
json_schema: map[unknown]
The JSON schema of the response.
type: optional "json_schema"
The type of the response format.
JsonObjectResponseFormat = object { type }
Response format for JSON object responses.
type: optional "json_object"
The type of the response format.
strict: optional boolean
Enable strict mode for tool calling. When true, tool outputs are guaranteed to match JSON schemas.
temperature: optional number
The temperature of the model.
thinking: optional object { budget_tokens, type }
The thinking configuration for the model.
budget_tokens: optional number
The maximum number of tokens the model can use for extended thinking.
type: optional "enabled" or "disabled"
The type of thinking to use.
verbosity: optional "low" or "medium" or "high"
Soft control for how verbose model output should be, used for GPT-5 models.
GoogleAIModelSettings = object { max_output_tokens, parallel_tool_calls, provider_type, 3 more }
max_output_tokens: optional number
The maximum number of tokens the model can generate.
parallel_tool_calls: optional boolean
Whether to enable parallel tool calling.
provider_type: optional "google_ai"
The type of the provider.
response_schema: optional TextResponseFormat { type } or JsonSchemaResponseFormat { json_schema, type } or JsonObjectResponseFormat { type }
The response schema for the model.
TextResponseFormat = object { type }
Response format for plain text responses.
type: optional "text"
The type of the response format.
JsonSchemaResponseFormat = object { json_schema, type }
Response format for JSON schema-based responses.
json_schema: map[unknown]
The JSON schema of the response.
type: optional "json_schema"
The type of the response format.
JsonObjectResponseFormat = object { type }
Response format for JSON object responses.
type: optional "json_object"
The type of the response format.
temperature: optional number
The temperature of the model.
thinking_config: optional object { include_thoughts, thinking_budget }
The thinking configuration for the model.
include_thoughts: optional boolean
Whether to include thoughts in the model's response.
thinking_budget: optional number
The thinking budget for the model.
GoogleVertexModelSettings = object { max_output_tokens, parallel_tool_calls, provider_type, 3 more }
max_output_tokens: optional number
The maximum number of tokens the model can generate.
parallel_tool_calls: optional boolean
Whether to enable parallel tool calling.
provider_type: optional "google_vertex"
The type of the provider.
response_schema: optional TextResponseFormat { type } or JsonSchemaResponseFormat { json_schema, type } or JsonObjectResponseFormat { type }
The response schema for the model.
TextResponseFormat = object { type }
Response format for plain text responses.
type: optional "text"
The type of the response format.
JsonSchemaResponseFormat = object { json_schema, type }
Response format for JSON schema-based responses.
json_schema: map[unknown]
The JSON schema of the response.
type: optional "json_schema"
The type of the response format.
JsonObjectResponseFormat = object { type }
Response format for JSON object responses.
type: optional "json_object"
The type of the response format.
temperature: optional number
The temperature of the model.
thinking_config: optional object { include_thoughts, thinking_budget }
The thinking configuration for the model.
include_thoughts: optional boolean
Whether to include thoughts in the model's response.
thinking_budget: optional number
The thinking budget for the model.
AzureModelSettings = object { max_output_tokens, parallel_tool_calls, provider_type, 2 more }
Azure OpenAI model configuration (OpenAI-compatible).
max_output_tokens: optional number
The maximum number of tokens the model can generate.
parallel_tool_calls: optional boolean
Whether to enable parallel tool calling.
provider_type: optional "azure"
The type of the provider.
response_format: optional TextResponseFormat { type } or JsonSchemaResponseFormat { json_schema, type } or JsonObjectResponseFormat { type }
The response format for the model.
TextResponseFormat = object { type }
Response format for plain text responses.
type: optional "text"
The type of the response format.
JsonSchemaResponseFormat = object { json_schema, type }
Response format for JSON schema-based responses.
json_schema: map[unknown]
The JSON schema of the response.
type: optional "json_schema"
The type of the response format.
JsonObjectResponseFormat = object { type }
Response format for JSON object responses.
type: optional "json_object"
The type of the response format.
temperature: optional number
The temperature of the model.
XaiModelSettings = object { max_output_tokens, parallel_tool_calls, provider_type, 2 more }
xAI model configuration (OpenAI-compatible).
max_output_tokens: optional number
The maximum number of tokens the model can generate.
parallel_tool_calls: optional boolean
Whether to enable parallel tool calling.
provider_type: optional "xai"
The type of the provider.
response_format: optional TextResponseFormat { type } or JsonSchemaResponseFormat { json_schema, type } or JsonObjectResponseFormat { type }
The response format for the model.
TextResponseFormat = object { type }
Response format for plain text responses.
type: optional "text"
The type of the response format.
JsonSchemaResponseFormat = object { json_schema, type }
Response format for JSON schema-based responses.
json_schema: map[unknown]
The JSON schema of the response.
type: optional "json_schema"
The type of the response format.
JsonObjectResponseFormat = object { type }
Response format for JSON object responses.
type: optional "json_object"
The type of the response format.
temperature: optional number
The temperature of the model.
Zai = object { max_output_tokens, parallel_tool_calls, provider_type, 3 more }
Z.ai (ZhipuAI) model configuration (OpenAI-compatible).
max_output_tokens: optional number
The maximum number of tokens the model can generate.
parallel_tool_calls: optional boolean
Whether to enable parallel tool calling.
provider_type: optional "zai"
The type of the provider.
response_format: optional TextResponseFormat { type } or JsonSchemaResponseFormat { json_schema, type } or JsonObjectResponseFormat { type }
The response format for the model.
TextResponseFormat = object { type }
Response format for plain text responses.
type: optional "text"
The type of the response format.
JsonSchemaResponseFormat = object { json_schema, type }
Response format for JSON schema-based responses.
json_schema: map[unknown]
The JSON schema of the response.
type: optional "json_schema"
The type of the response format.
JsonObjectResponseFormat = object { type }
Response format for JSON object responses.
type: optional "json_object"
The type of the response format.
temperature: optional number
The temperature of the model.
thinking: optional object { clear_thinking, type }
The thinking configuration for GLM-4.5+ models.
clear_thinking: optional boolean
If False, preserved thinking is used (recommended for agents).
type: optional "enabled" or "disabled"
Whether thinking is enabled or disabled.
GroqModelSettings = object { max_output_tokens, parallel_tool_calls, provider_type, 2 more }
Groq model configuration (OpenAI-compatible).
max_output_tokens: optional number
The maximum number of tokens the model can generate.
parallel_tool_calls: optional boolean
Whether to enable parallel tool calling.
provider_type: optional "groq"
The type of the provider.
response_format: optional TextResponseFormat { type } or JsonSchemaResponseFormat { json_schema, type } or JsonObjectResponseFormat { type }
The response format for the model.
TextResponseFormat = object { type }
Response format for plain text responses.
type: optional "text"
The type of the response format.
JsonSchemaResponseFormat = object { json_schema, type }
Response format for JSON schema-based responses.
json_schema: map[unknown]
The JSON schema of the response.
type: optional "json_schema"
The type of the response format.
JsonObjectResponseFormat = object { type }
Response format for JSON object responses.
type: optional "json_object"
The type of the response format.
temperature: optional number
The temperature of the model.
DeepseekModelSettings = object { max_output_tokens, parallel_tool_calls, provider_type, 2 more }
Deepseek model configuration (OpenAI-compatible).
max_output_tokens: optional number
The maximum number of tokens the model can generate.
parallel_tool_calls: optional boolean
Whether to enable parallel tool calling.
provider_type: optional "deepseek"
The type of the provider.
response_format: optional TextResponseFormat { type } or JsonSchemaResponseFormat { json_schema, type } or JsonObjectResponseFormat { type }
The response format for the model.
TextResponseFormat = object { type }
Response format for plain text responses.
type: optional "text"
The type of the response format.
JsonSchemaResponseFormat = object { json_schema, type }
Response format for JSON schema-based responses.
json_schema: map[unknown]
The JSON schema of the response.
type: optional "json_schema"
The type of the response format.
JsonObjectResponseFormat = object { type }
Response format for JSON object responses.
type: optional "json_object"
The type of the response format.
temperature: optional number
The temperature of the model.
TogetherModelSettings = object { max_output_tokens, parallel_tool_calls, provider_type, 2 more }
Together AI model configuration (OpenAI-compatible).
max_output_tokens: optional number
The maximum number of tokens the model can generate.
parallel_tool_calls: optional boolean
Whether to enable parallel tool calling.
provider_type: optional "together"
The type of the provider.
response_format: optional TextResponseFormat { type } or JsonSchemaResponseFormat { json_schema, type } or JsonObjectResponseFormat { type }
The response format for the model.
TextResponseFormat = object { type }
Response format for plain text responses.
type: optional "text"
The type of the response format.
JsonSchemaResponseFormat = object { json_schema, type }
Response format for JSON schema-based responses.
json_schema: map[unknown]
The JSON schema of the response.
type: optional "json_schema"
The type of the response format.
JsonObjectResponseFormat = object { type }
Response format for JSON object responses.
type: optional "json_object"
The type of the response format.
temperature: optional number
The temperature of the model.
BedrockModelSettings = object { max_output_tokens, parallel_tool_calls, provider_type, 2 more }
AWS Bedrock model configuration.
max_output_tokens: optional number
The maximum number of tokens the model can generate.
parallel_tool_calls: optional boolean
Whether to enable parallel tool calling.
provider_type: optional "bedrock"
The type of the provider.
response_format: optional TextResponseFormat { type } or JsonSchemaResponseFormat { json_schema, type } or JsonObjectResponseFormat { type }
The response format for the model.
TextResponseFormat = object { type }
Response format for plain text responses.
type: optional "text"
The type of the response format.
JsonSchemaResponseFormat = object { json_schema, type }
Response format for JSON schema-based responses.
json_schema: map[unknown]
The JSON schema of the response.
type: optional "json_schema"
The type of the response format.
JsonObjectResponseFormat = object { type }
Response format for JSON object responses.
type: optional "json_object"
The type of the response format.
temperature: optional number
The temperature of the model.
Openrouter = object { max_output_tokens, parallel_tool_calls, provider_type, 2 more }
OpenRouter model configuration (OpenAI-compatible).
max_output_tokens: optional number
The maximum number of tokens the model can generate.
parallel_tool_calls: optional boolean
Whether to enable parallel tool calling.
provider_type: optional "openrouter"
The type of the provider.
response_format: optional TextResponseFormat { type } or JsonSchemaResponseFormat { json_schema, type } or JsonObjectResponseFormat { type }
The response format for the model.
TextResponseFormat = object { type }
Response format for plain text responses.
type: optional "text"
The type of the response format.
JsonSchemaResponseFormat = object { json_schema, type }
Response format for JSON schema-based responses.
json_schema: map[unknown]
The JSON schema of the response.
type: optional "json_schema"
The type of the response format.
JsonObjectResponseFormat = object { type }
Response format for JSON object responses.
type: optional "json_object"
The type of the response format.
temperature: optional number
The temperature of the model.
ChatgptOAuth = object { max_output_tokens, parallel_tool_calls, provider_type, 2 more }
ChatGPT OAuth model configuration (uses ChatGPT backend API).
max_output_tokens: optional number
The maximum number of tokens the model can generate.
parallel_tool_calls: optional boolean
Whether to enable parallel tool calling.
provider_type: optional "chatgpt_oauth"
The type of the provider.
reasoning: optional object { reasoning_effort }
The reasoning configuration for the model.
reasoning_effort: optional "none" or "low" or "medium" or 2 more
The reasoning effort level for GPT-5.x and o-series models.
temperature: optional number
The temperature of the model.
prompt: optional string
The prompt to use for summarization. If None, uses mode-specific default.
prompt_acknowledgement: optional boolean
Whether to include an acknowledgement post-prompt (helps prevent non-summary outputs).
sliding_window_percentage: optional number
The percentage of the context window to keep post-summarization (only used in sliding window modes).
ReturnsExpand Collapse
CompactionResponse = object { num_messages_after, num_messages_before, summary }
Compact Conversation
curl https://api.letta.com/v1/conversations/$CONVERSATION_ID/compact \
-X POST \
-H "Authorization: Bearer $LETTA_API_KEY"
{
"num_messages_after": 0,
"num_messages_before": 0,
"summary": "summary"
}
Returns Examples
{
"num_messages_after": 0,
"num_messages_before": 0,
"summary": "summary"
}