Skip to content
Letta Platform Letta Platform Letta Docs
Sign up

Messages

List Conversation Messages
conversations.messages.list(strconversation_id, MessageListParams**kwargs) -> SyncArrayPage[Message]
get/v1/conversations/{conversation_id}/messages
Send Conversation Message
conversations.messages.create(strconversation_id, MessageCreateParams**kwargs) -> LettaResponse
post/v1/conversations/{conversation_id}/messages
Retrieve Conversation Stream
conversations.messages.stream(strconversation_id, MessageStreamParams**kwargs) -> object
post/v1/conversations/{conversation_id}/stream
Compact Conversation
conversations.messages.compact(strconversation_id, MessageCompactParams**kwargs) -> CompactionResponse
post/v1/conversations/{conversation_id}/compact
ModelsExpand Collapse
class CompactionRequest:
compaction_settings: Optional[CompactionSettings]

Configuration for conversation compaction / summarization.

model is the only required user-facing field – it specifies the summarizer model handle (e.g. "openai/gpt-4o-mini"). Per-model settings (temperature, max tokens, etc.) are derived from the default configuration for that handle.

model: str

Model handle to use for summarization (format: provider/model-name).

clip_chars: Optional[int]

The maximum length of the summary in characters. If none, no clipping is performed.

mode: Optional[Literal["all", "sliding_window"]]

The type of summarization technique use.

Accepts one of the following:
"all"
"sliding_window"
model_settings: Optional[CompactionSettingsModelSettings]

Optional model settings used to override defaults for the summarizer model.

Accepts one of the following:
class OpenAIModelSettings:
max_output_tokens: Optional[int]

The maximum number of tokens the model can generate.

parallel_tool_calls: Optional[bool]

Whether to enable parallel tool calling.

provider_type: Optional[Literal["openai"]]

The type of the provider.

reasoning: Optional[Reasoning]

The reasoning configuration for the model.

reasoning_effort: Optional[Literal["none", "minimal", "low", 3 more]]

The reasoning effort to use when generating text reasoning models

Accepts one of the following:
"none"
"minimal"
"low"
"medium"
"high"
"xhigh"
response_format: Optional[ResponseFormat]

The response format for the model.

Accepts one of the following:
class TextResponseFormat:

Response format for plain text responses.

type: Optional[Literal["text"]]

The type of the response format.

class JsonSchemaResponseFormat:

Response format for JSON schema-based responses.

json_schema: Dict[str, object]

The JSON schema of the response.

type: Optional[Literal["json_schema"]]

The type of the response format.

class JsonObjectResponseFormat:

Response format for JSON object responses.

type: Optional[Literal["json_object"]]

The type of the response format.

strict: Optional[bool]

Enable strict mode for tool calling. When true, tool outputs are guaranteed to match JSON schemas.

temperature: Optional[float]

The temperature of the model.

class AnthropicModelSettings:
effort: Optional[Literal["low", "medium", "high"]]

Effort level for Opus 4.5 model (controls token conservation). Not setting this gives similar performance to 'high'.

Accepts one of the following:
"low"
"medium"
"high"
max_output_tokens: Optional[int]

The maximum number of tokens the model can generate.

parallel_tool_calls: Optional[bool]

Whether to enable parallel tool calling.

provider_type: Optional[Literal["anthropic"]]

The type of the provider.

response_format: Optional[ResponseFormat]

The response format for the model.

Accepts one of the following:
class TextResponseFormat:

Response format for plain text responses.

type: Optional[Literal["text"]]

The type of the response format.

class JsonSchemaResponseFormat:

Response format for JSON schema-based responses.

json_schema: Dict[str, object]

The JSON schema of the response.

type: Optional[Literal["json_schema"]]

The type of the response format.

class JsonObjectResponseFormat:

Response format for JSON object responses.

type: Optional[Literal["json_object"]]

The type of the response format.

strict: Optional[bool]

Enable strict mode for tool calling. When true, tool outputs are guaranteed to match JSON schemas.

temperature: Optional[float]

The temperature of the model.

thinking: Optional[Thinking]

The thinking configuration for the model.

budget_tokens: Optional[int]

The maximum number of tokens the model can use for extended thinking.

type: Optional[Literal["enabled", "disabled"]]

The type of thinking to use.

Accepts one of the following:
"enabled"
"disabled"
verbosity: Optional[Literal["low", "medium", "high"]]

Soft control for how verbose model output should be, used for GPT-5 models.

Accepts one of the following:
"low"
"medium"
"high"
class GoogleAIModelSettings:
max_output_tokens: Optional[int]

The maximum number of tokens the model can generate.

parallel_tool_calls: Optional[bool]

Whether to enable parallel tool calling.

provider_type: Optional[Literal["google_ai"]]

The type of the provider.

response_schema: Optional[ResponseSchema]

The response schema for the model.

Accepts one of the following:
class TextResponseFormat:

Response format for plain text responses.

type: Optional[Literal["text"]]

The type of the response format.

class JsonSchemaResponseFormat:

Response format for JSON schema-based responses.

json_schema: Dict[str, object]

The JSON schema of the response.

type: Optional[Literal["json_schema"]]

The type of the response format.

class JsonObjectResponseFormat:

Response format for JSON object responses.

type: Optional[Literal["json_object"]]

The type of the response format.

temperature: Optional[float]

The temperature of the model.

thinking_config: Optional[ThinkingConfig]

The thinking configuration for the model.

include_thoughts: Optional[bool]

Whether to include thoughts in the model's response.

thinking_budget: Optional[int]

The thinking budget for the model.

class GoogleVertexModelSettings:
max_output_tokens: Optional[int]

The maximum number of tokens the model can generate.

parallel_tool_calls: Optional[bool]

Whether to enable parallel tool calling.

provider_type: Optional[Literal["google_vertex"]]

The type of the provider.

response_schema: Optional[ResponseSchema]

The response schema for the model.

Accepts one of the following:
class TextResponseFormat:

Response format for plain text responses.

type: Optional[Literal["text"]]

The type of the response format.

class JsonSchemaResponseFormat:

Response format for JSON schema-based responses.

json_schema: Dict[str, object]

The JSON schema of the response.

type: Optional[Literal["json_schema"]]

The type of the response format.

class JsonObjectResponseFormat:

Response format for JSON object responses.

type: Optional[Literal["json_object"]]

The type of the response format.

temperature: Optional[float]

The temperature of the model.

thinking_config: Optional[ThinkingConfig]

The thinking configuration for the model.

include_thoughts: Optional[bool]

Whether to include thoughts in the model's response.

thinking_budget: Optional[int]

The thinking budget for the model.

class AzureModelSettings:

Azure OpenAI model configuration (OpenAI-compatible).

max_output_tokens: Optional[int]

The maximum number of tokens the model can generate.

parallel_tool_calls: Optional[bool]

Whether to enable parallel tool calling.

provider_type: Optional[Literal["azure"]]

The type of the provider.

response_format: Optional[ResponseFormat]

The response format for the model.

Accepts one of the following:
class TextResponseFormat:

Response format for plain text responses.

type: Optional[Literal["text"]]

The type of the response format.

class JsonSchemaResponseFormat:

Response format for JSON schema-based responses.

json_schema: Dict[str, object]

The JSON schema of the response.

type: Optional[Literal["json_schema"]]

The type of the response format.

class JsonObjectResponseFormat:

Response format for JSON object responses.

type: Optional[Literal["json_object"]]

The type of the response format.

temperature: Optional[float]

The temperature of the model.

class XaiModelSettings:

xAI model configuration (OpenAI-compatible).

max_output_tokens: Optional[int]

The maximum number of tokens the model can generate.

parallel_tool_calls: Optional[bool]

Whether to enable parallel tool calling.

provider_type: Optional[Literal["xai"]]

The type of the provider.

response_format: Optional[ResponseFormat]

The response format for the model.

Accepts one of the following:
class TextResponseFormat:

Response format for plain text responses.

type: Optional[Literal["text"]]

The type of the response format.

class JsonSchemaResponseFormat:

Response format for JSON schema-based responses.

json_schema: Dict[str, object]

The JSON schema of the response.

type: Optional[Literal["json_schema"]]

The type of the response format.

class JsonObjectResponseFormat:

Response format for JSON object responses.

type: Optional[Literal["json_object"]]

The type of the response format.

temperature: Optional[float]

The temperature of the model.

class CompactionSettingsModelSettingsZaiModelSettings:

Z.ai (ZhipuAI) model configuration (OpenAI-compatible).

max_output_tokens: Optional[int]

The maximum number of tokens the model can generate.

parallel_tool_calls: Optional[bool]

Whether to enable parallel tool calling.

provider_type: Optional[Literal["zai"]]

The type of the provider.

response_format: Optional[CompactionSettingsModelSettingsZaiModelSettingsResponseFormat]

The response format for the model.

Accepts one of the following:
class TextResponseFormat:

Response format for plain text responses.

type: Optional[Literal["text"]]

The type of the response format.

class JsonSchemaResponseFormat:

Response format for JSON schema-based responses.

json_schema: Dict[str, object]

The JSON schema of the response.

type: Optional[Literal["json_schema"]]

The type of the response format.

class JsonObjectResponseFormat:

Response format for JSON object responses.

type: Optional[Literal["json_object"]]

The type of the response format.

temperature: Optional[float]

The temperature of the model.

thinking: Optional[CompactionSettingsModelSettingsZaiModelSettingsThinking]

The thinking configuration for GLM-4.5+ models.

clear_thinking: Optional[bool]

If False, preserved thinking is used (recommended for agents).

type: Optional[Literal["enabled", "disabled"]]

Whether thinking is enabled or disabled.

Accepts one of the following:
"enabled"
"disabled"
class GroqModelSettings:

Groq model configuration (OpenAI-compatible).

max_output_tokens: Optional[int]

The maximum number of tokens the model can generate.

parallel_tool_calls: Optional[bool]

Whether to enable parallel tool calling.

provider_type: Optional[Literal["groq"]]

The type of the provider.

response_format: Optional[ResponseFormat]

The response format for the model.

Accepts one of the following:
class TextResponseFormat:

Response format for plain text responses.

type: Optional[Literal["text"]]

The type of the response format.

class JsonSchemaResponseFormat:

Response format for JSON schema-based responses.

json_schema: Dict[str, object]

The JSON schema of the response.

type: Optional[Literal["json_schema"]]

The type of the response format.

class JsonObjectResponseFormat:

Response format for JSON object responses.

type: Optional[Literal["json_object"]]

The type of the response format.

temperature: Optional[float]

The temperature of the model.

class DeepseekModelSettings:

Deepseek model configuration (OpenAI-compatible).

max_output_tokens: Optional[int]

The maximum number of tokens the model can generate.

parallel_tool_calls: Optional[bool]

Whether to enable parallel tool calling.

provider_type: Optional[Literal["deepseek"]]

The type of the provider.

response_format: Optional[ResponseFormat]

The response format for the model.

Accepts one of the following:
class TextResponseFormat:

Response format for plain text responses.

type: Optional[Literal["text"]]

The type of the response format.

class JsonSchemaResponseFormat:

Response format for JSON schema-based responses.

json_schema: Dict[str, object]

The JSON schema of the response.

type: Optional[Literal["json_schema"]]

The type of the response format.

class JsonObjectResponseFormat:

Response format for JSON object responses.

type: Optional[Literal["json_object"]]

The type of the response format.

temperature: Optional[float]

The temperature of the model.

class TogetherModelSettings:

Together AI model configuration (OpenAI-compatible).

max_output_tokens: Optional[int]

The maximum number of tokens the model can generate.

parallel_tool_calls: Optional[bool]

Whether to enable parallel tool calling.

provider_type: Optional[Literal["together"]]

The type of the provider.

response_format: Optional[ResponseFormat]

The response format for the model.

Accepts one of the following:
class TextResponseFormat:

Response format for plain text responses.

type: Optional[Literal["text"]]

The type of the response format.

class JsonSchemaResponseFormat:

Response format for JSON schema-based responses.

json_schema: Dict[str, object]

The JSON schema of the response.

type: Optional[Literal["json_schema"]]

The type of the response format.

class JsonObjectResponseFormat:

Response format for JSON object responses.

type: Optional[Literal["json_object"]]

The type of the response format.

temperature: Optional[float]

The temperature of the model.

class BedrockModelSettings:

AWS Bedrock model configuration.

max_output_tokens: Optional[int]

The maximum number of tokens the model can generate.

parallel_tool_calls: Optional[bool]

Whether to enable parallel tool calling.

provider_type: Optional[Literal["bedrock"]]

The type of the provider.

response_format: Optional[ResponseFormat]

The response format for the model.

Accepts one of the following:
class TextResponseFormat:

Response format for plain text responses.

type: Optional[Literal["text"]]

The type of the response format.

class JsonSchemaResponseFormat:

Response format for JSON schema-based responses.

json_schema: Dict[str, object]

The JSON schema of the response.

type: Optional[Literal["json_schema"]]

The type of the response format.

class JsonObjectResponseFormat:

Response format for JSON object responses.

type: Optional[Literal["json_object"]]

The type of the response format.

temperature: Optional[float]

The temperature of the model.

class CompactionSettingsModelSettingsOpenRouterModelSettings:

OpenRouter model configuration (OpenAI-compatible).

max_output_tokens: Optional[int]

The maximum number of tokens the model can generate.

parallel_tool_calls: Optional[bool]

Whether to enable parallel tool calling.

provider_type: Optional[Literal["openrouter"]]

The type of the provider.

response_format: Optional[CompactionSettingsModelSettingsOpenRouterModelSettingsResponseFormat]

The response format for the model.

Accepts one of the following:
class TextResponseFormat:

Response format for plain text responses.

type: Optional[Literal["text"]]

The type of the response format.

class JsonSchemaResponseFormat:

Response format for JSON schema-based responses.

json_schema: Dict[str, object]

The JSON schema of the response.

type: Optional[Literal["json_schema"]]

The type of the response format.

class JsonObjectResponseFormat:

Response format for JSON object responses.

type: Optional[Literal["json_object"]]

The type of the response format.

temperature: Optional[float]

The temperature of the model.

class CompactionSettingsModelSettingsChatGptoAuthModelSettings:

ChatGPT OAuth model configuration (uses ChatGPT backend API).

max_output_tokens: Optional[int]

The maximum number of tokens the model can generate.

parallel_tool_calls: Optional[bool]

Whether to enable parallel tool calling.

provider_type: Optional[Literal["chatgpt_oauth"]]

The type of the provider.

reasoning: Optional[CompactionSettingsModelSettingsChatGptoAuthModelSettingsReasoning]

The reasoning configuration for the model.

reasoning_effort: Optional[Literal["none", "low", "medium", 2 more]]

The reasoning effort level for GPT-5.x and o-series models.

Accepts one of the following:
"none"
"low"
"medium"
"high"
"xhigh"
temperature: Optional[float]

The temperature of the model.

prompt: Optional[str]

The prompt to use for summarization.

prompt_acknowledgement: Optional[bool]

Whether to include an acknowledgement post-prompt (helps prevent non-summary outputs).

sliding_window_percentage: Optional[float]

The percentage of the context window to keep post-summarization (only used in sliding window mode).

class CompactionResponse:
num_messages_after: int
num_messages_before: int
summary: str