Messages
List Conversation Messages
Send Conversation Message
Retrieve Conversation Stream
Compact Conversation
ModelsExpand Collapse
CompactionRequest { compaction_settings }
compaction_settings?: CompactionSettings | null
Configuration for conversation compaction / summarization.
model is the only required user-facing field – it specifies the summarizer
model handle (e.g. "openai/gpt-4o-mini"). Per-model settings (temperature,
max tokens, etc.) are derived from the default configuration for that handle.
model: string
Model handle to use for summarization (format: provider/model-name).
clip_chars?: number | null
The maximum length of the summary in characters. If none, no clipping is performed.
mode?: "all" | "sliding_window"
The type of summarization technique use.
model_settings?: OpenAIModelSettings { max_output_tokens, parallel_tool_calls, provider_type, 4 more } | AnthropicModelSettings { effort, max_output_tokens, parallel_tool_calls, 6 more } | GoogleAIModelSettings { max_output_tokens, parallel_tool_calls, provider_type, 3 more } | 10 more | null
Optional model settings used to override defaults for the summarizer model.
OpenAIModelSettings { max_output_tokens, parallel_tool_calls, provider_type, 4 more }
max_output_tokens?: number
The maximum number of tokens the model can generate.
parallel_tool_calls?: boolean
Whether to enable parallel tool calling.
provider_type?: "openai"
The type of the provider.
reasoning?: Reasoning { reasoning_effort }
The reasoning configuration for the model.
reasoning_effort?: "none" | "minimal" | "low" | 3 more
The reasoning effort to use when generating text reasoning models
response_format?: TextResponseFormat { type } | JsonSchemaResponseFormat { json_schema, type } | JsonObjectResponseFormat { type } | null
The response format for the model.
TextResponseFormat { type }
Response format for plain text responses.
type?: "text"
The type of the response format.
JsonSchemaResponseFormat { json_schema, type }
Response format for JSON schema-based responses.
json_schema: Record<string, unknown>
The JSON schema of the response.
type?: "json_schema"
The type of the response format.
JsonObjectResponseFormat { type }
Response format for JSON object responses.
type?: "json_object"
The type of the response format.
strict?: boolean
Enable strict mode for tool calling. When true, tool outputs are guaranteed to match JSON schemas.
temperature?: number
The temperature of the model.
AnthropicModelSettings { effort, max_output_tokens, parallel_tool_calls, 6 more }
effort?: "low" | "medium" | "high" | null
Effort level for Opus 4.5 model (controls token conservation). Not setting this gives similar performance to 'high'.
max_output_tokens?: number
The maximum number of tokens the model can generate.
parallel_tool_calls?: boolean
Whether to enable parallel tool calling.
provider_type?: "anthropic"
The type of the provider.
response_format?: TextResponseFormat { type } | JsonSchemaResponseFormat { json_schema, type } | JsonObjectResponseFormat { type } | null
The response format for the model.
TextResponseFormat { type }
Response format for plain text responses.
type?: "text"
The type of the response format.
JsonSchemaResponseFormat { json_schema, type }
Response format for JSON schema-based responses.
json_schema: Record<string, unknown>
The JSON schema of the response.
type?: "json_schema"
The type of the response format.
JsonObjectResponseFormat { type }
Response format for JSON object responses.
type?: "json_object"
The type of the response format.
strict?: boolean
Enable strict mode for tool calling. When true, tool outputs are guaranteed to match JSON schemas.
temperature?: number
The temperature of the model.
thinking?: Thinking { budget_tokens, type }
The thinking configuration for the model.
budget_tokens?: number
The maximum number of tokens the model can use for extended thinking.
type?: "enabled" | "disabled"
The type of thinking to use.
verbosity?: "low" | "medium" | "high" | null
Soft control for how verbose model output should be, used for GPT-5 models.
GoogleAIModelSettings { max_output_tokens, parallel_tool_calls, provider_type, 3 more }
max_output_tokens?: number
The maximum number of tokens the model can generate.
parallel_tool_calls?: boolean
Whether to enable parallel tool calling.
provider_type?: "google_ai"
The type of the provider.
response_schema?: TextResponseFormat { type } | JsonSchemaResponseFormat { json_schema, type } | JsonObjectResponseFormat { type } | null
The response schema for the model.
TextResponseFormat { type }
Response format for plain text responses.
type?: "text"
The type of the response format.
JsonSchemaResponseFormat { json_schema, type }
Response format for JSON schema-based responses.
json_schema: Record<string, unknown>
The JSON schema of the response.
type?: "json_schema"
The type of the response format.
JsonObjectResponseFormat { type }
Response format for JSON object responses.
type?: "json_object"
The type of the response format.
temperature?: number
The temperature of the model.
thinking_config?: ThinkingConfig { include_thoughts, thinking_budget }
The thinking configuration for the model.
include_thoughts?: boolean
Whether to include thoughts in the model's response.
thinking_budget?: number
The thinking budget for the model.
GoogleVertexModelSettings { max_output_tokens, parallel_tool_calls, provider_type, 3 more }
max_output_tokens?: number
The maximum number of tokens the model can generate.
parallel_tool_calls?: boolean
Whether to enable parallel tool calling.
provider_type?: "google_vertex"
The type of the provider.
response_schema?: TextResponseFormat { type } | JsonSchemaResponseFormat { json_schema, type } | JsonObjectResponseFormat { type } | null
The response schema for the model.
TextResponseFormat { type }
Response format for plain text responses.
type?: "text"
The type of the response format.
JsonSchemaResponseFormat { json_schema, type }
Response format for JSON schema-based responses.
json_schema: Record<string, unknown>
The JSON schema of the response.
type?: "json_schema"
The type of the response format.
JsonObjectResponseFormat { type }
Response format for JSON object responses.
type?: "json_object"
The type of the response format.
temperature?: number
The temperature of the model.
thinking_config?: ThinkingConfig { include_thoughts, thinking_budget }
The thinking configuration for the model.
include_thoughts?: boolean
Whether to include thoughts in the model's response.
thinking_budget?: number
The thinking budget for the model.
AzureModelSettings { max_output_tokens, parallel_tool_calls, provider_type, 2 more }
Azure OpenAI model configuration (OpenAI-compatible).
max_output_tokens?: number
The maximum number of tokens the model can generate.
parallel_tool_calls?: boolean
Whether to enable parallel tool calling.
provider_type?: "azure"
The type of the provider.
response_format?: TextResponseFormat { type } | JsonSchemaResponseFormat { json_schema, type } | JsonObjectResponseFormat { type } | null
The response format for the model.
TextResponseFormat { type }
Response format for plain text responses.
type?: "text"
The type of the response format.
JsonSchemaResponseFormat { json_schema, type }
Response format for JSON schema-based responses.
json_schema: Record<string, unknown>
The JSON schema of the response.
type?: "json_schema"
The type of the response format.
JsonObjectResponseFormat { type }
Response format for JSON object responses.
type?: "json_object"
The type of the response format.
temperature?: number
The temperature of the model.
XaiModelSettings { max_output_tokens, parallel_tool_calls, provider_type, 2 more }
xAI model configuration (OpenAI-compatible).
max_output_tokens?: number
The maximum number of tokens the model can generate.
parallel_tool_calls?: boolean
Whether to enable parallel tool calling.
provider_type?: "xai"
The type of the provider.
response_format?: TextResponseFormat { type } | JsonSchemaResponseFormat { json_schema, type } | JsonObjectResponseFormat { type } | null
The response format for the model.
TextResponseFormat { type }
Response format for plain text responses.
type?: "text"
The type of the response format.
JsonSchemaResponseFormat { json_schema, type }
Response format for JSON schema-based responses.
json_schema: Record<string, unknown>
The JSON schema of the response.
type?: "json_schema"
The type of the response format.
JsonObjectResponseFormat { type }
Response format for JSON object responses.
type?: "json_object"
The type of the response format.
temperature?: number
The temperature of the model.
ZaiModelSettings { max_output_tokens, parallel_tool_calls, provider_type, 3 more }
Z.ai (ZhipuAI) model configuration (OpenAI-compatible).
max_output_tokens?: number
The maximum number of tokens the model can generate.
parallel_tool_calls?: boolean
Whether to enable parallel tool calling.
provider_type?: "zai"
The type of the provider.
response_format?: TextResponseFormat { type } | JsonSchemaResponseFormat { json_schema, type } | JsonObjectResponseFormat { type } | null
The response format for the model.
TextResponseFormat { type }
Response format for plain text responses.
type?: "text"
The type of the response format.
JsonSchemaResponseFormat { json_schema, type }
Response format for JSON schema-based responses.
json_schema: Record<string, unknown>
The JSON schema of the response.
type?: "json_schema"
The type of the response format.
JsonObjectResponseFormat { type }
Response format for JSON object responses.
type?: "json_object"
The type of the response format.
temperature?: number
The temperature of the model.
thinking?: Thinking { clear_thinking, type }
The thinking configuration for GLM-4.5+ models.
clear_thinking?: boolean
If False, preserved thinking is used (recommended for agents).
type?: "enabled" | "disabled"
Whether thinking is enabled or disabled.
GroqModelSettings { max_output_tokens, parallel_tool_calls, provider_type, 2 more }
Groq model configuration (OpenAI-compatible).
max_output_tokens?: number
The maximum number of tokens the model can generate.
parallel_tool_calls?: boolean
Whether to enable parallel tool calling.
provider_type?: "groq"
The type of the provider.
response_format?: TextResponseFormat { type } | JsonSchemaResponseFormat { json_schema, type } | JsonObjectResponseFormat { type } | null
The response format for the model.
TextResponseFormat { type }
Response format for plain text responses.
type?: "text"
The type of the response format.
JsonSchemaResponseFormat { json_schema, type }
Response format for JSON schema-based responses.
json_schema: Record<string, unknown>
The JSON schema of the response.
type?: "json_schema"
The type of the response format.
JsonObjectResponseFormat { type }
Response format for JSON object responses.
type?: "json_object"
The type of the response format.
temperature?: number
The temperature of the model.
DeepseekModelSettings { max_output_tokens, parallel_tool_calls, provider_type, 2 more }
Deepseek model configuration (OpenAI-compatible).
max_output_tokens?: number
The maximum number of tokens the model can generate.
parallel_tool_calls?: boolean
Whether to enable parallel tool calling.
provider_type?: "deepseek"
The type of the provider.
response_format?: TextResponseFormat { type } | JsonSchemaResponseFormat { json_schema, type } | JsonObjectResponseFormat { type } | null
The response format for the model.
TextResponseFormat { type }
Response format for plain text responses.
type?: "text"
The type of the response format.
JsonSchemaResponseFormat { json_schema, type }
Response format for JSON schema-based responses.
json_schema: Record<string, unknown>
The JSON schema of the response.
type?: "json_schema"
The type of the response format.
JsonObjectResponseFormat { type }
Response format for JSON object responses.
type?: "json_object"
The type of the response format.
temperature?: number
The temperature of the model.
TogetherModelSettings { max_output_tokens, parallel_tool_calls, provider_type, 2 more }
Together AI model configuration (OpenAI-compatible).
max_output_tokens?: number
The maximum number of tokens the model can generate.
parallel_tool_calls?: boolean
Whether to enable parallel tool calling.
provider_type?: "together"
The type of the provider.
response_format?: TextResponseFormat { type } | JsonSchemaResponseFormat { json_schema, type } | JsonObjectResponseFormat { type } | null
The response format for the model.
TextResponseFormat { type }
Response format for plain text responses.
type?: "text"
The type of the response format.
JsonSchemaResponseFormat { json_schema, type }
Response format for JSON schema-based responses.
json_schema: Record<string, unknown>
The JSON schema of the response.
type?: "json_schema"
The type of the response format.
JsonObjectResponseFormat { type }
Response format for JSON object responses.
type?: "json_object"
The type of the response format.
temperature?: number
The temperature of the model.
BedrockModelSettings { max_output_tokens, parallel_tool_calls, provider_type, 2 more }
AWS Bedrock model configuration.
max_output_tokens?: number
The maximum number of tokens the model can generate.
parallel_tool_calls?: boolean
Whether to enable parallel tool calling.
provider_type?: "bedrock"
The type of the provider.
response_format?: TextResponseFormat { type } | JsonSchemaResponseFormat { json_schema, type } | JsonObjectResponseFormat { type } | null
The response format for the model.
TextResponseFormat { type }
Response format for plain text responses.
type?: "text"
The type of the response format.
JsonSchemaResponseFormat { json_schema, type }
Response format for JSON schema-based responses.
json_schema: Record<string, unknown>
The JSON schema of the response.
type?: "json_schema"
The type of the response format.
JsonObjectResponseFormat { type }
Response format for JSON object responses.
type?: "json_object"
The type of the response format.
temperature?: number
The temperature of the model.
OpenRouterModelSettings { max_output_tokens, parallel_tool_calls, provider_type, 2 more }
OpenRouter model configuration (OpenAI-compatible).
max_output_tokens?: number
The maximum number of tokens the model can generate.
parallel_tool_calls?: boolean
Whether to enable parallel tool calling.
provider_type?: "openrouter"
The type of the provider.
response_format?: TextResponseFormat { type } | JsonSchemaResponseFormat { json_schema, type } | JsonObjectResponseFormat { type } | null
The response format for the model.
TextResponseFormat { type }
Response format for plain text responses.
type?: "text"
The type of the response format.
JsonSchemaResponseFormat { json_schema, type }
Response format for JSON schema-based responses.
json_schema: Record<string, unknown>
The JSON schema of the response.
type?: "json_schema"
The type of the response format.
JsonObjectResponseFormat { type }
Response format for JSON object responses.
type?: "json_object"
The type of the response format.
temperature?: number
The temperature of the model.
ChatGptoAuthModelSettings { max_output_tokens, parallel_tool_calls, provider_type, 2 more }
ChatGPT OAuth model configuration (uses ChatGPT backend API).
max_output_tokens?: number
The maximum number of tokens the model can generate.
parallel_tool_calls?: boolean
Whether to enable parallel tool calling.
provider_type?: "chatgpt_oauth"
The type of the provider.
reasoning?: Reasoning { reasoning_effort }
The reasoning configuration for the model.
reasoning_effort?: "none" | "low" | "medium" | 2 more
The reasoning effort level for GPT-5.x and o-series models.
temperature?: number
The temperature of the model.
prompt?: string
The prompt to use for summarization.
prompt_acknowledgement?: boolean
Whether to include an acknowledgement post-prompt (helps prevent non-summary outputs).
sliding_window_percentage?: number
The percentage of the context window to keep post-summarization (only used in sliding window mode).