Skip to content
Letta Platform Letta Platform Letta Docs
Sign up

Conversations

Create Conversation
post/v1/conversations/
List Conversations
get/v1/conversations/
Retrieve Conversation
get/v1/conversations/{conversation_id}
Update Conversation
patch/v1/conversations/{conversation_id}
Delete Conversation
delete/v1/conversations/{conversation_id}
Cancel Conversation
post/v1/conversations/{conversation_id}/cancel
Recompile Conversation
post/v1/conversations/{conversation_id}/recompile
ModelsExpand Collapse
Conversation = object { id, agent_id, created_at, 8 more }

Represents a conversation on an agent for concurrent messaging.

id: string

The unique identifier of the conversation.

agent_id: string

The ID of the agent this conversation belongs to.

created_at: optional string

The timestamp when the object was created.

formatdate-time
created_by_id: optional string

The id of the user that made this object.

in_context_message_ids: optional array of string

The IDs of in-context messages for the conversation.

isolated_block_ids: optional array of string

IDs of blocks that are isolated (specific to this conversation, overriding agent defaults).

last_updated_by_id: optional string

The id of the user that made this object.

model: optional string

The model handle for this conversation (overrides agent's model). Format: provider/model-name.

model_settings: optional OpenAIModelSettings { max_output_tokens, parallel_tool_calls, provider_type, 4 more } or object { max_output_tokens, parallel_tool_calls, provider_type, 5 more } or AnthropicModelSettings { effort, max_output_tokens, parallel_tool_calls, 6 more } or 11 more

The model settings for this conversation (overrides agent's model settings).

Accepts one of the following:
OpenAIModelSettings = object { max_output_tokens, parallel_tool_calls, provider_type, 4 more }
max_output_tokens: optional number

The maximum number of tokens the model can generate.

parallel_tool_calls: optional boolean

Whether to enable parallel tool calling.

provider_type: optional "openai"

The type of the provider.

reasoning: optional object { reasoning_effort }

The reasoning configuration for the model.

reasoning_effort: optional "none" or "minimal" or "low" or 3 more

The reasoning effort to use when generating text reasoning models

Accepts one of the following:
"none"
"minimal"
"low"
"medium"
"high"
"xhigh"
response_format: optional TextResponseFormat { type } or JsonSchemaResponseFormat { json_schema, type } or JsonObjectResponseFormat { type }

The response format for the model.

Accepts one of the following:
TextResponseFormat = object { type }

Response format for plain text responses.

type: optional "text"

The type of the response format.

JsonSchemaResponseFormat = object { json_schema, type }

Response format for JSON schema-based responses.

json_schema: map[unknown]

The JSON schema of the response.

type: optional "json_schema"

The type of the response format.

JsonObjectResponseFormat = object { type }

Response format for JSON object responses.

type: optional "json_object"

The type of the response format.

strict: optional boolean

Enable strict mode for tool calling. When true, tool outputs are guaranteed to match JSON schemas.

temperature: optional number

The temperature of the model.

Sglang = object { max_output_tokens, parallel_tool_calls, provider_type, 5 more }

SGLang model configuration (OpenAI-compatible runtime with SGLang-specific parsing).

max_output_tokens: optional number

The maximum number of tokens the model can generate.

parallel_tool_calls: optional boolean

Whether to enable parallel tool calling.

provider_type: optional "sglang"

The type of the provider.

reasoning: optional object { reasoning_effort }

The reasoning configuration for the model.

reasoning_effort: optional "none" or "minimal" or "low" or 3 more

The reasoning effort to use when generating text reasoning models

Accepts one of the following:
"none"
"minimal"
"low"
"medium"
"high"
"xhigh"
response_format: optional TextResponseFormat { type } or JsonSchemaResponseFormat { json_schema, type } or JsonObjectResponseFormat { type }

The response format for the model.

Accepts one of the following:
TextResponseFormat = object { type }

Response format for plain text responses.

type: optional "text"

The type of the response format.

JsonSchemaResponseFormat = object { json_schema, type }

Response format for JSON schema-based responses.

json_schema: map[unknown]

The JSON schema of the response.

type: optional "json_schema"

The type of the response format.

JsonObjectResponseFormat = object { type }

Response format for JSON object responses.

type: optional "json_object"

The type of the response format.

strict: optional boolean

Enable strict mode for tool calling. When true, tool outputs are guaranteed to match JSON schemas.

temperature: optional number

The temperature of the model.

tool_call_parser: optional string

SGLang tool call parser name (for example 'glm47', 'qwen25', or 'hermes').

AnthropicModelSettings = object { effort, max_output_tokens, parallel_tool_calls, 6 more }
effort: optional "low" or "medium" or "high" or "max"

Effort level for supported Anthropic models (controls token spending). 'max' is only available on Opus 4.6. Not setting this gives similar performance to 'high'.

Accepts one of the following:
"low"
"medium"
"high"
"max"
max_output_tokens: optional number

The maximum number of tokens the model can generate.

parallel_tool_calls: optional boolean

Whether to enable parallel tool calling.

provider_type: optional "anthropic"

The type of the provider.

response_format: optional TextResponseFormat { type } or JsonSchemaResponseFormat { json_schema, type } or JsonObjectResponseFormat { type }

The response format for the model.

Accepts one of the following:
TextResponseFormat = object { type }

Response format for plain text responses.

type: optional "text"

The type of the response format.

JsonSchemaResponseFormat = object { json_schema, type }

Response format for JSON schema-based responses.

json_schema: map[unknown]

The JSON schema of the response.

type: optional "json_schema"

The type of the response format.

JsonObjectResponseFormat = object { type }

Response format for JSON object responses.

type: optional "json_object"

The type of the response format.

strict: optional boolean

Enable strict mode for tool calling. When true, tool outputs are guaranteed to match JSON schemas.

temperature: optional number

The temperature of the model.

thinking: optional object { budget_tokens, type }

The thinking configuration for the model.

budget_tokens: optional number

The maximum number of tokens the model can use for extended thinking.

type: optional "enabled" or "disabled"

The type of thinking to use.

Accepts one of the following:
"enabled"
"disabled"
verbosity: optional "low" or "medium" or "high"

Soft control for how verbose model output should be, used for GPT-5 models.

Accepts one of the following:
"low"
"medium"
"high"
GoogleAIModelSettings = object { max_output_tokens, parallel_tool_calls, provider_type, 3 more }
max_output_tokens: optional number

The maximum number of tokens the model can generate.

parallel_tool_calls: optional boolean

Whether to enable parallel tool calling.

provider_type: optional "google_ai"

The type of the provider.

response_schema: optional TextResponseFormat { type } or JsonSchemaResponseFormat { json_schema, type } or JsonObjectResponseFormat { type }

The response schema for the model.

Accepts one of the following:
TextResponseFormat = object { type }

Response format for plain text responses.

type: optional "text"

The type of the response format.

JsonSchemaResponseFormat = object { json_schema, type }

Response format for JSON schema-based responses.

json_schema: map[unknown]

The JSON schema of the response.

type: optional "json_schema"

The type of the response format.

JsonObjectResponseFormat = object { type }

Response format for JSON object responses.

type: optional "json_object"

The type of the response format.

temperature: optional number

The temperature of the model.

thinking_config: optional object { include_thoughts, thinking_budget }

The thinking configuration for the model.

include_thoughts: optional boolean

Whether to include thoughts in the model's response.

thinking_budget: optional number

The thinking budget for the model.

GoogleVertexModelSettings = object { max_output_tokens, parallel_tool_calls, provider_type, 3 more }
max_output_tokens: optional number

The maximum number of tokens the model can generate.

parallel_tool_calls: optional boolean

Whether to enable parallel tool calling.

provider_type: optional "google_vertex"

The type of the provider.

response_schema: optional TextResponseFormat { type } or JsonSchemaResponseFormat { json_schema, type } or JsonObjectResponseFormat { type }

The response schema for the model.

Accepts one of the following:
TextResponseFormat = object { type }

Response format for plain text responses.

type: optional "text"

The type of the response format.

JsonSchemaResponseFormat = object { json_schema, type }

Response format for JSON schema-based responses.

json_schema: map[unknown]

The JSON schema of the response.

type: optional "json_schema"

The type of the response format.

JsonObjectResponseFormat = object { type }

Response format for JSON object responses.

type: optional "json_object"

The type of the response format.

temperature: optional number

The temperature of the model.

thinking_config: optional object { include_thoughts, thinking_budget }

The thinking configuration for the model.

include_thoughts: optional boolean

Whether to include thoughts in the model's response.

thinking_budget: optional number

The thinking budget for the model.

AzureModelSettings = object { max_output_tokens, parallel_tool_calls, provider_type, 2 more }

Azure OpenAI model configuration (OpenAI-compatible).

max_output_tokens: optional number

The maximum number of tokens the model can generate.

parallel_tool_calls: optional boolean

Whether to enable parallel tool calling.

provider_type: optional "azure"

The type of the provider.

response_format: optional TextResponseFormat { type } or JsonSchemaResponseFormat { json_schema, type } or JsonObjectResponseFormat { type }

The response format for the model.

Accepts one of the following:
TextResponseFormat = object { type }

Response format for plain text responses.

type: optional "text"

The type of the response format.

JsonSchemaResponseFormat = object { json_schema, type }

Response format for JSON schema-based responses.

json_schema: map[unknown]

The JSON schema of the response.

type: optional "json_schema"

The type of the response format.

JsonObjectResponseFormat = object { type }

Response format for JSON object responses.

type: optional "json_object"

The type of the response format.

temperature: optional number

The temperature of the model.

XaiModelSettings = object { max_output_tokens, parallel_tool_calls, provider_type, 2 more }

xAI model configuration (OpenAI-compatible).

max_output_tokens: optional number

The maximum number of tokens the model can generate.

parallel_tool_calls: optional boolean

Whether to enable parallel tool calling.

provider_type: optional "xai"

The type of the provider.

response_format: optional TextResponseFormat { type } or JsonSchemaResponseFormat { json_schema, type } or JsonObjectResponseFormat { type }

The response format for the model.

Accepts one of the following:
TextResponseFormat = object { type }

Response format for plain text responses.

type: optional "text"

The type of the response format.

JsonSchemaResponseFormat = object { json_schema, type }

Response format for JSON schema-based responses.

json_schema: map[unknown]

The JSON schema of the response.

type: optional "json_schema"

The type of the response format.

JsonObjectResponseFormat = object { type }

Response format for JSON object responses.

type: optional "json_object"

The type of the response format.

temperature: optional number

The temperature of the model.

Zai = object { max_output_tokens, parallel_tool_calls, provider_type, 3 more }

Z.ai (ZhipuAI) model configuration (OpenAI-compatible).

max_output_tokens: optional number

The maximum number of tokens the model can generate.

parallel_tool_calls: optional boolean

Whether to enable parallel tool calling.

provider_type: optional "zai"

The type of the provider.

response_format: optional TextResponseFormat { type } or JsonSchemaResponseFormat { json_schema, type } or JsonObjectResponseFormat { type }

The response format for the model.

Accepts one of the following:
TextResponseFormat = object { type }

Response format for plain text responses.

type: optional "text"

The type of the response format.

JsonSchemaResponseFormat = object { json_schema, type }

Response format for JSON schema-based responses.

json_schema: map[unknown]

The JSON schema of the response.

type: optional "json_schema"

The type of the response format.

JsonObjectResponseFormat = object { type }

Response format for JSON object responses.

type: optional "json_object"

The type of the response format.

temperature: optional number

The temperature of the model.

thinking: optional object { clear_thinking, type }

The thinking configuration for GLM-4.5+ models.

clear_thinking: optional boolean

If False, preserved thinking is used (recommended for agents).

type: optional "enabled" or "disabled"

Whether thinking is enabled or disabled.

Accepts one of the following:
"enabled"
"disabled"
GroqModelSettings = object { max_output_tokens, parallel_tool_calls, provider_type, 2 more }

Groq model configuration (OpenAI-compatible).

max_output_tokens: optional number

The maximum number of tokens the model can generate.

parallel_tool_calls: optional boolean

Whether to enable parallel tool calling.

provider_type: optional "groq"

The type of the provider.

response_format: optional TextResponseFormat { type } or JsonSchemaResponseFormat { json_schema, type } or JsonObjectResponseFormat { type }

The response format for the model.

Accepts one of the following:
TextResponseFormat = object { type }

Response format for plain text responses.

type: optional "text"

The type of the response format.

JsonSchemaResponseFormat = object { json_schema, type }

Response format for JSON schema-based responses.

json_schema: map[unknown]

The JSON schema of the response.

type: optional "json_schema"

The type of the response format.

JsonObjectResponseFormat = object { type }

Response format for JSON object responses.

type: optional "json_object"

The type of the response format.

temperature: optional number

The temperature of the model.

DeepseekModelSettings = object { max_output_tokens, parallel_tool_calls, provider_type, 2 more }

Deepseek model configuration (OpenAI-compatible).

max_output_tokens: optional number

The maximum number of tokens the model can generate.

parallel_tool_calls: optional boolean

Whether to enable parallel tool calling.

provider_type: optional "deepseek"

The type of the provider.

response_format: optional TextResponseFormat { type } or JsonSchemaResponseFormat { json_schema, type } or JsonObjectResponseFormat { type }

The response format for the model.

Accepts one of the following:
TextResponseFormat = object { type }

Response format for plain text responses.

type: optional "text"

The type of the response format.

JsonSchemaResponseFormat = object { json_schema, type }

Response format for JSON schema-based responses.

json_schema: map[unknown]

The JSON schema of the response.

type: optional "json_schema"

The type of the response format.

JsonObjectResponseFormat = object { type }

Response format for JSON object responses.

type: optional "json_object"

The type of the response format.

temperature: optional number

The temperature of the model.

TogetherModelSettings = object { max_output_tokens, parallel_tool_calls, provider_type, 2 more }

Together AI model configuration (OpenAI-compatible).

max_output_tokens: optional number

The maximum number of tokens the model can generate.

parallel_tool_calls: optional boolean

Whether to enable parallel tool calling.

provider_type: optional "together"

The type of the provider.

response_format: optional TextResponseFormat { type } or JsonSchemaResponseFormat { json_schema, type } or JsonObjectResponseFormat { type }

The response format for the model.

Accepts one of the following:
TextResponseFormat = object { type }

Response format for plain text responses.

type: optional "text"

The type of the response format.

JsonSchemaResponseFormat = object { json_schema, type }

Response format for JSON schema-based responses.

json_schema: map[unknown]

The JSON schema of the response.

type: optional "json_schema"

The type of the response format.

JsonObjectResponseFormat = object { type }

Response format for JSON object responses.

type: optional "json_object"

The type of the response format.

temperature: optional number

The temperature of the model.

BedrockModelSettings = object { max_output_tokens, parallel_tool_calls, provider_type, 2 more }

AWS Bedrock model configuration.

max_output_tokens: optional number

The maximum number of tokens the model can generate.

parallel_tool_calls: optional boolean

Whether to enable parallel tool calling.

provider_type: optional "bedrock"

The type of the provider.

response_format: optional TextResponseFormat { type } or JsonSchemaResponseFormat { json_schema, type } or JsonObjectResponseFormat { type }

The response format for the model.

Accepts one of the following:
TextResponseFormat = object { type }

Response format for plain text responses.

type: optional "text"

The type of the response format.

JsonSchemaResponseFormat = object { json_schema, type }

Response format for JSON schema-based responses.

json_schema: map[unknown]

The JSON schema of the response.

type: optional "json_schema"

The type of the response format.

JsonObjectResponseFormat = object { type }

Response format for JSON object responses.

type: optional "json_object"

The type of the response format.

temperature: optional number

The temperature of the model.

Openrouter = object { max_output_tokens, parallel_tool_calls, provider_type, 2 more }

OpenRouter model configuration (OpenAI-compatible).

max_output_tokens: optional number

The maximum number of tokens the model can generate.

parallel_tool_calls: optional boolean

Whether to enable parallel tool calling.

provider_type: optional "openrouter"

The type of the provider.

response_format: optional TextResponseFormat { type } or JsonSchemaResponseFormat { json_schema, type } or JsonObjectResponseFormat { type }

The response format for the model.

Accepts one of the following:
TextResponseFormat = object { type }

Response format for plain text responses.

type: optional "text"

The type of the response format.

JsonSchemaResponseFormat = object { json_schema, type }

Response format for JSON schema-based responses.

json_schema: map[unknown]

The JSON schema of the response.

type: optional "json_schema"

The type of the response format.

JsonObjectResponseFormat = object { type }

Response format for JSON object responses.

type: optional "json_object"

The type of the response format.

temperature: optional number

The temperature of the model.

ChatgptOAuth = object { max_output_tokens, parallel_tool_calls, provider_type, 2 more }

ChatGPT OAuth model configuration (uses ChatGPT backend API).

max_output_tokens: optional number

The maximum number of tokens the model can generate.

parallel_tool_calls: optional boolean

Whether to enable parallel tool calling.

provider_type: optional "chatgpt_oauth"

The type of the provider.

reasoning: optional object { reasoning_effort }

The reasoning configuration for the model.

reasoning_effort: optional "none" or "low" or "medium" or 2 more

The reasoning effort level for GPT-5.x and o-series models.

Accepts one of the following:
"none"
"low"
"medium"
"high"
"xhigh"
temperature: optional number

The temperature of the model.

summary: optional string

A summary of the conversation.

updated_at: optional string

The timestamp when the object was last updated.

formatdate-time
CreateConversation = object { isolated_block_labels, model, model_settings, summary }

Request model for creating a new conversation.

isolated_block_labels: optional array of string

List of block labels that should be isolated (conversation-specific) rather than shared across conversations. New blocks will be created as copies of the agent's blocks with these labels.

model: optional string

The model handle for this conversation (overrides agent's model). Format: provider/model-name.

model_settings: optional OpenAIModelSettings { max_output_tokens, parallel_tool_calls, provider_type, 4 more } or object { max_output_tokens, parallel_tool_calls, provider_type, 5 more } or AnthropicModelSettings { effort, max_output_tokens, parallel_tool_calls, 6 more } or 11 more

The model settings for this conversation (overrides agent's model settings).

Accepts one of the following:
OpenAIModelSettings = object { max_output_tokens, parallel_tool_calls, provider_type, 4 more }
max_output_tokens: optional number

The maximum number of tokens the model can generate.

parallel_tool_calls: optional boolean

Whether to enable parallel tool calling.

provider_type: optional "openai"

The type of the provider.

reasoning: optional object { reasoning_effort }

The reasoning configuration for the model.

reasoning_effort: optional "none" or "minimal" or "low" or 3 more

The reasoning effort to use when generating text reasoning models

Accepts one of the following:
"none"
"minimal"
"low"
"medium"
"high"
"xhigh"
response_format: optional TextResponseFormat { type } or JsonSchemaResponseFormat { json_schema, type } or JsonObjectResponseFormat { type }

The response format for the model.

Accepts one of the following:
TextResponseFormat = object { type }

Response format for plain text responses.

type: optional "text"

The type of the response format.

JsonSchemaResponseFormat = object { json_schema, type }

Response format for JSON schema-based responses.

json_schema: map[unknown]

The JSON schema of the response.

type: optional "json_schema"

The type of the response format.

JsonObjectResponseFormat = object { type }

Response format for JSON object responses.

type: optional "json_object"

The type of the response format.

strict: optional boolean

Enable strict mode for tool calling. When true, tool outputs are guaranteed to match JSON schemas.

temperature: optional number

The temperature of the model.

Sglang = object { max_output_tokens, parallel_tool_calls, provider_type, 5 more }

SGLang model configuration (OpenAI-compatible runtime with SGLang-specific parsing).

max_output_tokens: optional number

The maximum number of tokens the model can generate.

parallel_tool_calls: optional boolean

Whether to enable parallel tool calling.

provider_type: optional "sglang"

The type of the provider.

reasoning: optional object { reasoning_effort }

The reasoning configuration for the model.

reasoning_effort: optional "none" or "minimal" or "low" or 3 more

The reasoning effort to use when generating text reasoning models

Accepts one of the following:
"none"
"minimal"
"low"
"medium"
"high"
"xhigh"
response_format: optional TextResponseFormat { type } or JsonSchemaResponseFormat { json_schema, type } or JsonObjectResponseFormat { type }

The response format for the model.

Accepts one of the following:
TextResponseFormat = object { type }

Response format for plain text responses.

type: optional "text"

The type of the response format.

JsonSchemaResponseFormat = object { json_schema, type }

Response format for JSON schema-based responses.

json_schema: map[unknown]

The JSON schema of the response.

type: optional "json_schema"

The type of the response format.

JsonObjectResponseFormat = object { type }

Response format for JSON object responses.

type: optional "json_object"

The type of the response format.

strict: optional boolean

Enable strict mode for tool calling. When true, tool outputs are guaranteed to match JSON schemas.

temperature: optional number

The temperature of the model.

tool_call_parser: optional string

SGLang tool call parser name (for example 'glm47', 'qwen25', or 'hermes').

AnthropicModelSettings = object { effort, max_output_tokens, parallel_tool_calls, 6 more }
effort: optional "low" or "medium" or "high" or "max"

Effort level for supported Anthropic models (controls token spending). 'max' is only available on Opus 4.6. Not setting this gives similar performance to 'high'.

Accepts one of the following:
"low"
"medium"
"high"
"max"
max_output_tokens: optional number

The maximum number of tokens the model can generate.

parallel_tool_calls: optional boolean

Whether to enable parallel tool calling.

provider_type: optional "anthropic"

The type of the provider.

response_format: optional TextResponseFormat { type } or JsonSchemaResponseFormat { json_schema, type } or JsonObjectResponseFormat { type }

The response format for the model.

Accepts one of the following:
TextResponseFormat = object { type }

Response format for plain text responses.

type: optional "text"

The type of the response format.

JsonSchemaResponseFormat = object { json_schema, type }

Response format for JSON schema-based responses.

json_schema: map[unknown]

The JSON schema of the response.

type: optional "json_schema"

The type of the response format.

JsonObjectResponseFormat = object { type }

Response format for JSON object responses.

type: optional "json_object"

The type of the response format.

strict: optional boolean

Enable strict mode for tool calling. When true, tool outputs are guaranteed to match JSON schemas.

temperature: optional number

The temperature of the model.

thinking: optional object { budget_tokens, type }

The thinking configuration for the model.

budget_tokens: optional number

The maximum number of tokens the model can use for extended thinking.

type: optional "enabled" or "disabled"

The type of thinking to use.

Accepts one of the following:
"enabled"
"disabled"
verbosity: optional "low" or "medium" or "high"

Soft control for how verbose model output should be, used for GPT-5 models.

Accepts one of the following:
"low"
"medium"
"high"
GoogleAIModelSettings = object { max_output_tokens, parallel_tool_calls, provider_type, 3 more }
max_output_tokens: optional number

The maximum number of tokens the model can generate.

parallel_tool_calls: optional boolean

Whether to enable parallel tool calling.

provider_type: optional "google_ai"

The type of the provider.

response_schema: optional TextResponseFormat { type } or JsonSchemaResponseFormat { json_schema, type } or JsonObjectResponseFormat { type }

The response schema for the model.

Accepts one of the following:
TextResponseFormat = object { type }

Response format for plain text responses.

type: optional "text"

The type of the response format.

JsonSchemaResponseFormat = object { json_schema, type }

Response format for JSON schema-based responses.

json_schema: map[unknown]

The JSON schema of the response.

type: optional "json_schema"

The type of the response format.

JsonObjectResponseFormat = object { type }

Response format for JSON object responses.

type: optional "json_object"

The type of the response format.

temperature: optional number

The temperature of the model.

thinking_config: optional object { include_thoughts, thinking_budget }

The thinking configuration for the model.

include_thoughts: optional boolean

Whether to include thoughts in the model's response.

thinking_budget: optional number

The thinking budget for the model.

GoogleVertexModelSettings = object { max_output_tokens, parallel_tool_calls, provider_type, 3 more }
max_output_tokens: optional number

The maximum number of tokens the model can generate.

parallel_tool_calls: optional boolean

Whether to enable parallel tool calling.

provider_type: optional "google_vertex"

The type of the provider.

response_schema: optional TextResponseFormat { type } or JsonSchemaResponseFormat { json_schema, type } or JsonObjectResponseFormat { type }

The response schema for the model.

Accepts one of the following:
TextResponseFormat = object { type }

Response format for plain text responses.

type: optional "text"

The type of the response format.

JsonSchemaResponseFormat = object { json_schema, type }

Response format for JSON schema-based responses.

json_schema: map[unknown]

The JSON schema of the response.

type: optional "json_schema"

The type of the response format.

JsonObjectResponseFormat = object { type }

Response format for JSON object responses.

type: optional "json_object"

The type of the response format.

temperature: optional number

The temperature of the model.

thinking_config: optional object { include_thoughts, thinking_budget }

The thinking configuration for the model.

include_thoughts: optional boolean

Whether to include thoughts in the model's response.

thinking_budget: optional number

The thinking budget for the model.

AzureModelSettings = object { max_output_tokens, parallel_tool_calls, provider_type, 2 more }

Azure OpenAI model configuration (OpenAI-compatible).

max_output_tokens: optional number

The maximum number of tokens the model can generate.

parallel_tool_calls: optional boolean

Whether to enable parallel tool calling.

provider_type: optional "azure"

The type of the provider.

response_format: optional TextResponseFormat { type } or JsonSchemaResponseFormat { json_schema, type } or JsonObjectResponseFormat { type }

The response format for the model.

Accepts one of the following:
TextResponseFormat = object { type }

Response format for plain text responses.

type: optional "text"

The type of the response format.

JsonSchemaResponseFormat = object { json_schema, type }

Response format for JSON schema-based responses.

json_schema: map[unknown]

The JSON schema of the response.

type: optional "json_schema"

The type of the response format.

JsonObjectResponseFormat = object { type }

Response format for JSON object responses.

type: optional "json_object"

The type of the response format.

temperature: optional number

The temperature of the model.

XaiModelSettings = object { max_output_tokens, parallel_tool_calls, provider_type, 2 more }

xAI model configuration (OpenAI-compatible).

max_output_tokens: optional number

The maximum number of tokens the model can generate.

parallel_tool_calls: optional boolean

Whether to enable parallel tool calling.

provider_type: optional "xai"

The type of the provider.

response_format: optional TextResponseFormat { type } or JsonSchemaResponseFormat { json_schema, type } or JsonObjectResponseFormat { type }

The response format for the model.

Accepts one of the following:
TextResponseFormat = object { type }

Response format for plain text responses.

type: optional "text"

The type of the response format.

JsonSchemaResponseFormat = object { json_schema, type }

Response format for JSON schema-based responses.

json_schema: map[unknown]

The JSON schema of the response.

type: optional "json_schema"

The type of the response format.

JsonObjectResponseFormat = object { type }

Response format for JSON object responses.

type: optional "json_object"

The type of the response format.

temperature: optional number

The temperature of the model.

Zai = object { max_output_tokens, parallel_tool_calls, provider_type, 3 more }

Z.ai (ZhipuAI) model configuration (OpenAI-compatible).

max_output_tokens: optional number

The maximum number of tokens the model can generate.

parallel_tool_calls: optional boolean

Whether to enable parallel tool calling.

provider_type: optional "zai"

The type of the provider.

response_format: optional TextResponseFormat { type } or JsonSchemaResponseFormat { json_schema, type } or JsonObjectResponseFormat { type }

The response format for the model.

Accepts one of the following:
TextResponseFormat = object { type }

Response format for plain text responses.

type: optional "text"

The type of the response format.

JsonSchemaResponseFormat = object { json_schema, type }

Response format for JSON schema-based responses.

json_schema: map[unknown]

The JSON schema of the response.

type: optional "json_schema"

The type of the response format.

JsonObjectResponseFormat = object { type }

Response format for JSON object responses.

type: optional "json_object"

The type of the response format.

temperature: optional number

The temperature of the model.

thinking: optional object { clear_thinking, type }

The thinking configuration for GLM-4.5+ models.

clear_thinking: optional boolean

If False, preserved thinking is used (recommended for agents).

type: optional "enabled" or "disabled"

Whether thinking is enabled or disabled.

Accepts one of the following:
"enabled"
"disabled"
GroqModelSettings = object { max_output_tokens, parallel_tool_calls, provider_type, 2 more }

Groq model configuration (OpenAI-compatible).

max_output_tokens: optional number

The maximum number of tokens the model can generate.

parallel_tool_calls: optional boolean

Whether to enable parallel tool calling.

provider_type: optional "groq"

The type of the provider.

response_format: optional TextResponseFormat { type } or JsonSchemaResponseFormat { json_schema, type } or JsonObjectResponseFormat { type }

The response format for the model.

Accepts one of the following:
TextResponseFormat = object { type }

Response format for plain text responses.

type: optional "text"

The type of the response format.

JsonSchemaResponseFormat = object { json_schema, type }

Response format for JSON schema-based responses.

json_schema: map[unknown]

The JSON schema of the response.

type: optional "json_schema"

The type of the response format.

JsonObjectResponseFormat = object { type }

Response format for JSON object responses.

type: optional "json_object"

The type of the response format.

temperature: optional number

The temperature of the model.

DeepseekModelSettings = object { max_output_tokens, parallel_tool_calls, provider_type, 2 more }

Deepseek model configuration (OpenAI-compatible).

max_output_tokens: optional number

The maximum number of tokens the model can generate.

parallel_tool_calls: optional boolean

Whether to enable parallel tool calling.

provider_type: optional "deepseek"

The type of the provider.

response_format: optional TextResponseFormat { type } or JsonSchemaResponseFormat { json_schema, type } or JsonObjectResponseFormat { type }

The response format for the model.

Accepts one of the following:
TextResponseFormat = object { type }

Response format for plain text responses.

type: optional "text"

The type of the response format.

JsonSchemaResponseFormat = object { json_schema, type }

Response format for JSON schema-based responses.

json_schema: map[unknown]

The JSON schema of the response.

type: optional "json_schema"

The type of the response format.

JsonObjectResponseFormat = object { type }

Response format for JSON object responses.

type: optional "json_object"

The type of the response format.

temperature: optional number

The temperature of the model.

TogetherModelSettings = object { max_output_tokens, parallel_tool_calls, provider_type, 2 more }

Together AI model configuration (OpenAI-compatible).

max_output_tokens: optional number

The maximum number of tokens the model can generate.

parallel_tool_calls: optional boolean

Whether to enable parallel tool calling.

provider_type: optional "together"

The type of the provider.

response_format: optional TextResponseFormat { type } or JsonSchemaResponseFormat { json_schema, type } or JsonObjectResponseFormat { type }

The response format for the model.

Accepts one of the following:
TextResponseFormat = object { type }

Response format for plain text responses.

type: optional "text"

The type of the response format.

JsonSchemaResponseFormat = object { json_schema, type }

Response format for JSON schema-based responses.

json_schema: map[unknown]

The JSON schema of the response.

type: optional "json_schema"

The type of the response format.

JsonObjectResponseFormat = object { type }

Response format for JSON object responses.

type: optional "json_object"

The type of the response format.

temperature: optional number

The temperature of the model.

BedrockModelSettings = object { max_output_tokens, parallel_tool_calls, provider_type, 2 more }

AWS Bedrock model configuration.

max_output_tokens: optional number

The maximum number of tokens the model can generate.

parallel_tool_calls: optional boolean

Whether to enable parallel tool calling.

provider_type: optional "bedrock"

The type of the provider.

response_format: optional TextResponseFormat { type } or JsonSchemaResponseFormat { json_schema, type } or JsonObjectResponseFormat { type }

The response format for the model.

Accepts one of the following:
TextResponseFormat = object { type }

Response format for plain text responses.

type: optional "text"

The type of the response format.

JsonSchemaResponseFormat = object { json_schema, type }

Response format for JSON schema-based responses.

json_schema: map[unknown]

The JSON schema of the response.

type: optional "json_schema"

The type of the response format.

JsonObjectResponseFormat = object { type }

Response format for JSON object responses.

type: optional "json_object"

The type of the response format.

temperature: optional number

The temperature of the model.

Openrouter = object { max_output_tokens, parallel_tool_calls, provider_type, 2 more }

OpenRouter model configuration (OpenAI-compatible).

max_output_tokens: optional number

The maximum number of tokens the model can generate.

parallel_tool_calls: optional boolean

Whether to enable parallel tool calling.

provider_type: optional "openrouter"

The type of the provider.

response_format: optional TextResponseFormat { type } or JsonSchemaResponseFormat { json_schema, type } or JsonObjectResponseFormat { type }

The response format for the model.

Accepts one of the following:
TextResponseFormat = object { type }

Response format for plain text responses.

type: optional "text"

The type of the response format.

JsonSchemaResponseFormat = object { json_schema, type }

Response format for JSON schema-based responses.

json_schema: map[unknown]

The JSON schema of the response.

type: optional "json_schema"

The type of the response format.

JsonObjectResponseFormat = object { type }

Response format for JSON object responses.

type: optional "json_object"

The type of the response format.

temperature: optional number

The temperature of the model.

ChatgptOAuth = object { max_output_tokens, parallel_tool_calls, provider_type, 2 more }

ChatGPT OAuth model configuration (uses ChatGPT backend API).

max_output_tokens: optional number

The maximum number of tokens the model can generate.

parallel_tool_calls: optional boolean

Whether to enable parallel tool calling.

provider_type: optional "chatgpt_oauth"

The type of the provider.

reasoning: optional object { reasoning_effort }

The reasoning configuration for the model.

reasoning_effort: optional "none" or "low" or "medium" or 2 more

The reasoning effort level for GPT-5.x and o-series models.

Accepts one of the following:
"none"
"low"
"medium"
"high"
"xhigh"
temperature: optional number

The temperature of the model.

summary: optional string

A summary of the conversation.

UpdateConversation = object { model, model_settings, summary }

Request model for updating a conversation.

model: optional string

The model handle for this conversation (overrides agent's model). Format: provider/model-name.

model_settings: optional OpenAIModelSettings { max_output_tokens, parallel_tool_calls, provider_type, 4 more } or object { max_output_tokens, parallel_tool_calls, provider_type, 5 more } or AnthropicModelSettings { effort, max_output_tokens, parallel_tool_calls, 6 more } or 11 more

The model settings for this conversation (overrides agent's model settings).

Accepts one of the following:
OpenAIModelSettings = object { max_output_tokens, parallel_tool_calls, provider_type, 4 more }
max_output_tokens: optional number

The maximum number of tokens the model can generate.

parallel_tool_calls: optional boolean

Whether to enable parallel tool calling.

provider_type: optional "openai"

The type of the provider.

reasoning: optional object { reasoning_effort }

The reasoning configuration for the model.

reasoning_effort: optional "none" or "minimal" or "low" or 3 more

The reasoning effort to use when generating text reasoning models

Accepts one of the following:
"none"
"minimal"
"low"
"medium"
"high"
"xhigh"
response_format: optional TextResponseFormat { type } or JsonSchemaResponseFormat { json_schema, type } or JsonObjectResponseFormat { type }

The response format for the model.

Accepts one of the following:
TextResponseFormat = object { type }

Response format for plain text responses.

type: optional "text"

The type of the response format.

JsonSchemaResponseFormat = object { json_schema, type }

Response format for JSON schema-based responses.

json_schema: map[unknown]

The JSON schema of the response.

type: optional "json_schema"

The type of the response format.

JsonObjectResponseFormat = object { type }

Response format for JSON object responses.

type: optional "json_object"

The type of the response format.

strict: optional boolean

Enable strict mode for tool calling. When true, tool outputs are guaranteed to match JSON schemas.

temperature: optional number

The temperature of the model.

Sglang = object { max_output_tokens, parallel_tool_calls, provider_type, 5 more }

SGLang model configuration (OpenAI-compatible runtime with SGLang-specific parsing).

max_output_tokens: optional number

The maximum number of tokens the model can generate.

parallel_tool_calls: optional boolean

Whether to enable parallel tool calling.

provider_type: optional "sglang"

The type of the provider.

reasoning: optional object { reasoning_effort }

The reasoning configuration for the model.

reasoning_effort: optional "none" or "minimal" or "low" or 3 more

The reasoning effort to use when generating text reasoning models

Accepts one of the following:
"none"
"minimal"
"low"
"medium"
"high"
"xhigh"
response_format: optional TextResponseFormat { type } or JsonSchemaResponseFormat { json_schema, type } or JsonObjectResponseFormat { type }

The response format for the model.

Accepts one of the following:
TextResponseFormat = object { type }

Response format for plain text responses.

type: optional "text"

The type of the response format.

JsonSchemaResponseFormat = object { json_schema, type }

Response format for JSON schema-based responses.

json_schema: map[unknown]

The JSON schema of the response.

type: optional "json_schema"

The type of the response format.

JsonObjectResponseFormat = object { type }

Response format for JSON object responses.

type: optional "json_object"

The type of the response format.

strict: optional boolean

Enable strict mode for tool calling. When true, tool outputs are guaranteed to match JSON schemas.

temperature: optional number

The temperature of the model.

tool_call_parser: optional string

SGLang tool call parser name (for example 'glm47', 'qwen25', or 'hermes').

AnthropicModelSettings = object { effort, max_output_tokens, parallel_tool_calls, 6 more }
effort: optional "low" or "medium" or "high" or "max"

Effort level for supported Anthropic models (controls token spending). 'max' is only available on Opus 4.6. Not setting this gives similar performance to 'high'.

Accepts one of the following:
"low"
"medium"
"high"
"max"
max_output_tokens: optional number

The maximum number of tokens the model can generate.

parallel_tool_calls: optional boolean

Whether to enable parallel tool calling.

provider_type: optional "anthropic"

The type of the provider.

response_format: optional TextResponseFormat { type } or JsonSchemaResponseFormat { json_schema, type } or JsonObjectResponseFormat { type }

The response format for the model.

Accepts one of the following:
TextResponseFormat = object { type }

Response format for plain text responses.

type: optional "text"

The type of the response format.

JsonSchemaResponseFormat = object { json_schema, type }

Response format for JSON schema-based responses.

json_schema: map[unknown]

The JSON schema of the response.

type: optional "json_schema"

The type of the response format.

JsonObjectResponseFormat = object { type }

Response format for JSON object responses.

type: optional "json_object"

The type of the response format.

strict: optional boolean

Enable strict mode for tool calling. When true, tool outputs are guaranteed to match JSON schemas.

temperature: optional number

The temperature of the model.

thinking: optional object { budget_tokens, type }

The thinking configuration for the model.

budget_tokens: optional number

The maximum number of tokens the model can use for extended thinking.

type: optional "enabled" or "disabled"

The type of thinking to use.

Accepts one of the following:
"enabled"
"disabled"
verbosity: optional "low" or "medium" or "high"

Soft control for how verbose model output should be, used for GPT-5 models.

Accepts one of the following:
"low"
"medium"
"high"
GoogleAIModelSettings = object { max_output_tokens, parallel_tool_calls, provider_type, 3 more }
max_output_tokens: optional number

The maximum number of tokens the model can generate.

parallel_tool_calls: optional boolean

Whether to enable parallel tool calling.

provider_type: optional "google_ai"

The type of the provider.

response_schema: optional TextResponseFormat { type } or JsonSchemaResponseFormat { json_schema, type } or JsonObjectResponseFormat { type }

The response schema for the model.

Accepts one of the following:
TextResponseFormat = object { type }

Response format for plain text responses.

type: optional "text"

The type of the response format.

JsonSchemaResponseFormat = object { json_schema, type }

Response format for JSON schema-based responses.

json_schema: map[unknown]

The JSON schema of the response.

type: optional "json_schema"

The type of the response format.

JsonObjectResponseFormat = object { type }

Response format for JSON object responses.

type: optional "json_object"

The type of the response format.

temperature: optional number

The temperature of the model.

thinking_config: optional object { include_thoughts, thinking_budget }

The thinking configuration for the model.

include_thoughts: optional boolean

Whether to include thoughts in the model's response.

thinking_budget: optional number

The thinking budget for the model.

GoogleVertexModelSettings = object { max_output_tokens, parallel_tool_calls, provider_type, 3 more }
max_output_tokens: optional number

The maximum number of tokens the model can generate.

parallel_tool_calls: optional boolean

Whether to enable parallel tool calling.

provider_type: optional "google_vertex"

The type of the provider.

response_schema: optional TextResponseFormat { type } or JsonSchemaResponseFormat { json_schema, type } or JsonObjectResponseFormat { type }

The response schema for the model.

Accepts one of the following:
TextResponseFormat = object { type }

Response format for plain text responses.

type: optional "text"

The type of the response format.

JsonSchemaResponseFormat = object { json_schema, type }

Response format for JSON schema-based responses.

json_schema: map[unknown]

The JSON schema of the response.

type: optional "json_schema"

The type of the response format.

JsonObjectResponseFormat = object { type }

Response format for JSON object responses.

type: optional "json_object"

The type of the response format.

temperature: optional number

The temperature of the model.

thinking_config: optional object { include_thoughts, thinking_budget }

The thinking configuration for the model.

include_thoughts: optional boolean

Whether to include thoughts in the model's response.

thinking_budget: optional number

The thinking budget for the model.

AzureModelSettings = object { max_output_tokens, parallel_tool_calls, provider_type, 2 more }

Azure OpenAI model configuration (OpenAI-compatible).

max_output_tokens: optional number

The maximum number of tokens the model can generate.

parallel_tool_calls: optional boolean

Whether to enable parallel tool calling.

provider_type: optional "azure"

The type of the provider.

response_format: optional TextResponseFormat { type } or JsonSchemaResponseFormat { json_schema, type } or JsonObjectResponseFormat { type }

The response format for the model.

Accepts one of the following:
TextResponseFormat = object { type }

Response format for plain text responses.

type: optional "text"

The type of the response format.

JsonSchemaResponseFormat = object { json_schema, type }

Response format for JSON schema-based responses.

json_schema: map[unknown]

The JSON schema of the response.

type: optional "json_schema"

The type of the response format.

JsonObjectResponseFormat = object { type }

Response format for JSON object responses.

type: optional "json_object"

The type of the response format.

temperature: optional number

The temperature of the model.

XaiModelSettings = object { max_output_tokens, parallel_tool_calls, provider_type, 2 more }

xAI model configuration (OpenAI-compatible).

max_output_tokens: optional number

The maximum number of tokens the model can generate.

parallel_tool_calls: optional boolean

Whether to enable parallel tool calling.

provider_type: optional "xai"

The type of the provider.

response_format: optional TextResponseFormat { type } or JsonSchemaResponseFormat { json_schema, type } or JsonObjectResponseFormat { type }

The response format for the model.

Accepts one of the following:
TextResponseFormat = object { type }

Response format for plain text responses.

type: optional "text"

The type of the response format.

JsonSchemaResponseFormat = object { json_schema, type }

Response format for JSON schema-based responses.

json_schema: map[unknown]

The JSON schema of the response.

type: optional "json_schema"

The type of the response format.

JsonObjectResponseFormat = object { type }

Response format for JSON object responses.

type: optional "json_object"

The type of the response format.

temperature: optional number

The temperature of the model.

Zai = object { max_output_tokens, parallel_tool_calls, provider_type, 3 more }

Z.ai (ZhipuAI) model configuration (OpenAI-compatible).

max_output_tokens: optional number

The maximum number of tokens the model can generate.

parallel_tool_calls: optional boolean

Whether to enable parallel tool calling.

provider_type: optional "zai"

The type of the provider.

response_format: optional TextResponseFormat { type } or JsonSchemaResponseFormat { json_schema, type } or JsonObjectResponseFormat { type }

The response format for the model.

Accepts one of the following:
TextResponseFormat = object { type }

Response format for plain text responses.

type: optional "text"

The type of the response format.

JsonSchemaResponseFormat = object { json_schema, type }

Response format for JSON schema-based responses.

json_schema: map[unknown]

The JSON schema of the response.

type: optional "json_schema"

The type of the response format.

JsonObjectResponseFormat = object { type }

Response format for JSON object responses.

type: optional "json_object"

The type of the response format.

temperature: optional number

The temperature of the model.

thinking: optional object { clear_thinking, type }

The thinking configuration for GLM-4.5+ models.

clear_thinking: optional boolean

If False, preserved thinking is used (recommended for agents).

type: optional "enabled" or "disabled"

Whether thinking is enabled or disabled.

Accepts one of the following:
"enabled"
"disabled"
GroqModelSettings = object { max_output_tokens, parallel_tool_calls, provider_type, 2 more }

Groq model configuration (OpenAI-compatible).

max_output_tokens: optional number

The maximum number of tokens the model can generate.

parallel_tool_calls: optional boolean

Whether to enable parallel tool calling.

provider_type: optional "groq"

The type of the provider.

response_format: optional TextResponseFormat { type } or JsonSchemaResponseFormat { json_schema, type } or JsonObjectResponseFormat { type }

The response format for the model.

Accepts one of the following:
TextResponseFormat = object { type }

Response format for plain text responses.

type: optional "text"

The type of the response format.

JsonSchemaResponseFormat = object { json_schema, type }

Response format for JSON schema-based responses.

json_schema: map[unknown]

The JSON schema of the response.

type: optional "json_schema"

The type of the response format.

JsonObjectResponseFormat = object { type }

Response format for JSON object responses.

type: optional "json_object"

The type of the response format.

temperature: optional number

The temperature of the model.

DeepseekModelSettings = object { max_output_tokens, parallel_tool_calls, provider_type, 2 more }

Deepseek model configuration (OpenAI-compatible).

max_output_tokens: optional number

The maximum number of tokens the model can generate.

parallel_tool_calls: optional boolean

Whether to enable parallel tool calling.

provider_type: optional "deepseek"

The type of the provider.

response_format: optional TextResponseFormat { type } or JsonSchemaResponseFormat { json_schema, type } or JsonObjectResponseFormat { type }

The response format for the model.

Accepts one of the following:
TextResponseFormat = object { type }

Response format for plain text responses.

type: optional "text"

The type of the response format.

JsonSchemaResponseFormat = object { json_schema, type }

Response format for JSON schema-based responses.

json_schema: map[unknown]

The JSON schema of the response.

type: optional "json_schema"

The type of the response format.

JsonObjectResponseFormat = object { type }

Response format for JSON object responses.

type: optional "json_object"

The type of the response format.

temperature: optional number

The temperature of the model.

TogetherModelSettings = object { max_output_tokens, parallel_tool_calls, provider_type, 2 more }

Together AI model configuration (OpenAI-compatible).

max_output_tokens: optional number

The maximum number of tokens the model can generate.

parallel_tool_calls: optional boolean

Whether to enable parallel tool calling.

provider_type: optional "together"

The type of the provider.

response_format: optional TextResponseFormat { type } or JsonSchemaResponseFormat { json_schema, type } or JsonObjectResponseFormat { type }

The response format for the model.

Accepts one of the following:
TextResponseFormat = object { type }

Response format for plain text responses.

type: optional "text"

The type of the response format.

JsonSchemaResponseFormat = object { json_schema, type }

Response format for JSON schema-based responses.

json_schema: map[unknown]

The JSON schema of the response.

type: optional "json_schema"

The type of the response format.

JsonObjectResponseFormat = object { type }

Response format for JSON object responses.

type: optional "json_object"

The type of the response format.

temperature: optional number

The temperature of the model.

BedrockModelSettings = object { max_output_tokens, parallel_tool_calls, provider_type, 2 more }

AWS Bedrock model configuration.

max_output_tokens: optional number

The maximum number of tokens the model can generate.

parallel_tool_calls: optional boolean

Whether to enable parallel tool calling.

provider_type: optional "bedrock"

The type of the provider.

response_format: optional TextResponseFormat { type } or JsonSchemaResponseFormat { json_schema, type } or JsonObjectResponseFormat { type }

The response format for the model.

Accepts one of the following:
TextResponseFormat = object { type }

Response format for plain text responses.

type: optional "text"

The type of the response format.

JsonSchemaResponseFormat = object { json_schema, type }

Response format for JSON schema-based responses.

json_schema: map[unknown]

The JSON schema of the response.

type: optional "json_schema"

The type of the response format.

JsonObjectResponseFormat = object { type }

Response format for JSON object responses.

type: optional "json_object"

The type of the response format.

temperature: optional number

The temperature of the model.

Openrouter = object { max_output_tokens, parallel_tool_calls, provider_type, 2 more }

OpenRouter model configuration (OpenAI-compatible).

max_output_tokens: optional number

The maximum number of tokens the model can generate.

parallel_tool_calls: optional boolean

Whether to enable parallel tool calling.

provider_type: optional "openrouter"

The type of the provider.

response_format: optional TextResponseFormat { type } or JsonSchemaResponseFormat { json_schema, type } or JsonObjectResponseFormat { type }

The response format for the model.

Accepts one of the following:
TextResponseFormat = object { type }

Response format for plain text responses.

type: optional "text"

The type of the response format.

JsonSchemaResponseFormat = object { json_schema, type }

Response format for JSON schema-based responses.

json_schema: map[unknown]

The JSON schema of the response.

type: optional "json_schema"

The type of the response format.

JsonObjectResponseFormat = object { type }

Response format for JSON object responses.

type: optional "json_object"

The type of the response format.

temperature: optional number

The temperature of the model.

ChatgptOAuth = object { max_output_tokens, parallel_tool_calls, provider_type, 2 more }

ChatGPT OAuth model configuration (uses ChatGPT backend API).

max_output_tokens: optional number

The maximum number of tokens the model can generate.

parallel_tool_calls: optional boolean

Whether to enable parallel tool calling.

provider_type: optional "chatgpt_oauth"

The type of the provider.

reasoning: optional object { reasoning_effort }

The reasoning configuration for the model.

reasoning_effort: optional "none" or "low" or "medium" or 2 more

The reasoning effort level for GPT-5.x and o-series models.

Accepts one of the following:
"none"
"low"
"medium"
"high"
"xhigh"
temperature: optional number

The temperature of the model.

summary: optional string

A summary of the conversation.

ConversationsMessages

List Conversation Messages
get/v1/conversations/{conversation_id}/messages
Send Conversation Message
post/v1/conversations/{conversation_id}/messages
Retrieve Conversation Stream
post/v1/conversations/{conversation_id}/stream
Compact Conversation
post/v1/conversations/{conversation_id}/compact
ModelsExpand Collapse
CompactionRequest = object { compaction_settings }
compaction_settings: optional object { clip_chars, mode, model, 4 more }

Configuration for conversation compaction / summarization.

Per-model settings (temperature, max tokens, etc.) are derived from the default configuration for that handle.

clip_chars: optional number

The maximum length of the summary in characters. If none, no clipping is performed.

mode: optional "all" or "sliding_window" or "self_compact_all" or "self_compact_sliding_window"

The type of summarization technique use.

Accepts one of the following:
"all"
"sliding_window"
"self_compact_all"
"self_compact_sliding_window"
model: optional string

Model handle to use for sliding_window/all summarization (format: provider/model-name). If None, uses lightweight provider-specific defaults.

model_settings: optional OpenAIModelSettings { max_output_tokens, parallel_tool_calls, provider_type, 4 more } or object { max_output_tokens, parallel_tool_calls, provider_type, 5 more } or AnthropicModelSettings { effort, max_output_tokens, parallel_tool_calls, 6 more } or 11 more

Optional model settings used to override defaults for the summarizer model.

Accepts one of the following:
OpenAIModelSettings = object { max_output_tokens, parallel_tool_calls, provider_type, 4 more }
max_output_tokens: optional number

The maximum number of tokens the model can generate.

parallel_tool_calls: optional boolean

Whether to enable parallel tool calling.

provider_type: optional "openai"

The type of the provider.

reasoning: optional object { reasoning_effort }

The reasoning configuration for the model.

reasoning_effort: optional "none" or "minimal" or "low" or 3 more

The reasoning effort to use when generating text reasoning models

Accepts one of the following:
"none"
"minimal"
"low"
"medium"
"high"
"xhigh"
response_format: optional TextResponseFormat { type } or JsonSchemaResponseFormat { json_schema, type } or JsonObjectResponseFormat { type }

The response format for the model.

Accepts one of the following:
TextResponseFormat = object { type }

Response format for plain text responses.

type: optional "text"

The type of the response format.

JsonSchemaResponseFormat = object { json_schema, type }

Response format for JSON schema-based responses.

json_schema: map[unknown]

The JSON schema of the response.

type: optional "json_schema"

The type of the response format.

JsonObjectResponseFormat = object { type }

Response format for JSON object responses.

type: optional "json_object"

The type of the response format.

strict: optional boolean

Enable strict mode for tool calling. When true, tool outputs are guaranteed to match JSON schemas.

temperature: optional number

The temperature of the model.

Sglang = object { max_output_tokens, parallel_tool_calls, provider_type, 5 more }

SGLang model configuration (OpenAI-compatible runtime with SGLang-specific parsing).

max_output_tokens: optional number

The maximum number of tokens the model can generate.

parallel_tool_calls: optional boolean

Whether to enable parallel tool calling.

provider_type: optional "sglang"

The type of the provider.

reasoning: optional object { reasoning_effort }

The reasoning configuration for the model.

reasoning_effort: optional "none" or "minimal" or "low" or 3 more

The reasoning effort to use when generating text reasoning models

Accepts one of the following:
"none"
"minimal"
"low"
"medium"
"high"
"xhigh"
response_format: optional TextResponseFormat { type } or JsonSchemaResponseFormat { json_schema, type } or JsonObjectResponseFormat { type }

The response format for the model.

Accepts one of the following:
TextResponseFormat = object { type }

Response format for plain text responses.

type: optional "text"

The type of the response format.

JsonSchemaResponseFormat = object { json_schema, type }

Response format for JSON schema-based responses.

json_schema: map[unknown]

The JSON schema of the response.

type: optional "json_schema"

The type of the response format.

JsonObjectResponseFormat = object { type }

Response format for JSON object responses.

type: optional "json_object"

The type of the response format.

strict: optional boolean

Enable strict mode for tool calling. When true, tool outputs are guaranteed to match JSON schemas.

temperature: optional number

The temperature of the model.

tool_call_parser: optional string

SGLang tool call parser name (for example 'glm47', 'qwen25', or 'hermes').

AnthropicModelSettings = object { effort, max_output_tokens, parallel_tool_calls, 6 more }
effort: optional "low" or "medium" or "high" or "max"

Effort level for supported Anthropic models (controls token spending). 'max' is only available on Opus 4.6. Not setting this gives similar performance to 'high'.

Accepts one of the following:
"low"
"medium"
"high"
"max"
max_output_tokens: optional number

The maximum number of tokens the model can generate.

parallel_tool_calls: optional boolean

Whether to enable parallel tool calling.

provider_type: optional "anthropic"

The type of the provider.

response_format: optional TextResponseFormat { type } or JsonSchemaResponseFormat { json_schema, type } or JsonObjectResponseFormat { type }

The response format for the model.

Accepts one of the following:
TextResponseFormat = object { type }

Response format for plain text responses.

type: optional "text"

The type of the response format.

JsonSchemaResponseFormat = object { json_schema, type }

Response format for JSON schema-based responses.

json_schema: map[unknown]

The JSON schema of the response.

type: optional "json_schema"

The type of the response format.

JsonObjectResponseFormat = object { type }

Response format for JSON object responses.

type: optional "json_object"

The type of the response format.

strict: optional boolean

Enable strict mode for tool calling. When true, tool outputs are guaranteed to match JSON schemas.

temperature: optional number

The temperature of the model.

thinking: optional object { budget_tokens, type }

The thinking configuration for the model.

budget_tokens: optional number

The maximum number of tokens the model can use for extended thinking.

type: optional "enabled" or "disabled"

The type of thinking to use.

Accepts one of the following:
"enabled"
"disabled"
verbosity: optional "low" or "medium" or "high"

Soft control for how verbose model output should be, used for GPT-5 models.

Accepts one of the following:
"low"
"medium"
"high"
GoogleAIModelSettings = object { max_output_tokens, parallel_tool_calls, provider_type, 3 more }
max_output_tokens: optional number

The maximum number of tokens the model can generate.

parallel_tool_calls: optional boolean

Whether to enable parallel tool calling.

provider_type: optional "google_ai"

The type of the provider.

response_schema: optional TextResponseFormat { type } or JsonSchemaResponseFormat { json_schema, type } or JsonObjectResponseFormat { type }

The response schema for the model.

Accepts one of the following:
TextResponseFormat = object { type }

Response format for plain text responses.

type: optional "text"

The type of the response format.

JsonSchemaResponseFormat = object { json_schema, type }

Response format for JSON schema-based responses.

json_schema: map[unknown]

The JSON schema of the response.

type: optional "json_schema"

The type of the response format.

JsonObjectResponseFormat = object { type }

Response format for JSON object responses.

type: optional "json_object"

The type of the response format.

temperature: optional number

The temperature of the model.

thinking_config: optional object { include_thoughts, thinking_budget }

The thinking configuration for the model.

include_thoughts: optional boolean

Whether to include thoughts in the model's response.

thinking_budget: optional number

The thinking budget for the model.

GoogleVertexModelSettings = object { max_output_tokens, parallel_tool_calls, provider_type, 3 more }
max_output_tokens: optional number

The maximum number of tokens the model can generate.

parallel_tool_calls: optional boolean

Whether to enable parallel tool calling.

provider_type: optional "google_vertex"

The type of the provider.

response_schema: optional TextResponseFormat { type } or JsonSchemaResponseFormat { json_schema, type } or JsonObjectResponseFormat { type }

The response schema for the model.

Accepts one of the following:
TextResponseFormat = object { type }

Response format for plain text responses.

type: optional "text"

The type of the response format.

JsonSchemaResponseFormat = object { json_schema, type }

Response format for JSON schema-based responses.

json_schema: map[unknown]

The JSON schema of the response.

type: optional "json_schema"

The type of the response format.

JsonObjectResponseFormat = object { type }

Response format for JSON object responses.

type: optional "json_object"

The type of the response format.

temperature: optional number

The temperature of the model.

thinking_config: optional object { include_thoughts, thinking_budget }

The thinking configuration for the model.

include_thoughts: optional boolean

Whether to include thoughts in the model's response.

thinking_budget: optional number

The thinking budget for the model.

AzureModelSettings = object { max_output_tokens, parallel_tool_calls, provider_type, 2 more }

Azure OpenAI model configuration (OpenAI-compatible).

max_output_tokens: optional number

The maximum number of tokens the model can generate.

parallel_tool_calls: optional boolean

Whether to enable parallel tool calling.

provider_type: optional "azure"

The type of the provider.

response_format: optional TextResponseFormat { type } or JsonSchemaResponseFormat { json_schema, type } or JsonObjectResponseFormat { type }

The response format for the model.

Accepts one of the following:
TextResponseFormat = object { type }

Response format for plain text responses.

type: optional "text"

The type of the response format.

JsonSchemaResponseFormat = object { json_schema, type }

Response format for JSON schema-based responses.

json_schema: map[unknown]

The JSON schema of the response.

type: optional "json_schema"

The type of the response format.

JsonObjectResponseFormat = object { type }

Response format for JSON object responses.

type: optional "json_object"

The type of the response format.

temperature: optional number

The temperature of the model.

XaiModelSettings = object { max_output_tokens, parallel_tool_calls, provider_type, 2 more }

xAI model configuration (OpenAI-compatible).

max_output_tokens: optional number

The maximum number of tokens the model can generate.

parallel_tool_calls: optional boolean

Whether to enable parallel tool calling.

provider_type: optional "xai"

The type of the provider.

response_format: optional TextResponseFormat { type } or JsonSchemaResponseFormat { json_schema, type } or JsonObjectResponseFormat { type }

The response format for the model.

Accepts one of the following:
TextResponseFormat = object { type }

Response format for plain text responses.

type: optional "text"

The type of the response format.

JsonSchemaResponseFormat = object { json_schema, type }

Response format for JSON schema-based responses.

json_schema: map[unknown]

The JSON schema of the response.

type: optional "json_schema"

The type of the response format.

JsonObjectResponseFormat = object { type }

Response format for JSON object responses.

type: optional "json_object"

The type of the response format.

temperature: optional number

The temperature of the model.

Zai = object { max_output_tokens, parallel_tool_calls, provider_type, 3 more }

Z.ai (ZhipuAI) model configuration (OpenAI-compatible).

max_output_tokens: optional number

The maximum number of tokens the model can generate.

parallel_tool_calls: optional boolean

Whether to enable parallel tool calling.

provider_type: optional "zai"

The type of the provider.

response_format: optional TextResponseFormat { type } or JsonSchemaResponseFormat { json_schema, type } or JsonObjectResponseFormat { type }

The response format for the model.

Accepts one of the following:
TextResponseFormat = object { type }

Response format for plain text responses.

type: optional "text"

The type of the response format.

JsonSchemaResponseFormat = object { json_schema, type }

Response format for JSON schema-based responses.

json_schema: map[unknown]

The JSON schema of the response.

type: optional "json_schema"

The type of the response format.

JsonObjectResponseFormat = object { type }

Response format for JSON object responses.

type: optional "json_object"

The type of the response format.

temperature: optional number

The temperature of the model.

thinking: optional object { clear_thinking, type }

The thinking configuration for GLM-4.5+ models.

clear_thinking: optional boolean

If False, preserved thinking is used (recommended for agents).

type: optional "enabled" or "disabled"

Whether thinking is enabled or disabled.

Accepts one of the following:
"enabled"
"disabled"
GroqModelSettings = object { max_output_tokens, parallel_tool_calls, provider_type, 2 more }

Groq model configuration (OpenAI-compatible).

max_output_tokens: optional number

The maximum number of tokens the model can generate.

parallel_tool_calls: optional boolean

Whether to enable parallel tool calling.

provider_type: optional "groq"

The type of the provider.

response_format: optional TextResponseFormat { type } or JsonSchemaResponseFormat { json_schema, type } or JsonObjectResponseFormat { type }

The response format for the model.

Accepts one of the following:
TextResponseFormat = object { type }

Response format for plain text responses.

type: optional "text"

The type of the response format.

JsonSchemaResponseFormat = object { json_schema, type }

Response format for JSON schema-based responses.

json_schema: map[unknown]

The JSON schema of the response.

type: optional "json_schema"

The type of the response format.

JsonObjectResponseFormat = object { type }

Response format for JSON object responses.

type: optional "json_object"

The type of the response format.

temperature: optional number

The temperature of the model.

DeepseekModelSettings = object { max_output_tokens, parallel_tool_calls, provider_type, 2 more }

Deepseek model configuration (OpenAI-compatible).

max_output_tokens: optional number

The maximum number of tokens the model can generate.

parallel_tool_calls: optional boolean

Whether to enable parallel tool calling.

provider_type: optional "deepseek"

The type of the provider.

response_format: optional TextResponseFormat { type } or JsonSchemaResponseFormat { json_schema, type } or JsonObjectResponseFormat { type }

The response format for the model.

Accepts one of the following:
TextResponseFormat = object { type }

Response format for plain text responses.

type: optional "text"

The type of the response format.

JsonSchemaResponseFormat = object { json_schema, type }

Response format for JSON schema-based responses.

json_schema: map[unknown]

The JSON schema of the response.

type: optional "json_schema"

The type of the response format.

JsonObjectResponseFormat = object { type }

Response format for JSON object responses.

type: optional "json_object"

The type of the response format.

temperature: optional number

The temperature of the model.

TogetherModelSettings = object { max_output_tokens, parallel_tool_calls, provider_type, 2 more }

Together AI model configuration (OpenAI-compatible).

max_output_tokens: optional number

The maximum number of tokens the model can generate.

parallel_tool_calls: optional boolean

Whether to enable parallel tool calling.

provider_type: optional "together"

The type of the provider.

response_format: optional TextResponseFormat { type } or JsonSchemaResponseFormat { json_schema, type } or JsonObjectResponseFormat { type }

The response format for the model.

Accepts one of the following:
TextResponseFormat = object { type }

Response format for plain text responses.

type: optional "text"

The type of the response format.

JsonSchemaResponseFormat = object { json_schema, type }

Response format for JSON schema-based responses.

json_schema: map[unknown]

The JSON schema of the response.

type: optional "json_schema"

The type of the response format.

JsonObjectResponseFormat = object { type }

Response format for JSON object responses.

type: optional "json_object"

The type of the response format.

temperature: optional number

The temperature of the model.

BedrockModelSettings = object { max_output_tokens, parallel_tool_calls, provider_type, 2 more }

AWS Bedrock model configuration.

max_output_tokens: optional number

The maximum number of tokens the model can generate.

parallel_tool_calls: optional boolean

Whether to enable parallel tool calling.

provider_type: optional "bedrock"

The type of the provider.

response_format: optional TextResponseFormat { type } or JsonSchemaResponseFormat { json_schema, type } or JsonObjectResponseFormat { type }

The response format for the model.

Accepts one of the following:
TextResponseFormat = object { type }

Response format for plain text responses.

type: optional "text"

The type of the response format.

JsonSchemaResponseFormat = object { json_schema, type }

Response format for JSON schema-based responses.

json_schema: map[unknown]

The JSON schema of the response.

type: optional "json_schema"

The type of the response format.

JsonObjectResponseFormat = object { type }

Response format for JSON object responses.

type: optional "json_object"

The type of the response format.

temperature: optional number

The temperature of the model.

Openrouter = object { max_output_tokens, parallel_tool_calls, provider_type, 2 more }

OpenRouter model configuration (OpenAI-compatible).

max_output_tokens: optional number

The maximum number of tokens the model can generate.

parallel_tool_calls: optional boolean

Whether to enable parallel tool calling.

provider_type: optional "openrouter"

The type of the provider.

response_format: optional TextResponseFormat { type } or JsonSchemaResponseFormat { json_schema, type } or JsonObjectResponseFormat { type }

The response format for the model.

Accepts one of the following:
TextResponseFormat = object { type }

Response format for plain text responses.

type: optional "text"

The type of the response format.

JsonSchemaResponseFormat = object { json_schema, type }

Response format for JSON schema-based responses.

json_schema: map[unknown]

The JSON schema of the response.

type: optional "json_schema"

The type of the response format.

JsonObjectResponseFormat = object { type }

Response format for JSON object responses.

type: optional "json_object"

The type of the response format.

temperature: optional number

The temperature of the model.

ChatgptOAuth = object { max_output_tokens, parallel_tool_calls, provider_type, 2 more }

ChatGPT OAuth model configuration (uses ChatGPT backend API).

max_output_tokens: optional number

The maximum number of tokens the model can generate.

parallel_tool_calls: optional boolean

Whether to enable parallel tool calling.

provider_type: optional "chatgpt_oauth"

The type of the provider.

reasoning: optional object { reasoning_effort }

The reasoning configuration for the model.

reasoning_effort: optional "none" or "low" or "medium" or 2 more

The reasoning effort level for GPT-5.x and o-series models.

Accepts one of the following:
"none"
"low"
"medium"
"high"
"xhigh"
temperature: optional number

The temperature of the model.

prompt: optional string

The prompt to use for summarization. If None, uses mode-specific default.

prompt_acknowledgement: optional boolean

Whether to include an acknowledgement post-prompt (helps prevent non-summary outputs).

sliding_window_percentage: optional number

The percentage of the context window to keep post-summarization (only used in sliding window modes).

CompactionResponse = object { num_messages_after, num_messages_before, summary }
num_messages_after: number
num_messages_before: number
summary: string