Models
List Llm Models
ModelsExpand Collapse
EmbeddingConfig = object { embedding_dim, embedding_endpoint_type, embedding_model, 7 more }
Configuration for embedding model connection and processing parameters.
embedding_dim: number
The dimension of the embedding.
embedding_endpoint_type: "openai" or "anthropic" or "bedrock" or 16 more
The endpoint type for the model.
embedding_model: string
The model for the embedding.
azure_deployment: optional string
The Azure deployment for the model.
azure_endpoint: optional string
The Azure endpoint for the model.
azure_version: optional string
The Azure version for the model.
batch_size: optional number
The maximum batch size for processing embeddings.
embedding_chunk_size: optional number
The chunk size of the embedding.
embedding_endpoint: optional string
The endpoint for the model (None if local).
handle: optional string
The handle for this config, in the format provider/model-name.
EmbeddingModel = object { display_name, embedding_dim, embedding_endpoint_type, 12 more }
display_name: string
Display name for the model shown in UI
embedding_dim: number
The dimension of the embedding
Deprecatedembedding_endpoint_type: "openai" or "anthropic" or "bedrock" or 16 more
Deprecated: Use 'provider_type' field instead. The endpoint type for the embedding model.
Deprecatedembedding_model: string
Deprecated: Use 'name' field instead. Embedding model name.
name: string
The actual model name used by the provider
provider_name: string
The name of the provider
The type of the provider
Deprecatedazure_deployment: optional string
Deprecated: The Azure deployment for the model.
Deprecatedazure_endpoint: optional string
Deprecated: The Azure endpoint for the model.
Deprecatedazure_version: optional string
Deprecated: The Azure version for the model.
Deprecatedbatch_size: optional number
Deprecated: The maximum batch size for processing embeddings.
Deprecatedembedding_chunk_size: optional number
Deprecated: The chunk size of the embedding.
Deprecatedembedding_endpoint: optional string
Deprecated: The endpoint for the model.
handle: optional string
The handle for this config, in the format provider/model-name.
model_type: optional "embedding"
Type of model (llm or embedding)
LlmConfig = object { context_window, model, model_endpoint_type, 20 more }
Configuration for Language Model (LLM) connection and generation parameters.
.. deprecated:: LLMConfig is deprecated and should not be used as an input or return type in API calls. Use the schemas in letta.schemas.model (ModelSettings, OpenAIModelSettings, etc.) instead. For conversion, use the _to_model() method or Model._from_llm_config() method.
context_window: number
The context window size for the model.
model: string
LLM model name.
model_endpoint_type: "openai" or "anthropic" or "google_ai" or 22 more
The endpoint type for the model.
compatibility_type: optional "gguf" or "mlx"
The framework compatibility type for the model.
display_name: optional string
A human-friendly display name for the model.
effort: optional "low" or "medium" or "high" or "max"
The effort level for Anthropic models that support it (Opus 4.5, Opus 4.6). Controls token spending and thinking behavior. Not setting this gives similar performance to 'high'.
enable_reasoner: optional boolean
Whether or not the model should use extended thinking if it is a 'reasoning' style model
frequency_penalty: optional number
Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim. From OpenAI: Number between -2.0 and 2.0.
handle: optional string
The handle for this config, in the format provider/model-name.
max_reasoning_tokens: optional number
Configurable thinking budget for extended thinking. Used for enable_reasoner and also for Google Vertex models like Gemini 2.5 Flash. Minimum value is 1024 when used with enable_reasoner.
max_tokens: optional number
The maximum number of tokens to generate. If not set, the model will use its default value.
model_endpoint: optional string
The endpoint for the model.
model_wrapper: optional string
The wrapper for the model.
Deprecatedparallel_tool_calls: optional boolean
Deprecated: Use model_settings to configure parallel tool calls instead. If set to True, enables parallel tool calling. Defaults to False.
The provider category for the model.
provider_name: optional string
The provider name for the model.
put_inner_thoughts_in_kwargs: optional boolean
Puts 'inner_thoughts' as a kwarg in the function call if this is set to True. This helps with function calling performance and also the generation of inner thoughts.
reasoning_effort: optional "none" or "minimal" or "low" or 3 more
The reasoning effort to use when generating text reasoning models
response_format: optional TextResponseFormat { type } or JsonSchemaResponseFormat { json_schema, type } or JsonObjectResponseFormat { type }
The response format for the model's output. Supports text, json_object, and json_schema (structured outputs). Can be set via model_settings.
TextResponseFormat = object { type }
Response format for plain text responses.
type: optional "text"
The type of the response format.
JsonSchemaResponseFormat = object { json_schema, type }
Response format for JSON schema-based responses.
json_schema: map[unknown]
The JSON schema of the response.
type: optional "json_schema"
The type of the response format.
JsonObjectResponseFormat = object { type }
Response format for JSON object responses.
type: optional "json_object"
The type of the response format.
strict: optional boolean
Enable strict mode for tool calling. When true, tool schemas include strict: true and additionalProperties: false, guaranteeing tool outputs match JSON schemas.
temperature: optional number
The temperature to use when generating text with the model. A higher temperature will result in more random text.
tier: optional string
The cost tier for the model (cloud only).
verbosity: optional "low" or "medium" or "high"
Soft control for how verbose model output should be, used for GPT-5 models.
Model = object { context_window, max_context_window, model, 24 more }
Deprecatedcontext_window: number
Deprecated: Use 'max_context_window' field instead. The context window size for the model.
max_context_window: number
The maximum context window for the model
Deprecatedmodel: string
Deprecated: Use 'name' field instead. LLM model name.
Deprecatedmodel_endpoint_type: "openai" or "anthropic" or "google_ai" or 22 more
Deprecated: Use 'provider_type' field instead. The endpoint type for the model.
name: string
The actual model name used by the provider
The type of the provider
Deprecatedcompatibility_type: optional "gguf" or "mlx"
Deprecated: The framework compatibility type for the model.
display_name: optional string
A human-friendly display name for the model.
effort: optional "low" or "medium" or "high" or "max"
The effort level for Anthropic models that support it (Opus 4.5, Opus 4.6). Controls token spending and thinking behavior. Not setting this gives similar performance to 'high'.
Deprecatedenable_reasoner: optional boolean
Deprecated: Whether or not the model should use extended thinking if it is a 'reasoning' style model.
Deprecatedfrequency_penalty: optional number
Deprecated: Positive values penalize new tokens based on their existing frequency in the text so far.
handle: optional string
The handle for this config, in the format provider/model-name.
Deprecatedmax_reasoning_tokens: optional number
Deprecated: Configurable thinking budget for extended thinking.
Deprecatedmax_tokens: optional number
Deprecated: The maximum number of tokens to generate.
Deprecatedmodel_endpoint: optional string
Deprecated: The endpoint for the model.
model_type: optional "llm"
Type of model (llm or embedding)
Deprecatedmodel_wrapper: optional string
Deprecated: The wrapper for the model.
Deprecatedparallel_tool_calls: optional boolean
Deprecated: If set to True, enables parallel tool calling.
Deprecated: The provider category for the model.
provider_name: optional string
The provider name for the model.
Deprecatedput_inner_thoughts_in_kwargs: optional boolean
Deprecated: Puts 'inner_thoughts' as a kwarg in the function call.
Deprecatedreasoning_effort: optional "none" or "minimal" or "low" or 3 more
Deprecated: The reasoning effort to use when generating text reasoning models.
response_format: optional TextResponseFormat { type } or JsonSchemaResponseFormat { json_schema, type } or JsonObjectResponseFormat { type }
The response format for the model's output. Supports text, json_object, and json_schema (structured outputs). Can be set via model_settings.
TextResponseFormat = object { type }
Response format for plain text responses.
type: optional "text"
The type of the response format.
JsonSchemaResponseFormat = object { json_schema, type }
Response format for JSON schema-based responses.
json_schema: map[unknown]
The JSON schema of the response.
type: optional "json_schema"
The type of the response format.
JsonObjectResponseFormat = object { type }
Response format for JSON object responses.
type: optional "json_object"
The type of the response format.
strict: optional boolean
Enable strict mode for tool calling. When true, tool schemas include strict: true and additionalProperties: false, guaranteeing tool outputs match JSON schemas.
Deprecatedtemperature: optional number
Deprecated: The temperature to use when generating text with the model.
Deprecatedtier: optional string
Deprecated: The cost tier for the model (cloud only).
Deprecatedverbosity: optional "low" or "medium" or "high"
Deprecated: Soft control for how verbose model output should be.