Models

List Llm Models

get/v1/models/

ModelsExpand Collapse

EmbeddingConfig = object { embedding_dim, embedding_endpoint_type, embedding_model, 7 more }

Configuration for embedding model connection and processing parameters.

embedding_dim: number

The dimension of the embedding.

embedding_endpoint_type: "openai" or "anthropic" or "bedrock" or 16 more

The endpoint type for the model.

Accepts one of the following:

"openai"

"anthropic"

"bedrock"

"google_ai"

"google_vertex"

"azure"

"groq"

"ollama"

"webui"

"webui-legacy"

"lmstudio"

"lmstudio-legacy"

"llamacpp"

"koboldcpp"

"vllm"

"hugging-face"

"mistral"

"together"

"pinecone"

embedding_model: string

The model for the embedding.

azure_deployment: optional string

The Azure deployment for the model.

azure_endpoint: optional string

The Azure endpoint for the model.

azure_version: optional string

The Azure version for the model.

batch_size: optional number

The maximum batch size for processing embeddings.

embedding_chunk_size: optional number

The chunk size of the embedding.

embedding_endpoint: optional string

The endpoint for the model (None if local).

handle: optional string

The handle for this config, in the format provider/model-name.

EmbeddingModel = object { display_name, embedding_dim, embedding_endpoint_type, 12 more }

display_name: string

Display name for the model shown in UI

embedding_dim: number

The dimension of the embedding

Deprecatedembedding_endpoint_type: "openai" or "anthropic" or "bedrock" or 16 more

Deprecated: Use 'provider_type' field instead. The endpoint type for the embedding model.

Accepts one of the following:

"openai"

"anthropic"

"bedrock"

"google_ai"

"google_vertex"

"azure"

"groq"

"ollama"

"webui"

"webui-legacy"

"lmstudio"

"lmstudio-legacy"

"llamacpp"

"koboldcpp"

"vllm"

"hugging-face"

"mistral"

"together"

"pinecone"

Deprecatedembedding_model: string

Deprecated: Use 'name' field instead. Embedding model name.

name: string

The actual model name used by the provider

provider_name: string

The name of the provider

provider_type: ProviderType

The type of the provider

Accepts one of the following:

"anthropic"

"azure"

"baseten"

"bedrock"

"cerebras"

"chatgpt_oauth"

"deepseek"

"fireworks"

"google_ai"

"google_vertex"

"groq"

"hugging-face"

"letta"

"lmstudio_openai"

"minimax"

"mistral"

"ollama"

"openai"

"together"

"vllm"

"sglang"

"openrouter"

"xai"

"zai"

"zai_coding"

Deprecatedazure_deployment: optional string

Deprecated: The Azure deployment for the model.

Deprecatedazure_endpoint: optional string

Deprecated: The Azure endpoint for the model.

Deprecatedazure_version: optional string

Deprecated: The Azure version for the model.

Deprecatedbatch_size: optional number

Deprecated: The maximum batch size for processing embeddings.

Deprecatedembedding_chunk_size: optional number

Deprecated: The chunk size of the embedding.

Deprecatedembedding_endpoint: optional string

Deprecated: The endpoint for the model.

handle: optional string

The handle for this config, in the format provider/model-name.

model_type: optional "embedding"

Type of model (llm or embedding)

LlmConfig = object { context_window, model, model_endpoint_type, 24 more }

Configuration for Language Model (LLM) connection and generation parameters.

.. deprecated:: LLMConfig is deprecated and should not be used as an input or return type in API calls. Use the schemas in letta.schemas.model (ModelSettings, OpenAIModelSettings, etc.) instead. For conversion, use the _to_model() method or Model._from_llm_config() method.

context_window: number

The context window size for the model.

model: string

LLM model name.

model_endpoint_type: "openai" or "anthropic" or "google_ai" or 25 more

The endpoint type for the model.

Accepts one of the following:

"openai"

"anthropic"

"google_ai"

"google_vertex"

"azure"

"groq"

"ollama"

"webui"

"webui-legacy"

"lmstudio"

"lmstudio-legacy"

"lmstudio-chatcompletions"

"llamacpp"

"koboldcpp"

"vllm"

"hugging-face"

"minimax"

"mistral"

"together"

"bedrock"

"deepseek"

"xai"

"zai"

"zai_coding"

"baseten"

"fireworks"

"openrouter"

"chatgpt_oauth"

compatibility_type: optional "gguf" or "mlx"

The framework compatibility type for the model.

Accepts one of the following:

"gguf"

"mlx"

display_name: optional string

A human-friendly display name for the model.

effort: optional "low" or "medium" or "high" or "max"

The effort level for Anthropic models that support it (Opus 4.5, Opus 4.6). Controls token spending and thinking behavior. Not setting this gives similar performance to 'high'.

Accepts one of the following:

"low"

"medium"

"high"

"max"

enable_reasoner: optional boolean

Whether or not the model should use extended thinking if it is a 'reasoning' style model

frequency_penalty: optional number

Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim. From OpenAI: Number between -2.0 and 2.0.

handle: optional string

The handle for this config, in the format provider/model-name.

max_reasoning_tokens: optional number

Configurable thinking budget for extended thinking. Used for enable_reasoner and also for Google Vertex models like Gemini 2.5 Flash. Minimum value is 1024 when used with enable_reasoner.

max_tokens: optional number

The maximum number of tokens to generate. If not set, the model will use its default value.

model_endpoint: optional string

The endpoint for the model.

model_wrapper: optional string

The wrapper for the model.

Deprecatedparallel_tool_calls: optional boolean

Deprecated: Use model_settings to configure parallel tool calls instead. If set to True, enables parallel tool calling. Defaults to False.

provider_category: optional ProviderCategory

The provider category for the model.

Accepts one of the following:

"base"

"byok"

provider_name: optional string

The provider name for the model.

put_inner_thoughts_in_kwargs: optional boolean

Puts 'inner_thoughts' as a kwarg in the function call if this is set to True. This helps with function calling performance and also the generation of inner thoughts.

reasoning_effort: optional "none" or "minimal" or "low" or 3 more

The reasoning effort to use when generating text reasoning models

Accepts one of the following:

"none"

"minimal"

"low"

"medium"

"high"

"xhigh"

response_format: optional TextResponseFormat { type } or JsonSchemaResponseFormat { json_schema, type } or JsonObjectResponseFormat { type }

The response format for the model's output. Supports text, json_object, and json_schema (structured outputs). Can be set via model_settings.

Accepts one of the following:

TextResponseFormat = object { type }

Response format for plain text responses.

type: optional "text"

The type of the response format.

JsonSchemaResponseFormat = object { json_schema, type }

Response format for JSON schema-based responses.

json_schema: map[unknown]

The JSON schema of the response.

type: optional "json_schema"

The type of the response format.

JsonObjectResponseFormat = object { type }

Response format for JSON object responses.

type: optional "json_object"

The type of the response format.

return_logprobs: optional boolean

Whether to return log probabilities of the output tokens. Useful for RL training.

return_token_ids: optional boolean

Whether to return token IDs for all LLM generations via SGLang native endpoint. Required for multi-turn RL training with loss masking. Only works with SGLang provider.

strict: optional boolean

Enable strict mode for tool calling. When true, tool schemas include strict: true and additionalProperties: false, guaranteeing tool outputs match JSON schemas.

temperature: optional number

The temperature to use when generating text with the model. A higher temperature will result in more random text.

tier: optional string

The cost tier for the model (cloud only).

tool_call_parser: optional string

SGLang tool call parser name (e.g. 'glm47', 'qwen25', 'hermes'). Used by the SGLang native adapter to parse tool calls from raw model output.

top_logprobs: optional number

Number of most likely tokens to return at each position (0-20). Requires return_logprobs=True.

verbosity: optional "low" or "medium" or "high"

Soft control for how verbose model output should be, used for GPT-5 models.

Accepts one of the following:

"low"

"medium"

"high"

Model = object { context_window, max_context_window, model, 28 more }

Deprecatedcontext_window: number

Deprecated: Use 'max_context_window' field instead. The context window size for the model.

max_context_window: number

The maximum context window for the model

Deprecatedmodel: string

Deprecated: Use 'name' field instead. LLM model name.

Deprecatedmodel_endpoint_type: "openai" or "anthropic" or "google_ai" or 23 more

Deprecated: Use 'provider_type' field instead. The endpoint type for the model.

Accepts one of the following:

"openai"

"anthropic"

"google_ai"

"google_vertex"

"azure"

"groq"

"ollama"

"webui"

"webui-legacy"

"lmstudio"

"lmstudio-legacy"

"lmstudio-chatcompletions"

"llamacpp"

"koboldcpp"

"vllm"

"hugging-face"

"minimax"

"mistral"

"together"

"bedrock"

"deepseek"

"xai"

"zai"

"zai_coding"

"openrouter"

"chatgpt_oauth"

name: string

The actual model name used by the provider

provider_type: ProviderType

The type of the provider

Accepts one of the following:

"anthropic"

"azure"

"baseten"

"bedrock"

"cerebras"

"chatgpt_oauth"

"deepseek"

"fireworks"

"google_ai"

"google_vertex"

"groq"

"hugging-face"

"letta"

"lmstudio_openai"

"minimax"

"mistral"

"ollama"

"openai"

"together"

"vllm"

"sglang"

"openrouter"

"xai"

"zai"

"zai_coding"

Deprecatedcompatibility_type: optional "gguf" or "mlx"

Deprecated: The framework compatibility type for the model.

Accepts one of the following:

"gguf"

"mlx"

display_name: optional string

A human-friendly display name for the model.

effort: optional "low" or "medium" or "high" or "max"

The effort level for Anthropic models that support it (Opus 4.5, Opus 4.6). Controls token spending and thinking behavior. Not setting this gives similar performance to 'high'.

Accepts one of the following:

"low"

"medium"

"high"

"max"

Deprecatedenable_reasoner: optional boolean

Deprecated: Whether or not the model should use extended thinking if it is a 'reasoning' style model.

Deprecatedfrequency_penalty: optional number

Deprecated: Positive values penalize new tokens based on their existing frequency in the text so far.

handle: optional string

The handle for this config, in the format provider/model-name.

Deprecatedmax_reasoning_tokens: optional number

Deprecated: Configurable thinking budget for extended thinking.

Deprecatedmax_tokens: optional number

Deprecated: The maximum number of tokens to generate.

Deprecatedmodel_endpoint: optional string

Deprecated: The endpoint for the model.

model_type: optional "llm"

Type of model (llm or embedding)

Deprecatedmodel_wrapper: optional string

Deprecated: The wrapper for the model.

Deprecatedparallel_tool_calls: optional boolean

Deprecated: If set to True, enables parallel tool calling.

Deprecatedprovider_category: optional ProviderCategory

Deprecated: The provider category for the model.

Accepts one of the following:

"base"

"byok"

provider_name: optional string

The provider name for the model.

Deprecatedput_inner_thoughts_in_kwargs: optional boolean

Deprecated: Puts 'inner_thoughts' as a kwarg in the function call.

Deprecatedreasoning_effort: optional "none" or "minimal" or "low" or 3 more

Deprecated: The reasoning effort to use when generating text reasoning models.

Accepts one of the following:

"none"

"minimal"

"low"

"medium"

"high"

"xhigh"

response_format: optional TextResponseFormat { type } or JsonSchemaResponseFormat { json_schema, type } or JsonObjectResponseFormat { type }

The response format for the model's output. Supports text, json_object, and json_schema (structured outputs). Can be set via model_settings.

Accepts one of the following:

TextResponseFormat = object { type }

Response format for plain text responses.

type: optional "text"

The type of the response format.

JsonSchemaResponseFormat = object { json_schema, type }

Response format for JSON schema-based responses.

json_schema: map[unknown]

The JSON schema of the response.

type: optional "json_schema"

The type of the response format.

JsonObjectResponseFormat = object { type }

Response format for JSON object responses.

type: optional "json_object"

The type of the response format.

return_logprobs: optional boolean

Whether to return log probabilities of the output tokens. Useful for RL training.

return_token_ids: optional boolean

Whether to return token IDs for all LLM generations via SGLang native endpoint. Required for multi-turn RL training with loss masking. Only works with SGLang provider.

strict: optional boolean

Enable strict mode for tool calling. When true, tool schemas include strict: true and additionalProperties: false, guaranteeing tool outputs match JSON schemas.

Deprecatedtemperature: optional number

Deprecated: The temperature to use when generating text with the model.

Deprecatedtier: optional string

Deprecated: The cost tier for the model (cloud only).

tool_call_parser: optional string

SGLang tool call parser name (e.g. 'glm47', 'qwen25', 'hermes'). Used by the SGLang native adapter to parse tool calls from raw model output.

top_logprobs: optional number

Number of most likely tokens to return at each position (0-20). Requires return_logprobs=True.

Deprecatedverbosity: optional "low" or "medium" or "high"

Deprecated: Soft control for how verbose model output should be.

Accepts one of the following:

"low"

"medium"

"high"

ProviderCategory = "base" or "byok"

Accepts one of the following:

"base"

"byok"

ProviderType = "anthropic" or "azure" or "baseten" or 22 more

Accepts one of the following:

"anthropic"

"azure"

"baseten"

"bedrock"

"cerebras"

"chatgpt_oauth"

"deepseek"

"fireworks"

"google_ai"

"google_vertex"

"groq"

"hugging-face"

"letta"

"lmstudio_openai"

"minimax"

"mistral"

"ollama"

"openai"

"together"

"vllm"

"sglang"

"openrouter"

"xai"

"zai"

"zai_coding"

ModelsEmbeddings

List Embedding Models

get/v1/models/embedding