Models

List Llm Models

client.models.list(?, ?): ModelListResponse { context_window, max_context_window, model, 27 more }

get/v1/models/

ModelsExpand Collapse

EmbeddingConfig { embedding_dim, embedding_endpoint_type, embedding_model, 7 more }

Configuration for embedding model connection and processing parameters.

embedding_dim: number

The dimension of the embedding.

embedding_endpoint_type: "openai" | "anthropic" | "bedrock" | 16 more

The endpoint type for the model.

Accepts one of the following:

"openai"

"anthropic"

"bedrock"

"google_ai"

"google_vertex"

"azure"

"groq"

"ollama"

"webui"

"webui-legacy"

"lmstudio"

"lmstudio-legacy"

"llamacpp"

"koboldcpp"

"vllm"

"hugging-face"

"mistral"

"together"

"pinecone"

embedding_model: string

The model for the embedding.

azure_deployment?: string | null

The Azure deployment for the model.

azure_endpoint?: string | null

The Azure endpoint for the model.

azure_version?: string | null

The Azure version for the model.

batch_size?: number

The maximum batch size for processing embeddings.

embedding_chunk_size?: number | null

The chunk size of the embedding.

embedding_endpoint?: string | null

The endpoint for the model (None if local).

handle?: string | null

The handle for this config, in the format provider/model-name.

EmbeddingModel { display_name, embedding_dim, embedding_endpoint_type, 12 more }

display_name: string

Display name for the model shown in UI

embedding_dim: number

The dimension of the embedding

Deprecatedembedding_endpoint_type: "openai" | "anthropic" | "bedrock" | 16 more

Deprecated: Use 'provider_type' field instead. The endpoint type for the embedding model.

Accepts one of the following:

"openai"

"anthropic"

"bedrock"

"google_ai"

"google_vertex"

"azure"

"groq"

"ollama"

"webui"

"webui-legacy"

"lmstudio"

"lmstudio-legacy"

"llamacpp"

"koboldcpp"

"vllm"

"hugging-face"

"mistral"

"together"

"pinecone"

Deprecatedembedding_model: string

Deprecated: Use 'name' field instead. Embedding model name.

name: string

The actual model name used by the provider

provider_name: string

The name of the provider

provider_type: ProviderType

The type of the provider

Accepts one of the following:

"anthropic"

"azure"

"bedrock"

"cerebras"

"chatgpt_oauth"

"deepseek"

"google_ai"

"google_vertex"

"groq"

"hugging-face"

"letta"

"lmstudio_openai"

"minimax"

"mistral"

"ollama"

"openai"

"together"

"vllm"

"sglang"

"openrouter"

"xai"

"zai"

Deprecatedazure_deployment?: string | null

Deprecated: The Azure deployment for the model.

Deprecatedazure_endpoint?: string | null

Deprecated: The Azure endpoint for the model.

Deprecatedazure_version?: string | null

Deprecated: The Azure version for the model.

Deprecatedbatch_size?: number

Deprecated: The maximum batch size for processing embeddings.

Deprecatedembedding_chunk_size?: number | null

Deprecated: The chunk size of the embedding.

Deprecatedembedding_endpoint?: string | null

Deprecated: The endpoint for the model.

handle?: string | null

The handle for this config, in the format provider/model-name.

model_type?: "embedding"

Type of model (llm or embedding)

LlmConfig { context_window, model, model_endpoint_type, 23 more }

Configuration for Language Model (LLM) connection and generation parameters.

.. deprecated:: LLMConfig is deprecated and should not be used as an input or return type in API calls. Use the schemas in letta.schemas.model (ModelSettings, OpenAIModelSettings, etc.) instead. For conversion, use the _to_model() method or Model._from_llm_config() method.

context_window: number

The context window size for the model.

model: string

LLM model name.

model_endpoint_type: "openai" | "anthropic" | "google_ai" | 22 more

The endpoint type for the model.

Accepts one of the following:

"openai"

"anthropic"

"google_ai"

"google_vertex"

"azure"

"groq"

"ollama"

"webui"

"webui-legacy"

"lmstudio"

"lmstudio-legacy"

"lmstudio-chatcompletions"

"llamacpp"

"koboldcpp"

"vllm"

"hugging-face"

"minimax"

"mistral"

"together"

"bedrock"

"deepseek"

"xai"

"zai"

"openrouter"

"chatgpt_oauth"

compatibility_type?: "gguf" | "mlx" | null

The framework compatibility type for the model.

Accepts one of the following:

"gguf"

"mlx"

display_name?: string | null

A human-friendly display name for the model.

effort?: "low" | "medium" | "high" | "max" | null

The effort level for Anthropic models that support it (Opus 4.5, Opus 4.6). Controls token spending and thinking behavior. Not setting this gives similar performance to 'high'.

Accepts one of the following:

"low"

"medium"

"high"

"max"

enable_reasoner?: boolean

Whether or not the model should use extended thinking if it is a 'reasoning' style model

frequency_penalty?: number | null

Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim. From OpenAI: Number between -2.0 and 2.0.

handle?: string | null

The handle for this config, in the format provider/model-name.

max_reasoning_tokens?: number

Configurable thinking budget for extended thinking. Used for enable_reasoner and also for Google Vertex models like Gemini 2.5 Flash. Minimum value is 1024 when used with enable_reasoner.

max_tokens?: number | null

The maximum number of tokens to generate. If not set, the model will use its default value.

model_endpoint?: string | null

The endpoint for the model.

model_wrapper?: string | null

The wrapper for the model.

Deprecatedparallel_tool_calls?: boolean | null

Deprecated: Use model_settings to configure parallel tool calls instead. If set to True, enables parallel tool calling. Defaults to False.

provider_category?: ProviderCategory | null

The provider category for the model.

Accepts one of the following:

"base"

"byok"

provider_name?: string | null

The provider name for the model.

put_inner_thoughts_in_kwargs?: boolean | null

Puts 'inner_thoughts' as a kwarg in the function call if this is set to True. This helps with function calling performance and also the generation of inner thoughts.

reasoning_effort?: "none" | "minimal" | "low" | 3 more | null

The reasoning effort to use when generating text reasoning models

Accepts one of the following:

"none"

"minimal"

"low"

"medium"

"high"

"xhigh"

response_format?: TextResponseFormat { type } | JsonSchemaResponseFormat { json_schema, type } | JsonObjectResponseFormat { type } | null

The response format for the model's output. Supports text, json_object, and json_schema (structured outputs). Can be set via model_settings.

Accepts one of the following:

TextResponseFormat { type }

Response format for plain text responses.

type?: "text"

The type of the response format.

JsonSchemaResponseFormat { json_schema, type }

Response format for JSON schema-based responses.

json_schema: Record<string, unknown>

The JSON schema of the response.

type?: "json_schema"

The type of the response format.

JsonObjectResponseFormat { type }

Response format for JSON object responses.

type?: "json_object"

The type of the response format.

return_logprobs?: boolean

Whether to return log probabilities of the output tokens. Useful for RL training.

return_token_ids?: boolean

Whether to return token IDs for all LLM generations via SGLang native endpoint. Required for multi-turn RL training with loss masking. Only works with SGLang provider.

strict?: boolean

Enable strict mode for tool calling. When true, tool schemas include strict: true and additionalProperties: false, guaranteeing tool outputs match JSON schemas.

temperature?: number

The temperature to use when generating text with the model. A higher temperature will result in more random text.

tier?: string | null

The cost tier for the model (cloud only).

top_logprobs?: number | null

Number of most likely tokens to return at each position (0-20). Requires return_logprobs=True.

verbosity?: "low" | "medium" | "high" | null

Soft control for how verbose model output should be, used for GPT-5 models.

Accepts one of the following:

"low"

"medium"

"high"

Model { context_window, max_context_window, model, 27 more }

Deprecatedcontext_window: number

Deprecated: Use 'max_context_window' field instead. The context window size for the model.

max_context_window: number

The maximum context window for the model

Deprecatedmodel: string

Deprecated: Use 'name' field instead. LLM model name.

Deprecatedmodel_endpoint_type: "openai" | "anthropic" | "google_ai" | 22 more

Deprecated: Use 'provider_type' field instead. The endpoint type for the model.

Accepts one of the following:

"openai"

"anthropic"

"google_ai"

"google_vertex"

"azure"

"groq"

"ollama"

"webui"

"webui-legacy"

"lmstudio"

"lmstudio-legacy"

"lmstudio-chatcompletions"

"llamacpp"

"koboldcpp"

"vllm"

"hugging-face"

"minimax"

"mistral"

"together"

"bedrock"

"deepseek"

"xai"

"zai"

"openrouter"

"chatgpt_oauth"

name: string

The actual model name used by the provider

provider_type: ProviderType

The type of the provider

Accepts one of the following:

"anthropic"

"azure"

"bedrock"

"cerebras"

"chatgpt_oauth"

"deepseek"

"google_ai"

"google_vertex"

"groq"

"hugging-face"

"letta"

"lmstudio_openai"

"minimax"

"mistral"

"ollama"

"openai"

"together"

"vllm"

"sglang"

"openrouter"

"xai"

"zai"

Deprecatedcompatibility_type?: "gguf" | "mlx" | null

Deprecated: The framework compatibility type for the model.

Accepts one of the following:

"gguf"

"mlx"

display_name?: string | null

A human-friendly display name for the model.

effort?: "low" | "medium" | "high" | "max" | null

The effort level for Anthropic models that support it (Opus 4.5, Opus 4.6). Controls token spending and thinking behavior. Not setting this gives similar performance to 'high'.

Accepts one of the following:

"low"

"medium"

"high"

"max"

Deprecatedenable_reasoner?: boolean

Deprecated: Whether or not the model should use extended thinking if it is a 'reasoning' style model.

Deprecatedfrequency_penalty?: number | null

Deprecated: Positive values penalize new tokens based on their existing frequency in the text so far.

handle?: string | null

The handle for this config, in the format provider/model-name.

Deprecatedmax_reasoning_tokens?: number

Deprecated: Configurable thinking budget for extended thinking.

Deprecatedmax_tokens?: number | null

Deprecated: The maximum number of tokens to generate.

Deprecatedmodel_endpoint?: string | null

Deprecated: The endpoint for the model.

model_type?: "llm"

Type of model (llm or embedding)

Deprecatedmodel_wrapper?: string | null

Deprecated: The wrapper for the model.

Deprecatedparallel_tool_calls?: boolean | null

Deprecated: If set to True, enables parallel tool calling.

Deprecatedprovider_category?: ProviderCategory | null

Deprecated: The provider category for the model.

Accepts one of the following:

"base"

"byok"

provider_name?: string | null

The provider name for the model.

Deprecatedput_inner_thoughts_in_kwargs?: boolean | null

Deprecated: Puts 'inner_thoughts' as a kwarg in the function call.

Deprecatedreasoning_effort?: "none" | "minimal" | "low" | 3 more | null

Deprecated: The reasoning effort to use when generating text reasoning models.

Accepts one of the following:

"none"

"minimal"

"low"

"medium"

"high"

"xhigh"

response_format?: TextResponseFormat { type } | JsonSchemaResponseFormat { json_schema, type } | JsonObjectResponseFormat { type } | null

The response format for the model's output. Supports text, json_object, and json_schema (structured outputs). Can be set via model_settings.

Accepts one of the following:

TextResponseFormat { type }

Response format for plain text responses.

type?: "text"

The type of the response format.

JsonSchemaResponseFormat { json_schema, type }

Response format for JSON schema-based responses.

json_schema: Record<string, unknown>

The JSON schema of the response.

type?: "json_schema"

The type of the response format.

JsonObjectResponseFormat { type }

Response format for JSON object responses.

type?: "json_object"

The type of the response format.

return_logprobs?: boolean

Whether to return log probabilities of the output tokens. Useful for RL training.

return_token_ids?: boolean

Whether to return token IDs for all LLM generations via SGLang native endpoint. Required for multi-turn RL training with loss masking. Only works with SGLang provider.

strict?: boolean

Enable strict mode for tool calling. When true, tool schemas include strict: true and additionalProperties: false, guaranteeing tool outputs match JSON schemas.

Deprecatedtemperature?: number

Deprecated: The temperature to use when generating text with the model.

Deprecatedtier?: string | null

Deprecated: The cost tier for the model (cloud only).

top_logprobs?: number | null

Number of most likely tokens to return at each position (0-20). Requires return_logprobs=True.

Deprecatedverbosity?: "low" | "medium" | "high" | null

Deprecated: Soft control for how verbose model output should be.

Accepts one of the following:

"low"

"medium"

"high"

ProviderCategory = "base" | "byok"

Accepts one of the following:

"base"

"byok"

ProviderType = "anthropic" | "azure" | "bedrock" | 19 more

Accepts one of the following:

"anthropic"

"azure"

"bedrock"

"cerebras"

"chatgpt_oauth"

"deepseek"

"google_ai"

"google_vertex"

"groq"

"hugging-face"

"letta"

"lmstudio_openai"

"minimax"

"mistral"

"ollama"

"openai"

"together"

"vllm"

"sglang"

"openrouter"

"xai"

"zai"

ModelsEmbeddings

List Embedding Models

client.models.embeddings.list(?): EmbeddingListResponse { display_name, embedding_dim, embedding_endpoint_type, 12 more }

get/v1/models/embedding