Models

List Llm Models

models.list() -> ModelListResponse

get/v1/models/

ModelsExpand Collapse

class EmbeddingConfig: …

Configuration for embedding model connection and processing parameters.

embedding_dim: int

The dimension of the embedding.

embedding_endpoint_type: Literal["openai", "anthropic", "bedrock", 16 more]

The endpoint type for the model.

Accepts one of the following:

"openai"

"anthropic"

"bedrock"

"google_ai"

"google_vertex"

"azure"

"groq"

"ollama"

"webui"

"webui-legacy"

"lmstudio"

"lmstudio-legacy"

"llamacpp"

"koboldcpp"

"vllm"

"hugging-face"

"mistral"

"together"

"pinecone"

embedding_model: str

The model for the embedding.

azure_deployment: Optional[str]

The Azure deployment for the model.

azure_endpoint: Optional[str]

The Azure endpoint for the model.

azure_version: Optional[str]

The Azure version for the model.

batch_size: Optional[int]

The maximum batch size for processing embeddings.

embedding_chunk_size: Optional[int]

The chunk size of the embedding.

embedding_endpoint: Optional[str]

The endpoint for the model (None if local).

handle: Optional[str]

The handle for this config, in the format provider/model-name.

class EmbeddingModel: …

display_name: str

Display name for the model shown in UI

embedding_dim: int

The dimension of the embedding

Deprecatedembedding_endpoint_type: Literal["openai", "anthropic", "bedrock", 16 more]

Deprecated: Use 'provider_type' field instead. The endpoint type for the embedding model.

Accepts one of the following:

"openai"

"anthropic"

"bedrock"

"google_ai"

"google_vertex"

"azure"

"groq"

"ollama"

"webui"

"webui-legacy"

"lmstudio"

"lmstudio-legacy"

"llamacpp"

"koboldcpp"

"vllm"

"hugging-face"

"mistral"

"together"

"pinecone"

Deprecatedembedding_model: str

Deprecated: Use 'name' field instead. Embedding model name.

name: str

The actual model name used by the provider

provider_name: str

The name of the provider

provider_type: ProviderType

The type of the provider

Accepts one of the following:

"anthropic"

"azure"

"bedrock"

"cerebras"

"chatgpt_oauth"

"deepseek"

"google_ai"

"google_vertex"

"groq"

"hugging-face"

"letta"

"lmstudio_openai"

"minimax"

"mistral"

"ollama"

"openai"

"together"

"vllm"

"sglang"

"openrouter"

"xai"

"zai"

Deprecatedazure_deployment: Optional[str]

Deprecated: The Azure deployment for the model.

Deprecatedazure_endpoint: Optional[str]

Deprecated: The Azure endpoint for the model.

Deprecatedazure_version: Optional[str]

Deprecated: The Azure version for the model.

Deprecatedbatch_size: Optional[int]

Deprecated: The maximum batch size for processing embeddings.

Deprecatedembedding_chunk_size: Optional[int]

Deprecated: The chunk size of the embedding.

Deprecatedembedding_endpoint: Optional[str]

Deprecated: The endpoint for the model.

handle: Optional[str]

The handle for this config, in the format provider/model-name.

model_type: Optional[Literal["embedding"]]

Type of model (llm or embedding)

class LlmConfig: …

Configuration for Language Model (LLM) connection and generation parameters.

.. deprecated:: LLMConfig is deprecated and should not be used as an input or return type in API calls. Use the schemas in letta.schemas.model (ModelSettings, OpenAIModelSettings, etc.) instead. For conversion, use the _to_model() method or Model._from_llm_config() method.

context_window: int

The context window size for the model.

model: str

LLM model name.

model_endpoint_type: Literal["openai", "anthropic", "google_ai", 22 more]

The endpoint type for the model.

Accepts one of the following:

"openai"

"anthropic"

"google_ai"

"google_vertex"

"azure"

"groq"

"ollama"

"webui"

"webui-legacy"

"lmstudio"

"lmstudio-legacy"

"lmstudio-chatcompletions"

"llamacpp"

"koboldcpp"

"vllm"

"hugging-face"

"minimax"

"mistral"

"together"

"bedrock"

"deepseek"

"xai"

"zai"

"openrouter"

"chatgpt_oauth"

compatibility_type: Optional[Literal["gguf", "mlx"]]

The framework compatibility type for the model.

Accepts one of the following:

"gguf"

"mlx"

display_name: Optional[str]

A human-friendly display name for the model.

effort: Optional[Literal["low", "medium", "high", "max"]]

The effort level for Anthropic models that support it (Opus 4.5, Opus 4.6). Controls token spending and thinking behavior. Not setting this gives similar performance to 'high'.

Accepts one of the following:

"low"

"medium"

"high"

"max"

enable_reasoner: Optional[bool]

Whether or not the model should use extended thinking if it is a 'reasoning' style model

frequency_penalty: Optional[float]

Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim. From OpenAI: Number between -2.0 and 2.0.

handle: Optional[str]

The handle for this config, in the format provider/model-name.

max_reasoning_tokens: Optional[int]

Configurable thinking budget for extended thinking. Used for enable_reasoner and also for Google Vertex models like Gemini 2.5 Flash. Minimum value is 1024 when used with enable_reasoner.

max_tokens: Optional[int]

The maximum number of tokens to generate. If not set, the model will use its default value.

model_endpoint: Optional[str]

The endpoint for the model.

model_wrapper: Optional[str]

The wrapper for the model.

Deprecatedparallel_tool_calls: Optional[bool]

Deprecated: Use model_settings to configure parallel tool calls instead. If set to True, enables parallel tool calling. Defaults to False.

provider_category: Optional[ProviderCategory]

The provider category for the model.

Accepts one of the following:

"base"

"byok"

provider_name: Optional[str]

The provider name for the model.

put_inner_thoughts_in_kwargs: Optional[bool]

Puts 'inner_thoughts' as a kwarg in the function call if this is set to True. This helps with function calling performance and also the generation of inner thoughts.

reasoning_effort: Optional[Literal["none", "minimal", "low", 3 more]]

The reasoning effort to use when generating text reasoning models

Accepts one of the following:

"none"

"minimal"

"low"

"medium"

"high"

"xhigh"

response_format: Optional[ResponseFormat]

The response format for the model's output. Supports text, json_object, and json_schema (structured outputs). Can be set via model_settings.

Accepts one of the following:

class TextResponseFormat: …

Response format for plain text responses.

type: Optional[Literal["text"]]

The type of the response format.

class JsonSchemaResponseFormat: …

Response format for JSON schema-based responses.

json_schema: Dict[str, object]

The JSON schema of the response.

type: Optional[Literal["json_schema"]]

The type of the response format.

class JsonObjectResponseFormat: …

Response format for JSON object responses.

type: Optional[Literal["json_object"]]

The type of the response format.

return_logprobs: Optional[bool]

Whether to return log probabilities of the output tokens. Useful for RL training.

return_token_ids: Optional[bool]

Whether to return token IDs for all LLM generations via SGLang native endpoint. Required for multi-turn RL training with loss masking. Only works with SGLang provider.

strict: Optional[bool]

Enable strict mode for tool calling. When true, tool schemas include strict: true and additionalProperties: false, guaranteeing tool outputs match JSON schemas.

temperature: Optional[float]

The temperature to use when generating text with the model. A higher temperature will result in more random text.

tier: Optional[str]

The cost tier for the model (cloud only).

top_logprobs: Optional[int]

Number of most likely tokens to return at each position (0-20). Requires return_logprobs=True.

verbosity: Optional[Literal["low", "medium", "high"]]

Soft control for how verbose model output should be, used for GPT-5 models.

Accepts one of the following:

"low"

"medium"

"high"

class Model: …

Deprecatedcontext_window: int

Deprecated: Use 'max_context_window' field instead. The context window size for the model.

max_context_window: int

The maximum context window for the model

Deprecatedmodel: str

Deprecated: Use 'name' field instead. LLM model name.

Deprecatedmodel_endpoint_type: Literal["openai", "anthropic", "google_ai", 22 more]

Deprecated: Use 'provider_type' field instead. The endpoint type for the model.

Accepts one of the following:

"openai"

"anthropic"

"google_ai"

"google_vertex"

"azure"

"groq"

"ollama"

"webui"

"webui-legacy"

"lmstudio"

"lmstudio-legacy"

"lmstudio-chatcompletions"

"llamacpp"

"koboldcpp"

"vllm"

"hugging-face"

"minimax"

"mistral"

"together"

"bedrock"

"deepseek"

"xai"

"zai"

"openrouter"

"chatgpt_oauth"

name: str

The actual model name used by the provider

provider_type: ProviderType

The type of the provider

Accepts one of the following:

"anthropic"

"azure"

"bedrock"

"cerebras"

"chatgpt_oauth"

"deepseek"

"google_ai"

"google_vertex"

"groq"

"hugging-face"

"letta"

"lmstudio_openai"

"minimax"

"mistral"

"ollama"

"openai"

"together"

"vllm"

"sglang"

"openrouter"

"xai"

"zai"

Deprecatedcompatibility_type: Optional[Literal["gguf", "mlx"]]

Deprecated: The framework compatibility type for the model.

Accepts one of the following:

"gguf"

"mlx"

display_name: Optional[str]

A human-friendly display name for the model.

effort: Optional[Literal["low", "medium", "high", "max"]]

The effort level for Anthropic models that support it (Opus 4.5, Opus 4.6). Controls token spending and thinking behavior. Not setting this gives similar performance to 'high'.

Accepts one of the following:

"low"

"medium"

"high"

"max"

Deprecatedenable_reasoner: Optional[bool]

Deprecated: Whether or not the model should use extended thinking if it is a 'reasoning' style model.

Deprecatedfrequency_penalty: Optional[float]

Deprecated: Positive values penalize new tokens based on their existing frequency in the text so far.

handle: Optional[str]

The handle for this config, in the format provider/model-name.

Deprecatedmax_reasoning_tokens: Optional[int]

Deprecated: Configurable thinking budget for extended thinking.

Deprecatedmax_tokens: Optional[int]

Deprecated: The maximum number of tokens to generate.

Deprecatedmodel_endpoint: Optional[str]

Deprecated: The endpoint for the model.

model_type: Optional[Literal["llm"]]

Type of model (llm or embedding)

Deprecatedmodel_wrapper: Optional[str]

Deprecated: The wrapper for the model.

Deprecatedparallel_tool_calls: Optional[bool]

Deprecated: If set to True, enables parallel tool calling.

Deprecatedprovider_category: Optional[ProviderCategory]

Deprecated: The provider category for the model.

Accepts one of the following:

"base"

"byok"

provider_name: Optional[str]

The provider name for the model.

Deprecatedput_inner_thoughts_in_kwargs: Optional[bool]

Deprecated: Puts 'inner_thoughts' as a kwarg in the function call.

Deprecatedreasoning_effort: Optional[Literal["none", "minimal", "low", 3 more]]

Deprecated: The reasoning effort to use when generating text reasoning models.

Accepts one of the following:

"none"

"minimal"

"low"

"medium"

"high"

"xhigh"

response_format: Optional[ResponseFormat]

The response format for the model's output. Supports text, json_object, and json_schema (structured outputs). Can be set via model_settings.

Accepts one of the following:

class TextResponseFormat: …

Response format for plain text responses.

type: Optional[Literal["text"]]

The type of the response format.

class JsonSchemaResponseFormat: …

Response format for JSON schema-based responses.

json_schema: Dict[str, object]

The JSON schema of the response.

type: Optional[Literal["json_schema"]]

The type of the response format.

class JsonObjectResponseFormat: …

Response format for JSON object responses.

type: Optional[Literal["json_object"]]

The type of the response format.

return_logprobs: Optional[bool]

Whether to return log probabilities of the output tokens. Useful for RL training.

return_token_ids: Optional[bool]

Whether to return token IDs for all LLM generations via SGLang native endpoint. Required for multi-turn RL training with loss masking. Only works with SGLang provider.

strict: Optional[bool]

Enable strict mode for tool calling. When true, tool schemas include strict: true and additionalProperties: false, guaranteeing tool outputs match JSON schemas.

Deprecatedtemperature: Optional[float]

Deprecated: The temperature to use when generating text with the model.

Deprecatedtier: Optional[str]

Deprecated: The cost tier for the model (cloud only).

top_logprobs: Optional[int]

Number of most likely tokens to return at each position (0-20). Requires return_logprobs=True.

Deprecatedverbosity: Optional[Literal["low", "medium", "high"]]

Deprecated: Soft control for how verbose model output should be.

Accepts one of the following:

"low"

"medium"

"high"

Literal["base", "byok"]

Accepts one of the following:

"base"

"byok"

Literal["anthropic", "azure", "bedrock", 19 more]

Accepts one of the following:

"anthropic"

"azure"

"bedrock"

"cerebras"

"chatgpt_oauth"

"deepseek"

"google_ai"

"google_vertex"

"groq"

"hugging-face"

"letta"

"lmstudio_openai"

"minimax"

"mistral"

"ollama"

"openai"

"together"

"vllm"

"sglang"

"openrouter"

"xai"

"zai"

ModelsEmbeddings

List Embedding Models

models.embeddings.list() -> EmbeddingListResponse

get/v1/models/embedding