Skip to content
Letta Platform Letta Platform Letta Docs
Sign up

Configuring models

How to select and configure LLM models for Letta agents.

Letta agents can use models from multiple providers. This guide covers how to select models, configure model settings, and change models on existing agents.

Models are specified using a handle in the format provider/model-name:

agent = client.agents.create(
model="openai/gpt-4o",
# ...
)
ProviderHandle prefixExample
OpenAIopenai/openai/gpt-4o, openai/gpt-4o-mini
Anthropicanthropic/anthropic/claude-sonnet-4-5-20250929
Google AIgoogle_ai/google_ai/gemini-2.0-flash
Azure OpenAIazure/azure/gpt-4o
AWS Bedrockbedrock/bedrock/anthropic.claude-3-5-sonnet
Groqgroq/groq/llama-3.3-70b-versatile
Togethertogether/together/meta-llama/Llama-3-70b
OpenRouteropenrouter/openrouter/anthropic/claude-3.5-sonnet
Ollama (local)ollama/ollama/llama3.2

Use model_settings to configure model behavior:

agent = client.agents.create(
model="openai/gpt-4o",
model_settings={
"provider_type": "openai",
"temperature": 0.7,
"max_output_tokens": 4096,
},
context_window_limit=128000
)
SettingTypeDescription
provider_typestringRequired. Must match model provider (openai, anthropic, google_ai, etc.)
temperaturefloatControls randomness (0.0-2.0). Lower = more deterministic.
max_output_tokensintMaximum tokens in the response.

The context_window_limit parameter controls how much context the agent can use. This is set at the agent level, not inside model_settings:

agent = client.agents.create(
model="anthropic/claude-sonnet-4-5-20250929",
context_window_limit=200000 # Use 200K of Claude's context
)

When context fills up, Letta automatically summarizes older messages. See Context Engineering for details.

For models like o1 and o3 that support extended reasoning:

agent = client.agents.create(
model="openai/o3-mini",
model_settings={
"provider_type": "openai",
"reasoning": {
"reasoning_effort": "medium" # "low", "medium", or "high"
}
}
)

For Claude models with extended thinking capability:

agent = client.agents.create(
model="anthropic/claude-sonnet-4-5-20250929",
model_settings={
"provider_type": "anthropic",
"thinking": {
"type": "enabled",
"budget_tokens": 10000
}
}
)

Update an existing agent’s model using agents.update():

# Change model
client.agents.update(
agent_id=agent.id,
model="anthropic/claude-sonnet-4-5-20250929"
)
# Change model and settings
client.agents.update(
agent_id=agent.id,
model="openai/gpt-4o",
model_settings={
"provider_type": "openai",
"temperature": 0.5
},
context_window_limit=64000
)

For self-hosted models or OpenAI-compatible APIs, use model_endpoint:

agent = client.agents.create(
model="openai/custom-model",
model_settings={
"provider_type": "openai",
"model_endpoint": "https://your-api.example.com/v1",
"model_endpoint_type": "openai"
}
)

This works with:

  • vLLM servers
  • LM Studio
  • LocalAI
  • Any OpenAI-compatible endpoint

Agents also require an embedding model for archival memory search. On Letta Cloud, this is handled automatically. For self-hosted deployments, specify it when creating agents:

agent = client.agents.create(
model="openai/gpt-4o",
embedding="openai/text-embedding-3-small" # Required for self-hosted
)

Common embedding models:

  • openai/text-embedding-3-small (recommended)
  • openai/text-embedding-3-large
  • openai/text-embedding-ada-002