Configuring models
How to select and configure LLM models for Letta agents.
Letta agents can use models from multiple providers. This guide covers how to select models, configure model settings, and change models on existing agents.
Model handles
Section titled “Model handles”Models are specified using a handle in the format provider/model-name:
agent = client.agents.create( model="openai/gpt-4o", # ...)Available providers
Section titled “Available providers”| Provider | Handle prefix | Example |
|---|---|---|
| OpenAI | openai/ | openai/gpt-4o, openai/gpt-4o-mini |
| Anthropic | anthropic/ | anthropic/claude-sonnet-4-5-20250929 |
| Google AI | google_ai/ | google_ai/gemini-2.0-flash |
| Azure OpenAI | azure/ | azure/gpt-4o |
| AWS Bedrock | bedrock/ | bedrock/anthropic.claude-3-5-sonnet |
| Groq | groq/ | groq/llama-3.3-70b-versatile |
| Together | together/ | together/meta-llama/Llama-3-70b |
| OpenRouter | openrouter/ | openrouter/anthropic/claude-3.5-sonnet |
| Ollama (local) | ollama/ | ollama/llama3.2 |
Model settings
Section titled “Model settings”Use model_settings to configure model behavior:
agent = client.agents.create( model="openai/gpt-4o", model_settings={ "provider_type": "openai", "temperature": 0.7, "max_output_tokens": 4096, }, context_window_limit=128000)const agent = await client.agents.create({ model: "openai/gpt-4o", model_settings: { provider_type: "openai", temperature: 0.7, max_output_tokens: 4096, }, context_window_limit: 128000,});Common settings
Section titled “Common settings”| Setting | Type | Description |
|---|---|---|
provider_type | string | Required. Must match model provider (openai, anthropic, google_ai, etc.) |
temperature | float | Controls randomness (0.0-2.0). Lower = more deterministic. |
max_output_tokens | int | Maximum tokens in the response. |
Context window limit
Section titled “Context window limit”The context_window_limit parameter controls how much context the agent can use. This is set at the agent level, not inside model_settings:
agent = client.agents.create( model="anthropic/claude-sonnet-4-5-20250929", context_window_limit=200000 # Use 200K of Claude's context)When context fills up, Letta automatically summarizes older messages. See Context Engineering for details.
Provider-specific settings
Section titled “Provider-specific settings”OpenAI reasoning models
Section titled “OpenAI reasoning models”For models like o1 and o3 that support extended reasoning:
agent = client.agents.create( model="openai/o3-mini", model_settings={ "provider_type": "openai", "reasoning": { "reasoning_effort": "medium" # "low", "medium", or "high" } })Anthropic extended thinking
Section titled “Anthropic extended thinking”For Claude models with extended thinking capability:
agent = client.agents.create( model="anthropic/claude-sonnet-4-5-20250929", model_settings={ "provider_type": "anthropic", "thinking": { "type": "enabled", "budget_tokens": 10000 } })Changing an agent’s model
Section titled “Changing an agent’s model”Update an existing agent’s model using agents.update():
# Change modelclient.agents.update( agent_id=agent.id, model="anthropic/claude-sonnet-4-5-20250929")
# Change model and settingsclient.agents.update( agent_id=agent.id, model="openai/gpt-4o", model_settings={ "provider_type": "openai", "temperature": 0.5 }, context_window_limit=64000)// Change modelawait client.agents.update(agent.id, { model: "anthropic/claude-sonnet-4-5-20250929",});
// Change model and settingsawait client.agents.update(agent.id, { model: "openai/gpt-4o", model_settings: { provider_type: "openai", temperature: 0.5, }, context_window_limit: 64000,});Custom endpoints (OpenAI-compatible)
Section titled “Custom endpoints (OpenAI-compatible)”For self-hosted models or OpenAI-compatible APIs, use model_endpoint:
agent = client.agents.create( model="openai/custom-model", model_settings={ "provider_type": "openai", "model_endpoint": "https://your-api.example.com/v1", "model_endpoint_type": "openai" })This works with:
- vLLM servers
- LM Studio
- LocalAI
- Any OpenAI-compatible endpoint
Embedding models
Section titled “Embedding models”Agents also require an embedding model for archival memory search. On Letta Cloud, this is handled automatically. For self-hosted deployments, specify it when creating agents:
agent = client.agents.create( model="openai/gpt-4o", embedding="openai/text-embedding-3-small" # Required for self-hosted)Common embedding models:
openai/text-embedding-3-small(recommended)openai/text-embedding-3-largeopenai/text-embedding-ada-002
Next steps
Section titled “Next steps”- Context Engineering - Optimize context window usage
- Provider setup guides - Configure specific providers for self-hosted
- Streaming - Handle streaming responses