Azure OpenAI

To use Letta with Azure OpenAI, set the environment variables AZURE_API_KEY and AZURE_BASE_URL. You can also optionally specify AZURE_API_VERSION (default is 2024-09-01-preview)

You can use Letta with OpenAI if you have an OpenAI account and API key. Once you have set your AZURE_API_KEY and AZURE_BASE_URL specified in your environment variables, you can select what model and configure the context window size

Currently, Letta supports the following OpenAI models:

gpt-4 (recommended for advanced reasoning)
gpt-4o-mini (recommended for low latency and cost)
gpt-4o
gpt-4-turbo (not recommended, should use gpt-4o-mini instead)
gpt-3.5-turbo (not recommended, should use gpt-4o-mini instead)

Enabling Azure OpenAI models

To enable the Azure provider, set your key as an environment variable:

$ export AZURE_API_KEY="..."
> export AZURE_BASE_URL="..."
> 
> # Optional: specify API version (default is 2024-09-01-preview)
> export AZURE_API_VERSION="2024-09-01-preview"

Now, Azure OpenAI models will be enabled with you run letta run or the letta service.

Using the `docker run` server with OpenAI

To enable Azure OpenAI models, simply set your AZURE_API_KEY and AZURE_BASE_URL as an environment variables:

$ # replace `~/.letta/.persist/pgdata` with wherever you want to store your agent data
> docker run \
>   -v ~/.letta/.persist/pgdata:/var/lib/postgresql/data \
>   -p 8283:8283 \
>   -e AZURE_API_KEY="your_azure_api_key" \
>   -e AZURE_BASE_URL="your_azure_base_url" \
>   -e AZURE_API_VERSION="your_azure_api_version" \
>   letta/letta:latest

CLI (pypi only)

Using `letta run` and `letta server` with Azure OpenAI

To chat with an agent, run:

$ export AZURE_API_KEY="..."
> export AZURE_BASE_URL="..."
> letta run

To run the Letta server, run:

$ export AZURE_API_KEY="..."
> export AZURE_BASE_URL="..."
> letta server

To select the model used by the server, use the dropdown in the ADE or specify a LLMConfig object in the Python SDK.

Specifying agent models

When creating agents, you must specify the LLM and embedding models to use via a handle. You can additionally specify a context window limit (which must be less than or equal to the maximum size).

1 from letta_client import Letta
2 
3 client = Letta(base_url="http://localhost:8283")
4 
5 azure_agent = client.agents.create(
6     model="azure/gpt-4o-mini",
7     embedding="azure/text-embedding-3-small",
8     # optional configuration
9     context_window_limit=16000,
10 )