To use Letta with Ollama, set the environment variable OLLAMA_BASE_URL=http://localhost:11434.

Make sure to use tags when downloading Ollama models!

For example, don’t do ollama pull dolphin2.2-mistral, instead do ollama pull dolphin2.2-mistral:7b-q6_K (add the :7b-q6_K tag).

If you don’t specify a tag, Ollama may default to using a highly compressed model variant (e.g. Q4). We highly recommend NOT using a compression level below Q5 when using GGUF (stick to Q6 or Q8 if possible). In our testing, certain models start to become extremely unstable (when used with Letta/MemGPT) below Q6.

Setup Ollama

  1. Download + install Ollama and the model you want to test with
  2. Download a model to test with by running ollama pull <MODEL_NAME> in the terminal (check the Ollama model library for available models)

For example, if we want to use Dolphin 2.2.1 Mistral, we can download it by running:

1# Let's use the q6_K variant
2ollama pull dolphin2.2-mistral:7b-q6_K
1pulling manifest
2pulling d8a5ee4aba09... 100% |█████████████████████████████████████████████████████████████████████████| (4.1/4.1 GB, 20 MB/s)
3pulling a47b02e00552... 100% |██████████████████████████████████████████████████████████████████████████████| (106/106 B, 77 B/s)
4pulling 9640c2212a51... 100% |████████████████████████████████████████████████████████████████████████████████| (41/41 B, 22 B/s)
5pulling de6bcd73f9b4... 100% |████████████████████████████████████████████████████████████████████████████████| (58/58 B, 28 B/s)
6pulling 95c3d8d4429f... 100% |█████████████████████████████████████████████████████████████████████████████| (455/455 B, 330 B/s)
7verifying sha256 digest
8writing manifest
9removing any unused layers
10success

Enabling Ollama as a provider

To enable the Ollama provider, you must set the OLLAMA_BASE_URL environment variable. When this is set, Letta will use available LLM and embedding models running on Ollama.

Using the docker run server with Ollama

Since Ollama is running on the host network, you will need to use host.docker.internal to connect to the Ollama server instead of localhost. You’ll also want to make sure to open the port 11434 (the default port for Ollama) on your host machine.

$# replace `~/.letta/.persist/pgdata` with wherever you want to store your agent data
>docker run \
> -v ~/.letta/.persist/pgdata:/var/lib/postgresql/data \
> -p 8283:8283 \
> -p 11434:11434 \
> -e OLLAMA_BASE_URL="http://host.docker.internal:11434" \
> letta/letta:latest

Using letta run and letta server with Ollama

To chat with an agent, run:

$export OLLAMA_BASE_URL="http://localhost:11434"
>letta run

To run the Letta server, run:

$export OLLAMA_BASE_URL="http://localhost:11434"
>letta server

To select the model used by the server, use the dropdown in the ADE or specify a LLMConfig object in the Python SDK.

Built with