Ollama
Make sure to use tags when downloading Ollama models!
For example, don’t do ollama pull dolphin2.2-mistral
, instead do ollama pull dolphin2.2-mistral:7b-q6_K
(add the :7b-q6_K
tag).
If you don’t specify a tag, Ollama may default to using a highly compressed model variant (e.g. Q4). We highly recommend NOT using a compression level below Q5 when using GGUF (stick to Q6 or Q8 if possible). In our testing, certain models start to become extremely unstable (when used with Letta/MemGPT) below Q6.
Setup Ollama
- Download + install Ollama and the model you want to test with
- Download a model to test with by running
ollama pull <MODEL_NAME>
in the terminal (check the Ollama model library for available models)
For example, if we want to use Dolphin 2.2.1 Mistral, we can download it by running:
Enabling Ollama as a provider
To enable the Ollama provider, you must set the OLLAMA_BASE_URL
environment variable. When this is set, Letta will use available LLM and embedding models running on Ollama.
Using the docker run
server with Ollama
Since Ollama is running on the host network, you will need to use host.docker.internal
to connect to the Ollama server instead of localhost
.
CLI (pypi only)
Using letta run
and letta server
with Ollama
To chat with an agent, run:
To run the Letta server, run:
To select the model used by the server, use the dropdown in the ADE or specify a LLMConfig
object in the Python SDK.
Specifying agent models
When creating agents, you must specify the LLM and embedding models to use via a handle. You can additionally specify a context window limit (which must be less than or equal to the maximum size).