LLM configuration
vLLM
To use Letta with vLLM, set the environment variable
VLLM_API_BASE
to point to your vLLM ChatCompletitions server.Setting up vLLM
- Download + install vLLM
- Launch a vLLM OpenAI-compatible API server using the official vLLM documentation
For example, if we want to use the model dolphin-2.2.1-mistral-7b
from HuggingFace, we would run:
vLLM will automatically download the model (if it’s not already downloaded) and store it in your HuggingFace cache directory.
Enabling vLLM as a provider
To enable the vLLM provider, you must set the VLLM_API_BASE
environment variable. When this is set, Letta will use available LLM and embedding models running on vLLM.