LLM configuration
vLLM
Setting up vLLM
- Download + install vLLM
- Launch a vLLM OpenAI-compatible API server using the official vLLM documentation
For example, if we want to use the model dolphin-2.2.1-mistral-7b
from HuggingFace, we would run:
python -m vllm.entrypoints.openai.api_server \
--model ehartford/dolphin-2.2.1-mistral-7b
vLLM will automatically download the model (if it’s not already downloaded) and store it in your HuggingFace cache directory.