Deploy LLM
Requirement
Basic Software
- Ubuntu 24.04
- CUDA >= 12.4
- cuDNN >= 9.5.0
- Anaconda
- Docker
Anaconda Libraries
- Python
- PyTorch
- Tensorflow
- Transformer
- SageMath
- Keras
- Jupyter
LLM
- Qwen2.5-72B-Instruct
- Engine: vLLM
- Open WebUI
Install Basic Software
After installing Ubuntu 24.04, install some basic software (recommand to replace sources):
Install CUDA
Configure Nvidia key rings according to https://developer.nvidia.com/cuda-downloads.
sudo apt install cuda-toolkit-12-6
sudo apt install nvidia-open
sudo apt install nvidia-gds
sudo reboot
Default nvcc and CUDA SDK binaries are not in the PATH, add the following into ~/.bashrc
:
export CUDA_HOME=/usr/local/cuda
export PATH=${CUDA_HOME}/bin:${PATH}
export LD_LIBRARY_PATH=${CUDA_HOME}/lib64:$LD_LIBRARY_PATH
Check CUDA version:
Install cuDNN
Check cuDNN version:
Install Anaconda
Deploy LLM
Install Huggingface CLI
Create Conda Environment
source ~/anaconda3/bin/activate
conda init --all
conda create -n open-webui python=3.11
conda activate open-webui
Install Conda Dependencies
Download Model
Configure Nginx
In Open WebUI, we need to enable SSL access to enbale file uploading, audio input and other features. Therefore, an nginx server is required to serve the Open WebUI.
Configure OpenSSL Certificate
Generate RSA key:
Make a copy of the key without password:
Generate a certificate signing request:
Generate a self-signed certificate:
With the above certificate files, enable SSL in Nginx.
Configure Nginx
Add the following into /etc/nginx/conf.d/open-webui.conf
:
server {
listen 443 ssl;
server_name 10.98.36.37;
ssl_certificate /path/to/filessl.crt;
ssl_certificate_key /path/to/filessl_nopass.key;
location / {
proxy_pass http://localhost:8080;
# Standatd headers
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# Add WebSocket support
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
# Timeouts for WebSocket connections
proxy_read_timeout 60s;
proxy_send_timeout 60s;
}
}
Start LLM
Start vLLM
Start UI
conda activate open-webui
export HF_ENDPOINT=https://hf-mirror.com
export ENABLE_OLLAMA_API=False
export OPENAI_API_BASE_URL=http://127.0.0.1:8000/v1
export DEFAULT_MODELS="Qwen/Qwen2.5-72B-Instruct-AWQ"
open-webui serve
Alternatively, we can also use docker container to start open-webui:
docker run \
-d \
--add-host host.docker.internal:host-gateway \
-e DEFAULT_USER_ROLE="user" \
-e OPENAI_API_BASE_URL="http://host.docker.internal:8000/v1" \
-e HF_ENDPOINT="https://hf-mirror.com" \
-e DEFAULT_MODELS="Qwen/Qwen2.5-72B-Instruct-AWQ" \
-e ENABLE_OLLAMA_API=False \
-v $(pwd)/data/open-webui:/app/backend/data \
--name open-webui \
--restart always \
-p 8080:8080 \
ghcr.io/open-webui/open-webui:main