Deploy LLM

Requirement

Basic Software

Ubuntu 24.04
CUDA >= 12.4
cuDNN >= 9.5.0
Anaconda
Docker

Anaconda Libraries

Python
PyTorch
Tensorflow
Transformer
SageMath
Keras
Jupyter

LLM

Qwen2.5-72B-Instruct
Engine: vLLM
Open WebUI

Requirements

Install CUDA

Configure Nvidia key rings according to https://developer.nvidia.com/cuda-downloads.

sudo apt install cuda-toolkit-12-6
sudo apt install nvidia-open
sudo apt install nvidia-gds
sudo reboot

The default nvcc and CUDA SDK binaries are not in PATH. Add the following to ~/.bashrc:

export CUDA_HOME=/usr/local/cuda
export PATH=${CUDA_HOME}/bin:${PATH}
export LD_LIBRARY_PATH=${CUDA_HOME}/lib64:$LD_LIBRARY_PATH

Check CUDA version:

nvcc --version

Install cuDNN

sudo apt install cudnn cudnn-cuda-12

Check cuDNN version:

cat /usr/include/cudnn_version.h | grep CUDNN_MAJOR -A 2

Install Anaconda

Deploy LLM

Install Huggingface CLI

pip install -U "huggingface_hub[cli]"

Create Conda Environment

source ~/anaconda3/bin/activate
conda init --all
conda create -n open-webui python=3.11
conda activate open-webui

Install Conda Dependencies

pip install -U open-webui vllm torch transformers

Download Model

export HF_ENDPOINT=https://hf-mirror.com
huggingface-cli download Qwen/Qwen2.5-72B-Instruct-AWQ

Configure Nginx

In Open WebUI, we need to enable SSL access to enable file uploads, audio input, and other features. Therefore, an Nginx server is required to serve Open WebUI.

Configure OpenSSL Certificate

Generate RSA key:

openssl genrsa -des3 -out filessl.key 2048

Make a copy of the key without password:

openssl rsa -in filessl.key -out filessl_nopass.key

Generate a certificate signing request:

openssl req -new -key filessl.key -out filessl.csr

Generate a self-signed certificate:

openssl x509 -req -days 365 -in filessl.csr -signkey filessl.key -out filessl.crt

With the certificate files above, enable SSL in Nginx.

Configure Nginx

Add the following into /etc/nginx/conf.d/open-webui.conf:

server {
  listen 443 ssl;
  server_name 10.98.36.37;

  ssl_certificate /path/to/filessl.crt;
  ssl_certificate_key /path/to/filessl_nopass.key;

  location / {
    proxy_pass http://localhost:8080;

    # Standard headers
    proxy_set_header Host $host;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Forwarded-Proto $scheme;

    # Add WebSocket support
    proxy_http_version 1.1;
    proxy_set_header Upgrade $http_upgrade;
    proxy_set_header Connection "upgrade";

    # Timeouts for WebSocket connections
    proxy_read_timeout 60s;
    proxy_send_timeout 60s;
  }
}

Start LLM

Start vLLM

vllm serve Qwen/Qwen2.5-72B-Instruct-AWQ

Start UI

conda activate open-webui
export HF_ENDPOINT=https://hf-mirror.com
export ENABLE_OLLAMA_API=False
export OPENAI_API_BASE_URL=http://127.0.0.1:8000/v1
export DEFAULT_MODELS="Qwen/Qwen2.5-72B-Instruct-AWQ"
open-webui serve

Alternatively, we can also use a Docker container to start Open WebUI:

docker run \
  -d \
  --add-host host.docker.internal:host-gateway \
  -e DEFAULT_USER_ROLE="user" \
  -e OPENAI_API_BASE_URL="http://host.docker.internal:8000/v1" \
  -e HF_ENDPOINT="https://hf-mirror.com" \
  -e DEFAULT_MODELS="Qwen/Qwen2.5-72B-Instruct-AWQ" \
  -e ENABLE_OLLAMA_API=False \
  -v $(pwd)/data/open-webui:/app/backend/data \
  --name open-webui \
  --restart always \
  -p 8080:8080 \
  ghcr.io/open-webui/open-webui:main