Skip to content

Deploy LLM

Requirement

Basic Software

  • Ubuntu 24.04
  • CUDA >= 12.4
  • cuDNN >= 9.5.0
  • Anaconda
  • Docker

Anaconda Libraries

  • Python
  • PyTorch
  • Tensorflow
  • Transformer
  • SageMath
  • Keras
  • Jupyter

LLM

  • Qwen2.5-72B-Instruct
  • Engine: vLLM
  • Open WebUI

Install Basic Software

After installing Ubuntu 24.04, install some basic software (recommand to replace sources):

sudo apt install vim git curl wget neofetch htop byobu build-essential

Install CUDA

Configure Nvidia key rings according to https://developer.nvidia.com/cuda-downloads.

sudo apt install cuda-tookit-12-6
sudo apt install nvidia-open
sudo apt install nvidia-gds
sudo reboot

Install cuDNN

sudo apt install cudnn cudnn-cuda-12

Install Anaconda

Deploy LLM

Install Huggingface CLI

pip install -U "huggingface_hub[cli]"

Create Conda Environment

source ~/anacoda3/bin/activate
conda init --all
conda create -n open-webui python=3.11
conda activate open-webui

Install Conda Dependencies

pip install -U open-webui vllm torch transformers

Download Model

export HF_ENDPOINT=https://hf-mirror.com
huggingface-cli download Qwen/Qwen2.5-72B-Instruct-AWQ

Start vLLM

vllm serve Qwen/Qwen2.5-72B-Instruct-AWQ

Start UI

conda activate open-webui
export HF_ENDPOINT=https://hf-mirror.com
export ENABLE_OLLAMA_API=False
export OPENAI_API_BASE_URL=http://127.0.0.1:8000/v1
export DEFAULT_MODELS="Qwen/Qwen2.5-72B-Instruct-AWQ"
open-webui serve