Install Qwen 2.5- Omni 3B on Ubuntu

Install Qwen 2.5- Omni 3B on Ubuntu
Install Qwen 2.5- Omni 3B on Ubuntu

Qwen2.5-Omni 3B is an advanced multimodal AI model capable of processing text, image, audio, and video in a single, 3-billion-parameter architecture. This guide provides step-by-step instructions for installing Qwen2.5-Omni 3B on Ubuntu, including three different installation methods optimized for GPU usage.

System Requirements

Hardware Specifications

  • Minimum
    • 16GB RAM (32GB recommended)
    • NVIDIA GPU with 24GB VRAM (for BF16 precision)
    • 50GB SSD storage
  • Recommended
    • 32GB RAM
    • NVIDIA A100/A6000 or RTX 4090
    • 100GB NVMe storage

Software Prerequisites

  • Ubuntu 22.04/24.04 LTS
  • Python 3.10+
  • CUDA 12.1/11.8
  • cuDNN 8.9+
  • NVIDIA Driver 525+

Method 1: Hugging Face Transformers Installation

Step 1: System Preparation

sudo apt update && sudo apt upgrade -y
sudo apt install -y python3-pip git cmake build-essential

Step 2: CUDA Toolkit Installation

wget https://developer.download.nvidia.com/compute/cuda/12.1.1/local_installers/cuda_12.1.1_530.30.02_linux.run
sudo sh cuda_12.1.1_530.30.02_linux.run

Add to .bashrc:

export PATH=/usr/local/cuda-12.1/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-12.1/lib64:$LD_LIBRARY_PATH

Step 3: Environment Setup

python3 -m venv qwen_env
source qwen_env/bin/activate
pip install --upgrade pip

Step 4: Dependency Installation

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
pip install sentencepiece bitsandbytes protobuf numpy einops timm pillow

Step 5: Transformers Installation

pip uninstall -y transformers
pip install git+https://github.com/huggingface/transformers@3a1ead0aabed473eafe527915eea8c197d424356
pip install accelerate soundfile qwen-omni-utils[decord]

Method 2: vLLM Optimization Setup

Step 1: Custom vLLM Build

git clone -b qwen2_omni_public https://github.com/fyabc/vllm.git
cd vllm
git checkout de8f43fbe9428b14d31ac5ec45d065cd3e5c3ee0

Step 2: Dependency Installation

pip install setuptools_scm torchdiffeq resampy x_transformers qwen-omni-utils accelerate
pip install -r requirements/cuda.txt
pip install --upgrade setuptools wheel
pip install .

Step 3: Model Initialization

from vllm import LLM, SamplingParams

llm = LLM(model="Qwen/Qwen2.5-Omni-3B", 
          dtype="bfloat16",
          tensor_parallel_size=2)  # For multi-GPU setups

Method 3: Ollama Quick Deployment

Step 1: Ollama Installation

curl -fsSL https://ollama.com/install.sh | sh
sudo systemctl enable ollama

Step 2: Model Download

ollama pull qwen2.5:3b-omni
ollama run qwen2.5:3b-omni

GPU Memory Management

Precision 15s Video 30s Video 60s Video
FP32 89.10 GB N/A N/A
BF16 18.38 GB 22.43 GB 28.22 GB

Optimization Tips

  1. Use attn_implementation="flash_attention_2"
  2. Enable bitsandbytes 4-bit quantization
  3. Implement gradient checkpointing

Multimodal Configuration

Video Processing Backends

# For HTTP/HTTPS support
FORCE_QWENVL_VIDEO_READER=torchvision python script.py

# For local video files
FORCE_QWENVL_VIDEO_READER=decord python script.py

Audio Processing

from transformers import AutoProcessor, AutoModelForCausalLM

processor = AutoProcessor.from_pretrained("Qwen/Qwen2.5-Omni-3B")
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-Omni-3B", device_map="auto")

Advanced Deployment

Docker Configuration

FROM nvidia/cuda:12.1-base
RUN pip install vllm qwen-omni-utils[decord]
CMD ["python3", "-m", "vllm.entrypoints.api_server"]

WebUI Integration

pip install gradio
python -m gradio webapp.py

Troubleshooting Guide

CUDA Out of Memory

    • Reduce max_new_tokens
    • Enable --load-in-4bit
    • Use --device-map="balanced"

Audio Generation Issues

sudo apt install libsndfile1
pip install soundfile

Video Processing Errors

sudo apt install ffmpeg
pip install av

Performance Benchmarks

Hardware Tokens/sec VRAM Usage
RTX 3090 (24GB) 42.1 19.8 GB
A100 40GB 78.3 22.1 GB
Dual RTX 4090 135.7 28.4 GB

Use Case Examples

Multimodal Chatbot

response = model.generate(
    input_text="Describe this image",
    input_images=["image.jpg"],
    audio_prompt="audio.wav"
)

Video Summarization

video_summary = model.process_video(
    "video.mp4",
    prompt="Summarize the key events"
)

Maintenance & Updates

Security Patches

sudo unattended-upgrade

Model Updates

pip install --upgrade transformers qwen-omni-utils

Conclusion

Installing and optimizing Qwen2.5-Omni 3B on Ubuntu can seem daunting, but by following the steps outlined in this guide, you’ll be able to take full advantage of its powerful multimodal capabilities.

Whether you choose to go with the Hugging Face Transformers method, the vLLM optimization setup, or the Ollama quick deployment, each option provides a flexible solution tailored to different hardware configurations.

References

  1. Run DeepSeek Janus-Pro 7B on Mac: A Comprehensive Guide Using ComfyUI
  2. Run DeepSeek Janus-Pro 7B on Mac: Step-by-Step Guide
  3. Run DeepSeek Janus-Pro 7B on Windows: A Complete Installation Guide
  4. Install Qwen2.5-Omni 3B on macOS
  5. Install Qwen2.5-Omni 3B on Windows