qwen

Install Qwen2.5-Omni 3B on macOS

Anas Mohammad

May 1, 2025 • 3 min read

Qwen2.5-Omni 3B is a cutting-edge multimodal AI model developed to handle text, image, audio, and video processing tasks. While macOS doesn't offer the same native GPU acceleration as Linux or Windows systems, it's still possible to run Qwen2.5-Omni 3B locally with some optimization.

This guide walks you through the complete installation process on macOS, with additional tips to improve performance on Apple Silicon and CPU-based systems.

System Requirements

To ensure smooth operation, check the following prerequisites:

macOS: Monterey (12.x) or newer
RAM: Minimum 16GB (32GB recommended)
Storage: At least 10GB of free disk space
Recommended Hardware:
- Apple Silicon (M1/M2/M3) for ARM optimizations
- Optional eGPU (24GB+ VRAM) for advanced acceleration

Step 1: Install Prerequisites

Homebrew

Install Homebrew by running:

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

Add it to your shell environment:

echo 'eval "$(/opt/homebrew/bin/brew shellenv)"' >> ~/.zshrc
source ~/.zshrc

Python and Tools

Install Python 3.10 and core dependencies:

brew install [email protected]
pip install --upgrade pip
brew install cmake ffmpeg
pip install torch torchvision torchaudio

Step 2: Configure Python Environment

Create and activate a virtual environment:

python -m venv qwen-env
source qwen-env/bin/activate

Step 3: Install Custom Transformers for Qwen

Uninstall any existing transformers library and install the custom preview version that supports Qwen2.5-Omni:

pip uninstall transformers -y
pip install git+https://github.com/huggingface/[email protected]
pip install accelerate sentencepiece soundfile einops

Step 4: Install Qwen-Omni Utilities

Install the toolkit with video decoding support:

pip install qwen-omni-utils[decord]

Note: If the decord installation fails on macOS, use:

pip install qwen-omni-utils

This fallback may be slower for video processing.

Step 5: Download Qwen2.5-Omni 3B Model

Use the huggingface_hub API to download the model:

from huggingface_hub import snapshot_download
snapshot_download(repo_id="Qwen/Qwen2.5-Omni-3B", local_dir="qwen-3b")

Alternatively, download it manually from the Hugging Face Hub.

Step 6: Run Inference with Qwen2.5-Omni

Create an inference.py file with the following code:

import torch
from transformers import Qwen2_5OmniModel, Qwen2_5OmniProcessor
from qwen_omni_utils import process_mm_info

# Load model and processor
model = Qwen2_5OmniModel.from_pretrained("qwen-3b", device_map="auto", torch_dtype=torch.float16)
processor = Qwen2_5OmniProcessor.from_pretrained("qwen-3b")

# Prepare input
inputs = processor("Describe this image: [img_path]", return_tensors="pt").to("cpu")
outputs = model.generate(**inputs)

# Output
print(processor.decode(outputs[0]))

Performance Optimization Tips

Memory Management

Apple Silicon Optimization: Use torch_dtype=torch.bfloat16 when supported.
Device Offloading: Use device_map="auto" to split workloads.

Quantization for Low RAM:

model = Qwen2_5OmniModel.from_pretrained("qwen-3b", load_in_4bit=True)

Video Processing

Short Clips: Limit video input to under 15 seconds for stability.

Force torchvision backend:

export FORCE_QWENVL_VIDEO_READER=torchvision

Troubleshooting Common Errors

KeyError: 'qwen2_5_omni'
→ Reinstall the correct transformers branch.
Video Load Failures
→ Update torchvision to at least version 0.19.0.
Memory Overflow
→ Reduce input size or set a max_length value in generate().

Advanced Deployment Options

Ollama (Optional)

Use Ollama for a managed local LLM runtime:

brew install --cask ollama
ollama pull qwen2.5-omni-3b

⚠️ You may need to configure custom templates for Qwen2.5-Omni compatibility.

vLLM Server (Experimental)

Clone and run a custom vLLM fork:

git clone -b qwen2_omni_public https://github.com/fyabc/vllm.git
cd vllm && pip install -e .
python -m vllm.entrypoints.api_server --model Qwen/Qwen2.5-Omni-3B

Practical Use Cases

1. Voice-Based Chatbot

inputs = processor("Speak a welcome message.", voice="Ethan", return_tensors="pt")
audio = model.generate_audio(**inputs)
sf.write("output.wav", audio.numpy(), 16000)

2. Video Summarization

inputs = processor("Summarize this video: [video_url]", return_tensors="pt")

Limitations to Consider

Slow Inference on CPU: Expect <1 token/sec without a GPU.
License Restrictions: Qwen2.5 is licensed for non-commercial use.
Missing Dependencies: Audio tasks require soundfile.

Conclusion

Installing and running Qwen2.5-Omni 3B on macOS is entirely feasible with the right configuration, even without powerful GPUs.

By following the steps outlined above—setting up Python environments, installing custom libraries, and managing performance through quantization and precision tuning—you can leverage this powerful multimodal AI model for local experimentation and prototyping.

Install Qwen2.5-Omni 3B on macOS

Anas Mohammad

System Requirements

Step 1: Install Prerequisites

Homebrew

Python and Tools

Step 2: Configure Python Environment

Step 3: Install Custom Transformers for Qwen

Step 4: Install Qwen-Omni Utilities

Step 5: Download Qwen2.5-Omni 3B Model

Step 6: Run Inference with Qwen2.5-Omni

Performance Optimization Tips

Memory Management

Video Processing

Troubleshooting Common Errors

Advanced Deployment Options

Ollama (Optional)

vLLM Server (Experimental)

Practical Use Cases

1. Voice-Based Chatbot

2. Video Summarization

Limitations to Consider

Conclusion

References

Sign up for more like this.