qwen Featured

How to Set Up the Qwen2.5-1M Model Locally on Your Mac

Anas Mohammad

Jan 29, 2025 • 3 min read

How to Set Up the Qwen2.5-1M Model Locally on Your Mac

Artificial intelligence (AI) models have revolutionized technology in recent years, enabling applications that were once thought to be science fiction. Among these, the Qwen2.5-1M model stands out for its impressive capabilities in natural language processing (NLP) tasks. If you're keen on leveraging the power of this model locally on your Mac, this guide will walk you through every step of the setup process.

By following these instructions, you'll be able to set up the model and use it effectively for various AI-driven applications.

Prerequisites

Before diving into the installation process, make sure your Mac meets the following requirements to ensure a smooth setup.

System Requirements

Operating System: macOS (latest version recommended)
Python Version: 3.9 to 3.12
CUDA Version: 12.1 or 12.3 (if you plan to utilize a compatible GPU)

VRAM Requirements

Qwen2.5-7B-Instruct-1M: At least 120GB VRAM across GPUs
Qwen2.5-14B-Instruct-1M: At least 320GB VRAM across GPUs

Note: If your hardware doesn't meet the VRAM specifications, you can still run smaller tasks with the model, but performance might be affected.

Step-by-Step Installation Guide

Now, let's walk through the process of setting up the Qwen2.5-1M model on your Mac.

Step 1: Install Homebrew

Homebrew is a powerful package manager that simplifies software installation on macOS. If you haven't installed it yet, open the terminal and run the following command:

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

Follow the on-screen instructions to complete the installation.

Step 2: Install Ollama

Ollama is an essential tool that allows you to run AI models locally. To install Ollama using Homebrew, execute the following command:

brew install --cask ollama

This will install Ollama on your system.

Step 3: Clone the vLLM Repository

Next, you need to clone the vLLM repository, which contains the files necessary for running Qwen models. Run these commands:

git clone -b dev/dual-chunk-attn [email protected]:QwenLM/vllm.git
cd vllm
pip install -e . -v

This will download the repository and install its dependencies.

Step 4: Start the Ollama Service

To interact with the Qwen model, you’ll need to start the Ollama service. Keep the terminal window open while you work with the model:

ollama serve

This command initializes the service and prepares it for incoming requests.

Step 5: Download and Run the Model

With everything set up, you can now download and run the Qwen2.5 model. For example, to run the 7B model, use the following command:

ollama run qwen2.5:7b

For larger models like Qwen2.5-14B, simply replace 7b with 14b in the command.

Step 6: Accessing the Model via API

Once your model is running, you can interact with it programmatically using Python. First, ensure that you have the OpenAI library installed:

pip install openai

Then, use this Python code to send a request to your running model:

from openai import OpenAI

client = OpenAI(
    base_url='http://localhost:11434/v1/',
    api_key='ollama'  # This key is required but ignored
)

response = client.chat.completions.create(
    messages=[
        {'role': 'user', 'content': 'Say this is a test'},
    ],
    model='qwen2.5:7b',
)

print(response['choices'][0]['message']['content'])

This code sends a message to the model and prints the response.

Additional Tips for a Smooth Experience

To ensure a smooth experience while using the Qwen2.5 model, here are a few helpful tips:

1. Monitor Resource Usage

Running large models can be resource-intensive, so keep an eye on your Mac's CPU and memory usage. If you experience performance issues, consider optimizing your system or reducing the workload.

2. Experiment with Different Models

Depending on your system's capabilities, experiment with different Qwen models (like Qwen2.5-14B) to find the one that suits your needs best.

3. Stay Updated

AI is a rapidly evolving field, and both Ollama and QwenLM frequently release updates. Make sure to stay up-to-date to take advantage of new features or improvements.

Conclusion

Setting up the Qwen2.5-1M model on your Mac unlocks a powerful tool for natural language processing tasks. By following this guide, you can harness the full potential of AI without relying on cloud services.

Whether you're developing AI applications, conducting research, or exploring NLP tasks, this model will significantly enhance your projects.

Feel free to share this guide with others who might find it helpful. Happy coding!

Let me know when you'd like the title for this article!