Run DeepSeek-VL2 on Windows: Installation Guide
DeepSeek AI has rapidly gained prominence as a Chinese AI model, rivaling even OpenAI's ChatGPT. Its open-source model, DeepSeek R1, is licensed by the Massachusetts Institute of Technology (MIT), ensuring accessibility for both personal and professional endeavors.
Why DeepSeek-VL2 Matters
As the first open-source MoE (Mixture of Experts) vision-language model with MIT licensing, DeepSeek-VL2 offers:
- Multimodal Understanding: Processes images (JPG/PNG) and text simultaneously
- Commercial Flexibility: MIT license enables enterprise deployment
- Efficiency: 3.37B to 27.5B parameter variants balance performance/resource use
- Local Operation: Run offline after initial setup
Minimum System Requirements:
- Operating System: Windows 10 or later
- CPU: Multi-core processor (Quad-core or higher recommended)
- GPU: High-performance GPU (NVIDIA with CUDA support is typically required for AI tasks)
- RAM: Minimum 8GB (16GB or more recommended for AI-related tasks)
- Storage: SSD with at least 50GB free space (more space required for handling large datasets)
- Software Dependencies:
- Python (for model training and scripting)
- CUDA Toolkit (for GPU acceleration)
- Required deep learning libraries (e.g., TensorFlow, PyTorch)
- Critical Notes:
- NVIDIA GPUs require CUDA Toolkit 11.8+
- Enable WSL2 for Linux-containerized AI workloads
- 64-bit Windows mandatory for Ollama compatibility
Installation
Using Ollama
Ollama simplifies the installation process, negating the necessity for cloud subscriptions.
- Download Ollama: Visit the official Ollama website and download the Windows installer.
- Install Ollama: Double-click the installer and follow the on-screen prompts. Ensure there is at least 4GB of free storage space before proceeding.
- Open Command Prompt: Once installed, open Command Prompt on your computer.
Start DeepSeek: Enter the following command to start the application with debug mode enabled:
$env:OLLAMA_DEBUG="1" & "ollama app.exe"
Directory Information:
- Data and Logs:
%LOCALAPPDATA%\Ollama
- Program Files:
%LOCALAPPDATA%\Programmes\Ollama
- Models and Settings:
%HOMEPATH%.ollama
Accessing DeepSeek on the Web
If you prefer not to download the software, you can access DeepSeek on the web:
- Visit the Website: Go to DeepSeek Chat.
- Registration: Register using your email ID or Google account. Note that new registrations are currently on hold due to security issues.
- Interact with the Chatbot: Once registered, you can start interacting with the chatbot.
Step-by-Step Guide for Inference of DeepSeek Models
This guide explains how to deploy the DeepSeek model using the vLLM framework.
Prerequisites
- Python Environment: Ensure you have Python installed (preferably Python 3.8 or later).
Install Required Packages: Install the required libraries using pip:
pip install vllm==0.6.6.post1
Installation Methods Compared
Method 1: Ollama (Beginner-Friendly)
- Download Windows installer from Ollama Official Site
- Allocate VRAM:
setx OLLAMA_GPUS "1"
(Admin CMD) - Verify installation:
ollama list
Launch with debugging:
$env:OLLAMA_DEBUG="1"; ollama run deepseek-vl2-tiny
Pros: One-click setup, automatic updates
Cons: Limited model customization
Method 2: Manual Setup (Advanced Users)
# Create isolated environment
python -m venv deepseek_env
.\deepseek_env\Scripts\activate
# Install core dependencies
pip install torch==2.3.0+cu121 -f https://download.pytorch.org/whl/torch_stable.html
pip install vllm==0.6.6.post1 deepseek-vl2[gradio]
Method 3: Docker Containers (Production)
FROM nvidia/cuda:12.1.1-devel-ubuntu22.04
RUN apt-get update && apt-get install -y python3.11
COPY requirements.txt .
RUN pip install -r requirements.txt
DeepSeek-VL2 Code Implementation
To implement DeepSeek-VL2, follow these steps:
Installation
Install the necessary dependencies using pip:
pip install -e .[gradio]
Python Code
Example usage of DeepSeek-VL2 in Python:
import torch
from transformers import AutoModelForCausalLM
from deepseek_vl2.models import DeepseekVLV2Processor, DeepseekVLV2ForCausalLM
from deepseek_vl2.utils.io import load_pil_images
# specify the path to the model
model_path = "deepseek-ai/deepseek-vl2-tiny"
vl_chat_processor = DeepseekVLV2Processor.from_pretrained(model_path)
tokenizer = vl_chat_processor.tokenizer
vl_gpt = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True)
vl_gpt = vl_gpt.to(torch.bfloat16).cuda().eval()
Gradio Demo
To run the Gradio demo, use the following commands:
CUDA_VISIBLE_DEVICES=2 python web_demo.py \
--model_name "deepseek-ai/deepseek-vl2-tiny" \
--port 37914
Vision-Language Workflow: Step-by-Step
1. Image Preprocessing
from deepseek_vl2.utils.io import load_pil_images
conversation = [
{
"role": "<|User|>",
"content": "<image>\nAnalyze this medical scan",
"images": ["./patient_scan.png"]
},
{"role": "<|Assistant|>", "content": ""}
]
pil_images = load_pil_images(conversation, max_size=(1024,1024))
2. Multimodal Encoding
processor = DeepseekVLV2Processor.from_pretrained("deepseek-ai/deepseek-vl2-small")
inputs = processor(
conversations=conversation,
images=pil_images,
force_batchify=True
).to("cuda")
3. Expert Model Inference
model = DeepseekVLV2ForCausalLM.from_pretrained(
"deepseek-ai/deepseek-vl2-small",
torch_dtype=torch.bfloat16,
device_map="auto"
)
outputs = model.generate(
inputs_embeds=inputs_embeds,
attention_mask=inputs.attention_mask,
max_new_tokens=512,
temperature=0.7,
top_p=0.8
)
Performance Optimization Tips
- VRAM Management:
- 7B Model: Requires 20GB+ VRAM
- 20B Model: Needs 40GB+ VRAM
- Use
--chunk_size 512
for memory-constrained systems
Batch Processing:
llm = LLM(model="deepseek-vl2", max_batch_size=8, gpu_memory_utilization=0.85)
Quantization (FP16 → INT8):
model = quantize_model(model, quantization_config=BitsAndBytesConfig(
load_in_8bit=True,
llm_int8_threshold=6.0
))
Enterprise Deployment Checklist
- Security:
- Enable TLS in Ollama:
ollama serve --tls-cert cert.pem --tls-key key.pem
- Set API rate limits: 10 requests/second default
- Enable TLS in Ollama:
CI/CD Pipeline:
# GitHub Actions Example
- name: DeepSeek Model Test
run: |
ollama pull deepseek-vl2-tiny
pytest vision_tests/
env:
OLLAMA_HOST: 127.0.0.1
CUDA_VISIBLE_DEVICES: 0
Monitoring:
ollama logs --format json | jq '.latency, .gpu_util'
Troubleshooting Common Issues
Problem: CUDA Out-of-Memory Error
Solution:
# Reduce image resolution
processor.image_size = (512,512)
# Enable gradient checkpointing
model.gradient_checkpointing_enable()
Problem: Ollama Connection Refused
Fix:
netsh advfirewall firewall add rule name="Ollama Port" dir=in action=allow protocol=TCP localport=11434
Problem: Slow Inference Speed
Optimizations:
- Update NVIDIA drivers to 550+
- Set power management:
nvidia-smi -pm 1
- Enable FP8 precision:
export OLLAMA_FP8_MATH=1
Future-Proofing Your Setup
- Hardware Upgrades:
- NVIDIA RTX 5090 (2024 Q3) for 2x FP8 performance
- PCIe 5.0 SSDs for faster model loading
- ONNX Runtime conversion
Edge Deployment:
torch.onnx.export(model, inputs, "deepseek-vl2.onnx", opset_version=18)
Model Updates:
ollama pull deepseek-vl2-2024Q3
How to Use DeepSeek
To start using DeepSeek:
- Open Terminal: Open the Terminal app.
- Interact with the Model: Press Enter to run the command, and DeepSeek R1 will start, allowing you to interact with the model through the terminal interface.
Run Command: Type the following command:
ollama run deepseek-r1:8b
Important Considerations
- The provided demo implementation is basic and may result in slower performance.
- For production environments, consider using optimized deployment solutions like vllm, sglang, or lmdeploy for faster response times and better cost efficiency.
By following these instructions and utilizing the code examples, you can effectively run and implement DeepSeek-VL2 on a Windows environment.
References
- Run DeepSeek Janus-Pro 7B on Mac: A Comprehensive Guide Using ComfyUI
- Run DeepSeek Janus-Pro 7B on Mac: Step-by-Step Guide
- Run DeepSeek Janus-Pro 7B on Windows: A Complete Installation Guide
- Run DeepSeek-VL2 on macOS: Step-by-Step Installation Guide
- Install and Run DeepSeek-VL2 on Ubuntu: A Step-by-Step Guide