Run and Install DeepSeek-R1-0528 Locally on Your Computer

DeepSeek-R1-0528 is a cutting-edge open-source large language model (LLM) designed for developers, researchers, and AI enthusiasts. With state-of-the-art benchmark performance, advanced reasoning capabilities and support for JSON output and function calling, this model stands out for both experimentation and production use.
In this guide, you'll learn how to run and install DeepSeek-R1-0528 on your local machine using Ollama, vLLM, and Hugging Face Transformers.
Overview of DeepSeek-R1-0528
DeepSeek-R1-0528 is the latest entry in the DeepSeek-R1 series and offers:
- Enhanced reasoning and factual accuracy
- Reduced hallucinations for better reliability
- JSON output support and function calling
- Commercial-use friendly MIT License
The model is freely available with open-source weights on Hugging Face.
Hardware Requirements
Before setting up DeepSeek-R1-0528, ensure your hardware meets the minimum system requirements based on the model size.
Component | Minimum (1.5B) | Recommended (7B–8B) | Large Models (14B–32B) | Enterprise (671B) |
---|---|---|---|---|
CPU | Intel i7 / AMD Ryzen 7 (8 cores) | 3.5GHz+ latest-gen | Server-grade, multi-socket | Multi-socket, high core count |
RAM | 16GB | 32–64GB | 64–128GB | 256GB+ |
GPU | NVIDIA RTX 3060 (12GB VRAM) | A100, H100 (16–24GB VRAM) | 24–48GB VRAM | 80GB+ VRAM |
Storage | 512GB NVMe SSD | 1–2TB NVMe SSD (PCIe Gen 4/5) | Multiple NVMe SSDs (RAID) | Enterprise SSD arrays |
Tip: Use quantized versions for better performance on consumer-grade GPUs.
Installation Methods
You can install and run DeepSeek-R1-0528 locally using three main methods:
- Ollama – Fastest and easiest setup
- vLLM – High performance for production environments
- Transformers – Full flexibility for development pipelines
Method 1: Installing DeepSeek-R1-0528 with Ollama
Ollama simplifies LLM deployment and is ideal for getting started quickly.
Step-by-Step Setup
1. Check GPU Compatibility
nvidia-smi
2. Update System and Install Dependencies
sudo apt-get update
sudo apt-get install pciutils -y
3. Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
4. Start the Ollama Server
ollama serve
5. Verify Installation
ollama
6. Install the DeepSeek-R1-0528 Model
Currently available model (quantized):
ollama run hf.co/unsloth/DeepSeek-R1-0528-Qwen3-8B-GGUF:Q4_K_XL
7. Run Inference
Use the terminal interface to send prompts and receive real-time responses.
Method 2: Installing DeepSeek-R1-0528 with vLLM
vLLM is ideal for high-throughput inference in scalable environments.
Requirements
- Python 3.8+
- CUDA-enabled GPU
- pip
Installation Steps
1. Install vLLM
pip install vllm
2. Download Model Weights
From Hugging Face:
- Visit:
https://huggingface.co/deepseek-ai/DeepSeek-R1-0528
3. Launch the API Server
python -m vllm.entrypoints.openai.api_server \
--model deepseek-ai/DeepSeek-R1-0528 \
--tokenizer deepseek-ai/DeepSeek-R1-0528
4. Query the Model
Use the OpenAI-compatible API to send and receive prompts.
Method 3: Installing DeepSeek-R1-0528 with Transformers
Use this method for programmatic integration and full control over inference.
Requirements
- Python 3.8+
- PyTorch
- Transformers
Installation Steps
1. Install Required Packages
pip install torch transformers
2. Load the Model in Python
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-R1-0528")
model = AutoModelForCausalLM.from_pretrained("deepseek-ai/DeepSeek-R1-0528")
prompt = "Explain the difference between monorepos and turborepos."
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Prompt Engineering and Usage
DeepSeek-R1-0528 supports structured prompts, file handling, and web integration.
System Prompt Example
The assistant is DeepSeek-R1, created by DeepSeek。
Today is Monday, May 28, 2025.
Temperature Setting
- Default:
0.6
File Upload Template
file_template = """[file name]: {file_name}
[file content begin]
{file_content}
[file content end]
{question}"""
Web Search Template (English)
search_answer_en_template = '''
# The following contents are the search results related to the user's message:
{search_results}
...
# The user's message is:
{question}'''
Model Features and Improvements
- Superior Benchmarks – Outperforms previous releases on multiple tasks
- Reduced Hallucinations – More reliable for critical use cases
- Function Calling – Easy tool and API integration
- MIT License – Open for commercial use and distillation
Troubleshooting and Optimization
Common Issues
- Out of Memory – Use quantized models or reduce batch size
- Slow Inference – Optimize SSD and GPU usage
- Dependency Errors – Ensure Python, CUDA, and driver compatibility
Optimization Tips
- Use Ollama for quick experimentation
- Deploy vLLM for scalable APIs
- Choose quantized models for consumer-grade GPUs
Security and Best Practices
- Virtual Environments: Use
venv
orconda
for dependency isolation - Data Handling: Ensure compliance with data privacy laws
- Model Maintenance: Regularly update libraries and model versions
License and Commercial Use
DeepSeek-R1-0528 is licensed under the MIT License, allowing unrestricted commercial use, modification, and redistribution.
Conclusion
DeepSeek-R1-0528 empowers you to run a powerful, open-source LLM locally with a method that fits your workflow—Ollama for simplicity, vLLM for scalability, or Transformers for flexibility.