Run DeepScaleR 1.5B on Windows : Step by Step Installation Guide

DeepScaleR, a refined iteration of Deepseek-R1-Distilled-Qwen-1.5B, represents a substantial advancement in compact language models. With 1.5 billion parameters, this model demonstrates exceptional computational efficacy, surpassing OpenAI's o1-preview in mathematical benchmarks.
This guide provides a rigorous, stepwise approach to configuring and deploying DeepScaleR 1.5B on a Windows-based system.
System Prerequisites
Prior to installation, ensure that your system meets the following specifications:
Operating System: Windows 10 or later.
Hardware Specifications:
- Processor: A modern CPU with AVX2 instruction support to facilitate optimal computational throughput.
- Memory: A minimum of 8 GB RAM; additional RAM will enhance model performance.
- Storage: At least 10 GB of available disk space for the model and its dependencies.
Software Requirements:
- Ollama: The primary runtime environment for executing DeepScaleR.
- Git: Necessary for acquiring the model repository and related assets.
- Python: Required for supplementary scripting utilities.
- CUDA Toolkit (Optional): Recommended for GPU-based acceleration on NVIDIA hardware.
Step 1: Installing Ollama
Ollama streamlines the execution of extensive language models on local systems. Follow these installation steps:
- Obtain Ollama:
- Download the appropriate Windows-compatible package from the official Ollama website or its GitHub repository.
- Install Ollama:
- Execute the installer and adhere to on-screen instructions.
- Append Ollama’s installation directory to the system’s PATH variable.
- A successful installation will return the version number.
- Confirm Installation:
ollama --version
Step 2: Installing Git
Git facilitates the retrieval of DeepScaleR from repositories such as Hugging Face.
- Download Git:
- Acquire the latest Windows-compatible version from the official Git website.
- Execute the Installer:
- Follow installation prompts, accepting default settings where applicable.
- If successful, the version number will be displayed.
- Validate Installation:
git --version
Step 3: Downloading the DeepScaleR 1.5B Model
The DeepScaleR 1.5B model is hosted on Hugging Face and can be retrieved via Git Large File Storage (LFS).
- Obtain Model Files:
- If Git LFS is configured, model files will download automatically.
- Otherwise, manually acquire and position them in the designated directory.
- Clone the Repository:
git clone https://huggingface.co/agentica-project/deepscaler
- Initialize Git LFS:
git lfs install
Step 4: Configuring CUDA (Optional)
For those utilizing NVIDIA GPUs, CUDA can substantially enhance computational efficiency.
- Verify GPU Compatibility: Consult the NVIDIA CUDA compatibility list.
- Acquire the CUDA Toolkit: Download the appropriate package from NVIDIA’s official site.
- Install CUDA Toolkit: Follow installation prompts, accepting default configurations.
- Configure Environment Variables:
- Append the CUDA installation path to the system’s PATH variable.
- Define
CUDA_HOME
as the root directory of the CUDA installation. - A successful installation will return the version number.
- Confirm Installation:
nvcc --version
Step 5: Executing DeepScaleR 1.5B via Ollama
- Define a Modelfile:
- Deploying DeepScaleR as a Web API:
from flask import Flask, request, jsonify
import subprocess
app = Flask(__name__)
@app.route('/ask', methods=['POST'])
def ask_model():
prompt = request.json.get("prompt", "")
result = subprocess.run(["ollama", "prompt", "deepscaler", prompt], capture_output=True, text=True)
return jsonify({"response": result.stdout})
if __name__ == '__main__':
app.run(debug=True)
- Executing DeepScaleR in a Python Environment:
import subprocess
def query_model(prompt):
result = subprocess.run(["ollama", "prompt", "deepscaler", prompt], capture_output=True, text=True)
return result.stdout
response = query_model("Summarize the theory of relativity")
print(response)
- Interact with the Model via CLI:
ollama prompt deepscaler "What is the square root of 49?"
- Launch the Model:
ollama run deepscaler
- Create a file named
Modelfile
in the model directory with the following content:
FROM ./
Step 6: Performance Optimization Strategies
To optimize DeepScaleR’s execution on Windows:
- GPU Utilization: Ensure proper CUDA configuration for acceleration.
- Batch Size Adjustment: Experiment with varying batch sizes to balance speed and latency.
- Quantization: Employ quantized model versions to reduce memory overhead.
- System Resource Optimization: Close extraneous applications and configure virtual memory appropriately.
- AVX2 Instruction Set Utilization: Verify that AVX2-enhanced computations are engaged for superior CPU performance.
Troubleshooting
- Ollama Not Recognized: Confirm that its installation directory is appended to the system PATH.
- CUDA-Related Errors: Ensure GPU compatibility and correct CUDA configuration.
- Model Loading Issues: Verify the correctness of the
Modelfile
and model placement. - Performance Bottlenecks: Reduce batch size, leverage quantized models, or optimize system settings.
Conclusion
By meticulously adhering to these steps, one can effectively install and operationalize DeepScaleR 1.5B on a Windows system. Properly configuring software dependencies and leveraging hardware acceleration techniques will enhance computational efficiency.
Through methodical experimentation with model parameters and execution strategies, users can optimize the system for diverse applications in natural language processing and beyond.