Run DeepScaleR 1.5B on Windows : Step by Step Installation Guide

Anas Mohammad

Feb 13, 2025 • 3 min read

DeepScaleR 1.5B

DeepScaleR, a refined iteration of Deepseek-R1-Distilled-Qwen-1.5B, represents a substantial advancement in compact language models. With 1.5 billion parameters, this model demonstrates exceptional computational efficacy, surpassing OpenAI's o1-preview in mathematical benchmarks.

This guide provides a rigorous, stepwise approach to configuring and deploying DeepScaleR 1.5B on a Windows-based system.

System Prerequisites

Prior to installation, ensure that your system meets the following specifications:

Operating System: Windows 10 or later.

Hardware Specifications:

Processor: A modern CPU with AVX2 instruction support to facilitate optimal computational throughput.
Memory: A minimum of 8 GB RAM; additional RAM will enhance model performance.
Storage: At least 10 GB of available disk space for the model and its dependencies.

Software Requirements:

Ollama: The primary runtime environment for executing DeepScaleR.
Git: Necessary for acquiring the model repository and related assets.
Python: Required for supplementary scripting utilities.
CUDA Toolkit (Optional): Recommended for GPU-based acceleration on NVIDIA hardware.

Step 1: Installing Ollama

Ollama streamlines the execution of extensive language models on local systems. Follow these installation steps:

Obtain Ollama:
- Download the appropriate Windows-compatible package from the official Ollama website or its GitHub repository.
Install Ollama:
- Execute the installer and adhere to on-screen instructions.
- Append Ollama’s installation directory to the system’s PATH variable.
- A successful installation will return the version number.
Confirm Installation:

ollama --version

Step 2: Installing Git

Git facilitates the retrieval of DeepScaleR from repositories such as Hugging Face.

Download Git:
- Acquire the latest Windows-compatible version from the official Git website.
Execute the Installer:
- Follow installation prompts, accepting default settings where applicable.
- If successful, the version number will be displayed.
Validate Installation:

git --version

Step 3: Downloading the DeepScaleR 1.5B Model

The DeepScaleR 1.5B model is hosted on Hugging Face and can be retrieved via Git Large File Storage (LFS).

Obtain Model Files:
- If Git LFS is configured, model files will download automatically.
- Otherwise, manually acquire and position them in the designated directory.
Clone the Repository:

git clone https://huggingface.co/agentica-project/deepscaler

Initialize Git LFS:

git lfs install

Step 4: Configuring CUDA (Optional)

For those utilizing NVIDIA GPUs, CUDA can substantially enhance computational efficiency.

Verify GPU Compatibility: Consult the NVIDIA CUDA compatibility list.
Acquire the CUDA Toolkit: Download the appropriate package from NVIDIA’s official site.
Install CUDA Toolkit: Follow installation prompts, accepting default configurations.
Configure Environment Variables:
- Append the CUDA installation path to the system’s PATH variable.
- Define CUDA_HOME as the root directory of the CUDA installation.
- A successful installation will return the version number.
Confirm Installation:

nvcc --version

Step 5: Executing DeepScaleR 1.5B via Ollama

Define a Modelfile:
Deploying DeepScaleR as a Web API:

from flask import Flask, request, jsonify
import subprocess

app = Flask(__name__)

@app.route('/ask', methods=['POST'])
def ask_model():
    prompt = request.json.get("prompt", "")
    result = subprocess.run(["ollama", "prompt", "deepscaler", prompt], capture_output=True, text=True)
    return jsonify({"response": result.stdout})

if __name__ == '__main__':
    app.run(debug=True)

Executing DeepScaleR in a Python Environment:

import subprocess

def query_model(prompt):
    result = subprocess.run(["ollama", "prompt", "deepscaler", prompt], capture_output=True, text=True)
    return result.stdout

response = query_model("Summarize the theory of relativity")
print(response)

Interact with the Model via CLI:

ollama prompt deepscaler "What is the square root of 49?"

Launch the Model:

ollama run deepscaler

Create a file named Modelfile in the model directory with the following content:

FROM ./

Step 6: Performance Optimization Strategies

To optimize DeepScaleR’s execution on Windows:

GPU Utilization: Ensure proper CUDA configuration for acceleration.
Batch Size Adjustment: Experiment with varying batch sizes to balance speed and latency.
Quantization: Employ quantized model versions to reduce memory overhead.
System Resource Optimization: Close extraneous applications and configure virtual memory appropriately.
AVX2 Instruction Set Utilization: Verify that AVX2-enhanced computations are engaged for superior CPU performance.

Troubleshooting

Ollama Not Recognized: Confirm that its installation directory is appended to the system PATH.
CUDA-Related Errors: Ensure GPU compatibility and correct CUDA configuration.
Model Loading Issues: Verify the correctness of the Modelfile and model placement.
Performance Bottlenecks: Reduce batch size, leverage quantized models, or optimize system settings.

Conclusion

By meticulously adhering to these steps, one can effectively install and operationalize DeepScaleR 1.5B on a Windows system. Properly configuring software dependencies and leveraging hardware acceleration techniques will enhance computational efficiency.

Through methodical experimentation with model parameters and execution strategies, users can optimize the system for diverse applications in natural language processing and beyond.