Devstral

Run Devstral Locally with Ollama

Anas Mohammad

May 25, 2025 • 3 min read

Running advanced AI models like Devstral on your own hardware is now practical, thanks to tools like Ollama, which simplify local deployment. This guide walks you through how to run Devstral locally with Ollama—from setup and installation to advanced configuration, troubleshooting, and real-world use cases.

What Is Devstral?

Devstral is a powerful open-source large language model (LLM) developed by Mistral AI, optimized for software engineering tasks such as:

Code generation
Code review
Bug fixing
Technical Q&A

The latest model, Devstral-Small-2505, has 24 billion parameters and can run on high-end consumer GPUs like the RTX 4090 or Apple Silicon machines with at least 32GB RAM. It’s ideal for developers aiming to automate or streamline their coding workflows.

Why Run Devstral Locally?

Running Devstral on your machine offers several key benefits:

Privacy: Keep your code and prompts local
Performance: No network latency, faster responses
Cost: Avoid cloud compute expenses
Customization: Full control over model behavior and parameters
Offline Access: Work without an internet connection

What Is Ollama?

Ollama is an open-source tool that makes running LLMs locally simple. It handles model loading, hardware acceleration, and provides:

A user-friendly command-line interface
A REST API for integration with tools like IDEs, scripts, and local apps

Prerequisites

Before getting started, make sure you have:

Hardware: 32GB RAM minimum, modern GPU (RTX 4090 or better recommended)
OS: Linux, macOS, or Windows
Disk Space: 50GB+ free for model files
Internet: For initial downloads
CLI Skills: Basic command-line knowledge

Step 1: Install Ollama

Download & Install

Visit ollama.com and download the installer for your OS.
On macOS, you can also install via Homebrew:

brew install ollama

On Linux, use the provided .deb or .rpm packages.

Verify Installation

Run:

ollama --version

If successful, you’ll see the installed version.

Step 2: Download Devstral Model Weights

Using Ollama’s Registry (If Available)

Check for the model:

ollama list

If available:

ollama pull mistralai/devstral-small-2505

Replace the model name if different.

Manual Download (If Not Listed)

Use Python to download from Hugging Face:

from huggingface_hub import snapshot_download
from pathlib import Path

mistral_models_path = Path.home() / 'mistral_models' / 'Devstral'
mistral_models_path.mkdir(parents=True, exist_ok=True)

snapshot_download(
    repo_id="mistralai/Devstral-Small-2505",
    allow_patterns=["params.json", "consolidated.safetensors", "tekken.json"],
    local_dir=mistral_models_path
)

Step 3: Load Devstral into Ollama

If you manually downloaded the model:

Ensure the required files are in the correct directory
Use Ollama’s import or conversion tool (see official docs for updates)

Then verify:

ollama list

You should see devstral-small-2505 listed.

Step 4: Run Devstral Locally

Method 1: Command-Line Inference

ollama run devstral-small-2505 "Write a Python function to reverse a string."

Method 2: Run as Local API Server

ollama serve

Then make API calls like:

curl http://localhost:11434/api/generate \
  -d '{"model": "devstral-small-2505", "prompt": "Explain the difference between a list and a tuple in Python."}'

Step 5: Advanced Configuration

GPU Acceleration: Ollama auto-detects your GPU. Make sure your drivers and CUDA toolkit are up to date.
Custom Parameters:

ollama run devstral-small-2505 --max-tokens 512 --temperature 0.7 "Generate a REST API in Flask."

Multiple Models: Ollama supports managing and switching between multiple models with ollama list and ollama run.

Step 6: Integrate Devstral with Development Tools

API Integration: Connect Devstral to tools like VS Code or custom chatbots.
Automation: Script interactions with the API to generate code, comments, or docstrings automatically.

Step 7: Troubleshooting

Common issues:

Model not loading: Ensure you meet RAM/GPU requirements
Slow performance: Update GPU drivers and reduce background load
API issues: Make sure ollama serve is running and accessible

Step 8: Maintenance & Updates

Update Ollama:

ollama update

Update your models using ollama pull or re-download from Hugging Face when new versions are released.

Step 9: Security Best Practices

Keep models and data local; avoid exposing the API externally
Use containers or virtual environments for isolation
Monitor system usage, as large models consume substantial RAM and GPU power

Step 10: Practical Use Cases

Code Generation: Boilerplate, functions, and templates
Code Review: Feedback and improvements
Bug Fixing: Suggest and apply code fixes
Documentation: Generate docstrings and inline comments

Devstral Local vs. Cloud Deployment

Feature	Local with Ollama	Cloud (API)
Privacy	High (local-only)	Lower (data sent to cloud)
Latency	Low	Higher
Cost	One-time hardware cost	Ongoing API charges
Customization	Full control	Limited
Scalability	Limited by hardware	High
Setup Complexity	Moderate	Low

Conclusion

Running Devstral locally with Ollama gives developers privacy, speed, and flexibility in using AI for software engineering. With the right hardware, you can fully utilize Devstral’s capabilities without relying on cloud services.

FAQ

FAQ 1 Can I run Devstral on a laptop?

FAQ 1 Yes, if it has 32GB RAM and a modern GPU. Performance may be limited.

FAQ 2 Is Ollama open source?

FAQ answer 2 Yes, and it's actively maintained.

FAQ 3 Can I fine-tune Devstral locally?

FAQ answer 2 Yes, assuming you have sufficient hardware. Refer to the model’s official documentation.