mistral

How to Run Devstral by Mistral

Anas Mohammad

May 24, 2025 • 4 min read

Devstral, Mistral AI’s cutting-edge agentic coding model, is redefining the boundaries of automated software engineering. Whether you’re a hobbyist developer, a seasoned enterprise engineer, or a research scientist, Devstral offers unprecedented capabilities that streamline and scale complex coding workflows.

What is Devstral?

Devstral is a high-performance, open-source agentic coding large language model (LLM) developed by Mistral AI in collaboration with All Hands AI. It is engineered specifically for real-world software engineering tasks and fine-tuned to excel at:

Navigating and understanding large, complex codebases
Editing multiple files while resolving deep dependency trees
Solving actual GitHub issues with production-level context
Generating, debugging, and refactoring code at scale

Built on the Mistral Small 3.1 architecture, Devstral features a 128k context window, allowing it to consume and reason about extensive documentation and multi-file codebases. It is a text-only model, with the vision encoder removed to optimize for code-centric tasks.

Why Choose Devstral?

Top-Tier Performance: Scores 46.8% on the SWE-Bench Verified benchmark, outperforming many proprietary models.
Agentic Intelligence: Integrates seamlessly with frameworks like OpenHands to plan, execute, and verify multi-step engineering tasks.
Open and Efficient: Fully open-source and capable of running on both consumer-grade hardware and enterprise GPUs.
Scalable: Suitable for local development, cloud deployment, and production-grade pipeline integration.

Key Features and Capabilities

128k Context Window: Ideal for handling large codebases and documentation.
Agentic Reasoning: Performs autonomous, multi-step tasks using planning and tool usage.
Tool Integration: Works with frameworks like OpenHands and SWE-Agent.
Text-Only Specialization: Designed for software engineering with superior precision and speed.
High Compatibility: Runs on NVIDIA RTX 4090, H100, A100, and Macs with 32GB+ RAM.

System Requirements

Minimum Hardware:

GPU: 1x NVIDIA H100 / 2x RTX A6000 / RTX 4090
RAM: 32GB minimum (64GB+ recommended)
Disk: 100GB free
CPU: Multi-core processor

Software:

Python 3.8+
Docker (for OpenHands)
pip or conda
Access to Hugging Face Hub
(Optional) JupyterLab for development

Running Devstral Locally

1. Environment Setup

Create a Python virtual environment using Anaconda:

conda create -n devstral python=3.10
conda activate devstral

Install required packages:

pip install mistral_inference --upgrade
pip install huggingface_hub

2. Download Model Files

Use Hugging Face to fetch the model:

from huggingface_hub import snapshot_download
from pathlib import Path

mistral_models_path = Path.home().joinpath('mistral_models', 'Devstral')
mistral_models_path.mkdir(parents=True, exist_ok=True)

snapshot_download(
    repo_id="mistralai/Devstral-Small-2505",
    allow_patterns=["params.json", "consolidated.safetensors", "tekken.json"],
    local_dir=mistral_models_path
)

3. Launch Devstral with CLI

mistral-chat $HOME/mistral_models/Devstral --instruct --max_tokens 300

Test with a prompt like:

Create a REST API from scratch using Python.

4. Advanced Deployment with vLLM

vllm serve mistralai/Devstral-Small-2505 \
  --tokenizer_mode mistral \
  --config_format mistral \
  --load_format mistral \
  --tool-call-parser mistral \
  --enable-auto-tool-choice \
  --tensor-parallel-size 2

Performance Tuning Tips:

Adjust --threads and --ctx-size up to 128k
Optimize GPU usage with --n-gpu-layers

Running Devstral in the Cloud

1. Select a Cloud Provider

Options include NodeShift, AWS, GCP, and Azure. NodeShift is particularly affordable and user-friendly.

2. Provision GPU Resources

Recommended:

1x H100 or 2x A6000
100GB disk
80GB RAM

3. Install and Launch

Follow local setup steps. Use VS Code Remote-SSH for development.

4. Build Example Apps

Create full-stack apps (like an RGB Color Mixer) using Devstral. It generates complete HTML, CSS, and JS, ready to deploy.

Using Devstral with OpenHands

OpenHands is a robust automation platform that connects with Devstral for agentic coding.

1. Get a Mistral API Key

2. Configure OpenHands

export MISTRAL_API_KEY=<YOUR_KEY>

docker pull docker.all-hands.dev/all-hands-ai/runtime:0.39-nikolaik

mkdir -p ~/.openhands-state
cat << EOF > ~/.openhands-state/settings.json
{
  "language": "en",
  "agent": "CodeActAgent",
  "llm_model": "mistral/devstral-small-2505",
  "llm_api_key": "${MISTRAL_API_KEY}",
  "enable_default_condenser": true
}
EOF

3. Run OpenHands

docker run -it --rm --pull=always \
  -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.39-nikolaik \
  -e LOG_ALL_EVENTS=true \
  -v /var/run/docker.sock:/var/run/docker.sock \
  -v ~/.openhands-state:/.openhands-state \
  -p 3000:3000 \
  --add-host host.docker.internal:host-gateway \
  --name openhands-app \
  --memory="4g" \
  --cpus="2" \
  docker.all-hands.dev/all-hands-ai/openhands:0.39

Use the web UI or API to delegate coding tasks—Devstral handles them autonomously.

Fine-Tuning and Customization

1. Fine-Tune with Unsloth

Up to 2x faster training, 70% less VRAM
Supports 8x longer context

Steps:

Install unsloth and llama.cpp
Prepare datasets (code, issues, docs)
Set training parameters (batch size, learning rate, etc.)

2. Custom Prompt Engineering

Use system-level prompts (like SYSTEM_PROMPT.txt) to guide Devstral’s behavior for specific tasks:

Code generation
Bug fixing
Code documentation

Example Use Cases

Devstral shines in real-world development:

Code Generation: Full-stack apps, APIs, interfaces
Bug Fixing: From single-line issues to repo-wide bugs
Refactoring: Suggests and applies cleaner architectures
Documentation: Generates docstrings, READMEs, and guides
Testing: Writes and runs tests automatically
Agentic Integration: Automates workflows via OpenHands

Troubleshooting and Optimization

Common Issues

OOM Errors: Reduce batch size or context window
Slow Inference: Enable parallelism and upgrade GPU
API Issues: Validate keys and check token limits
Model Hangups: Inspect Docker logs and file paths

Performance Tips

Use GPU offloading
Fine-tune thread counts
Keep mistral-common and huggingface_hub updated

Security, Cost, and Best Practices

Secure Keys: Never hardcode API tokens
Monitor Resources: Avoid GPU overuse and cloud cost spikes
Data Privacy: Avoid exposing sensitive code via cloud
Optimize Costs: Local usage is free post-setup; API usage is metered

Final Thoughts

Devstral by Mistral AI is a revolutionary leap in coding automation—offering a robust, open-source solution for developers who demand more than just code completion.

With support for agentic reasoning, multi-step workflows, and massive codebases, it’s positioned to become a cornerstone of future development stacks.

FAQ 1 Can I run Devstral on a CPU-only machine?

FAQ 1 Yes, but performance will be significantly slower. GPU acceleration is strongly recommended for practical use.

FAQ 2 Is Devstral suitable for production?

FAQ answer 2 Yes, for many coding automation tasks, but always validate outputs and test thoroughly before deploying generated code.