How to Run Devstral by Mistral

Devstral, Mistral AI’s cutting-edge agentic coding model, is redefining the boundaries of automated software engineering. Whether you’re a hobbyist developer, a seasoned enterprise engineer, or a research scientist, Devstral offers unprecedented capabilities that streamline and scale complex coding workflows.
What is Devstral?
Devstral is a high-performance, open-source agentic coding large language model (LLM) developed by Mistral AI in collaboration with All Hands AI. It is engineered specifically for real-world software engineering tasks and fine-tuned to excel at:
- Navigating and understanding large, complex codebases
- Editing multiple files while resolving deep dependency trees
- Solving actual GitHub issues with production-level context
- Generating, debugging, and refactoring code at scale
Built on the Mistral Small 3.1 architecture, Devstral features a 128k context window, allowing it to consume and reason about extensive documentation and multi-file codebases. It is a text-only model, with the vision encoder removed to optimize for code-centric tasks.
Why Choose Devstral?
- Top-Tier Performance: Scores 46.8% on the SWE-Bench Verified benchmark, outperforming many proprietary models.
- Agentic Intelligence: Integrates seamlessly with frameworks like OpenHands to plan, execute, and verify multi-step engineering tasks.
- Open and Efficient: Fully open-source and capable of running on both consumer-grade hardware and enterprise GPUs.
- Scalable: Suitable for local development, cloud deployment, and production-grade pipeline integration.
Key Features and Capabilities
- 128k Context Window: Ideal for handling large codebases and documentation.
- Agentic Reasoning: Performs autonomous, multi-step tasks using planning and tool usage.
- Tool Integration: Works with frameworks like OpenHands and SWE-Agent.
- Text-Only Specialization: Designed for software engineering with superior precision and speed.
- High Compatibility: Runs on NVIDIA RTX 4090, H100, A100, and Macs with 32GB+ RAM.
System Requirements
Minimum Hardware:
- GPU: 1x NVIDIA H100 / 2x RTX A6000 / RTX 4090
- RAM: 32GB minimum (64GB+ recommended)
- Disk: 100GB free
- CPU: Multi-core processor
Software:
- Python 3.8+
- Docker (for OpenHands)
- pip or conda
- Access to Hugging Face Hub
- (Optional) JupyterLab for development
Running Devstral Locally
1. Environment Setup
Create a Python virtual environment using Anaconda:
conda create -n devstral python=3.10
conda activate devstral
Install required packages:
pip install mistral_inference --upgrade
pip install huggingface_hub
2. Download Model Files
Use Hugging Face to fetch the model:
from huggingface_hub import snapshot_download
from pathlib import Path
mistral_models_path = Path.home().joinpath('mistral_models', 'Devstral')
mistral_models_path.mkdir(parents=True, exist_ok=True)
snapshot_download(
repo_id="mistralai/Devstral-Small-2505",
allow_patterns=["params.json", "consolidated.safetensors", "tekken.json"],
local_dir=mistral_models_path
)
3. Launch Devstral with CLI
mistral-chat $HOME/mistral_models/Devstral --instruct --max_tokens 300
Test with a prompt like:
Create a REST API from scratch using Python.
4. Advanced Deployment with vLLM
vllm serve mistralai/Devstral-Small-2505 \
--tokenizer_mode mistral \
--config_format mistral \
--load_format mistral \
--tool-call-parser mistral \
--enable-auto-tool-choice \
--tensor-parallel-size 2
Performance Tuning Tips:
- Adjust
--threads
and--ctx-size
up to 128k - Optimize GPU usage with
--n-gpu-layers
Running Devstral in the Cloud
1. Select a Cloud Provider
Options include NodeShift, AWS, GCP, and Azure. NodeShift is particularly affordable and user-friendly.
2. Provision GPU Resources
Recommended:
- 1x H100 or 2x A6000
- 100GB disk
- 80GB RAM
3. Install and Launch
Follow local setup steps. Use VS Code Remote-SSH for development.
4. Build Example Apps
Create full-stack apps (like an RGB Color Mixer) using Devstral. It generates complete HTML, CSS, and JS, ready to deploy.
Using Devstral with OpenHands
OpenHands is a robust automation platform that connects with Devstral for agentic coding.
1. Get a Mistral API Key
Sign up on the Mistral AI platform and fund your account ($5 minimum).
2. Configure OpenHands
export MISTRAL_API_KEY=<YOUR_KEY>
docker pull docker.all-hands.dev/all-hands-ai/runtime:0.39-nikolaik
mkdir -p ~/.openhands-state
cat << EOF > ~/.openhands-state/settings.json
{
"language": "en",
"agent": "CodeActAgent",
"llm_model": "mistral/devstral-small-2505",
"llm_api_key": "${MISTRAL_API_KEY}",
"enable_default_condenser": true
}
EOF
3. Run OpenHands
docker run -it --rm --pull=always \
-e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.39-nikolaik \
-e LOG_ALL_EVENTS=true \
-v /var/run/docker.sock:/var/run/docker.sock \
-v ~/.openhands-state:/.openhands-state \
-p 3000:3000 \
--add-host host.docker.internal:host-gateway \
--name openhands-app \
--memory="4g" \
--cpus="2" \
docker.all-hands.dev/all-hands-ai/openhands:0.39
Use the web UI or API to delegate coding tasks—Devstral handles them autonomously.
Fine-Tuning and Customization
1. Fine-Tune with Unsloth
- Up to 2x faster training, 70% less VRAM
- Supports 8x longer context
Steps:
- Install
unsloth
andllama.cpp
- Prepare datasets (code, issues, docs)
- Set training parameters (batch size, learning rate, etc.)
2. Custom Prompt Engineering
Use system-level prompts (like SYSTEM_PROMPT.txt
) to guide Devstral’s behavior for specific tasks:
- Code generation
- Bug fixing
- Code documentation
Example Use Cases
Devstral shines in real-world development:
- Code Generation: Full-stack apps, APIs, interfaces
- Bug Fixing: From single-line issues to repo-wide bugs
- Refactoring: Suggests and applies cleaner architectures
- Documentation: Generates docstrings, READMEs, and guides
- Testing: Writes and runs tests automatically
- Agentic Integration: Automates workflows via OpenHands
Troubleshooting and Optimization
Common Issues
- OOM Errors: Reduce batch size or context window
- Slow Inference: Enable parallelism and upgrade GPU
- API Issues: Validate keys and check token limits
- Model Hangups: Inspect Docker logs and file paths
Performance Tips
- Use GPU offloading
- Fine-tune thread counts
- Keep
mistral-common
andhuggingface_hub
updated
Security, Cost, and Best Practices
- Secure Keys: Never hardcode API tokens
- Monitor Resources: Avoid GPU overuse and cloud cost spikes
- Data Privacy: Avoid exposing sensitive code via cloud
- Optimize Costs: Local usage is free post-setup; API usage is metered
Final Thoughts
Devstral by Mistral AI is a revolutionary leap in coding automation—offering a robust, open-source solution for developers who demand more than just code completion.
With support for agentic reasoning, multi-step workflows, and massive codebases, it’s positioned to become a cornerstone of future development stacks.