How to Run Devstral by Mistral

How to Run Devstral by Mistral

Devstral, Mistral AI’s cutting-edge agentic coding model, is redefining the boundaries of automated software engineering. Whether you’re a hobbyist developer, a seasoned enterprise engineer, or a research scientist, Devstral offers unprecedented capabilities that streamline and scale complex coding workflows.

What is Devstral?

Devstral is a high-performance, open-source agentic coding large language model (LLM) developed by Mistral AI in collaboration with All Hands AI. It is engineered specifically for real-world software engineering tasks and fine-tuned to excel at:

  • Navigating and understanding large, complex codebases
  • Editing multiple files while resolving deep dependency trees
  • Solving actual GitHub issues with production-level context
  • Generating, debugging, and refactoring code at scale

Built on the Mistral Small 3.1 architecture, Devstral features a 128k context window, allowing it to consume and reason about extensive documentation and multi-file codebases. It is a text-only model, with the vision encoder removed to optimize for code-centric tasks.

Why Choose Devstral?

  • Top-Tier Performance: Scores 46.8% on the SWE-Bench Verified benchmark, outperforming many proprietary models.
  • Agentic Intelligence: Integrates seamlessly with frameworks like OpenHands to plan, execute, and verify multi-step engineering tasks.
  • Open and Efficient: Fully open-source and capable of running on both consumer-grade hardware and enterprise GPUs.
  • Scalable: Suitable for local development, cloud deployment, and production-grade pipeline integration.

Key Features and Capabilities

  • 128k Context Window: Ideal for handling large codebases and documentation.
  • Agentic Reasoning: Performs autonomous, multi-step tasks using planning and tool usage.
  • Tool Integration: Works with frameworks like OpenHands and SWE-Agent.
  • Text-Only Specialization: Designed for software engineering with superior precision and speed.
  • High Compatibility: Runs on NVIDIA RTX 4090, H100, A100, and Macs with 32GB+ RAM.

System Requirements

Minimum Hardware:

  • GPU: 1x NVIDIA H100 / 2x RTX A6000 / RTX 4090
  • RAM: 32GB minimum (64GB+ recommended)
  • Disk: 100GB free
  • CPU: Multi-core processor

Software:

  • Python 3.8+
  • Docker (for OpenHands)
  • pip or conda
  • Access to Hugging Face Hub
  • (Optional) JupyterLab for development

Running Devstral Locally

1. Environment Setup

Create a Python virtual environment using Anaconda:

conda create -n devstral python=3.10
conda activate devstral

Install required packages:

pip install mistral_inference --upgrade
pip install huggingface_hub

2. Download Model Files

Use Hugging Face to fetch the model:

from huggingface_hub import snapshot_download
from pathlib import Path

mistral_models_path = Path.home().joinpath('mistral_models', 'Devstral')
mistral_models_path.mkdir(parents=True, exist_ok=True)

snapshot_download(
    repo_id="mistralai/Devstral-Small-2505",
    allow_patterns=["params.json", "consolidated.safetensors", "tekken.json"],
    local_dir=mistral_models_path
)

3. Launch Devstral with CLI

mistral-chat $HOME/mistral_models/Devstral --instruct --max_tokens 300

Test with a prompt like:

Create a REST API from scratch using Python.

4. Advanced Deployment with vLLM

vllm serve mistralai/Devstral-Small-2505 \
  --tokenizer_mode mistral \
  --config_format mistral \
  --load_format mistral \
  --tool-call-parser mistral \
  --enable-auto-tool-choice \
  --tensor-parallel-size 2

Performance Tuning Tips:

  • Adjust --threads and --ctx-size up to 128k
  • Optimize GPU usage with --n-gpu-layers

Running Devstral in the Cloud

1. Select a Cloud Provider

Options include NodeShift, AWS, GCP, and Azure. NodeShift is particularly affordable and user-friendly.

2. Provision GPU Resources

Recommended:

  • 1x H100 or 2x A6000
  • 100GB disk
  • 80GB RAM

3. Install and Launch

Follow local setup steps. Use VS Code Remote-SSH for development.

4. Build Example Apps

Create full-stack apps (like an RGB Color Mixer) using Devstral. It generates complete HTML, CSS, and JS, ready to deploy.

Using Devstral with OpenHands

OpenHands is a robust automation platform that connects with Devstral for agentic coding.

1. Get a Mistral API Key

Sign up on the Mistral AI platform and fund your account ($5 minimum).

2. Configure OpenHands

export MISTRAL_API_KEY=<YOUR_KEY>
docker pull docker.all-hands.dev/all-hands-ai/runtime:0.39-nikolaik

mkdir -p ~/.openhands-state
cat << EOF > ~/.openhands-state/settings.json
{
  "language": "en",
  "agent": "CodeActAgent",
  "llm_model": "mistral/devstral-small-2505",
  "llm_api_key": "${MISTRAL_API_KEY}",
  "enable_default_condenser": true
}
EOF

3. Run OpenHands

docker run -it --rm --pull=always \
  -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.39-nikolaik \
  -e LOG_ALL_EVENTS=true \
  -v /var/run/docker.sock:/var/run/docker.sock \
  -v ~/.openhands-state:/.openhands-state \
  -p 3000:3000 \
  --add-host host.docker.internal:host-gateway \
  --name openhands-app \
  --memory="4g" \
  --cpus="2" \
  docker.all-hands.dev/all-hands-ai/openhands:0.39

Use the web UI or API to delegate coding tasks—Devstral handles them autonomously.

Fine-Tuning and Customization

1. Fine-Tune with Unsloth

  • Up to 2x faster training, 70% less VRAM
  • Supports 8x longer context

Steps:

  • Install unsloth and llama.cpp
  • Prepare datasets (code, issues, docs)
  • Set training parameters (batch size, learning rate, etc.)

2. Custom Prompt Engineering

Use system-level prompts (like SYSTEM_PROMPT.txt) to guide Devstral’s behavior for specific tasks:

  • Code generation
  • Bug fixing
  • Code documentation

Example Use Cases

Devstral shines in real-world development:

  • Code Generation: Full-stack apps, APIs, interfaces
  • Bug Fixing: From single-line issues to repo-wide bugs
  • Refactoring: Suggests and applies cleaner architectures
  • Documentation: Generates docstrings, READMEs, and guides
  • Testing: Writes and runs tests automatically
  • Agentic Integration: Automates workflows via OpenHands

Troubleshooting and Optimization

Common Issues

  • OOM Errors: Reduce batch size or context window
  • Slow Inference: Enable parallelism and upgrade GPU
  • API Issues: Validate keys and check token limits
  • Model Hangups: Inspect Docker logs and file paths

Performance Tips

  • Use GPU offloading
  • Fine-tune thread counts
  • Keep mistral-common and huggingface_hub updated

Security, Cost, and Best Practices

  • Secure Keys: Never hardcode API tokens
  • Monitor Resources: Avoid GPU overuse and cloud cost spikes
  • Data Privacy: Avoid exposing sensitive code via cloud
  • Optimize Costs: Local usage is free post-setup; API usage is metered

Final Thoughts

Devstral by Mistral AI is a revolutionary leap in coding automation—offering a robust, open-source solution for developers who demand more than just code completion.

With support for agentic reasoning, multi-step workflows, and massive codebases, it’s positioned to become a cornerstone of future development stacks.

FAQ 1 Can I run Devstral on a CPU-only machine? FAQ 1 Yes, but performance will be significantly slower. GPU acceleration is strongly recommended for practical use.
FAQ 2 Is Devstral suitable for production? FAQ answer 2 Yes, for many coding automation tasks, but always validate outputs and test thoroughly before deploying generated code.

References

  1. Llama 4 vs Mistral 7B: A Comprehensive Comparison of AI Models
  2. Run Mistral 7B on macOS: Step by Step Guide