DeepSeek V4 vs DeepSeek V3.2: What Changed and What Developers Should Use
If you open the DeepSeek API today and look at the available models, you will see deepseek-chat and deepseek-reasoner. Both of those are DeepSeek V3.2 — the current flagship from DeepSeek's last major release. DeepSeek V4 is a different animal: a trillion-parameter multimodal model with a new memory architecture, an 8× larger context window, and benchmark numbers that significantly surpass V3.2. This guide breaks down exactly what changed between DeepSeek V4 vs DeepSeek V3.2 and gives you a clear recommendation for which to use in production today.
DeepSeek V3.2: The Model Behind deepseek-chat and deepseek-reasoner
DeepSeek V3.2 is the version currently serving the DeepSeek API under two model identifiers:
deepseek-chat— V3.2 in standard mode, optimised for instruction following, coding, and general generationdeepseek-reasoner— V3.2 with the extended thinking (chain-of-thought) mode enabled, equivalent to the "R1" reasoning behaviour
V3.2 is a 671B parameter Mixture-of-Experts (MoE) model with 37B active parameters per token. This is the same efficiency trick that made the original DeepSeek-V3 notable: you get near-70B quality at a fraction of the compute cost because only 37B parameters activate per forward pass. The context window is 128K–164K tokens depending on the provider.
Key capabilities of V3.2 include:
- Gold-medal-level performance on IMO and IOI mathematical competitions
deepseek-reasonersupports tool calling during extended thinking — a significant upgrade over R1's original limitation- DeepSeek Sparse Attention (DSA) for efficient long-context handling
- Text-only — no image, video, or audio input
For a hands-on API guide covering both model variants, see our DeepSeek V3.2 API guide for deepseek-chat and deepseek-reasoner.
DeepSeek V4: What Actually Changed
DeepSeek V4 launched in early March 2026. It is not an incremental update — nearly every dimension of the model changed.
Scale
V4 has approximately 1 trillion total parameters, still in a MoE configuration with roughly 37B active per token. This keeps per-token compute costs comparable to V3.2 despite the dramatic parameter count increase, because the MoE routing activates only a fraction of the model per inference.
Engram — A New Memory Architecture
The most architecturally novel change in V4 is Engram, named after the neuroscience term for a memory trace. Engram separates static knowledge retrieval from dynamic neural reasoning. When the model encounters patterns it has seen many times — syntax rules, library function signatures, named entities — it retrieves them from a hash-based lookup table stored in DRAM instead of running them through attention layers.
This has two effects: it frees attention capacity for genuinely novel reasoning, and it reduces the VRAM requirement for running V4 locally because static knowledge is offloaded to system RAM rather than GPU memory.
Context Window
V4 supports a 1 million token context window — 8× larger than V3.2's 128K. For software engineering use cases, this means fitting an entire medium-sized codebase in a single context without chunking or retrieval augmentation.
Native Multimodal Input
V3.2 is text-only. V4 was trained from the start on text, images, video, and audio. This is not a bolt-on vision module — multimodality is part of V4's base architecture. Developers can pass screenshots, diagrams, or audio clips to the same API endpoint as text.
Architecture at a Glance
- Total parameters: V3.2 — 671B | V4 — ~1T
- Active parameters per token: V3.2 — 37B | V4 — ~37B (MoE efficiency preserved)
- Context window: V3.2 — 128K–164K tokens | V4 — 1M tokens
- Modalities: V3.2 — text only | V4 — text, image, video, audio
- Memory architecture: V3.2 — standard transformer | V4 — Engram + Manifold-Constrained Hyper-Connections
- License: V3.2 — MIT | V4 — Apache 2.0
- Training hardware: V3.2 — NVIDIA H800 | V4 — Huawei Ascend
The Huawei Ascend training detail is notable: V4 was trained entirely on non-NVIDIA hardware, which has significant geopolitical and supply-chain implications for a model intended to be open-weight under Apache 2.0.
Benchmark Performance: V4 vs V3.2
DeepSeek V4's headline numbers represent a substantial improvement over V3.2:
- SWE-bench Verified: V4 — 81% | V3 (baseline) — 69% | GPT-4o — ~49%
- HumanEval (coding): V4 — approximately 90%
- AIME (math competition): V3.2 — 93.1% | V4 — data pending final release benchmarks
- Long-context coherence: V4 maintains coherence over 1M token prompts — V3.2 degrades beyond 100K
The 81% SWE-bench score is the most important number for developers building agentic coding tools. SWE-bench Verified tests a model's ability to autonomously resolve real GitHub issues — it is the closest proxy to "can this model actually fix bugs in production code?" V4's 12-point improvement over V3's baseline puts it ahead of Claude Sonnet and GPT-4o on this benchmark.
Note: Benchmark data for V4 comes from pre-release and third-party testing. Verify current numbers at DeepSeek's official API documentation before making infrastructure decisions.
Reasoning Mode: deepseek-reasoner vs V4 Hybrid Reasoning
One of the more practically important differences between V3.2 and V4 is in their reasoning modes.
V3.2 reasoning (deepseek-reasoner): Extended thinking is a separate mode you activate via the API. The model produces a chain-of-thought reasoning block before the final answer. As of V3.2, this thinking mode supports tool calling — you can have the model reason through multiple tool calls before outputting its final response.
V4 reasoning: V4 uses a hybrid reasoning mode that does not require a separate model variant. The model dynamically decides how much reasoning to apply based on the complexity of the request. For simple completions it responds immediately; for complex multi-step problems it activates extended thinking automatically. Developers can also force either mode via API parameters.
For most agentic workflows, V4's hybrid approach is more practical: you don't need to maintain two separate API clients or conditionally route requests between deepseek-chat and deepseek-reasoner.
API Access and Pricing
DeepSeek V3.2 (current API):
Available now via api.deepseek.com as deepseek-chat and deepseek-reasoner. Pricing is among the lowest of any frontier model — check the official docs for current rates, as these change frequently.
DeepSeek V4 (new):
V4 is priced at approximately $0.30 per million input tokens and $0.50 per million output tokens. With cache hits, input costs drop to around $0.03/M — a 90% discount for applications that reuse long system prompts or context windows. Given the 1M token context and multimodal capabilities, this is a substantial cost improvement over comparable frontier multimodal models.
V4 weights are planned for release under Apache 2.0, which would allow commercial use without attribution requirements and enable self-hosting at scale.
For alternatives to DeepSeek V4 in case availability is limited, see our DeepSeek V4 alternatives guide.
Which DeepSeek Version Should You Use?
Here is a direct recommendation based on use case:
Use DeepSeek V3.2 (deepseek-chat / deepseek-reasoner) if:
- You need a stable, production-ready API endpoint today — V3.2 is the current default
- Your use case is text-only and your context requirements are under 100K tokens
- You want the lowest possible per-token cost with proven reliability
- You need explicit control over when reasoning mode activates (
deepseek-reasonervsdeepseek-chat)
Use DeepSeek V4 if:
- You are building agentic coding tools that need to understand large codebases — the 1M context window is a genuine capability unlock
- Your application handles image, video, or audio inputs alongside text
- You are evaluating models for SWE-bench-class workloads — V4's 81% is the benchmark to beat
- You want a simpler API (single model, hybrid reasoning) rather than maintaining separate chat/reasoner routing
For new projects starting today: Build against the V4 API if it is available in your region. The cost premium over V3.2 is modest at standard volumes, and the architectural advantages — particularly the 1M context and hybrid reasoning — are significant enough to justify the switch.
For a broader look at how V4 compares to the competition beyond DeepSeek's own model lineup, see our DeepSeek V3 vs V4 deep dive and the official release status tracker.