DeepSeek V4: Release Date, Features, Benchmarks, and What to Expect
DeepSeek V4 is weeks away from launch. This article tracks the confirmed release timeline, explains the three architectural innovations (Engram, DSA, mHC), and gives developers a benchmark comparison and action plan for the transition.
DeepSeek V4 is the most anticipated AI model release in the open-source community right now — and as of April 2026, it has not officially launched. With over 21,000 monthly searches and a wave of pre-release leaks covering architecture papers, benchmark claims, and hardware choices, developers are rightfully paying attention. This article tracks the confirmed release timeline, breaks down the DeepSeek V4 architectural changes, and tells you exactly what to do while you wait.
DeepSeek V4 Release Status — April 2026 Update
As of April 11, 2026, DeepSeek V4 has not been publicly released. DeepSeek's official API still serves deepseek-chat and deepseek-reasoner, both mapped to DeepSeek-V3.2 with a 128K context window. No V4 model ID has appeared, no changelog entry, and no official announcement has been made by DeepSeek AI.
The clearest signal of an imminent release came on April 3, 2026, when Reuters — citing The Information — reported that DeepSeek V4 is expected to launch within "the next few weeks" and will run on Huawei's latest chips. That is the closest thing to a confirmed timeline available today.
For a guide to the current production model, see our DeepSeek V3.2-Speciale installation and benchmarks guide.
DeepSeek V4 Release Date — What We Know
The timeline has slipped several times. Here is the full sequence:
- Late 2025: Pre-release leaks suggest a mid-February 2026 launch tied to Lunar New Year (February 17).
- February 2026: No launch. Chinese tech outlet Whale Lab reports the model is being held back for further testing.
- March 16, 2026: Dataconomy reports DeepSeek V4 and Tencent's Hunyuan model will both launch in April.
- April 3, 2026: Reuters confirms V4 is "weeks away" and will run on Huawei Ascend 950PR chips.
The Huawei chip detail matters for international developers. DeepSeek reportedly gave Huawei's Ascend 950PR chips exclusive early hardware access to V4 while denying NVIDIA early access — a deliberate signal amid ongoing US export controls on advanced semiconductors destined for China. The key practical question: will this affect API availability for developers outside China?
Based on DeepSeek's established pattern with V3 and V3.2, international API access through api.deepseek.com is expected to continue. However, this is not officially confirmed for V4, and the geopolitical context is worth monitoring.
DeepSeek V4 Architecture — Three Key Innovations
DeepSeek V4 is not a simple parameter scale-up of V3.2. Three documented architectural innovations distinguish it, each targeting a specific limitation of the prior generation.
Engram Conditional Memory
Engram replaces attention-based retrieval for static knowledge with hash-based O(1) lookups stored in DRAM rather than GPU VRAM. Think of it as a read-only key-value cache for factual recall — separate from the transformer's attention mechanism and not subject to its quadratic scaling cost. A pre-release paper reports 3–5 point benchmark improvements and a Needle-in-a-Haystack accuracy jump from 84.2% to 97% on a 27B test model.
For developers, the practical impact is significant. V4's context window targets 1M+ tokens — the difference between passing in a single file versus an entire repository. Where V3.2 can lose coherence at the far end of long contexts, Engram's architecture is specifically designed for stable retrieval at that scale.
DeepSeek Sparse Attention (DSA)
DSA is a dynamic attention routing mechanism that selects between dense and sparse computation paths based on token complexity. Routine tokens (boilerplate, repetitive syntax) get efficient sparse attention. Tokens that require deep reasoning get full dense attention. The result: substantially lower inference cost per token without degrading quality on hard reasoning tasks.
Manifold-Constrained Hyper-Connections (mHC)
Training a 1-trillion-parameter model reliably is a hard optimization problem. Gradient instability at that scale produces unpredictable training curves — a challenge that has historically made trillion-parameter MoE models unreliable to bring to convergence. mHC addresses this by constraining parameter updates to stable manifold paths during training, allowing DeepSeek's team to train V4 without the instability spikes that disrupted earlier large-scale attempts.
DeepSeek V4 Benchmarks — Pre-Release Claims
⚠ All V4 benchmark numbers below come from pre-release internal testing and leaks. No independent evaluations have been published. Treat these as projections, not confirmed results.
With that caveat stated, here is what the pre-release data claims:
- SWE-bench Verified: V3.2-Speciale 67.8% → V4 claimed ~81%
- HumanEval: V3.2 ~82% → V4 claimed ~90%
- MMLU-Pro: V3.2 85.0% → V4 claimed ~89%
- LiveCodeBench: V3.2 74.1% → V4 not yet reported
The SWE-bench claim is the most significant for coding use cases. A jump from 67.8% to 81% would put V4 ahead of all current open-weight models on software engineering tasks. The architectural changes — particularly DSA for coding-heavy inference and Engram for long-context retrieval — provide a plausible technical basis for the improvement, even before independent verification.
For current reasoning model benchmarks in the DeepSeek family, see our DeepSeek R1-0528 vs OpenAI O3 comparison.
DeepSeek V4 vs DeepSeek V3.2 — Feature Comparison
- Total parameters: V3.2 671B → V4 ~1T
- Active parameters per token: V3.2 37B → V4 ~32B
- Context window: V3.2 128K → V4 1M+ tokens
- Multimodal support: V3.2 text only → V4 text + image (expected)
- Architecture: V3.2 MoE + standard attention → V4 MoE + Engram + DSA + mHC
- Expert routing: V3.2 top-2/top-4 per token → V4 16 expert pathways per token
- Training hardware: V3.2 NVIDIA H800 → V4 Huawei Ascend 950PR
The active parameter reduction (37B → ~32B) alongside the expanded expert pool (top-4 → 16 pathways) means each inference call draws from a richer knowledge base while activating fewer parameters per token. Combined with the 1M+ context window, this changes the calculus for long-running coding agents significantly.
For a thorough architecture comparison across generations, see our DeepSeek V3 vs V4 architecture deep dive.
DeepSeek V4 vs DeepSeek R1 — Which Model Is Which?
A persistent source of confusion: DeepSeek V4 is a general-purpose language model. DeepSeek R1 (and variants like R1-0528) is the reasoning-focused line — designed for chain-of-thought tasks, math, multi-step logic, and structured problem-solving where showing work improves accuracy.
V4 is the successor to V3.2: the general-purpose model used for chat, code completion, document analysis, and API integrations. When V4 launches, expect DeepSeek to maintain both lines — V4 for general tasks and an eventual R2 for deep reasoning. These are not competing products; they serve different use cases in the same way GPT-4o and o3 serve different use cases. For current comparisons across the general-purpose space, our DeepSeek V3.1 Terminus vs ChatGPT 5 vs Claude 4.1 comparison covers the practical trade-offs.
DeepSeek V4 API Pricing and Access
DeepSeek has not announced official V4 pricing. Based on V3.2's pricing trajectory and pre-release reports, projected figures are:
- Input tokens: ~$0.14 per million (⚠ unverified projection)
- Output tokens: ~$0.28 per million (⚠ unverified projection)
- Cached input: ~$0.07 per million (⚠ unverified projection)
If these projections are accurate, output token costs drop roughly 75% compared to current V3.2 rates — continuing DeepSeek's consistent pattern of aggressive price reduction with each model generation. For high-throughput applications, that change alone would be a compelling migration reason.
Today, DeepSeek's API at api.deepseek.com exposes two models: deepseek-chat (mapped to V3.2) and deepseek-reasoner (mapped to R1). When V4 launches, expect either a new deepseek-v4 model ID or a redirect of deepseek-chat — the same pattern used when V3.2 replaced V3.1. Code using the OpenAI-compatible SDK with DeepSeek's base URL requires only a model name change to migrate. For current API usage, see our DeepSeek V3.2-Exp API and performance guide.
What Developers Should Do Before DeepSeek V4 Launches
V4 is not live yet, but you can be ready to adopt it the day it ships:
- Watch DeepSeek's API changelog and HuggingFace repo: The fastest signal will be a new model ID appearing at
api.deepseek.com/modelsor a new model card onhuggingface.co/deepseek-ai. Model releases typically appear on HuggingFace within hours of announcement. - Stay on V3.2 for production today:
deepseek-chat(V3.2) is production-ready, well-documented, and fully accessible via the OpenAI-compatible SDK. Do not delay current projects waiting for V4. The API interface will be backward-compatible — migration requires only a model ID swap. - Architect for long context now: V4's 1M+ token window is its headline developer feature. If you are building applications that would benefit from full codebase or large document ingestion, design your context strategy now. Structure your prompts and chunking logic to take advantage of extended context when it becomes available.
DeepSeek V4 represents a substantive architectural step — not just more parameters. Engram memory, DeepSeek Sparse Attention, and mHC each address real limitations of the prior generation. The benchmark claims are compelling, the pricing trajectory is aggressive, and the release is weeks away. The developers who prepare now will be the first to ship with it.