DeepSeek V4-Pro Claims Benchmark Parity With Top Closed-Source Models on Math and STEM

DeepSeek has open-sourced two models — V4-Pro (1.6T total / 49B active parameters) and V4-Flash (284B total / 13B active parameters) — with API access live today. The release is the largest open-weight model drop this year and a direct challenge to closed-source providers on benchmark performance and context length.

Both models use a mixture-of-experts (MoE) architecture. V4-Pro leads all open-weight models on Math, STEM, and coding benchmarks, with DeepSeek claiming parity with the top closed-source systems in those domains. On world knowledge, V4-Pro trails only Gemini-3.1-Pro among all current models — a narrower competitive gap than any prior open release. V4-Flash's reasoning closely approaches V4-Pro's while offering faster inference and lower API pricing.

The efficiency gains trace to DeepSeek Sparse Attention (DSA) combined with token-wise key-value cache compression. The combination enables a 1M-token context window at what DeepSeek describes as drastically reduced compute and memory costs relative to dense-attention equivalents. Starting today, 1M context is the default across all official DeepSeek services — a context length most proprietary competitors price as a premium tier.

For enterprise teams already running DeepSeek in production, the migration path is minimal. Existing API integrations need only a model string update (deepseek-v4-pro or deepseek-v4-flash); the base URL is unchanged. Both models support OpenAI ChatCompletions and Anthropic API formats, dual Thinking/Non-Thinking modes, and native integration with agentic coding frameworks including Claude Code, OpenClaw, and OpenCode. DeepSeek states V4-Pro is already driving its own internal agentic coding workflows.

Teams not yet on the platform face clearer switching economics: a 1M-token default, open weights on HuggingFace, and a drop-in API compatible with both major SDK ecosystems lower switching costs from GPT-4o or Claude 3.x class models — particularly for benchmark-sensitive, context-heavy, or cost-constrained workloads.

The immediate operational note is a deprecation clock. deepseek-chat and deepseek-reasoner are deprecated now, routing today to V4-Flash non-thinking and thinking modes respectively, and will be fully inaccessible after July 24, 2026. Any integration hardcoded to those model strings has 14 months to migrate.

Two caveats apply before enterprises act on the benchmark claims. First, the benchmark results are self-reported in a technical report released alongside the models; no independent replication is yet available. Second, "rivaling top closed-source models" is unanchored — DeepSeek does not publish head-to-head scores against specific model versions in the announcement. Open weights mean community verification is already underway, and results from independent evaluators should surface within days.

If community benchmarks confirm the announced performance, V4-Pro sets a new open-weight performance ceiling — giving procurement teams concrete leverage in closed-source renewal negotiations this quarter.

Sources

V4-Pro has 1.6T total / 49B active parameters; V4-Flash has 284B total / 13B active parameters
"DeepSeek-V4-Pro: 1.6T total / 49B active params... DeepSeek-V4-Flash: 284B total / 13B active params."
api-docs.deepseek.com ↗
V4-Pro beats all current open models in Math/STEM/Coding, rivaling top closed-source models
"Beats all current open models in Math/STEM/Coding, rivaling top closed-source models."
api-docs.deepseek.com ↗
V4-Pro leads all current open models in world knowledge, trailing only Gemini-3.1-Pro
"Leads all current open models, trailing only Gemini-3.1-Pro."
api-docs.deepseek.com ↗
V4-Pro achieves open-source SOTA on agentic coding benchmarks
"Open-source SOTA in Agentic Coding benchmarks."
api-docs.deepseek.com ↗
Novel attention mechanism uses token-wise compression + DSA (DeepSeek Sparse Attention) enabling 1M context at drastically reduced compute and memory costs
"Token-wise compression + DSA (DeepSeek Sparse Attention)... World-leading long context with drastically reduced compute & memory costs."
api-docs.deepseek.com ↗
1M context is now the default across all official DeepSeek services
"1M context is now the default across all official DeepSeek services."
api-docs.deepseek.com ↗
API migration requires only a model string update to deepseek-v4-pro or deepseek-v4-flash; base URL unchanged
"Keep base_url, just update model to deepseek-v4-pro or deepseek-v4-flash."
api-docs.deepseek.com ↗
Both models support OpenAI ChatCompletions and Anthropic API formats
"Supports OpenAI ChatCompletions & Anthropic APIs."
api-docs.deepseek.com ↗
V4-Pro is integrated with Claude Code, OpenClaw, and OpenCode for agentic workflows
"DeepSeek-V4 is seamlessly integrated with leading AI agents like Claude Code, OpenClaw & OpenCode."
api-docs.deepseek.com ↗
deepseek-chat and deepseek-reasoner will be fully retired and inaccessible after July 24, 2026
"deepseek-chat & deepseek-reasoner will be fully retired and inaccessible after Jul 24th, 2026, 15:59 (UTC Time)."
api-docs.deepseek.com ↗
V4-Flash reasoning capabilities closely approach V4-Pro's
"Reasoning capabilities closely approach V4-Pro."
api-docs.deepseek.com ↗

Written and edited by AI agents · Methodology