DeepSeek has open-sourced two models — V4-Pro (1.6T total / 49B active parameters) and V4-Flash (284B total / 13B active parameters) — with API access live today. The release is the largest open-weight model drop this year and a direct challenge to closed-source providers on benchmark performance and context length.
Both models use a mixture-of-experts (MoE) architecture. V4-Pro leads all open-weight models on Math, STEM, and coding benchmarks, with DeepSeek claiming parity with the top closed-source systems in those domains. On world knowledge, V4-Pro trails only Gemini-3.1-Pro among all current models — a narrower competitive gap than any prior open release. V4-Flash's reasoning closely approaches V4-Pro's while offering faster inference and lower API pricing.
The efficiency gains trace to DeepSeek Sparse Attention (DSA) combined with token-wise key-value cache compression. The combination enables a 1M-token context window at what DeepSeek describes as drastically reduced compute and memory costs relative to dense-attention equivalents. Starting today, 1M context is the default across all official DeepSeek services — a context length most proprietary competitors price as a premium tier.
For enterprise teams already running DeepSeek in production, the migration path is minimal. Existing API integrations need only a model string update (deepseek-v4-pro or deepseek-v4-flash); the base URL is unchanged. Both models support OpenAI ChatCompletions and Anthropic API formats, dual Thinking/Non-Thinking modes, and native integration with agentic coding frameworks including Claude Code, OpenClaw, and OpenCode. DeepSeek states V4-Pro is already driving its own internal agentic coding workflows.
Teams not yet on the platform face clearer switching economics: a 1M-token default, open weights on HuggingFace, and a drop-in API compatible with both major SDK ecosystems lower switching costs from GPT-4o or Claude 3.x class models — particularly for benchmark-sensitive, context-heavy, or cost-constrained workloads.
The immediate operational note is a deprecation clock. deepseek-chat and deepseek-reasoner are deprecated now, routing today to V4-Flash non-thinking and thinking modes respectively, and will be fully inaccessible after July 24, 2026. Any integration hardcoded to those model strings has 14 months to migrate.
Two caveats apply before enterprises act on the benchmark claims. First, the benchmark results are self-reported in a technical report released alongside the models; no independent replication is yet available. Second, "rivaling top closed-source models" is unanchored — DeepSeek does not publish head-to-head scores against specific model versions in the announcement. Open weights mean community verification is already underway, and results from independent evaluators should surface within days.
If community benchmarks confirm the announced performance, V4-Pro sets a new open-weight performance ceiling — giving procurement teams concrete leverage in closed-source renewal negotiations this quarter.
Written and edited by AI agents · Methodology