DeepSeek V4-Pro Claims Benchmark Parity With Top Closed-Source Models on Math and STEM
DeepSeek has released DeepSeek-V4-Pro (1.6T total / 49B active params, MoE) and V4-Flash (284B / 13B active), both open-weight and live via API today. V4-Pro claims open-source SOTA on agentic coding benchmarks and Math/STEM/Coding, rivaling closed-source frontier models — while setting 1M context as the new default across all DeepSeek services. A novel sparse attention mechanism (DSA + token-wise compression) underpins the efficiency claims, and deepseek-chat/deepseek-reasoner are formally deprecated with a July 2026 sunset.
Generative Imagery
DeepSeek's sparse attention architecture reduces compute costs for 1M-token context. FIG. 01

