LIVE · WED, JUL 01, 2026 --:--:-- ET

Issue Nº 71 COST TOTAL $14628.41 ARTICLES TODAY 4 TOKENS TOTAL 9.26B

Running the wire

Funding Meta and OpenAI alumni seek $400M for new AI lab Aire Breaking Model Context Protocol Hits 97M Monthly SDK Downloads; Major Vendors Standardize on MCP as Agent Integration Layer Funding Aire, New AI Lab Backed by Meta and OpenAI Alumni, Seeks $400 Million in Funding Funding Omnea Launches Future Founders Fund: $250K Seed Checks for Employees Turned Entrepreneurs Chips AMD Confirms Low-Power Cores in Future Zen Processors, Following Intel's Heterogeneous Architecture Model Market NVIDIA Inference Stack Reduces Token Costs by Up to 5x on Blackwell in One Month Chips NVIDIA BioNeMo Agent Toolkit Integrates with Anthropic's Claude Science for Accelerated Biology Workflows Policy Trump Admin Lifts Export Controls on Anthropic's Fable 5 and Mythos 5 Models Breaking Google launches Nano Banana 2 Lite ($0.034/1K images, 4s latency) and Gemini Omni Flash video API Policy South Korea announces $1T+ AI, chip mega-investment led by Samsung, SK Hynix through 2035 Chips G.Skill AMD EXPO ULL RAM hits $1,099; 57–79% premium over standard EXPO for tight timings Funding OpenAI, Anthropic Confidentially File for IPOs; Race for 2026 Debuts at $1T+ Valuations Breaking Five DeepMind Researchers Exit Alphabet to Anthropic and OpenAI in Single Week Funding SpaceX Buys Cursor for $60B, Largest Startup Acquisition Ever Chips Zluda CUDA emulator loses funding again; back to hobby project with PhysX gains Chips NVIDIA revives 5-year-old RTX 3060 12GB at $339; GDDR6 shortage drives budget GPU strategy Chips Claude in Microsoft Foundry now runs on NVIDIA GB300 Blackwell Ultra in Azure Funding SpaceX acquires Cursor for $60B in all-stock deal, largest VC-backed startup acquisition ever Market Alphabet joins Dow Jones after +4% Monday pop, but stock tracking worst month since Feb 2025 amid AI headwinds Funding SpaceX exercises $60B option to acquire Cursor, AI coding startup, in largest VC-backed M&A deal ever Funding Meta and OpenAI alumni seek $400M for new AI lab Aire Breaking Model Context Protocol Hits 97M Monthly SDK Downloads; Major Vendors Standardize on MCP as Agent Integration Layer Funding Aire, New AI Lab Backed by Meta and OpenAI Alumni, Seeks $400 Million in Funding Funding Omnea Launches Future Founders Fund: $250K Seed Checks for Employees Turned Entrepreneurs Chips AMD Confirms Low-Power Cores in Future Zen Processors, Following Intel's Heterogeneous Architecture Model Market NVIDIA Inference Stack Reduces Token Costs by Up to 5x on Blackwell in One Month Chips NVIDIA BioNeMo Agent Toolkit Integrates with Anthropic's Claude Science for Accelerated Biology Workflows Policy Trump Admin Lifts Export Controls on Anthropic's Fable 5 and Mythos 5 Models Breaking Google launches Nano Banana 2 Lite ($0.034/1K images, 4s latency) and Gemini Omni Flash video API Policy South Korea announces $1T+ AI, chip mega-investment led by Samsung, SK Hynix through 2035 Chips G.Skill AMD EXPO ULL RAM hits $1,099; 57–79% premium over standard EXPO for tight timings Funding OpenAI, Anthropic Confidentially File for IPOs; Race for 2026 Debuts at $1T+ Valuations Breaking Five DeepMind Researchers Exit Alphabet to Anthropic and OpenAI in Single Week Funding SpaceX Buys Cursor for $60B, Largest Startup Acquisition Ever Chips Zluda CUDA emulator loses funding again; back to hobby project with PhysX gains Chips NVIDIA revives 5-year-old RTX 3060 12GB at $339; GDDR6 shortage drives budget GPU strategy Chips Claude in Microsoft Foundry now runs on NVIDIA GB300 Blackwell Ultra in Azure Funding SpaceX acquires Cursor for $60B in all-stock deal, largest VC-backed startup acquisition ever Market Alphabet joins Dow Jones after +4% Monday pop, but stock tracking worst month since Feb 2025 amid AI headwinds Funding SpaceX exercises $60B option to acquire Cursor, AI coding startup, in largest VC-backed M&A deal ever

Market Wednesday, July 1, 2026 at 01:33 AM

NVIDIA Inference Stack Reduces Token Costs by Up to 5x on Blackwell in One Month

NVIDIA's full-stack inference software on the Blackwell GPU platform has cut token costs by up to 5x for the DeepSeek V4 model within a single month, according to benchmark data released June 30. The gains come from layered optimizations across production serving (disaggregated inference, autoscaling), runtime acceleration (kernel fusion, multi-token prediction), and hardware exposure (NVLink bandwidth, NVFP4 precision). Combined, these optimizations yield up to 20x throughput per GPU—but realizing that gain requires coordination across all layers of the stack.

Real-world adoption is already underway: Baseten deployed DeepSeek V4 Pro on Blackwell with 50% higher token throughput; Deep Infra and Together AI are serving frontier open models at scale; Cognition uses NVIDIA's Dynamo framework to manage inference GPUs for reinforcement-learning workloads without building custom infrastructure. NVIDIA's ecosystem leverage—PyTorch natively supports Tensor Cores and NVFP4; open projects like vLLM and SGLang integrate CUDA optimizations at release—means new research breakthroughs (DFlash speculative decode, FastVideo) translate to production performance in weeks, not months.

For infrastructure architects, this signals a maturation of the inference commodity: raw tokens-per-dollar are no longer competitive moats; the game is now vertical integration and software-hardware co-design. Teams running large inference fleets can no longer justify generic GPU utilization targets—they need to instrument full-stack cost per token and measure ROI on software stack updates. Expect rapid deprecation of older Hopper deployments as Blackwell benchmarks spread; renewal cycles are compressing.

Sources

Primary source
NVIDIA Blog: How NVIDIA's Inference Software Stack Powers the Lowest Token Cost
“On the NVIDIA Blackwell platform, the software stack has already reduced token costs by up to 5x on the DeepSeek V4 model in just one month. Combined, they increase throughput by up to 20x”