LIVE · MON, JUN 29, 2026 --:--:-- ET
Issue Nº 69 COST TOTAL $14603.40 ARTICLES TODAY 1 TOKENS TOTAL 9.23B
aiexpert
Running the wire
Funding Baidu's Kunlunxin targets $50B Hong Kong IPO, ties chip purchases to allocations Funding Momenta launches Hong Kong IPO targeting $751M for autonomous driving R&D Chips HBM now comprises 35-47% of AI accelerator BOM; GB200 HBM alone costs $4,800/unit Market Samsung HBM4 revenue tops $1 billion; targets $10 billion run-rate by end-2026 Chips OpenAI, Broadcom unveil Jalapeño LLM inference chip; gigawatt-scale deployment targeted by end-2026 Market TSMC warns AI chip shortage to persist into 2027; signals 15% 3nm price increase H2 2026 Research DeepSeek V4 DSpark speculative decoding cuts inference latency 85%, hits Together AI Breaking OpenAI launches $150M Partner Network to certify 300K consultants by year-end Breaking HP becomes flagship Frontier adopter; OpenAI scales enterprise AI agent platform with consulting partnerships Breaking Apple lobbies White House for CXMT DRAM approval as memory costs hit 20% MacBook, iPad price hikes Funding Samsung, SK Hynix plan $1.3T capex over decade on AI memory demand Breaking Lenovo, NVIDIA Partner on AI Cloud Gigafactory; Reduce Inference Server Deployment Timelines from Months to Weeks Chips Google TPUs Power Anthropic Expansion; Up to 1M Ironwood Chips Lock $40B Multi-Gigawatt Capacity Deal Through 2027+ Chips NVIDIA confirms Vera Rubin full production; Rubin GPU leads AgentPerf with 20x efficiency over Hopper Policy FERC orders grid operators to fast-track AI data center connections; 60-day deadline to justify or rewrite tariffs Chips Coherent CHIPS grant $50M for indium phosphide fab expansion; quadruples Sherman wafer output for AI optical networking Chips NVIDIA partners with SK Hynix on next-gen AI memory; codeveloping for Vera Rubin and autonomous fabs Chips TSMC CoWoS hits 98% yield; SoW-X roadmap supports 64 HBM stacks; co-packaged optics production 2026 Chips PNY DDR5-5600 32GB hits $379.99 — cheapest 2x16GB kit amid RAM crisis; 16% discount Chips TSMC 2nm mass production hits 70% yield; Apple, NVIDIA locked in through 2026 Funding Baidu's Kunlunxin targets $50B Hong Kong IPO, ties chip purchases to allocations Funding Momenta launches Hong Kong IPO targeting $751M for autonomous driving R&D Chips HBM now comprises 35-47% of AI accelerator BOM; GB200 HBM alone costs $4,800/unit Market Samsung HBM4 revenue tops $1 billion; targets $10 billion run-rate by end-2026 Chips OpenAI, Broadcom unveil Jalapeño LLM inference chip; gigawatt-scale deployment targeted by end-2026 Market TSMC warns AI chip shortage to persist into 2027; signals 15% 3nm price increase H2 2026 Research DeepSeek V4 DSpark speculative decoding cuts inference latency 85%, hits Together AI Breaking OpenAI launches $150M Partner Network to certify 300K consultants by year-end Breaking HP becomes flagship Frontier adopter; OpenAI scales enterprise AI agent platform with consulting partnerships Breaking Apple lobbies White House for CXMT DRAM approval as memory costs hit 20% MacBook, iPad price hikes Funding Samsung, SK Hynix plan $1.3T capex over decade on AI memory demand Breaking Lenovo, NVIDIA Partner on AI Cloud Gigafactory; Reduce Inference Server Deployment Timelines from Months to Weeks Chips Google TPUs Power Anthropic Expansion; Up to 1M Ironwood Chips Lock $40B Multi-Gigawatt Capacity Deal Through 2027+ Chips NVIDIA confirms Vera Rubin full production; Rubin GPU leads AgentPerf with 20x efficiency over Hopper Policy FERC orders grid operators to fast-track AI data center connections; 60-day deadline to justify or rewrite tariffs Chips Coherent CHIPS grant $50M for indium phosphide fab expansion; quadruples Sherman wafer output for AI optical networking Chips NVIDIA partners with SK Hynix on next-gen AI memory; codeveloping for Vera Rubin and autonomous fabs Chips TSMC CoWoS hits 98% yield; SoW-X roadmap supports 64 HBM stacks; co-packaged optics production 2026 Chips PNY DDR5-5600 32GB hits $379.99 — cheapest 2x16GB kit amid RAM crisis; 16% discount Chips TSMC 2nm mass production hits 70% yield; Apple, NVIDIA locked in through 2026
Research

DeepSeek V4 DSpark speculative decoding cuts inference latency 85%, hits Together AI

DeepSeek released DSpark, a speculative decoding framework for V4-Pro and V4-Flash, on June 27, 2026, claiming up to 85% reduction in inference latency without requiring new hardware or model retraining. Speculative decoding generates low-cost draft tokens using a smaller model, then verifies them against the full model, trading higher prefill cost for reduced decode tokens and lower overall latency. DeepSeek claims the technique works across both its hosted API and self-hosted open weights, though independent benchmarks had not been published as of June 28. The speedup figures derive from DeepSeek's own benchmarks on DeepSeek infrastructure against its own prior baseline (MTP-1), so claims merit third-party verification before production deployment planning.

Together AI launched DeepSeek V4 Pro on its Serverless Inference platform on June 27-28, 2026, with cached input pricing for cost-effective long-context reasoning. V4 Pro is a 1.6T MoE model (49B activated) supporting 512K context on Together (expandable to 1M on dedicated), offering three reasoning modes (Non-Think, Think High, Think Max) and 90.1% GPQA-Diamond + 95.2% HMMT-2026 math performance. The availability reflects a structural shift in open-source inference economics: models like V4-Pro now rival or exceed closed-source alternatives on agentic and coding tasks, with cost-per-token competitive with smaller proprietary offerings once serving costs are optimized.

For teams evaluating open-source reasoning models for production agents and long-document codebases, V4-Pro availability on Together (plus self-hosting optionality) is a material change in the build-vs-buy calculation. The combination of hybrid attention architecture (reducing KV cache 90% vs V3.2 at 1M context), aggressive quantization (FP4+FP8 mixed), and DSpark speculative decoding suggests inference cost per token for V4 could undercut comparable closed-source workloads in 2027. Watch third-party latency benchmarks; if independent confirmation validates the 85% speedup claim on production inference patterns, it reshapes the ROI on both custom silicon (Jalapeño, B200) and inference infrastructure purchasing decisions.

Sources