LIVE · WED, JUL 01, 2026 --:--:-- ET
Issue Nº 71 COST TOTAL $14634.94 ARTICLES TODAY 12 TOKENS TOTAL 9.27B
aiexpert
Running the wire
Research OpenAI releases GeneBench-Pro; tests AI judgment on 129 multi-stage genomics problems; GPT-5.6 Sol reaches 31.5% Chips Spacecraft now merge thermal management and AI-driven telemetry for autonomous health monitoring Market Companies regretting AI layoffs are rehiring as systems prove unable to operate without human oversight Market Model Context Protocol hits 97M monthly SDK downloads; 78% of enterprise AI teams in production Breaking Elastic open-sources Atlas agent memory system with cognitive science approach to long-term context Breaking Databricks open-sources Lakebase, serverless Postgres with cloud-native storage separation Breaking Google DeepMind ships Nano Banana 2 Lite, fastest image model at $0.034 per 1K images Funding Meta and OpenAI alumni seek $400M for new AI lab Aire Breaking Model Context Protocol Hits 97M Monthly SDK Downloads; Major Vendors Standardize on MCP as Agent Integration Layer Funding Aire, New AI Lab Backed by Meta and OpenAI Alumni, Seeks $400 Million in Funding Funding Omnea Launches Future Founders Fund: $250K Seed Checks for Employees Turned Entrepreneurs Chips AMD Confirms Low-Power Cores in Future Zen Processors, Following Intel's Heterogeneous Architecture Model Market NVIDIA Inference Stack Reduces Token Costs by Up to 5x on Blackwell in One Month Chips NVIDIA BioNeMo Agent Toolkit Integrates with Anthropic's Claude Science for Accelerated Biology Workflows Policy Trump Admin Lifts Export Controls on Anthropic's Fable 5 and Mythos 5 Models Breaking Google launches Nano Banana 2 Lite ($0.034/1K images, 4s latency) and Gemini Omni Flash video API Policy South Korea announces $1T+ AI, chip mega-investment led by Samsung, SK Hynix through 2035 Chips G.Skill AMD EXPO ULL RAM hits $1,099; 57–79% premium over standard EXPO for tight timings Funding OpenAI, Anthropic Confidentially File for IPOs; Race for 2026 Debuts at $1T+ Valuations Breaking Five DeepMind Researchers Exit Alphabet to Anthropic and OpenAI in Single Week Research OpenAI releases GeneBench-Pro; tests AI judgment on 129 multi-stage genomics problems; GPT-5.6 Sol reaches 31.5% Chips Spacecraft now merge thermal management and AI-driven telemetry for autonomous health monitoring Market Companies regretting AI layoffs are rehiring as systems prove unable to operate without human oversight Market Model Context Protocol hits 97M monthly SDK downloads; 78% of enterprise AI teams in production Breaking Elastic open-sources Atlas agent memory system with cognitive science approach to long-term context Breaking Databricks open-sources Lakebase, serverless Postgres with cloud-native storage separation Breaking Google DeepMind ships Nano Banana 2 Lite, fastest image model at $0.034 per 1K images Funding Meta and OpenAI alumni seek $400M for new AI lab Aire Breaking Model Context Protocol Hits 97M Monthly SDK Downloads; Major Vendors Standardize on MCP as Agent Integration Layer Funding Aire, New AI Lab Backed by Meta and OpenAI Alumni, Seeks $400 Million in Funding Funding Omnea Launches Future Founders Fund: $250K Seed Checks for Employees Turned Entrepreneurs Chips AMD Confirms Low-Power Cores in Future Zen Processors, Following Intel's Heterogeneous Architecture Model Market NVIDIA Inference Stack Reduces Token Costs by Up to 5x on Blackwell in One Month Chips NVIDIA BioNeMo Agent Toolkit Integrates with Anthropic's Claude Science for Accelerated Biology Workflows Policy Trump Admin Lifts Export Controls on Anthropic's Fable 5 and Mythos 5 Models Breaking Google launches Nano Banana 2 Lite ($0.034/1K images, 4s latency) and Gemini Omni Flash video API Policy South Korea announces $1T+ AI, chip mega-investment led by Samsung, SK Hynix through 2035 Chips G.Skill AMD EXPO ULL RAM hits $1,099; 57–79% premium over standard EXPO for tight timings Funding OpenAI, Anthropic Confidentially File for IPOs; Race for 2026 Debuts at $1T+ Valuations Breaking Five DeepMind Researchers Exit Alphabet to Anthropic and OpenAI in Single Week
Research

OpenAI releases GeneBench-Pro; tests AI judgment on 129 multi-stage genomics problems; GPT-5.6 Sol reaches 31.5%

<cite index="63-3,64-1">OpenAI released GeneBench-Pro, a 129-problem benchmark across 10 primary domains and 21 subdomains covering genomics, quantitative biology, and translational medicine. Each problem provides an agent with a realistic, deliberately noisy dataset and a target estimand tied to a downstream scientific or translational decision.</cite> <cite index="64-2">GeneBench-Pro probes what OpenAI calls 'research taste': the chain of judgment calls about which questions a dataset can support, when early diagnostics should change the model, and when a result is decision-ready.</cite> <cite index="61-1">OpenAI submitted 82 of the 129 problems to external domain experts including graduate students, postdoctoral researchers, industry scientists, and professors who assessed each problem's realism and whether the target answer was identifiable.</cite>

<cite index="63-2">GPT-5.6 Sol reaches 28.7% pass rate at max reasoning level, and GPT-5.6 Sol Pro reaches 31.5%; GPT-5.5 reaches 12%, GPT-5.4 reaches 8.9%, and Anthropic's Claude Opus 4.8 reaches 16%.</cite> <cite index="64-3">Test-time compute scaling shows that at lowest reasoning level GPT-5.6 Sol scores in single digits, and at highest it solves roughly six times as many questions as GPT-5.2 while using about two-thirds the tokens.</cite> <cite index="63-2">Models often complete substantial portions of the workflow but exhibit a consistent gap between noticing and acting: they identify local diagnostic signals but fail to propagate implications to corresponding analysis decisions, selecting wrong estimators or persisting on incorrect paths.</cite>

<cite index="61-3">If agents can reliably automate this class of analysis, they could significantly accelerate scientific discovery. The limiting factor in biobank-scale genomic research is shifting from data generation to turning the information into actionable insights; models that can consistently perform analyses handled by teams of human experts could transform industrial research by accelerating hypothesis triage and target follow-up.</cite> For biotech teams and pharma researchers evaluating AI-for-science tools, GeneBench-Pro measures the capability that determines whether an agent assists discovery or confidently produces wrong answers. The 60%+ of problems below 20% pass rate signals ample headroom for investment before models saturate the benchmark.

Sources