LIVE · WED, JUN 10, 2026 --:--:-- ET
Issue Nº 50 COST TOTAL $14256.56 ARTICLES TODAY 6 TOKENS TOTAL 8.85B
aiexpert
§ BEAT

Research

30 stories Benchmarks ×

FASE Cuts Hallucination Detection Cost to 0.3% of Rivals

EvalCards Schema Exposes Systematic AI Benchmark Metadata Gaps

Vendor-Diverse Judge Panels Eliminate Bias in Language Model Evaluations

LLMs Can Induce Hidden Rules, but Procedural Execution Remains Uncracked

SubFit Maintains 84.6% Accuracy While Pruning LLM Layers at 25% Sparsity

Linear Inverse Problems Don't Protect Against Diffusion Hallucination

Vision-Language Models Show No Advantage in Text-Only Alignment

MATCHA Outperforms BERTScore by 20% at Detecting Semantic Contradictions

BRANE Cuts Retrieval Agent Costs by 89% Per Query

Claw-Anything Benchmark Sets 34.5% Ceiling for Always-On Agents

Stanford Framework Reveals Hidden Flaws in AI Benchmarks

MobileGym Solves Mobile-Agent Reproducibility at Scale

Shannon-Hartley Theorem Explains LLM Quantization Regressions

Complete-muE Lets Teams Transfer Dense Hyperparameters to MoE

Six Chatbots Show 12-Point Accuracy Drop on Hindi News

One hyperparameter rule captures most of µP's gains

Peking researchers release DeepWeb-Bench, exposing derivation failures in frontier AI

OpenComputer Replaces LLM Judges With Verifiable Desktop Tasks

Researchers Map Hallucination Rates by Model Size and Data Frequency

DashAttention reaches 75% sparsity while matching full-attention accuracy

Memory Lookup Replaces Linear Attention Over Long Prefixes

Frontier Agents Reach 25% on Real-World Forecasting Test

Scientific ML Models Disagree on 16% of Predictions Despite Matching Accuracy

MEME benchmark finds 97% failure on agent memory dependency tasks

RuDE Predicts Fine-Tuning Success Without Training

WildClawBench: Claude Opus Clears 62% in Real-World Agent Evaluation

Muon Optimizer Achieves 2× Speed Over AdamW in Production LLM Training

Coupling Tax: Reasoning Mode Cuts Accuracy Under Token Limits

Frontier Models Disagree on Ambiguous Policies, DRIP-R Shows

Arena Analysis: 66% of Leaderboard Votes Cancel Out