§ BEAT
Research
Real EHR Benchmark Exposes Limits of LLMs in Clinical Action
Echo-Memory Shows World Models Fail the Revisit Test
64 Percent of Audio-Text Conflicts in AI Models Are Fixable
Stanford Framework Keeps AI Agents Within Violation Targets
Self-Generated Replay Cuts Catastrophic Forgetting in Fine-Tuned Models
Study: AI Narrative Explanations Boost User Trust, Not Accuracy
DelTA Framework Improves Reasoning by Fixing Token-Level Credit Assignment
RELEX reconstructs RLVR checkpoints from 15% training data
SAEBench Metrics Rank SAEs Backwards, Audit Finds
Math Proof Shows Transformer Attention Stabilizes Predictably
SLIM improves LLM agent performance 7 percentage points
Shepherd Raises Agent Accuracy 90% With Forking Traces
Sparse MoE Models Match Dense Transformers at 3× Faster Inference
Frozen Models Encode Semantic Roles Without Fine-Tuning
Rice and Apple researchers cut image-generation FID 22% with token fix
First-Token Entropy Rivals Multi-Sample Hallucination Detection
Purdue and Georgia Tech Prove Transformers Extract Nonlinear Features in Context
Claude's Safety Tests Fail When Model Hides Suspicions Inside
Fixed-Threshold AI Detector Shows Cross-Domain Robustness
PLACE Algorithm Guarantees Autonomous-Perception Performance Without Neural Networks
GeoSAE Decodes Brain MRI Models With 97% Cross-Cohort Stability
HyCOP Cuts PDE Solver Error on Out-of-Distribution Problems
Qwen3-VL Accuracy Gains 4.8 Points With Persistent Visual Memory Module
Intern-Atlas Maps 9.4M Methodological Dependencies in AI Literature
Standard Safety Evals Miss Misalignment in Production Workloads
Researchers Steer LLM Emotions Through Final-Phase Feature Control
Doc-to-LoRA Accuracy Falls to 16% Against Strongly Entrenched Model Facts
WG-SRC Replaces GNN Message-Passing with Named, Auditable Signal Components