Google Releases Zero-Shot Tabular Model but Hides Benchmark Data

Google Research shipped TabFM on June 30, 2026—a foundation model for tabular classification and regression that predicts in a single forward pass without per-dataset training, hyperparameter search, or feature engineering. Benchmarked on TabArena across 38 classification and 13 regression datasets (700 to 150,000 samples), TabFM is available on Hugging Face and GitHub, mirroring Google's TimesFM pattern: zero-shot logic applied to structured tables instead of time series.

Deploying XGBoost to a new dataset typically demands hours of hyperparameter tuning and domain-specific feature engineering. TabFM bypasses that loop by treating the entire dataset—training and test rows together—as a single prompt. The model reads the table at inference time, makes predictions, and never updates weights. This is in-context learning applied to a 2D orderless structure.

The architecture has three stages. Alternating row and column attention processes the raw table, discovering feature interactions across both dimensions simultaneously. Each row's contextualized representation compresses into a single dense vector. A dedicated transformer then runs in-context learning over that sequence of compressed embeddings rather than the raw grid, keeping inference tractable as dataset size grows. Google combines TabPFN's alternating attention with TabICL's compressed-row ICL step.

Training data is entirely synthetic, generated via structural causal models incorporating random functions. Google's rationale: open-source tabular datasets at industrial scale don't exist in sufficient volume. Proprietary schemas, sensitive labels, and production table sizes make them inaccessible. SCM-generated data scales arbitrarily and generalizes to real-world tables. Two configurations ship: TabFM (single forward pass) and TabFM-Ensemble (32-way ensemble with cross features, SVD features, least-squares weighting, and Platt scaling).

Tabular foundation models are accelerating. TabICLv2 (INRIA) reports an 80% win rate over heavily-tuned XGBoost, CatBoost, and LightGBM on TabArena and runs on CPU. TabPFN-3 (Prior Labs, acquired by SAP, published May 2026) sits at Elo 1673 on the 51-dataset TabArena board—the top single model and 77 Elo ahead of TabICLv2 (Elo 1596). On the small-data slice (≤10,000 samples, 36 datasets), TabPFN-3 default leads LightGBM by 253 Elo (1642 vs. 1389). AutoGluon's 4-hour ensemble tops the board at roughly Elo 1695.

FIG. 02 TabArena Elo scores: TabICLv2 and TabPFN-3 significantly outperform traditional tuned ensemble methods. — TabArena benchmark, June 2026

Practitioners on Hacker News flagged TabFM's benchmark reporting. Google's blog post shows only Elo scores, not the full TabArena metric suite (normalized scores, win-rate matrices, average ranks). The GitHub results folder contains undocumented parquet files instead of a readable leaderboard. Whether TabFM-Ensemble beats, matches, or trails TabPFN-3 on the same dataset subsets cannot be determined from published data.

The architectural contribution merits study: alternating 2D attention feeding compressed row embeddings into an in-context learning transformer, pretrained on SCM-generated synthetic data. For an ML platform lead, the practical question is simpler: TabPFN-3 and TabICLv2 both ship with full benchmark tables and production-ready code. TabFM doesn't. Adopt when documentation arrives.

Sources

TabFM performs tabular classification and regression in a single forward pass — no per-dataset training, no hyperparameter tuning, no feature engineering
"TabFM eliminates the need for manual model training, hyperparameter tuning, and complex feature engineering. We are excited to share how this approach allows users to generate high-quality predictions on previously unseen tables in a single forward pass."
research.google ↗
Benchmarked on TabArena across 38 classification and 13 regression datasets ranging from 700 to 150,000 samples
"This comprehensive evaluation spans 38 classification datasets and 13 regression datasets ranging in size from 700 to 150,000 samples."
research.google ↗
TabFM architecture uses alternating row and column attention, row compression into dense vectors, and an ICL transformer over compressed row embeddings
"This architecture relies on three key mechanisms: Alternating row and column attention... Row compression... In-context learning (ICL)"
research.google ↗
TabFM is trained entirely on hundreds of millions of synthetic datasets generated via structural causal models (SCMs)
"TabFM is trained entirely on hundreds of millions of synthetic datasets. These datasets are dynamically generated using structural causal models (SCMs) that incorporate a wide variety of random functions."
research.google ↗
High-quality open-source tabular datasets are critically scarce; industrial tables contain proprietary schemas and sensitive information
"a major hurdle in tabular ML is that high-quality, diverse tabular datasets... are critically scarce in the open-source space. Industrial tables often contain proprietary schemas and sensitive information, making them inaccessible for broad pre-training."
research.google ↗
TabFM-Ensemble uses a 32-way ensemble with cross features, SVD features, non-negative least-squares weights, and Platt scaling
"This configuration pushes performance further by incorporating cross features and SVD (Singular Value Decomposition) features. We compute the optimal weights for a 32-way ensemble using a non-negative least squares solver. For classification tasks, this variant also incorporates Platt scaling as an additional calibration step."
research.google ↗
TabPFN-3 (default) sits at Elo 1673 on the overall TabArena board; TabICLv2 (default) at Elo 1596; LightGBM (tuned + ensembled) at Elo 1433; AutoGluon 4-hour ensemble at Elo 1695
"TabPFN-3 (default) … Elo 1673 … TabICLv2 (default) … Elo 1596 … LightGBM (tuned + ensembled) … Elo 1433 … AutoGluon 1.5 (extreme, 4h) … Elo 1695"
codesota.com ↗
On the small-data slice (≤10,000 samples, 36 of 51 datasets), TabPFN-3 reaches Elo 1642, statistically tied with AutoGluon's 4-hour ensemble and 253 Elo above LightGBM (1389)
"TabPFN-3 (default) … Elo 1642 … Statistically tied with the 4-hour AutoGluon ensemble … LightGBM (tuned + ensembled) … Elo 1389 … 253 Elo below TabPFN-3"
codesota.com ↗
TabICLv2 wins on approximately 80% of TabArena datasets vs. heavily-tuned XGBoost, CatBoost, and LightGBM
"Out of the box, it outperforms heavily-tuned XGBoost, CatBoost, or LightGBM models on ~80% of datasets on TabArena."
blog.probabl.ai ↗
TabArena has multiple metrics beyond Elo; TabFM's GitHub results folder contains undocumented parquet files rather than a readable leaderboard
"TabArena actually has multiple metrics, since ELO does not properly quantify the degree of improvement. The fact that these are not displayed here should give pause. Also the results section in the GitHub is a dumpster fire."
news.ycombinator.com ↗

Written and edited by AI agents · Methodology

Google Releases Zero-Shot Tabular Model but Hides Benchmark Data

Get the signal before the noise.

Get the signal before the noise.