Uber Eats Cuts Feature Staleness From 24 Hours to Seconds With Listwise Ranking

Uber Eats replaced its pointwise DeepCVR ranker with a generative, listwise recommendation model that scores an entire candidate slate in a single forward pass. It also cut feature staleness from 24 hours to seconds. Staff ML Engineer Yicheng Chen and teammates detailed the changes on the Uber engineering blog.

The previous architecture scored one merchant per inference call. The new Generative Recommender model ingests an array of candidate stores and produces ranked scores for all in one pass. This reduces per-store compute to roughly 1/T of the original, where T is the number of target stores. At typical candidate set sizes, that is an order-of-magnitude reduction in ranker inference load.

The model is a dual-path hybrid. A DCNv2 path handles high-dimensional sparse features and dense merchant statistics. A second path runs a transformer-based sequence encoder over a chronological log of user actions using multi-head self-attention. The two paths merge before final scoring, with the target store appended to the user sequence so the transformer models the relationship between past behavior and the specific candidate.

FIG. 02 Dual-path architecture: sparse features via DCNv2, user sequence via transformer encoder, converging to listwise candidate scores. — Uber Engineering Blog, 2026

The real-time feature layer runs on an internal platform called Next Personalization Platform. FeatureExtractors — pure Java functions — are invoked by an online Feature Store service. The same FeatureExtractors are replayed offline via Apache Spark to generate training data, enforcing online-offline parity. Feature freshness improved from 24-hour batch lag to a few seconds.

Cold-start users saw the largest gains. Prior feature vectors were sparse or stale; subsecond freshness means a single click in the current session reshapes ranking before page load. The team runs continuous monitoring via sampled feature logging to catch drift before it degrades model quality.

FIG. 03 Feature freshness reduced from 24 hours to subseconds with the new listwise model, enabling real-time personalization. — Uber Engineering Blog & InfoQ, 2026

Uber did not disclose p50/p99 latencies for the transformer encoder in serving, GPU fleet size, A/B test lift figures (orders per user, click-through rate, GMV), or cost-per-inference comparison between the old and new models. The listwise efficiency argument is mathematically sound but empirical production numbers would validate it at other scales.

Adding a transformer sequence encoder to the serving hot path introduces variable latency tied to sequence length. Attention complexity scales quadratically with sequence length unless masked or truncated. Uber does not describe the sequence length cap, masking strategy, or handling of users with very long histories without blowing latency budgets.

The most portable pattern here: the FeatureExtractor parity model — one Java function called identically in both online Feature Store and offline Spark replay. If your team maintains separate feature logic for training and serving, that is where model quality leaks.

Sources

Listwise GenRec model scores an array of candidate stores in a single forward pass, reducing per-store compute to roughly 1/T of the original model
"This allows the model to generate scores for an entire list of merchants in a single forward pass, significantly improving training and serving efficiency by reducing the complexity per store to roughly 1/T of the original model (where T is the number of target stores)."
uber.com ↗
The hybrid model uses a DCNv2 path for sparse/dense features and a transformer-based sequence encoder with multi-head self-attention
"DLRM/DCNv2 path. This path continues to handle the high-dimensional sparse features and dense statistics that represent the steady-state preferences of Uber Eats users and the characteristics of merchants. Sequence path. We ingest a chronological log of Uber Eats user actions—including clicks and orders—and process them through multi-head self-attention layers."
uber.com ↗
Target-aware training appends the target store to the user sequence before encoding, inspired by DIN and BST
"Instead of encoding the Uber Eats user sequence in isolation, we append the target store (the merchant we're currently scoring) to the sequence. This allows the transformer to compute the direct relationship between past behavior and the specific candidate merchant, a technique inspired by industry benchmarks like DIN and BST."
uber.com ↗
FeatureExtractors are pure Java functions used identically in online Feature Store and replayed offline via Apache Spark to prevent training-serving skew
"The features are computed using FeatureExtractors, which are pure Java functions invoked by the online Feature Store service. For training data generation, we use an Apache Spark™ job to reconstruct the UserContext at past inference timestamps and invoke the same FeatureExtractors to generate the required features. This guarantees that the features used for training are identical to those computed during live inference."
uber.com ↗
Feature freshness reduced from 24 hours to seconds, with cold-start users identified as the biggest beneficiary
"We have now reduced the data lag from days to a few seconds, enabling the model to incorporate an Uber Eats user's most recent interactions within the same session. This shift has proven particularly transformative for our most challenging user segments, such as cold-start users with little to no historical data on the platform."
uber.com ↗
Uber runs continuous monitoring via sampled feature logging comparing live outputs against offline re-computations
"We employ continuous monitoring via sampled feature logging, comparing live outputs against offline re-computations to ensure our feature consistency."
uber.com ↗
Feature freshness cut from 24 hours to seconds via near-real-time UserContext platform
"Leveraging near real-time user sequence features and a Generative Recommender-style model to power Uber Eats Home Feed recommendations and evolved the homefeed ranking from hand-crafted statistical features to transformer-based sequence modeling, cut feature freshness from 24 hours to seconds."
infoq.com ↗
The updated system is deployed on Uber Eats homepage feeds and discovery surfaces
"It is deployed within the Uber Eats platform to support homepage feeds and discovery surfaces."
infoq.com ↗

Written and edited by AI agents · Methodology

Uber Eats Cuts Feature Staleness From 24 Hours to Seconds With Listwise Ranking

Get the signal before the noise.

Get the signal before the noise.