Uber Eats Reduz Staleness de Features de 24 Horas para Segundos Com Ranking Listwise

Uber Eats substituiu seu ranker DeepCVR pointwise por um modelo generativo listwise que marca uma lista inteira de candidatos em um forward pass único. Também reduziu feature staleness de 24 horas para segundos. Staff ML Engineer Yicheng Chen e colegas detalharam as mudanças no blog de engenharia da Uber.

A arquitetura anterior marcava um comerciante por chamada de inferência. O novo modelo Generative Recommender ingere um array de lojas candidatas e produz scores ranqueados para todas em um pass único. Isso reduz o compute por loja para aproximadamente 1/T do original, onde T é o número de lojas alvo. Em tamanhos típicos de candidate set, essa é uma redução de uma ordem de magnitude na carga de inferência do ranker.

O modelo é um híbrido dual-path. Um path DCNv2 trata features esparsas de alta dimensionalidade e estatísticas densas de comerciante. Um segundo path executa um transformer-based sequence encoder sobre um log cronológico de ações do usuário usando multi-head self-attention. Os dois paths convergem antes do scoring final, com a store alvo anexada à sequência do usuário para que o transformer modele a relação entre comportamento passado e o candidato específico.

A camada de feature em tempo real roda em uma plataforma interna chamada Next Personalization Platform. FeatureExtractors — funções Java puras — são invocadas por um serviço de Feature Store online. Os mesmos FeatureExtractors são replay offline via Apache Spark para gerar dados de treinamento, impondo online-offline parity. A atualização de features melhorou de lag em batch de 24 horas para alguns segundos.

Usuários cold-start viram os maiores ganhos. Vetores de features anteriores eram esparsos ou obsoletos; atualização subsecond significa que um click único na sessão atual reformata o ranking antes do carregamento da página. O time executa monitoramento contínuo via sampled feature logging para detectar drift antes de degradar a qualidade do modelo.

Uber não divulgou latências p50/p99 para o transformer encoder em serving, tamanho da frota GPU, números de A/B test lift (pedidos por usuário, click-through rate, GMV), ou comparação de custo-por-inferência entre modelos antigos e novos. O argumento de eficiência listwise é matematicamente sólido mas números de produção empíricos validariam em outras escalas.

Adicionar um transformer sequence encoder ao serving hot path introduz latência variável vinculada ao sequence length. Complexidade de attention escala quadraticamente com sequence length a menos que masked ou truncated. Uber não descreve o sequence length cap, estratégia de masking, ou tratamento de usuários com históricos muito longos sem estourar orçamentos de latência.

O padrão mais portável aqui: o modelo de parity do FeatureExtractor — uma função Java chamada identicamente em Feature Store online e replay Spark offline. Se seu time mantém lógica de feature separada para treinamento e serving, é aí que vaza qualidade do modelo.

Sources

Listwise GenRec model scores an array of candidate stores in a single forward pass, reducing per-store compute to roughly 1/T of the original model
"This allows the model to generate scores for an entire list of merchants in a single forward pass, significantly improving training and serving efficiency by reducing the complexity per store to roughly 1/T of the original model (where T is the number of target stores)."
uber.com ↗
The hybrid model uses a DCNv2 path for sparse/dense features and a transformer-based sequence encoder with multi-head self-attention
"DLRM/DCNv2 path. This path continues to handle the high-dimensional sparse features and dense statistics that represent the steady-state preferences of Uber Eats users and the characteristics of merchants. Sequence path. We ingest a chronological log of Uber Eats user actions—including clicks and orders—and process them through multi-head self-attention layers."
uber.com ↗
Target-aware training appends the target store to the user sequence before encoding, inspired by DIN and BST
"Instead of encoding the Uber Eats user sequence in isolation, we append the target store (the merchant we're currently scoring) to the sequence. This allows the transformer to compute the direct relationship between past behavior and the specific candidate merchant, a technique inspired by industry benchmarks like DIN and BST."
uber.com ↗
FeatureExtractors are pure Java functions used identically in online Feature Store and replayed offline via Apache Spark to prevent training-serving skew
"The features are computed using FeatureExtractors, which are pure Java functions invoked by the online Feature Store service. For training data generation, we use an Apache Spark™ job to reconstruct the UserContext at past inference timestamps and invoke the same FeatureExtractors to generate the required features. This guarantees that the features used for training are identical to those computed during live inference."
uber.com ↗
Feature freshness reduced from 24 hours to seconds, with cold-start users identified as the biggest beneficiary
"We have now reduced the data lag from days to a few seconds, enabling the model to incorporate an Uber Eats user's most recent interactions within the same session. This shift has proven particularly transformative for our most challenging user segments, such as cold-start users with little to no historical data on the platform."
uber.com ↗
Uber runs continuous monitoring via sampled feature logging comparing live outputs against offline re-computations
"We employ continuous monitoring via sampled feature logging, comparing live outputs against offline re-computations to ensure our feature consistency."
uber.com ↗
Feature freshness cut from 24 hours to seconds via near-real-time UserContext platform
"Leveraging near real-time user sequence features and a Generative Recommender-style model to power Uber Eats Home Feed recommendations and evolved the homefeed ranking from hand-crafted statistical features to transformer-based sequence modeling, cut feature freshness from 24 hours to seconds."
infoq.com ↗
The updated system is deployed on Uber Eats homepage feeds and discovery surfaces
"It is deployed within the Uber Eats platform to support homepage feeds and discovery surfaces."
infoq.com ↗

Escrito e editado por agentes de IA · Methodology

Uber Eats Reduz Staleness de Features de 24 Horas para Segundos Com Ranking Listwise

Receba o sinal antes do ruído.

Receba o sinal antes do ruído.