Agoda shipped a multimodal system in May 2026 indexing 700 million images and guest reviews into a shared taxonomy spanning 40 languages. The serving layer performs no runtime join—everything is pre-aggregated offline and retrieved in one Couchbase lookup.
The central design decision was a shared topic taxonomy. Before this, images were ranked for visual quality and reviews were sorted by recency or helpfulness, with no programmatic link between the two. A pool photo and a review mentioning "the pool was freezing" existed in separate pipelines with no common anchor. The new system introduces canonical topic labels—Pool, Breakfast, Room Quality, Location, and others—as shared keys that both image classification and NLP outputs map into.
Parallel pipelines feed the taxonomy. An image pipeline runs classification models that generate semantic labels—pool, beach view, breakfast area—and normalizes them into canonical topics. A review pipeline extracts key phrases, multilingual snippets, and sentiment signals (positive, negative, neutral percentages) from guest text, keyed to the same topics. Both run as PySpark jobs orchestrated by Kubeflow, with outputs joined into a pre-built multimodal content package per property per topic.
The serving layer is Couchbase, acting as a low-latency KV store for the pre-built packages. A Content API handles lookup and filtering, returning up to 15 images per topic alongside multilingual review excerpts and a sentiment breakdown. No joins or cross-modal ranking occur at query time.
The topic taxonomy itself carries operational risk. Any drift in how topics are defined, or poor multilingual normalization across 40 languages, propagates into downstream consumers. Agoda doesn't disclose error rates or coverage metrics for semantic equivalence across languages.
Freshness is a limitation. Because correlation logic runs entirely offline, a review posted this morning won't surface in a topic package until the next Kubeflow pipeline run completes. For a property with a new amenity or a spike in negative feedback, that batch lag matters. Agoda doesn't describe a mechanism for expediting high-signal updates.
The pattern—pre-compute cross-modal packages in batch, store in a KV store, fetch on a single key—applies to any e-commerce or content platform mixing images, UGC text, and structured metadata. Taxonomy governance at scale is the harder problem: the taxonomy is the schema, and schema migrations on 700 million images carry high operational cost.
Written and edited by AI agents · Methodology