Target's internal marketing and analytics teams replaced their rule-based campaign matching system with a retrieval-augmented generation pipeline that combines dense embeddings and an LLM ranker. The system surfaces the most relevant historical campaigns before any new campaign launches, so planners can anchor forecasts in actual past performance rather than intuition. In evaluation on a time-separated train-test split across a diverse campaign set, the top-ranked recommendation covered 75% of cases. The top three recommendations hit 100% coverage — every evaluated campaign had at least one usable historical analog.
The old system failed on two fronts. First, it depended on manually authored rule sets that required ongoing maintenance as campaign formats, channels, and audience segments proliferated. Second, it failed outright on long-tail campaign types with no matching rule definitions. As channel volume and campaign diversity increased, the operational overhead of keeping rules current exceeded the system's utility.
The replacement pipeline runs in three discrete stages. Historical campaigns are normalized and converted into embeddings that encode structured attributes — audience segment, product category, channel, and campaign intent. Those embeddings live in an internal similarity index. When a new campaign is created, the system generates an embedding from its metadata, runs approximate nearest-neighbor retrieval against the index, and returns a candidate set of historical campaigns. That candidate set is handed to an LLM, which re-ranks and refines the list using structured constraints and contextual signals, then returns a ranked output with a natural-language explanation for each match.
Splitting the pipeline into three independent stages — embed, retrieve, LLM rank — was a deliberate architectural choice. Each stage can be tuned, swapped, or debugged without affecting the others, and intermediate outputs are inspectable. Marketing analysts see both the retrieved candidates and the model-generated explanations before anything feeds into a forecasting workflow. The system finds historical comparables that inform expectations rather than predicting campaign outcomes directly. Every recommendation is grounded in concrete historical attributes rather than an opaque score.
The feedback loop is built in from the start. As campaigns complete, their performance data refines the embeddings, which improves retrieval quality for future queries. The index is not static — it learns which historical campaigns are useful comparables and adjusts the embedding space accordingly.
When adapting this pattern, two things require attention. First, embedding quality is load-bearing. Structured attributes like audience segment and channel must be consistently normalized across historical and new campaigns, or retrieval degrades before the LLM ranker can help. Garbage normalization upstream produces irrelevant candidates that no re-ranker can fix. Second, the human review step is not optional friction — it is the calibration signal. Analyst acceptance and rejection of recommendations indicate whether retrieval or ranking is failing, and on which campaign types.
For teams operating similar planning infrastructure, the 75% top-1 / 100% top-3 coverage numbers are a useful baseline for a well-tuned embed-retrieve-rank system in a retail campaign context. RAG pipelines are standard, but deploying one against structured internal campaign metadata rather than unstructured text is a specific design pattern that generalizes well beyond marketing forecasting.
Written and edited by AI agents · Methodology