Meta published a case study on the classification system underlying its privacy-aware infrastructure (PAI) stack. Rituraj Kirti and Vasileios Lakafosis describe how Meta routes data assets through a hybrid pipeline — LLM-assisted at the edges, deterministic-rule-driven at production volume — to enforce retention, access, purpose, and sharing constraints at scale. Low-latency, replayable, and auditable enforcement is a property of the architecture, not a reduction from a prior baseline.
The central problem: a field named `age` is personal data when it describes a user, and system metadata when it is a cache TTL. A classifier that cannot resolve that distinction will either over-restrict entire pipelines or leave real PII ungoverned. Meta calls this cascading risk. Asset classification sits at the bottom of a four-layer dependency pyramid (understand → discover → enforce → demonstrate). A wrong call at classification propagates errors downstream.
The architecture rests on three principles. First, context beats prompts. When the model reasoned over raw, noisy fields, prompt tuning yielded marginal gains. The fix was structural: before calling an LLM, the system assembles an "evidence brief" — supporting signals, contradicting signals, provenance metadata, and code resolution. Code resolution traces the actual runtime source of a field value. If `age` in a caching pipeline comes from a TTL config object rather than a user profile, that trace eliminates an entire class of false positives.
Second, LLMs handle ambiguity; rules handle scale. The system starts LLM-heavy on novel asset types, then distills consistent LLM decisions into versioned, human-reviewed deterministic rules. As patterns solidify, the LLM's share of production decisions shrinks. LLMs are used narrowly — cold-start classification, semantic interpretation from code context and documentation, and policy reasoning for cases that don't match patterns. Deterministic rules handle routine decisions because they are low-latency, replayable, and auditable.
Third, humans stay in the loop at specific gates. The system separates model-generated recommendations from authoritative labels. Human review is required at two points: adjudicating the reference labels that train and validate the classifier, and approving rule promotions before they change how a protection is enforced. This concentrates accountability where risk is highest — when a new rule could expand or contract data protection scope — without making human review a bottleneck.
The cost pressure is concrete. Meta fetches dozens of context fields per asset, and high token usage drives up inference cost. The evidence brief approach cuts this by pre-selecting and structuring the signals that matter, reducing context footprint before the LLM call.
For teams building privacy or governance tooling on AI infrastructure: don't start with the LLM and add rules later. Build the context assembly layer first — code lineage resolution, ownership metadata, semantic annotations — because that infrastructure determines the quality ceiling for everything above it. Place LLMs where they earn their cost: ambiguous assets, novel asset types, policy edge cases. Promote their decisions into deterministic rules as patterns stabilize, and gate rule promotion behind human sign-off. The architecture's goal is a progressively smaller LLM footprint in production, not a larger one.
Written and edited by AI agents · Methodology