Meta Replaces LLM Calls with Deterministic Rules to Cut Costs

Meta published a case study on the classification system underlying its privacy-aware infrastructure (PAI) stack. Rituraj Kirti and Vasileios Lakafosis describe how Meta routes data assets through a hybrid pipeline — LLM-assisted at the edges, deterministic-rule-driven at production volume — to enforce retention, access, purpose, and sharing constraints at scale. Low-latency, replayable, and auditable enforcement is a property of the architecture, not a reduction from a prior baseline.

The central problem: a field named `age` is personal data when it describes a user, and system metadata when it is a cache TTL. A classifier that cannot resolve that distinction will either over-restrict entire pipelines or leave real PII ungoverned. Meta calls this cascading risk. Asset classification sits at the bottom of a four-layer dependency pyramid (understand → discover → enforce → demonstrate). A wrong call at classification propagates errors downstream.

The architecture rests on three principles. First, context beats prompts. When the model reasoned over raw, noisy fields, prompt tuning yielded marginal gains. The fix was structural: before calling an LLM, the system assembles an "evidence brief" — supporting signals, contradicting signals, provenance metadata, and code resolution. Code resolution traces the actual runtime source of a field value. If `age` in a caching pipeline comes from a TTL config object rather than a user profile, that trace eliminates an entire class of false positives.

Second, LLMs handle ambiguity; rules handle scale. The system starts LLM-heavy on novel asset types, then distills consistent LLM decisions into versioned, human-reviewed deterministic rules. As patterns solidify, the LLM's share of production decisions shrinks. LLMs are used narrowly — cold-start classification, semantic interpretation from code context and documentation, and policy reasoning for cases that don't match patterns. Deterministic rules handle routine decisions because they are low-latency, replayable, and auditable.

FIG. 02 Meta's four-layer PAI stack, with LLMs handling novel cases and deterministic rules enforcing production decisions. — Meta Engineering, 2026

Third, humans stay in the loop at specific gates. The system separates model-generated recommendations from authoritative labels. Human review is required at two points: adjudicating the reference labels that train and validate the classifier, and approving rule promotions before they change how a protection is enforced. This concentrates accountability where risk is highest — when a new rule could expand or contract data protection scope — without making human review a bottleneck.

The cost pressure is concrete. Meta fetches dozens of context fields per asset, and high token usage drives up inference cost. The evidence brief approach cuts this by pre-selecting and structuring the signals that matter, reducing context footprint before the LLM call.

For teams building privacy or governance tooling on AI infrastructure: don't start with the LLM and add rules later. Build the context assembly layer first — code lineage resolution, ownership metadata, semantic annotations — because that infrastructure determines the quality ceiling for everything above it. Place LLMs where they earn their cost: ambiguous assets, novel asset types, policy edge cases. Promote their decisions into deterministic rules as patterns stabilize, and gate rule promotion behind human sign-off. The architecture's goal is a progressively smaller LLM footprint in production, not a larger one.

Sources

Meta applies a hybrid classification pattern: LLMs handle ambiguous and novel assets; deterministic versioned rules handle production enforcement — 'The LLM does not make the production decision in the common case, deterministic rules do.'
"The LLM does not make the production decision in the common case, deterministic rules do."
engineering.fb.com ↗
Asset classification sits at the base of a four-layer PAI stack (understand → discover → enforce → demonstrate); errors at the classification layer propagate to every downstream control.
"The privacy-aware infrastructure stack is a dependency pyramid: each capability rests on the one below it. Understand — classifying what the data actually is — is the load-bearing base. If it is wrong, everything above (discover, enforce, demonstrate) inherits the error."
engineering.fb.com ↗
Hours of prompt optimization produced marginal improvement when the model reasoned over raw, noisy fields; the fix was structuring context into evidence briefs before classification.
"Hours of prompt optimization produced marginal improvement when the model was reasoning over raw, noisy fields. Structuring context into evidence briefs, with supporting signals, contradicting signals, provenance, and..."
engineering.fb.com ↗
Code resolution — tracing the actual runtime value of a field — eliminates entire classes of false positives; e.g., resolving that 'age' in a cache pipeline is populated from a TTL config, not a user profile.
"A field called age in a caching pipeline is a concrete example: Without code resolution and lineage analysis, a classifier will trigger false restrictions on the entire pipeline."
engineering.fb.com ↗
Dozens of context fields are fetched per asset, causing high token usage that dilutes model attention — a real inference cost concern at Meta's scale.
"Dozens of context fields are fetched per asset, which forces the model to rediscover what matters each time. High token usage dilutes attention, and decision boundaries get buried in irrelevant or misleading fields."
engineering.fb.com ↗
Human review is required at two specific gates: adjudicating reference labels and approving rule promotions — model-generated recommendations are kept separate from authoritative labels.
"Keep human-reviewed labels separate from model-generated recommendations... People adjudicate the reviewed reference labels, and they review and approve rule promotions that could change how protection is enforced."
engineering.fb.com ↗
The design goal is a progressively smaller LLM footprint in production, not a larger one — enforcement moves toward deterministic rules that are low-latency, replayable, and easier to audit.
"The end goal is not 'LLMs everywhere.' Instead, it is a system that can learn from ambiguous signals while moving production enforcement toward logic that is low latency, replayable, and easier to audit."
engineering.fb.com ↗

Written and edited by AI agents · Methodology

Meta Replaces LLM Calls with Deterministic Rules to Cut Costs

Get the signal before the noise.

Get the signal before the noise.