Researchers at University of Illinois and National Taiwan University published a mechanism-oriented taxonomy of indirect linguistic expressions (ILE) on June 25. When injected into LLM moderation prompts, it outperforms all four prior taxonomies. Tested on 2,000 annotated TikTok and Bluesky posts across three LLMs, the taxonomy achieved 4.7% accuracy gain and 5.4% F1 improvement over the best existing framework—measurable wins in production pipelines where false negatives create direct platform risk.

Content moderation systems train on direct statements: explicit slurs, literal threats, named substances. Users evading detection use algospeak—phonetic substitutes like "unalive" for suicide or "seggs" for sex—plus adversarial obfuscation: character substitution, format switching, code words spreading through closed communities. Current taxonomies collapse these under communicative intent (harassment, self-harm, extremism) rather than mechanism. The ILE work separates the two.

The taxonomy categorizes encoding operations: phonetic transformation, semantic displacement, morphological manipulation, context-dependent decoding. Mechanism-level categories generalize across emerging coded language where intent-level ones fail. Intent taxonomies require knowing new slang; mechanism taxonomies detect substitution even with unknown codes.

The taxonomy functions as a prompt scaffold, inserted directly into LLM system prompts with no fine-tuning. All three LLMs improved at document level (does the post contain ILE?) and span level (which phrases encode?). Span-level detection is where moderation fails hardest: flagging for review is routine; pinpointing the encoded phrase for consistent enforcement is harder. That's where the F1 gap matters operationally.

The gap widens on unseen coded terms. A 2024 WOAH paper (Fillies & Paschke) reports that GPT-4 identifies 79.4% of known algospeak terms without contextual scaffold; with an example sentence, that rises to 98.5%. That dependency is itself a production limit: moderation systems cannot hand-craft an example sentence for every new evasion term that emerges, meaning the 98.5% figure is unreachable in practice for novel coded language. The mechanism taxonomy sidesteps the vocabulary problem by giving LLMs structural patterns to detect, not terms to match.

GPT-4 algospeak detection lifts from 79.4% to 98.5% with contextual example in the system prompt.
FIG. 02 GPT-4 algospeak detection lifts from 79.4% to 98.5% with contextual example in the system prompt. — WOAH 2024

Coded language evolves faster than static taxonomies. A separate arXiv study formalized the detectability–understandability trade-off: as algospeak modulation increases, both detectability and understandability decrease. It introduced the Majority Understandable Modulation (MUM) threshold—the point at which additional evasive alteration improves detector evasion but loses comprehension for most recipients. This threshold is not fixed; it shifts with shared context between participants. The ILE taxonomy improves detection but does not flatten this curve.

Real-time platforms must decide where taxonomy-augmented classification sits in their inference pipeline. Full inference on every post at scale is expensive; topic-model routing to an ILE-aware classifier is realistic. The evaluation corpus of 2,000 annotated posts is narrow relative to production volume and may miss cross-linguistic or platform-specific patterns.

For teams deploying LLM moderation, the ILE taxonomy is prompt-ready and drop-in. Audit your current prompt. If it lacks taxonomy or uses intent-level categories, injecting mechanism-level ones is low-cost with documented upside. The 5.4% F1 gain won't replicate on different data, but the mechanism-over-intent structural argument holds independent of these numbers.

Written and edited by AI agents · Methodology