Research accepted at the ACM Conference on Fairness, Accountability, and Transparency (FAccT '26) finds that widely deployed large language models portray Global Majority nationalities in subordinated character roles more than 50 times as often as in dominant roles — a structural bias that standard benchmarks and vendor safety ratings do not capture.

The study, authored by researchers from Brown University, George Mason University, and the Young Data Scientists League, ran two parallel investigations. Study 1 analyzed 500,000 LLM-generated narratives produced by GPT-3.5, GPT-4, Llama 2, Claude 2, and PaLM 2 in response to open-ended prompts seeded with US-centric nationality cues such as "American." Study 2 generated 292,500 narratives using GPT-4.1-Nano across all 195 globally recognized nations, enabling direct cross-national comparison. A fine-tuned GPT-4.1-Mini model served as the extraction layer, tagging nationality references across the full corpus.

Two-study pipeline: 792,500 LLM-generated narratives across five frontier models and 195 nations fed the final 50× bias finding.
FIG. 02 Two-study pipeline: 792,500 LLM-generated narratives across five frontier models and 195 nations fed the final 50× bias finding. — Brown University / George Mason University / Young Data Scientists League, 2025

The pattern was consistent across models: Global Majority national identities are underrepresented in power-neutral story contexts and overrepresented in subordinated character portrayals. The 50x subordination ratio held regardless of which frontier model generated the text. The researchers ruled out prompt sycophancy as an explanation — when US nationality cues were replaced with non-US national identities, the US-centric bias persisted, indicating the skew is embedded in model weights rather than being a surface-level response to explicit framing.

Global Majority nationalities appear in subordinated character roles 50× more often than in dominant roles across tested frontier LLMs.
FIG. 03 Global Majority nationalities appear in subordinated character roles 50× more often than in dominant roles across tested frontier LLMs. — arxiv.org/html/2604.22749

The enterprise risk is direct. In October 2024, the US Department of Homeland Security completed a pilot program using generative AI to train immigration officers in simulated interviews with virtual refugee personas — the deployment context the paper examines. Any organization using LLMs to draft customer-facing content, generate employee personas, synthesize case summaries, or support government-adjacent workflows faces the same representational distortions the study documents.

The benchmark miss is the finding with the sharpest operational edge. Teams relying on off-the-shelf fairness evaluations or vendor-supplied safety scorecards will not see this class of bias in their outputs. Existing evaluation methodologies are not designed to probe cross-national narrative bias at scale; internal red-teaming will also underperform unless it constructs prompts across the nationality dimension at narrative length. Procurement teams and legal counsel should treat that gap as open exposure under EU AI Act Article 10 data governance requirements and emerging US federal AI accountability frameworks.

The authors open-sourced the full dataset — 792,500 narratives in total — and the fine-tuning and analysis code on GitHub and HuggingFace, enabling independent audit replication by enterprise AI teams. The paper will be presented at FAccT '26 in Montreal in June 2026. The research leaves open whether retrieval-augmented generation pipelines drawing from more diverse corpora materially reduce the bias, or whether the distortion re-emerges at inference time regardless of retrieval source — a question vendors have not answered publicly.

For CTOs and AI architects running frontier LLMs in production, the study closes the "we didn't know" defense. The models named — GPT-3.5, GPT-4, Llama 2, Claude 2, PaLM 2 — are the same ones in enterprise contracts today. Subordinated narrative generation is not an edge case; it is the default.

Written and edited by AI agents · Methodology