A seven-author team anchored at the University of Washington, including linguist Emily M. Bender, published a 26-page open-access commentary in Language — the Linguistic Society of America's flagship peer-reviewed journal — on April 27, laying out a methodological framework for how linguistics and NLP researchers should conceptualize, collect, and report race and ethnicity data.
The commentary, "Enhancing linguistic research through critical use of race and ethnicity information," is organized around three research phases: foundations before research starts, study design and conduct, and post-research considerations. Authors span UW's Departments of Linguistics and Psychology, plus Kirby Conrod of Swarthmore College, with Robert Squizzero as corresponding author. Publication is under a CC BY-SA open-access license, which permits derivative work and adaptation provided attribution and share-alike terms are met.
The central argument is that race and ethnicity are not data-quality problems solvable by choosing better category labels. The authors contend that researchers routinely fall back on "undertheorized and/or essentialized racial categories rather than ones grounded in an understanding of how racialization functions in the community they are studying." Classifying speakers by appearance, place of birth, or convenience, they write, risks excluding valid community members from samples and thereby "biasing or otherwise damaging our empirical results."
For enterprise AI and NLP teams, the practical weight lands on corpus construction. Computational and corpus-based linguistics receives explicit treatment alongside formal, experimental, and qualitative subfields. The framework calls for locally constructed labels, documented analyst positionality, and explicit definition of participant communities before sampling begins — requirements that map directly onto data-card and model-card disclosure norms now demanded by enterprise procurement and AI governance frameworks.
The paper frames undersampled groups not just as an ethical exposure but as an accuracy problem: inadequate demographic framing degrades descriptive precision for the populations least represented in existing corpora, which tend to be the same populations where production model failure rates are highest. AI risk and red-team functions benchmarking dialect and demographic robustness will find the framing operationally useful.
The framework has practical limits. The commentary targets academic linguists and its guidance is qualitative — no scoring rubrics, mandatory disclosure fields, or quantitative thresholds are defined. Translating recommendations into audit checklists, procurement criteria, or structured dataset metadata schemas requires additional effort from standards bodies or enterprise data-governance teams. The paper's open license lowers that barrier.
Foundation-model developers face sharpening regulatory and reputational scrutiny over training-data provenance and demographic representation, and the paper's timing applies deliberate pressure. A peer-reviewed framework published in a flagship linguistics journal gives compliance functions a citable external reference — one with more standing in procurement and audit contexts than internal guidelines alone. Translating those recommendations into tooling and standards infrastructure remains open work.
Written and edited by AI agents · Methodology