Stanford Audit Finds Pymetrics Directed 26% of Black Applicants Away From Jobs

A Stanford-led audit of 3.4 million applicants screened by Pymetrics—now owned by Harver—revealed that a single vendor's cognitive-assessment algorithm produced measurable racial adverse impact at the individual job level. The algorithm directed 26 percent of Black applicants' submissions and 15 percent of Asian applicants' submissions to positions where the system discriminated against their group under the EEOC four-fifths rule. The study analyzed 4 million applications across 1,700 positions and 150 employers, showing a systemic shutout effect: 10 percent of applicants who submit four applications are rejected from all four, while 4 percent of applicants who apply to 10 positions are algorithmically rejected from all 10. The probability of total shutout only falls below 0.1 percent if an applicant submits to 25 distinct roles.

Pymetrics screens applicants via browser-based cognitive-trait games measuring constructs like processing speed and risk tolerance, then outputs a deterministic binary label—recommend or do not recommend—that employers use to gate human review. The vendor stores scores and reuses them across its employer network for up to 330 days, meaning an applicant who applies to multiple companies is not receiving multiple independent evaluations; the same cached score is referenced repeatedly. Researchers exploited this deterministic replicability to simulate what every applicant would have received had they applied to all 1,700 positions, enabling the first large-scale per-position adverse-impact measurement in production hiring AI.

Operationally, the divergence between pooled and per-position metrics is stark. While the vendor's own prior aggregate audit found no disparities rising to legal scrutiny—because occupational averaging smears bias across job families—the Stanford team's position-by-position analysis showed 10.62 percent of individual jobs carried adverse impact against Black applicants. Under equal treatment, approximately 40,000 additional minority applications would have advanced to human review. The paper, to be presented at ACM FAccT, notes that a prior study of 83,000 non-AI-screened applications to Fortune 500 firms showed rejection patterns consistent with statistical independence. This baseline confirms the correlation driving systemic shutout is a product of single-vendor algorithmic monoculture, not natural labor-market variance.

FIG. 02 Per-position analysis reveals disparities masked by pooled-data audits. Pooled averages (olive) hide adverse impact detected position-by-position (terra-cotta). — Stanford HAI / arxiv.org

The audit architecture, not merely the model, is broken. New York City's Local Law 144 explicitly permits pooled audits, the method that masked per-position bias in this case, and most third-party screening vendors face no requirement to measure cross-employer score persistence as a concentration risk. With 60 percent or more of the Fortune 100 and eight of the ten largest U.S. federal agencies running screening through HireVue alone, the market structure mirrors the systemic-risk dynamics the researchers identify: correlated, deterministic decisions propagated across institutions from a narrow set of shared models, where a single scoring edge case can blacklist an applicant across an entire network.

The EU AI Act's hiring-tool compliance deadline lands August 2, 2026, but the study argues current frameworks still lack position-level adverse-impact mandates, cross-employer market surveillance, and legal pathways for independent researcher access to vendor data. For ML platform architects, the blunt takeaway is that deterministic scores reused across tenants and validated only with aggregate fairness metrics constitute an algorithmic monoculture that will mathematically concentrate rejection on the same subset of applicants—and your pooled report will hide it until someone runs the per-position numbers.

Sources

3.4 million people submitted 4 million applications to 1,700 job postings across 150 employers and 11 industry sectors
"We follow 3.4 million people who submit 4 million job applications to 1,700 job postings across 150 employers and 11 industry sectors."
hai.stanford.edu ↗
26% of Black applicants and 15% of Asian applicants applied to positions where the AI discriminated against their group
"26% of Black applicants and 15% of Asian applicants applied to positions where the AI system discriminated against their racial group."
hai.stanford.edu ↗
40,000 more minority applications would have advanced under equal treatment
"If the AI had recommended Black and Asian candidates at the same rate as it recommended the most-favored group (typically white applicants), 40,000 more of their applications would have advanced to the next stage of hiring."
hai.stanford.edu ↗
10% of applicants who submit four applications are rejected from all four
"Ten percent of applicants who submit four applications are rejected from all the places to which they apply."
hai.stanford.edu ↗
Pooling data across positions masks adverse impact that appears when each position is analyzed separately
"If we pool all of its recommendations together — treating the vendor as one giant hiring process — we don't find adverse impact. If we look at each position separately, as would be typical in an evaluation of adverse impact, then we expose the adverse impact in many positions."
hai.stanford.edu ↗
4% of applicants who apply to 10 positions are algorithmically rejected from all 10 — a rate higher than expected by chance
"4% of all applicants who apply to 10 positions are recommended for rejection from all positions, a rate higher than expected by chance."
arxiv.org ↗
Of all applications submitted by Asian and Black applicants, 14.74% and 25.87% respectively went to positions with adverse impact — precise paper figures
"14.74% and 25.87% are submitted to positions that adversely impact Asian and Black applicants, respectively, according to U.S. employment discrimination standards."
arxiv.org ↗
Pymetrics screens applicants via cognitive-trait games measuring risk tolerance, processing speed, and altruism rather than resumes
"Pymetrics screens applicants not through resumes but through a battery of online games designed to measure cognitive traits like risk tolerance, processing speed, and altruism."
fortune.com ↗
Pymetrics stores scores and reuses them across its employer network for up to 330 days
"an applicant plays Pymetrics' assessment games, their scores are stored and reused for up to 330 days. If two different companies both use Pymetrics, an applicant isn't really getting two separate evaluations — they're getting the same score, twice."
fortune.com ↗
To reduce probability of systemic shutout below 0.1%, an applicant must apply to at least 25 different positions
"to reduce the probability of being systemically shut out to below 0.1%, an applicant would need to apply to at least 25 different positions — more than double the 10 applications that would suffice if hiring decisions were made independently."
fortune.com ↗
10.62% of individual positions showed adverse impact on Black applicants when analyzed position-by-position
"10.62% of jobs in the dataset showed an adverse impact on Black applicants, meaning the algorithm recommended Black candidates at a rate below the federal threshold relative to the most-selected racial group."
fortune.com ↗
Vendor's own prior analysis found no disparities because it pooled data across employers and positions
"Pymetrics had measured bias by pooling all of its applicants and outcomes together, across all employers and positions. The Stanford-led team instead analyzed each of the 1,746 individual positions separately, which is how U.S. employment discrimination law ... is actually designed to be applied."
fortune.com ↗
NYC Local Law 144 permits pooled audits — the exact method that masked per-position bias; EU AI Act hiring-tool compliance takes effect August 2, 2026
"its existing government guidance appears to instruct auditors to pool data across positions and employers, exactly the aggregation method they argue masks disparities. In Europe, the EU AI Act designates hiring algorithms as high-risk AI systems by default, with compliance requirements taking effect August 2, 2026"
fortune.com ↗
60%+ of Fortune 100 and 8 of 10 largest U.S. federal agencies use HireVue's algorithms
"As of May 2023, over 60% of the Fortune 100 and eight of the 10 largest U.S. federal agencies used HireVue's algorithms"
fortune.com ↗

Written and edited by AI agents · Methodology

Stanford Audit Finds Pymetrics Directed 26% of Black Applicants Away From Jobs

Get the signal before the noise.

Get the signal before the noise.