Algorithmic Bias in Clinical AI: The Health Equity Risk

PM Takeaways

The NAIC’s 2025 survey found one in three US health insurers still don’t regularly test AI for bias despite the Model Bulletin requiring it. EU AI Act Article 10 makes bias testing a legal obligation for life and health pricing AI. Colorado’s AG can enforce algorithmic discrimination outcomes from June 2026. The gap between deployment and testing is now an active enforcement risk.
Excluding protected characteristics from model inputs doesn’t eliminate bias — it hides the mechanism. ZIP codes proxy for race through residential segregation. Credit scores proxy for wealth gaps built by historical lending discrimination. The only meaningful test is whether the model produces materially different outcomes for protected groups controlling for legitimate risk factors.
Colorado’s outcomes-based standard — enforceable from June 2026 — places the burden on the insurer to demonstrate the model does not produce discriminatory outcomes. Not that it avoids discriminatory inputs. Outcomes, not inputs, are what regulators are testing.
Bias testing must use the actual deployment population, not the training population. A model that tests clean nationally may produce disparate impact in your specific geographic market. Bias testing at launch is a gate. Quarterly production monitoring is the ongoing obligation.
The IAIS Application Paper (July 2025) identifies adaptive insurance AI as carrying heightened bias risk — models that recalibrate after deployment may deviate from the validated version. Bias re-testing after every material model update is required, not just at initial deployment.

Insurance has prohibited unfair discrimination for as long as it has existed as a regulated industry. The principle is embedded in every major insurance regulatory framework: like risks must be treated alike; premiums must reflect risk factors, not irrelevant personal characteristics; coverage cannot be denied based on protected class membership. AI does not change this principle. It creates new and more complex ways to violate it.

The traditional discrimination concern in insurance was explicit: an insurer that directly uses race, gender, or national origin to set premiums or deny coverage is violating anti-discrimination law. AI creates a different problem: models that achieve discriminatory outcomes through variables that appear facially neutral but correlate with protected characteristics in the real world. This is proxy discrimination, and it is harder to detect, harder to prove, and — precisely because it is harder to see — more likely to persist undetected in deployed models.

This article addresses the mechanics of bias in insurance AI, the documented cases, the regulatory requirements across jurisdictions, and the PM governance obligations that address this risk.

How Bias Enters Insurance AI

Historical Training Data Encoding Historical Discrimination

Insurance claims data, loss data, and underwriting data were generated by human processes that have documented histories of discriminatory practice — redlining in property insurance, discriminatory underwriting in life insurance, geographic risk classification that encoded racial segregation. AI trained on this historical data learns these patterns. The model did not invent the discrimination; it learned it from the data. But the outcome — systematically charging more or denying coverage to protected groups — is the same, and the regulatory and legal response is the same.

External Consumer Data as Proxy

The modern expansion of insurance AI is driven partly by external consumer data: credit bureaus, consumer behavior databases, property data, telematics, social media, IoT device data. These data sources are purchased from third-party data brokers and integrated into pricing and underwriting models as additional predictive features. The FCA, examining UK general insurance pricing practices, found that some firms were using datasets — including purchased third-party datasets — that contained factors that could implicitly or explicitly relate to race or ethnicity.

The proxy mechanism is well documented. ZIP codes in US auto and property insurance correlate with race through residential segregation patterns. Credit scores correlate with wealth accumulation, which correlates with race through documented historical patterns of discriminatory lending. Occupation codes correlate with national origin and gender through labor market patterns. Social media engagement patterns correlate with age, political affiliation, and socioeconomic status. None of these variables mention a protected characteristic by name. All of them can produce discriminatory outcomes through correlation.

Intersectionality and Interaction Effects

Advanced AI models use interaction effects — the combination of two or more variables — that can produce discriminatory outputs even when no single variable is problematic. A model that uses credit score, ZIP code, and social media engagement simultaneously may produce an interaction effect that creates a discriminatory premium differential for a specific demographic combination. Testing each variable individually for bias will not detect this. Testing the model output for demographic parity across all relevant subgroups will.

The Regulatory Framework Across Jurisdictions

Jurisdiction	Bias Prohibition Standard	Testing Requirement
US (NAIC Model Bulletin)	AI must not produce unfair discrimination in regulated insurance processes; adverse consumer outcomes must be mitigated	Written AIS Program must include bias testing methodology; validation, testing, and retesting at each stage of AI lifecycle; bias analysis and minimization in data practices
New York DFS (Circular Letter 7, 2024)	AI must not proxy for protected classes or generate disproportionate adverse effects	Insurers must demonstrate proxy testing; keep explanatory documentation of how AI functions and its inputs; allow DFS review of vendor tools
Colorado (SB 169 / AI Act 2024)	External consumer data and predictive models may not result in unfair discrimination; outcomes-based standard	Governance and testing requirements; documentation of model inputs; AG enforcement from June 2026
EU AI Act (Annex III, Section 5(c))	Article 10: data governance must address biases likely to affect health and safety or lead to discrimination; applicable to life/health pricing AI	Bias testing required as part of data governance documentation (Annex IV technical file); post-market monitoring for bias must continue after deployment
EU (Gender Directive / Test-Achats)	Direct use of gender prohibited in insurance pricing/risk assessment across EU/EEA	Proxy testing: models that achieve gender differentials through proxies violate the same principle; EIOPA Opinion applies bias governance obligations to proxy outcomes
UK FCA	Consumer Duty: fair outcomes required for all customer groups including protected characteristics; no direct AI-specific bias rule	Proportionate testing expected; FCA found UK insurers using datasets correlating with race/ethnicity in pricing practices — this is a live regulatory concern

The Bias Testing Workstream for Insurance AI

Step 1: Identify Proxy Risks Before Modeling

Before model development, assess the correlation between proposed features and protected characteristics in the deployment population. This is the proxy identification step. It is a data science task, but it requires input from legal and compliance on which protected characteristics are legally relevant in the applicable jurisdiction.
For external consumer data: require the data vendor to provide demographic composition documentation and known correlations with protected characteristics. Under EIOPA’s Opinion, this obligation applies to third-party data as directly as to owned data. If the vendor cannot provide this, the insurer cannot assess the proxy risk.
Flag variables with high correlation to protected characteristics as requiring heightened scrutiny during model development and mandatory inclusion in outcome testing.

Step 2: Define Outcome Test Groups and Thresholds Before Testing

Define the demographic groups relevant to the deployment population and applicable anti-discrimination law: race/ethnicity, gender, age, national origin, disability status, religion, sexual orientation (jurisdiction-dependent). For EU operations, add groups protected by the Gender Directive.
Set acceptable differential thresholds before testing. The four-fifths rule (80% rule) from US employment discrimination law is frequently applied to insurance; Colorado’s regulation uses disparate impact analysis. The specific threshold should be set with legal input and documented with the rationale.
Define the outcome metrics to be tested: premium differentials, denial rates, coverage limits, claims approval rates. Test each metric across relevant demographic groups.

Step 3: Conduct Outcome Testing on Deployment Population Data

Test using production population data, not training data. The deployment population may differ demographically from the training population; bias testing must reflect where the model will actually be used.
Apply statistical significance testing. A 3-point premium differential for a small demographic group may not be statistically significant. A 2-point differential for a large group may be.
Document test design, methodology, data sources, results, and follow-up actions. This documentation will be produced in market conduct examination and potentially in litigation.

Step 4: Investigate and Remediate Findings

A bias finding requires investigation before any deployment decision. Three possible explanations, each requiring a different response:

Actuarial basis: the differential reflects a genuine risk differential that is legally permissible and defensible. Document the actuarial basis, confirm legal permissibility in each deployment jurisdiction, and retain this documentation.
Proxy mechanism: the differential reflects a proxy variable encoding a protected characteristic rather than a genuine risk differential. Remediate by removing the variable, replacing it with a direct risk measure, or applying algorithmic fairness constraints.
Unknown cause: the differential cannot be explained by either actuarial basis or identified proxy. This is not a deployment decision — it is a model quality problem requiring further investigation.

Step 5: Post-Deployment Monitoring

Re-run outcome testing quarterly using production data. Bias characteristics can change as the model encounters a different population mix than was present in testing.
For adaptive AI: require re-testing after every material model update. The IAIS Application Paper specifically identifies adaptive models as a heightened bias risk.
Track complaint and appeal patterns by demographic group. Elevated complaint rates from specific groups are early signals of bias in production.
Maintain ongoing monitoring documentation as a governance record. The monitoring record demonstrates that the insurer is not just testing at deployment but tracking performance over time.

PM Responsibilities for Insurance AI Bias Governance

Scope bias testing as a dedicated project workstream with specific resources, data access, statistical analysis capability, legal review, and governance approval. Not a checklist item assigned to the model team alongside validation.
Require vendors to provide bias testing documentation. For external AI tools: documentation of demographic composition of training data, outcome testing by demographic group in a comparable deployment context, and re-testing obligations after model updates.
Confirm the testing methodology is appropriate for each jurisdiction. Colorado’s outcomes-based standard; NY DFS’s proxy testing requirement; EU AI Act Article 10’s data governance obligation; EIOPA’s Gender Directive-aligned bias assessment. These are distinct tests requiring distinct methodologies.
Build bias monitoring into the ongoing product management lifecycle, not just the initial deployment. Quarterly outcome monitoring and annual comprehensive bias reports are the minimum standard in most jurisdictions.

Right-Sizing Your AI Governance Approach

Greenfield — Insurance AI Bias Governance Playbook

Proxy discrimination mechanics in insurance AI; external consumer data assessment; minimum outcome testing methodology; threshold-setting basics; AIS Program bias requirements.

Emerging — Insurance AI Bias Governance Playbook

Comprehensive outcome testing framework; proxy identification methodology; EU Article 10 data governance for life/health AI; Gender Directive proxy analysis; Colorado and New York testing requirements; adaptive AI re-testing design; production monitoring program.

Established — Insurance AI Bias Governance Playbook

Enterprise bias governance program; multi-jurisdiction compliance (NAIC, NY DFS, Colorado, EU AI Act, EIOPA); market conduct examination readiness for bias; litigation discovery preparation for bias findings; vendor bias audit program.

The AI Governance Advisor can help you build an outcome-testing framework and monitoring program for your insurance AI — start with a free Essential account.

Framework References

NAIC Model Bulletin: Use of Artificial Intelligence Systems by Insurers (December 2023) — Bias testing and minimization as core AIS Program element; validation and retesting at each AI lifecycle stage; adverse consumer outcome mitigation.

New York DFS Insurance Circular Letter No. 7 (July 11, 2024) — Proxy discrimination testing required; insurers must demonstrate AI does not proxy for protected classes or produce disproportionate adverse effects.

Colorado SB 169 (2023) / Colorado AI Act (May 2024) — Outcomes-based standard: prohibition on external data and predictive models resulting in unfair discrimination; AG enforcement from June 2026.

EU AI Act (Reg. (EU) 2024/1689) Article 10 — Data governance obligation for high-risk AI to address biases likely to affect health and safety or lead to discrimination; applicable to life/health insurance underwriting AI.

EIOPA Opinion on AI Governance and Risk Management (August 6, 2025) — Bias assessment and mitigation obligation applies to third-party data and proxy variable interactions; Gender Directive principle applies to AI-generated gender differentials.

IAIS Application Paper on the Supervision of Artificial Intelligence (July 2, 2025) — Adaptive AI heightened bias risk; training data representativeness obligation; proportionality in supervisory expectations by retail vs. commercial.

Test-Achats Ruling (CJEU, 2011) / EU Gender Equality Directive (2006/54/EC) — Prohibition on direct gender use in EU/EEA insurance pricing; EIOPA applies same principle to proxy gender discrimination through AI.

This article is part of AIPMO’s Insurance series. See also: AI Governance in Insurance | AI in Insurance Underwriting | AI in Insurance Claims | GenAI in Insurance

To err is AI; to govern, human.

AIPMO.co · AI Governance, PM-first