Law Enforcement and Criminal Justice AI: The Highest-Stakes Deployment

PM Takeaways

Facial recognition is a lead generator for investigation, not an identification tool. The DOJ’s December 2024 report establishes FRT alone cannot be used as proof of identity. In every documented wrongful FRT arrest — at least seven in the US, nearly all involving Black individuals — basic police work would have exonerated the person before arrest.
COMPAS and similar risk tools produce population-level probability estimates, not individual predictions. A high-risk score means the statistical group with similar characteristics re-offends at higher rates — not that this specific person will. Deploying that score in sentencing without that qualification is both statistically incorrect and constitutionally exposed.
The EU AI Act’s prohibitions on predictive policing based solely on profiling and real-time biometric identification in public spaces took legal effect February 2, 2025. For any deployment touching EU residents, these are active prohibitions applying to existing systems right now.
ShotSpotter audits document 80–90% false positive rates. Independent reviews of predictive policing tools in Chicago and Los Angeles found no measurable crime reduction alongside documented racial profiling effects. The governance question is not whether the AI works in theory — it is whether it has been validated in your specific deployment context.
Training law enforcement AI on historical policing data encodes historical policing bias. More enforcement produces more arrests, which produces more data confirming the prediction. This is a bias amplification loop — not a data quality issue — and it requires specific governance design to interrupt.

Law enforcement and criminal justice AI occupies a category of its own in government AI governance. Law enforcement AI can put innocent people in prison. Risk assessment AI has influenced sentences for crimes people have not committed. Facial recognition has resulted in people being arrested, detained, and prosecuted for things they did not do. Predictive policing has directed enforcement resources at communities based on historical data that reflects discriminatory enforcement, reinforcing those patterns at scale.

The consequences of failure in law enforcement AI are not reversible. A year in pretrial detention cannot be refunded. A wrongful conviction that takes years to overturn cannot be repaid. This is why law enforcement and criminal justice AI sits in the highest-risk category in every governance framework that engages seriously with the subject.

AI Uses in Law Enforcement and Criminal Justice

Application	What It Does	Primary Governance Risk
Facial recognition (FRT)	Matches individuals from images/video against databases of photographs.	Misidentification; racial bias in accuracy; automation bias leading to wrongful arrest.
Predictive policing	Uses historical crime data to identify locations or individuals predicted to be involved in future crime.	Encodes historical policing bias; self-fulfilling prophecy; racial profiling at scale.
Pretrial risk assessment (e.g., COMPAS, PSA)	Scores defendants on likelihood of failure to appear or re-offense to inform bail/detention decisions.	Statistical tools misunderstood as individual predictions; racial disparities; liberty deprivation without conviction.
Recidivism risk scoring	Predicts likelihood of re-offense to inform sentencing and parole decisions.	Sentencing based on group characteristics rather than individual conduct; perpetuates racial disparities.
Gunshot detection (e.g., ShotSpotter)	Acoustic sensors and AI to detect and locate gunshots.	High false positive rates; disproportionate deployment in minority communities.
Border and immigration screening	Risk scoring for travelers, visa applications, automated document analysis.	Discrimination; due process for immigration decisions; arbitrary enforcement.

The Three Documented Failure Modes

Facial Recognition: The Automation Bias Loop

Facial recognition technology performs differently across demographic groups. Black individuals, women, and older people are misidentified at higher rates than white men in most commercially available systems tested by NIST. As of the DOJ’s December 2024 final report, at least seven people had been wrongfully arrested in the US as a direct result of FRT misidentification. Almost all were Black. In each case, officers received an FRT match, treated it as definitive identification, and proceeded to arrest without conducting the corroborating investigation that would have established the person’s innocence.

The Robert Williams settlement (Detroit, summer 2024) — the first publicly settled FRT wrongful arrest case in the US — resulted in the strictest FRT operational standards in the country: police cannot apply for an arrest warrant based solely on an FRT result, and cannot proceed directly from an FRT match to a witness identification procedure. These requirements exist because operating without them produced wrongful arrests.

PM lesson: FRT is a lead generation tool, not an identification tool. The governance design requirement is the workflow that connects FRT output to any action affecting a person’s liberty: corroborating investigation required, arrest prohibited based solely on FRT output, investigation steps documented before arrest, FRT match-to-arrest rates tracked by demographic group.

Risk Assessment: Scoring People, Not Predicting Individuals

The ProPublica analysis of COMPAS (2016) documented that the tool incorrectly flagged Black defendants as future criminals at twice the rate of white defendants, while white defendants who did re-offend were more often incorrectly scored as low-risk. The fundamental issue: algorithmic risk assessment produces group-level probability estimates, not individual predictions. A high-risk score means the statistical population of individuals with similar characteristics has historically had higher re-offense rates — not that this specific person will re-offend. Deploying this score to inform liberty decisions converts a population-level statistical output into a de facto determination about an individual’s future conduct.

PM lesson: Risk assessment tools must be accompanied by documentation of their statistical properties: the training data, the re-offense rate base for each risk category, the demographic performance differentials, and explicit guidance that the score is a population-level estimate. This documentation must be presented to every decision-maker who uses the score.

Predictive Policing: The Bias Amplification Loop

Predictive policing tools trained on historical policing data encode historical policing bias. Communities that were over-policed in the past produce more historical incident data, which the algorithm uses to direct future enforcement toward the same communities. More enforcement produces more arrests, which produces more data confirming the prediction. This is a bias amplification loop that requires specific governance design to interrupt: independent effectiveness evaluation before expansion, historical data audits for discriminatory enforcement patterns, and community transparency about what systems are in use.

The Legal Landscape

Jurisdiction	Framework	Key Requirements
European Union	EU AI Act (effective February 2, 2025)	Prohibited: real-time biometric identification in public spaces for law enforcement (with narrow exceptions), predictive policing based solely on profiling. High-risk (full compliance August 2026): law enforcement risk assessment, evidence reliability evaluation, recidivism risk scoring, individual profiling.
United States	US Constitution, state statutes, common law	At least seven wrongful FRT arrests documented (DOJ December 2024). Williams settlement established: no arrest warrant based solely on FRT; no direct transition from FRT match to witness ID procedure. 15 states have enacted FRT-specific legislation. No federal law directly regulates law enforcement FRT.
United Kingdom	Bridges v South Wales Police [2020] EWCA Civ 1058	Court of Appeal ruled police use of live FRT violated human rights law because there was no clear policy specifying who could be placed on watchlists. Post-Bridges: UK Home Office code of practice for police live facial recognition.
Canada	Charter of Rights and Freedoms, Privacy Act, provincial human rights legislation	Canada’s Directive Level 4 classification applies to decisions affecting liberty; human decision required; AIA must be published. Biometric data governed by Privacy Act and PIPEDA.
Australia	Privacy Act, Australian Privacy Principles, state police legislation	OAIC biometric privacy guidance. Post-Robodebt governance principles apply: individual assessment, genuine human review, practical appeal mechanisms.

Governance Design for Law Enforcement AI

Facial Recognition Technology

Governance Element	Required Design
Prohibited uses	No arrest, detention, or prosecution based solely on FRT output. FRT results are investigative leads requiring corroboration.
Corroboration standard	Documented investigation steps confirming or excluding the FRT match before any action affecting liberty — proportionate to the liberty interest at stake.
Watchlist controls	Clear criteria for who can appear on a watchlist; documented authority for additions; regular review and purge of stale entries.
Demographic performance disclosure	Agencies must receive vendor documentation of performance by demographic group, and must conduct independent performance testing in their own operational environment.
Human authorization chain	The specific human decision-maker who must authorize any action based on FRT output, with documentation of their independent review.
Outcome tracking	Track FRT match rates, corroboration rates, arrest-from-FRT rates, and demographic distribution of all three. Review quarterly.

Risk Assessment Tools

Score documentation: Every risk score report used in bail, sentencing, or supervision decisions must include: the factors that elevated or lowered the score; the statistical re-offense rate for the risk category in the validation sample; the demographic performance differentials; and the explicit statement that the score is a population-level estimate, not an individual prediction.
Decision-maker training: Every judge, bail magistrate, or parole officer who uses risk scores must receive training on statistical limitations and documented bias patterns. Training must be refreshed when tool version or validation data changes.
Override documentation: Human decision-makers must document whether they followed or departed from the score recommendation. Systematic follow-the-score patterns without documented reasoning are indicators of automation bias.
Validation currency: Risk assessment tools must be re-validated periodically using current population data from the jurisdiction of use.

Predictive Policing

Effectiveness evaluation before expansion: No predictive policing tool should expand beyond a pilot phase without an independent effectiveness evaluation measuring crime outcomes (not police activity) against documented disparate impact.
Training data audit: Historical policing data used to train predictive models must be audited for known discriminatory enforcement patterns. Documented over-policing of specific communities is a bias amplification risk that must be explicitly assessed and mitigated.
Community transparency: Jurisdictions deploying predictive policing should disclose what systems are in use, what data they use, and how patrol resource allocation is affected.
Sunset review: Predictive policing deployments should have defined review cycles at which continuation is actively re-authorized based on effectiveness evidence, not assumed by default.

PM Responsibilities

Phase	Key Actions
Before Deployment	Assess whether the system falls within EU AI Act prohibited categories or high-risk categories. Document the legal authority for deployment in the specific jurisdiction. Complete a bias assessment of training data and model performance. Define the operational policy before deployment: what FRT output triggers what workflow, who authorizes action, what corroboration is required.
Deployment & Operations	Brief all users on the system’s statistical limitations, the governance policy, and their personal legal exposure for non-compliant use. Confirm outcome tracking is operational before go-live. Confirm audit trail functionality: every use of FRT, every risk score generated must be logged.
Post-Deployment	Review outcome data quarterly: FRT false positive rates by demographic group; risk score override rates. Conduct annual independent effectiveness review for predictive policing tools. Update training when tool version changes.

Right-Sizing for Your Situation

Law enforcement AI does not scale down gracefully. The liberty interests at stake — arrest, detention, prosecution, sentencing, supervision — demand high governance intensity regardless of organizational scale. A small-town police department deploying FRT faces the same operational policy, corroboration, and outcome tracking obligations as a large national agency.

Greenfield

For agencies new to law enforcement AI. Covers FRT operational policy design (including prohibited uses and corroboration standards), risk assessment tool disclosure requirements, predictive policing bias audit fundamentals, EU AI Act prohibited practice assessment, and outcome tracking design.

Emerging

For agencies building systematic governance programs. Comprehensive FRT governance framework including demographic performance assessment and audit trail requirements, risk assessment documentation standards, predictive policing effectiveness evaluation methodology, multi-jurisdiction legal framework mapping.

Established

For agencies with existing law enforcement AI programs. EU AI Act high-risk compliance readiness, enterprise outcome monitoring program, independent effectiveness audit design for predictive policing, FRT demographic audit program, and judicial review readiness for AI-influenced criminal proceedings.

Framework References

DOJ Final Report on AI in Criminal Justice (December 3, 2024) — Documents seven wrongful FRT arrests, almost all involving Black individuals; establishes that FRT results may not be used as sole proof of identity; best practices for FRT use in investigations.

EU AI Act (Reg. (EU) 2024/1689) — Article 5 (prohibited: real-time biometric ID in public spaces, predictive policing based on profiling); Annex III Section 6 (high-risk: law enforcement risk assessment, recidivism risk scoring). Full compliance August 2, 2026.

Bridges v. South Wales Police [2020] EWCA Civ 1058 (UK Court of Appeal) — Established human rights requirements for police FRT: legal authority, proportionality, published watchlist policy, independent oversight.

Williams v. City of Detroit (settled summer 2024) — First publicly settled FRT wrongful arrest case in the US. Settlement terms established strictest FRT guardrails: no arrest warrant based solely on FRT; no direct transition from FRT match to witness identification.

State v. Loomis (Wisconsin Supreme Court, 2016) — Use of COMPAS in sentencing does not violate due process provided judges do not rely solely on the score; requires disclosure of statistical limitations to decision-makers.

ProPublica, “Machine Bias” (2016) — Documented that COMPAS incorrectly flagged Black defendants as high risk at twice the rate of white defendants. The defining analysis of racial disparity in criminal risk assessment tools.

NIST Face Recognition Vendor Testing Program — Ongoing independent testing demonstrating demographic performance differentials. Required reference for any FRT procurement decision.

This article is part of AIPMO’s Government series. See also: AI Governance in Government | Due Process and Automated Government Decisions | Procuring AI for Government

To err is AI; to govern, human.

AIPMO.co · AI Governance, PM-first

Law Enforcement and Criminal Justice AI: The Highest-Stakes Deployment

AI Uses in Law Enforcement and Criminal Justice

The Three Documented Failure Modes

Facial Recognition: The Automation Bias Loop

Risk Assessment: Scoring People, Not Predicting Individuals

Predictive Policing: The Bias Amplification Loop

The Legal Landscape

Governance Design for Law Enforcement AI

Facial Recognition Technology

Risk Assessment Tools

Predictive Policing

PM Responsibilities

Right-Sizing for Your Situation

Framework References

AIPMO

More in Articles

NAIC AI Bulletin Adoption: Q2 2026 State-by-State Status

The Banking Sector Got Mythos First. Here's What That Means for Its PMs.

The Mythos Signal: Why a Model You Can't Use Should Change Your AI Governance

The AI Project Charter for Agile Teams: Governance that Enables Agility, Not Bureaucracy

More from AIPMO

NAIC AI Bulletin Adoption: Q2 2026 State-by-State Status

The Banking Sector Got Mythos First. Here's What That Means for Its PMs.

The Mythos Signal: Why a Model You Can't Use Should Change Your AI Governance

The AI Project Charter for Agile Teams: Governance that Enables Agility, Not Bureaucracy