Skip to content

GenAI in Financial Services: New Capabilities, Familiar Governance Risks

FINRA's 2026 Annual Report documented AI hallucinations in compliance workflows and unauthorized use of non-compliant tools. Generative AI introduces failure modes that traditional model risk management wasn't designed to catch. Here's the gap analysis.

By AIPMO
Published: · 9 min read
PM Takeaways
  • FINRA’s 2026 Annual Regulatory Oversight Report (December 2025) makes the governance posture explicit: FINRA rules are technologically neutral and continue to apply when firms use GenAI. Supervision obligations, communications rules, recordkeeping requirements, and fair dealing standards all apply to GenAI outputs. “The AI generated it” is not a compliance defense — institutions own their AI’s outputs.
  • The distinction between customer service AI and regulated financial advice is not a technical line — it is a governance decision that must be made before deployment. An LLM that provides specific investment guidance, product recommendations with return projections, or retirement income estimates crosses into regulated territory under existing suitability and fiduciary rules, regardless of the interface. Define the scope of permitted outputs before building, and build guardrails that enforce that scope.
  • Hallucination is not a rare failure mode — it is a structural characteristic of large language models. They generate outputs by predicting probable next tokens, not by retrieving verified facts. A customer-facing LLM that fabricates an interest rate, misstates account terms, or invents a regulatory exemption is creating a compliance event, not a UX issue.
  • Agentic AI — LLMs that can take actions, not just generate text — is arriving in financial services operations. FINRA’s 2026 report specifically flags the autonomous nature of AI agents as a novel supervisory challenge. Any GenAI system that can execute actions in production systems requires explicit authorization boundaries, a human-in-the-loop standard, and an audit trail before deployment.
  • The GenAI vendor concentration risk flagged by the Bank of England and FSOC is real and specific: most major financial institutions are using the same two or three LLM providers. A failure in a leading provider’s model — a safety incident, a performance degradation, a service outage — propagates across the industry simultaneously.

Generative AI is different from the predictive models that SR 11-7 was designed to govern. A credit scoring model produces a number from a defined set of inputs through a deterministic process. A large language model produces text — and the text can vary, can be wrong, and can be confidently, fluently wrong in ways that look indistinguishable from correct.

Financial services has moved quickly. FINRA’s December 2025 Annual Regulatory Oversight Report identified GenAI as a top compliance priority for 2026, noting that member firms have already deployed it primarily for internal efficiency — research support, documentation, compliance drafting — with some beginning to explore customer-facing applications. The regulatory position is consistent: existing rules apply. The technology does not create new exemptions from existing law.


Why GenAI Is Different Within Financial Services Governance

Traditional model risk management was designed for deterministic systems: the same inputs produce the same outputs, performance can be measured against a ground truth, and the model’s decision logic can be examined. SR 11-7’s validation framework — conceptual soundness, performance testing, back-testing — assumes these properties.

GenAI breaks all three. LLMs are non-deterministic: the same prompt can produce different outputs across runs. Performance cannot be fully captured by traditional metrics. And the model’s reasoning process is not interpretable in any conventional sense.

This does not mean governance frameworks do not apply. It means they need to be adapted. The validation question shifts from “does this model produce the right answer?” to “does this system stay within its intended scope, produce outputs grounded in verified information, and escalate to human review when it cannot?”


Three Cases That Define the Stakes

Air Canada’s Chatbot: When Hallucinated Policy Becomes Binding Obligation

In 2022, a passenger booked a bereavement fare through Air Canada’s AI chatbot. The chatbot told him he had 90 days after travel to apply for the bereavement discount. This was not Air Canada’s policy. The chatbot had hallucinated it. The Civil Resolution Tribunal of British Columbia ruled in the passenger’s favor. Air Canada’s argument that it was not responsible for misinformation in its chatbot was rejected. The chatbot’s statements were treated as the airline’s statements.

PM lesson: The institution owns its chatbot’s outputs. Customer-facing GenAI requires output grounding in verified, current product and policy documentation before deployment. Not approximate grounding — documented grounding, where every output category has a defined source and outputs outside that source are escalated or declined.

FINRA’s 2026 Report: The Regulatory Framework Is Clear

On December 9, 2025, FINRA released its 2026 Annual Regulatory Oversight Report with a new section dedicated to GenAI — the first time GenAI has been addressed as a standalone topic in FINRA’s annual oversight guidance. The report is unequivocal: FINRA rules are technologically neutral and continue to apply when firms use GenAI. AI-generated content that constitutes customer communication must be reviewed and retained per applicable rules. For agentic AI, FINRA specifically calls for supervisory processes addressing how to monitor agent access and data handling, where to require human oversight, and how to establish guardrails limiting agent actions.

Agentic AI and the $440M Reference Point

AI-related incidents in financial services rose 21% from 2024 to 2025. The Knight Capital incident — $440 million in losses in 45 minutes from a software error in an algorithmic trading system — is a decade-old reference point for what happens when automated systems operate without adequate human oversight. Agentic AI operating in financial workflows creates a contemporary version of the same risk: systems that can take actions autonomously in production environments, where a single unexpected behavior can cascade rapidly.

PM lesson: Agentic AI deployments require explicit authorization boundaries before go-live: what can the agent do, what is it prohibited from doing, what requires human approval, and what triggers immediate halt. An agent that can execute transactions, move funds, or modify account data without real-time human authorization requires you to govern explicitly before deployment.


The Regulated Advice Boundary

The most consequential governance decision for customer-facing GenAI is where the boundary sits between customer service and regulated financial advice. That boundary is not a technology question. It is a regulatory question that must be resolved before the model is built.

An LLM deployed as a customer service tool crosses into regulated territory when it:

  • Provides specific investment recommendations, even framed as informational.
  • Generates personalized retirement or income projections that could influence financial decisions.
  • Recommends specific loan products, insurance coverage amounts, or account types based on customer circumstances.
  • Provides tax guidance tailored to a customer’s specific situation.

Guardrails must enforce this boundary technically, not just through policy. A system prompt that says “do not provide personalized financial advice” is not sufficient — users will find phrasings that elicit specific recommendations. The system must be designed so that responses in regulated territory either route to a licensed human advisor or decline to answer with a clear explanation.


Hallucination Governance by Use Case

ContextHallucination RiskConsequence
Internal research and draftingEmployee acts on incorrect analysis or fabricated citation.Operational error, potentially significant if input to a material decision.
Compliance documentationAI generates a regulatory citation that does not exist.Compliance failure if the document is submitted to regulators or used in examination response.
Customer communicationsAI misstates product terms, rates, fees, or eligibility criteria.Regulatory exposure under UDAP/UDAAP, potential consumer harm, disclosure violations.
Investment guidanceAI fabricates performance data or makes unsuitable recommendations.Securities law violations, potential fiduciary breach, customer financial harm.
Fraud and AML alertsAI generates a false alert or incorrect SAR narrative.Regulatory compliance risk if submitted; operational risk if legitimate activity is flagged.

Applying Existing Frameworks to GenAI

SR 11-7 Extensions for GenAI

  • Inventory: GenAI systems used for material business functions belong in the model inventory.
  • Risk tiering: Customer-facing GenAI and GenAI producing compliance outputs are high-tier. Internal productivity tools are lower-tier. Agentic AI operating in production systems is high-tier by definition.
  • Validation: For GenAI, validation shifts from predictive accuracy to scope compliance, output grounding, and hallucination rate. What percentage of outputs fall outside intended scope? What is the false information rate on a standardized test set?
  • Monitoring: Prompt and output logging is required for FINRA recordkeeping compliance and for monitoring output quality over time.

FINRA Communications Rules

  • AI-generated communications that constitute retail communications require principal review before delivery.
  • Records of AI-generated communications must be retained per applicable recordkeeping rules.
  • Communications that cross into investment advice trigger suitability and best interest obligations.
  • Disclosure that the customer is interacting with AI is increasingly expected — FINRA’s 2026 report specifically notes this as an emerging requirement.

PM Responsibilities for GenAI Governance

PhaseKey Actions
Before DeploymentDefine the use case scope explicitly in writing. Determine the risk tier under SR 11-7 and applicable FINRA rules. Design output grounding into the architecture. Establish hallucination monitoring with defined acceptable rates.
At DeploymentConfirm supervision procedures have been updated. Confirm logging and recordkeeping are operational for all prompts and outputs. For agentic AI: confirm authorization boundaries are technically enforced and human-in-the-loop checkpoints are operational. Brief users on hallucination risk and human review requirements.
Post-DeploymentMonitor output quality continuously: hallucination rates, scope compliance, escalation trigger rates. Review production logs periodically for outputs that crossed the regulated advice boundary. Assess GenAI vendor concentration risk annually.

Right-Sizing for Your Situation

Greenfield

For organizations deploying GenAI for the first time. Covers SR 11-7 inventory and risk tiering for GenAI, FINRA communications rules basics, output grounding requirements for customer-facing applications, hallucination monitoring fundamentals, and recordkeeping setup.

Emerging

For organizations deploying GenAI at scale. Comprehensive hallucination governance framework, regulated advice boundary design and enforcement, supervision procedure updates for GenAI, agentic AI authorization framework, and GenAI vendor concentration risk assessment.

Established

For organizations with mature GenAI deployments. Enterprise-wide GenAI governance integration with MRM programs, FINRA 2026 compliance readiness assessment, agentic AI governance framework for production deployments, and operational resilience planning for GenAI vendor dependency.


Framework References

FINRA 2026 Annual Regulatory Oversight Report, GenAI Section (December 9, 2025) — First dedicated GenAI guidance in FINRA’s annual oversight report. Confirms technologically neutral application of all FINRA rules; identifies supervision, communications, recordkeeping, and fair dealing as applicable; specifically addresses agentic AI supervisory requirements.

Federal Reserve / OCC SR 11-7: Supervisory Guidance on Model Risk Management (April 2011) — Model inventory, validation, monitoring, and governance obligations that apply to GenAI systems used for material decisions, adapted for the non-deterministic characteristics of generative models.

FSOC Annual Report 2024 (December 2024) — Flagged GenAI vendor concentration as a systemic risk. Multiple institutions relying on the same LLM providers creates correlated failure risk.

Bank of England Financial Stability in Focus: AI in the Financial System — Identified common AI model reliance across financial institutions as a systemic vulnerability.

EU AI Act (Reg. (EU) 2024/1689) — Article 52 (transparency obligations for AI systems interacting with natural persons, including disclosure requirements). GPAI model obligations effective August 2025.

NIST AI RMF 1.0 — GOVERN 1.7 (processes for identifying AI risks specific to GenAI), MEASURE 2.5 (performance evaluation for non-traditional AI outputs), MANAGE 2.4 (contingency planning for AI failure).

This article is part of AIPMO’s Financial Services series. See also: AI Governance in Financial Services  |  Model Risk Management and SR 11-7  |  AI Governance in Healthcare

More in Articles

See all

More from AIPMO

See all