|
PM Takeaways |
|
•
Human
oversight is a legal requirement for high-risk AI systems, not a design
preference — EU AI Act Article 14 mandates five specific capabilities that
must be built in: the ability to understand system limitations, detect
anomalies, interpret outputs, override decisions, and halt the system. Each
is a project deliverable. |
|
•
Oversight
must be assigned to named, competent persons before deployment — EU AI Act
Article 26(2) requires deployers to assign human oversight to natural persons
who have the necessary competence, training, and authority. An oversight role
without a named, trained individual attached to it does not satisfy the
regulation. |
|
•
Automation
bias is not a training problem — EU AI Act Article 14(4)(b) explicitly
requires that oversight personnel remain aware of the tendency to over-rely
on AI outputs, and that the system be designed to support that awareness.
Structural mitigations must be designed in, not just mentioned in onboarding. |
|
•
Override
tracking is a regulatory and governance input, not just a quality metric —
NIST AI RMF notes that data on the frequency and rationale with which humans
overrule AI system output in deployed systems is useful to collect and
analyse. Override patterns reveal whether oversight is meaningful or
performative. |
|
•
Oversight
requirements do not end at deployment — EU AI Act Article 26(5) requires
deployers to monitor operation continuously, and to suspend use and notify
authorities if they have reason to consider that use of the system may
present a risk. The suspension obligation is active, not passive. |
AI systems can process data, identify patterns, and generate recommendations faster than any human. But speed is not always what matters. When AI systems make or influence decisions that affect people’s lives, someone needs to be watching — and that someone needs the ability, the training, and the authority to intervene.
Human oversight is not a nice-to-have. For high-risk AI systems, it is a legal requirement under the EU AI Act, with specific technical and operational obligations that attach to both providers (who build the system) and deployers (who put it into use). And even where it is not legally mandated, it is essential for responsible deployment. As PM, your job is to ensure oversight is designed into the system from the start — not bolted on when an auditor asks for it.
The Oversight Spectrum
Human oversight is not binary. It exists on a spectrum from fully autonomous to fully manual, with several configurations in between. The appropriate configuration depends on the system’s risk profile, the stakes of individual decisions, the volume of decisions to be made, and the time available for human review.
NIST AI RMF MAP 3.5 notes that AI systems have evolved from decision support tools — where humans retained full control — to automated decision-making with limited human involvement. This evolution increases the likelihood of outputs being produced with little human oversight, and makes deliberate configuration choices more important, not less.
Human-in-the-Loop (HITL)
A human reviews and approves every consequential decision before it takes effect. The AI system generates a recommendation; the human decides whether to act on it.
|
Dimension |
Detail |
|
Example |
A hiring system screens and ranks candidates, but a human
recruiter reviews all recommendations before any candidate is advanced or
rejected. No applicant is filtered out without human sign-off. |
|
When to use |
High-stakes, low-volume decisions where errors have
significant consequences for individuals. Required for many high-risk use
cases under EU AI Act Annex III, including employment screening, credit
assessment, and access to essential services. |
|
Trade-off |
Slower throughput and higher operational cost, but maximum
human control over individual decisions. The system’s productivity advantage
is partially offset by the overhead of review. |
|
Key risk |
Rubber-stamping — the human approves AI recommendations
without genuinely evaluating them. Volume and workload must be managed to
ensure review is substantive, not nominal. |
Human-on-the-Loop (HOTL)
The system operates autonomously, but humans monitor in real time and can intervene when needed. Most decisions proceed without human action; humans are watching and can halt or override when something flags their attention.
|
Dimension |
Detail |
|
Example |
A fraud detection system automatically flags and holds
suspicious transactions. A human analyst monitors the system dashboard and
can release held transactions or escalate for investigation — but most
transactions clear without review. |
|
When to use |
When decision speed matters but intervention must remain
possible. Well-suited to systems where the majority of decisions are routine
but a meaningful minority require human judgment. |
|
Trade-off |
Requires sustained, vigilant monitoring. Automation
complacency is the primary operational risk — when the system usually gets it
right, humans gradually stop critically evaluating its outputs, and edge
cases go undetected. |
|
Key risk |
EU AI Act Article 14(4)(b) explicitly names this risk:
oversight personnel must remain aware of the possible tendency of
automatically relying or over-relying on the output produced by a high-risk
AI system. The Act treats this as a design requirement, not only a training
topic. |
Human-in-Command (HIC)
Humans set the parameters, boundaries, and goals within which the system operates. The system acts autonomously within those constraints, and humans review aggregate outcomes on a defined cadence rather than individual decisions.
|
Dimension |
Detail |
|
Example |
A dynamic pricing system adjusts prices within bounds
approved by management. Weekly reviews assess whether aggregate outcomes
align with business and governance objectives — but individual price
decisions are not reviewed. |
|
When to use |
High-volume decisions where per-decision review is
operationally impractical, but overall outcomes and parameter settings need
human accountability and periodic reassessment. |
|
Trade-off |
Reduced control over individual decisions; accountability
operates at the level of system configuration and outcome trends rather than
specific outputs. Appropriate for lower-risk, reversible decisions. |
|
Key risk |
Parameter drift — the system operates within approved
bounds but the bounds themselves become outdated as conditions change. Review
cadence must be sufficient to catch configuration that is no longer
appropriate. |
Fully Autonomous
The system makes decisions with no human involvement in individual cases. Humans may review aggregate outcomes periodically, but the system itself operates without human review of individual outputs.
|
Dimension |
Detail |
|
Example |
Email spam filters; content recommendation algorithms for
entertainment where individual errors have minimal impact on the individual. |
|
When to use |
Low-risk decisions where the cost of occasional errors is
minimal, errors are reversible, and decision volume makes human review
impractical or impossible. |
|
Trade-off |
Appropriate for some applications but increasingly
restricted by regulation for consequential decisions. EU AI Act Annex III
high-risk categories are effectively excluded from fully autonomous
operation. |
|
Key risk |
Scope creep — a system that began in a low-risk context is
extended to higher-stakes decisions without a corresponding upgrade to the
oversight model. |
What Regulations Require
Human oversight is moving from best practice to legal requirement. For PMs deploying high-risk AI systems, the regulatory obligations are specific and operational — they cannot be satisfied by pointing to governance documentation without corresponding technical and organisational implementation.
EU AI Act: Article 14 and Article 26
Article 14 is the primary human oversight requirement for high-risk AI systems, and it establishes obligations at two levels: what providers must build into the system, and what deployers must implement operationally.
EU AI Act Article 14(1) states that high-risk AI systems shall be designed and developed in such a way that they can be effectively overseen by natural persons during the period in which they are in use. Article 14(3) requires that oversight measures be commensurate with the risks, level of autonomy, and context of use.
Article 14(4) specifies five distinct capabilities that oversight personnel must be enabled to exercise. These are not aspirational principles — each is a technical or operational deliverable.
|
Article 14(4) Requirement |
Project Deliverable |
|
(a) Properly understand the relevant capacities and
limitations of the high-risk AI system and be able to duly monitor its
operation, including detecting and addressing anomalies, dysfunctions, and
unexpected performance |
Training programme covering system capabilities, known
limitations, and known failure modes. Monitoring dashboard or alert mechanism
that surfaces anomalies in real time. Both must exist before deployment. |
|
(b) Remain aware of the possible tendency of automatically
relying or over-relying on the output produced by a high-risk AI system
(automation bias) |
Structural design elements that counter complacency — not
only training content. Examples: varied output presentation, mandatory
justification documentation, randomised spot-check requirements. Designed in,
not mentioned in onboarding. |
|
(c) Correctly interpret the high-risk AI system’s output,
taking into account interpretation tools and methods available |
Interpretable output format with explanation sufficient
for a trained, non-technical oversight person to evaluate the recommendation.
System card or operating guide documenting how to interpret confidence
scores, flags, and edge-case indicators. |
|
(d) Decide, in any particular situation, not to use the
high-risk AI system or to otherwise disregard, override, or reverse the
output of the high-risk AI system |
A documented, tested override mechanism. Oversight
personnel must have the authority — not just the technical ability — to
override. Override decisions must be logged per Article 26(6). |
|
(e) Intervene in the operation of the high-risk AI system
or interrupt the system through a ‘stop’ button or a similar procedure that
allows the system to come to a halt in a safe state |
A functional circuit breaker — a mechanism that can be
triggered by an oversight person and that brings the system to a defined safe
state. Tested before production deployment. Named person with authority to
use it. |
Article 26(2) adds the personnel obligation: deployers shall assign human oversight to natural persons who have the necessary competence, training, and authority, as well as the necessary support. The word “assign” matters — a generic statement that oversight will be performed is not sufficient. A named individual, with documented competence and formal authority, must be designated before the system goes live.
Article 26(5) establishes the ongoing monitoring obligation: deployers shall monitor the operation of the high-risk AI system on the basis of the instructions for use and, where relevant, inform providers of emerging issues. Critically, where deployers have reason to consider that use of the system may present a risk, they shall without undue delay inform the provider, the distributor, and the relevant market surveillance authority, and shall suspend use of the system. This is an active, ongoing obligation that persists throughout the deployment lifecycle.
Article 26(6) requires deployers to keep logs automatically generated by the high-risk AI system for at least six months, or as required by applicable law. Override decisions, anomalies, and suspension events must be captured and retained.
NIST AI RMF: MAP 3.5 and MANAGE 2.4
NIST AI RMF MAP 3.5 requires that processes for human oversight are defined, assessed, and documented in accordance with organisational policies from the GOVERN function. NIST is explicit that oversight is a shared responsibility: attempts to properly authorise or govern oversight practices will not be effective without organisational buy-in and accountability mechanisms. An oversight framework that exists only on paper, without the backing of organisational authority and incentives, does not function.
NIST AI RMF notes directly that data on the frequency and rationale with which humans overrule AI system output in deployed systems may be useful to collect and analyse. Override patterns are governance data, not only quality metrics. A near-zero override rate is not evidence that the system is performing well — it may be evidence that oversight personnel are not genuinely engaging.
NIST MANAGE 2.4 requires that mechanisms for superseding, disengaging, or deactivating AI systems are in place and applied, and that responsibilities are assigned and understood, before deployment. This maps directly to the circuit breaker requirement in EU AI Act Article 14(4)(e) — both frameworks treat deactivation capability as a pre-deployment requirement, not a post-incident response plan.
UNESCO and Global Frameworks
UNESCO’s Recommendation on the Ethics of AI establishes that it should always be possible to attribute ethical and legal responsibility to humans at any stage of the AI system lifecycle. This principle grounds the oversight requirement in accountability, not only in risk management: the purpose of human oversight is to ensure that a human being remains responsible for consequential decisions, even when AI does the analytical work.
Singapore’s IMDA Agentic AI Governance Framework (2025) extends this to multi-step autonomous systems, requiring that the deploying organisation — defined as the principal — retain accountability for all agent actions regardless of how many automated steps are involved. The principal-agent model makes clear that increasing automation does not dilute human accountability.
Designing for Oversight
Oversight cannot be an afterthought. Once a system is built without interpretable outputs, without override mechanisms, without logging infrastructure — retrofitting those capabilities is expensive and often architecturally difficult. The design decisions that enable meaningful oversight must be made early and treated as first-class requirements, not post-delivery additions.
NIST AI RMF MAP 3.5 explicitly states: in critical systems, high-stakes settings, and systems deemed high-risk it is of vital importance to evaluate risks and effectiveness of oversight procedures before an AI system is deployed. Testing oversight before deployment is a framework requirement, not a recommended option.
Technical Requirements
Each of the following must be specified as a system requirement and verified before deployment, not assumed to exist because the system is functional.
|
Technical Capability |
What It Must Do |
|
Stop/pause mechanism (Article 14(4)(e)) |
Halt system operation immediately and bring it to a
defined safe state. Must be triggerable by oversight personnel, not only by
engineers. Must be tested under realistic conditions before go-live. |
|
Override capability (Article 14(4)(d)) |
Allow oversight personnel to reject or modify individual
system outputs, with authority documented and tested. Override must be logged
automatically. |
|
Interpretable outputs (Article 14(4)(c)) |
Outputs must be presented in a format that a trained,
non-technical oversight person can evaluate. Confidence scores alone are not
sufficient. Explanation of primary factors in the recommendation must be
accessible. |
|
Audit logging (Article 26(6)) |
Automatic record of all decisions, overrides, anomalies,
and human interventions. Retained for at least six months. Format must be
accessible for review without specialist tooling. |
|
Alert thresholds |
Automatic notification when system behaviour exceeds
defined parameters — accuracy drops, output distribution shifts, anomaly
rates. Thresholds must be defined before deployment and linked to documented
response procedures. |
|
Fallback procedures |
Manual processes that can take over if the system is
halted. These must be documented and tested. A fallback that has never been
rehearsed is not a real fallback. |
Operational Requirements
Technical capability without operational structure produces oversight that exists on paper but not in practice. Each of the following must be documented and tested before deployment.
|
Operational Element |
What Must Be Defined |
|
Named oversight personnel (Article 26(2)) |
Who is assigned? Name, role, and documented authority.
What decisions can they make independently? What requires escalation? Vacant
oversight roles are a compliance gap, not an organisational convenience. |
|
Competence and training |
What does an oversight person need to know to evaluate
this system’s outputs responsibly? Training must cover system capabilities,
known limitations, documented failure modes, and the specific biases the
system has been assessed to carry. Completion must be documented. |
|
Escalation paths |
Under what specific conditions should an oversight person
escalate? To whom, and by what channel? What is the expected response time?
Escalation paths must be tested before they are needed. |
|
Review cadence |
How often are aggregate system performance and oversight
effectiveness formally reviewed? Who owns that review? What triggers an
unscheduled review? Cadence must be set based on system risk, not
organisational convenience. |
|
Override tracking |
Override rate, override reasons, and override outcomes
must be tracked and reviewed. NIST AI RMF identifies this data as
analytically valuable for governance. An unreviewed override log is not
oversight — it is an audit trail. |
Warning Signs That Oversight Is Not Working
These indicators suggest that oversight is nominal rather than substantive. Each warrants investigation, not just documentation.
|
Warning Sign |
What It Likely Means |
|
Override rate near zero over a sustained period |
Oversight personnel may not be genuinely evaluating
outputs — automation bias. Alternatively, the system may have reached a
population where it performs well and edge cases are not surfacing. Both
possibilities require investigation. |
|
Override rate very high |
System may not be fit for the deployment context. High
override rates suggest the system’s recommendations are frequently misaligned
with what oversight personnel would decide independently. This is an accuracy
and fitness-for-purpose signal. |
|
Response time degradation on flagged items |
Oversight personnel may be overwhelmed, disengaged, or
insufficiently supported. Workload and attention capacity must match the
monitoring demand the system creates. |
|
Inconsistent override decisions for similar inputs |
May indicate unclear override criteria, insufficient
training, inadequate explanation of system outputs, or disagreement about
what the system is supposed to do. Requires training review and criteria
clarification. |
|
No recent escalations despite active system operation |
Either the system is functioning within parameters
(confirm by reviewing alert thresholds) or escalation procedures are not
being followed. Distinguish between absence of problems and absence of
reporting. |
The Automation Bias Problem
Research consistently shows that humans tend to over-rely on automated systems. When an AI system usually gets it right, humans gradually stop critically evaluating its outputs. This is automation bias — and it is the primary mechanism by which human oversight fails in practice while appearing to function on paper.
EU AI Act Article 14(4)(b) treats automation bias not as a training topic but as a design requirement: high-risk AI systems must be provided to deployers in such a way that oversight personnel are enabled to remain aware of this tendency. The Act places the obligation on the system design, not only on the training programme. If your system’s outputs are presented in a way that makes uncritical acceptance the path of least resistance, the oversight design is inadequate regardless of what training materials say.
Factors That Increase Automation Bias
|
Factor |
Why It Increases Risk |
|
High oversight workload |
When oversight personnel are reviewing large volumes of
decisions under time pressure, the cognitive effort required for genuine
evaluation becomes unsustainable. Review defaults to rubber-stamping. |
|
Sustained system reliability in the past |
A system that has been accurate for months trains
oversight personnel to trust it. When the distribution shifts or an edge case
arises, the trained trust is misapplied. |
|
High-confidence output presentation |
Outputs presented with high stated confidence (percentage
scores, strong language, visual design that implies certainty) suppress the
critical evaluation that uncertain presentation would trigger. |
|
Oversight personnel who lack independent domain expertise |
A person who cannot evaluate whether a recommendation is
plausible cannot meaningfully override it. Oversight without domain
competence is a procedural formality, not a substantive check. |
|
No accountability for failures to catch errors |
If oversight personnel are never held accountable when an
AI error passes their review unchallenged, the incentive for genuine
engagement is absent. Accountability must be defined and applied. |
Structural Mitigations
These mitigations address automation bias at the system design and operational process level — not only at the training level. EU AI Act Article 14(4)(b) requires that the system be designed to support awareness of over-reliance tendency. Design choices, not only training content, must carry this obligation.
|
Mitigation |
How It Works |
|
Vary output presentation |
Do not always display recommendations in the same format.
Occasionally present the system’s underlying data without the final
recommendation, and ask the oversight person to form an independent view
before seeing the system output. Breaks the conditioned acceptance pattern. |
|
Require documented justification for agreement |
Require oversight personnel to document why they agreed
with a system recommendation before approving it, not only when they
override. Agreement without explanation is not substantive engagement. |
|
Structured randomised spot checks |
Randomly select a sample of approved decisions for
secondary review. Compare the secondary reviewer’s independent assessment
against the initial oversight decision. Inconsistencies surface both
automation bias and training gaps. |
|
Workload management |
Set explicit caps on the number of decisions a single
oversight person reviews per session. Cognitive fatigue is a documented
contributor to automation bias. Oversight throughput must be set based on
quality of engagement, not operational convenience. |
|
Individual accountability tracking |
Track oversight decisions at the individual level — not
only aggregate override rates. Where patterns emerge (one person overrides
far less frequently than peers, or overrides cluster around certain decision
types), investigate. |
PM Responsibilities by Phase
As PM, you are not designing the oversight mechanisms yourself — but you are responsible for ensuring they are specified, built, tested, and sustained. Oversight is a project deliverable, not an operational afterthought.
During Planning
• Define the oversight model in the project charter. What level of human involvement does this system require, given its risk classification and use case? The answer must be documented before development begins.
• Map to EU AI Act Article 14 if the system is high-risk. Work through each of the five capabilities in Article 14(4) and document how the project will satisfy each one. These are acceptance criteria, not guidelines.
• Identify oversight personnel before development, not at deployment. Who will perform oversight? Do those people currently exist in the organisation? Do they have domain expertise? If not, recruitment or training has a lead time that must be planned.
• Budget for ongoing oversight costs. Human oversight has recurring operational costs — staff time, training, tooling, periodic review. These must be budgeted explicitly. An oversight model that disappears when operational budgets are cut is not a real oversight model.
• Define the fallback. What happens if the system is halted? Manual processes must be identified and documented before deployment, not after an incident forces the question.
During Development
• Verify oversight capabilities are built to specification. Can the system be stopped? Can outputs be overridden? Are outputs interpretable without specialist tooling? Test against the Article 14(4) checklist before system acceptance.
• Develop operational procedures in parallel with the system. How will oversight work day-to-day? Escalation procedures, review cadences, and logging processes must be documented and tested, not drafted after go-live.
• Specify and build alert thresholds. Define the performance boundaries that trigger automatic notification to oversight personnel. Thresholds must be set based on the system’s risk profile, not on what is technically easy to configure.
• Build the logging infrastructure before integration testing. You cannot test oversight procedures if the logging required to track them does not yet exist. Article 26(6) minimum retention of six months must be confirmed before deployment.
At Deployment
• Test oversight procedures under realistic conditions. Run scenarios in which the system produces anomalous outputs, in which the circuit breaker must be triggered, and in which escalation procedures are invoked. Oversight that has never been tested is not ready for production.
• Formally train and document oversight personnel competence. EU AI Act Article 26(2) requires competence, training, and authority. Training completion must be documented. Verbal assurance that ‘people know what to do’ does not satisfy the regulation.
• Establish monitoring from day one. Override rates, alert events, and escalation activity must be tracked from the first day of operation. Baseline data collected in the first weeks informs whether oversight is functioning as designed.
Post-Deployment
• Review override rates and patterns on a defined cadence. Are humans engaging meaningfully? Are override rates consistent across oversight personnel? Are patterns clustering around specific decision types or time periods? Each variation is a governance signal.
• Review incidents for oversight effectiveness. When things go wrong, was the oversight process engaged? If an error reached a consequential outcome, at what point in the review process did it pass unchallenged? Post-incident review must include oversight effectiveness, not only technical root cause.
• Reassess the oversight model when scope or use changes. A system extended to new populations, new decision types, or higher volumes may require a more intensive oversight configuration than the original deployment. Scope changes should trigger an oversight review, not assume continuity.
Questions to Ask
Use these questions to assess whether human oversight in your AI project is substantive or nominal.
Design
• Can the system be stopped or paused immediately, and brought to a safe state — as required by EU AI Act Article 14(4)(e)? Who has the authority and the mechanism to do this?
• Can oversight personnel override individual decisions, with that override logged automatically per Article 26(6)?
• Are outputs interpretable by a trained, non-technical oversight person without specialist tooling? Does the output explanation satisfy Article 14(4)(c)?
• Has the system been designed to counter automation bias at the interface level, as required by Article 14(4)(b) — not only addressed in training materials?
• Are there automatic alerts for anomalous behaviour? Have the thresholds been defined based on risk, and tested under conditions similar to production?
Operations
• Who is assigned oversight? Is there a named person with documented competence, training, and authority per Article 26(2)? What happens when that person is unavailable?
• Do oversight personnel have the domain expertise to evaluate system outputs independently — not just the training to approve them procedurally?
• Do they have the time and workload capacity for meaningful review? Has an explicit capacity limit been set?
• What happens when they override the system? Is the override logged, reviewed, and factored into system performance assessment?
Monitoring
• Are override rates and reasons tracked at the individual and aggregate level?
• Are you watching for structural signs of automation bias — not only individual incidents?
• How often is oversight effectiveness formally reviewed, and by whom?
• What triggers an unscheduled review or reassessment of the oversight model?
• If the system were halted today, could manual fallback processes sustain operations? When were those processes last tested?
Right-Sizing for Your Situation
The appropriate oversight model depends on the system’s risk classification, the stakes of individual decisions, and the volume of decisions the system makes. EU AI Act Annex III high-risk categories require the most intensive configuration. Lower-risk systems have more flexibility — but oversight design choices should be documented and defensible regardless of risk level.
|
Greenfield
— Human Oversight Playbook For PMs
without formal oversight frameworks. Covers how to implement oversight for
high-risk decisions without enterprise infrastructure — including the minimum
viable Article 14(4) checklist, how to document named oversight personnel to
satisfy Article 26(2), and how to set up basic logging and override tracking
before you have dedicated monitoring tooling. |
|
Emerging
— Human Oversight Playbook For PMs
building repeatable oversight processes. Full oversight model selection
framework, role definition templates, alert threshold design guidance,
automation bias mitigation design patterns, and override tracking approaches
that feed into NIST AI RMF governance reporting. |
|
Established
— Human Oversight Playbook For PMs
in organisations with formal governance. How to integrate AI oversight into
existing operational and compliance frameworks — including how to connect
Article 26(5) suspension obligations to incident response procedures, and how
to manage oversight consistency across a portfolio of high-risk AI systems. |
Framework References
• EU AI Act (Official Journal, 12 July 2024) — Article 14(1) (high-risk AI systems must be designed to allow effective human oversight); Article 14(3) (oversight measures must be commensurate with risks, autonomy level, and context); Article 14(4)(a)–(e) (five specific capabilities that oversight personnel must be enabled to exercise: understanding limitations, detecting anomalies, interpreting outputs, overriding decisions, and halting the system); Article 14(5) (biometric identification systems require verification by at least two natural persons); Article 26(2) (deployers must assign oversight to named persons with necessary competence, training, and authority); Article 26(5) (ongoing monitoring obligation; suspension and notification requirements if risk is identified); Article 26(6) (log retention minimum six months); Recital 73 (human oversight design requirements; guidance and inform mechanisms for oversight decisions)
• NIST AI RMF 1.0 (NIST AI 100-1, 2023) — MAP 3.5 (processes for human oversight must be defined, assessed, and documented; oversight is a shared responsibility requiring organisational buy-in; effectiveness must be evaluated before deployment in high-stakes settings); GOVERN function (roles and responsibilities for human-AI team configurations; mechanisms for making decision-making processes explicit and countering systemic biases); AI RMF note on override data (frequency and rationale of human overrides of AI system output in deployed systems is useful to collect and analyse for governance purposes)
• NIST AI RMF 1.0 (NIST AI 100-1, 2023) — MANAGE 2.4 (mechanisms for superseding, disengaging, or deactivating AI systems must be in place and applied before deployment; responsibilities must be assigned and understood)
• NIST AI 600-1: Generative AI Profile (2024) — MG-2.4-004 (establish and regularly review specific criteria that warrant deactivation of GAI systems in accordance with risk tolerances and appetites); automation bias documentation and mitigation in GAI-specific human-AI configuration contexts
• UNESCO Recommendation on the Ethics of AI (2021) — Principle of human oversight and determination: it should always be possible to attribute ethical and legal responsibility to humans at any stage of the AI system lifecycle; AI systems must not be given legal personality that would dilute human accountability
• Singapore IMDA — Agentic AI Governance Framework (2025): principal-agent accountability model; the deploying organisation retains accountability for all agent actions regardless of automation level; supervised principle (meaningful human oversight must be maintained throughout operation, not only at deployment)
This article is part of AIPMO’s PM Practice series. See also: The AI Project Charter | AI Risk Registers | AI Impact Assessments