IMP - The PM's Guide to NIST AI RMF

PM Takeaways

• The AI RMF is voluntary — no organisation is required to implement it. But ‘voluntary’ has become a relative term. US federal agencies are expected to align with it. Enterprise procurement increasingly asks suppliers to demonstrate alignment. The EU AI Act recitals reference it. The framework has become the de facto common language for AI governance conversations in a way that makes it worth understanding even if you’re not actively implementing it.

• The framework’s most useful structural insight is that GOVERN is not a phase — it’s a precondition for the other three functions. MAP, MEASURE, and MANAGE require decisions that only governance can authorise: what risk tolerance applies, who can accept residual risk, what happens when the system underperforms. A project that starts with MAP without having resolved those questions will stall at exactly the point where it matters most.

• NIST explicitly says the framework is not a checklist. This is worth taking seriously. The AI RMF 1.0 document contains 72 subcategories across the four functions. No one implements all 72. The intended approach is to select what applies based on system risk level, organisational context, and available resources — and to document why the rest doesn’t apply. PMs should resist the impulse to treat the full subcategory list as a task backlog.

• The MAP function asks a question that most project teams skip: what happens if this system is wrong? Not wrong in the ‘bug in the code’ sense, but wrong in the ‘produces an outcome that is technically correct but harmful to someone’ sense. Who bears the cost of an AI error? Is it the user who receives the wrong recommendation, the customer who gets an inaccurate decision, or the organisation that gets sued? MAP 5 asks you to characterise impacts on individuals, groups, communities, and society. That scope is wider than most project stakeholder analyses go.

• The MANAGE function is where most AI projects currently have the largest gaps. Risk identification and testing are disciplines most project teams know how to do. What they’re less practised at is sustained post-deployment risk management — monitoring for drift, maintaining incident response procedures for AI-specific failures, planning for model updates and eventual decommissioning. MANAGE extends the PM’s accountability past the go-live date in ways that traditional project closeout does not.

Most summaries of the NIST AI Risk Management Framework open with the same observation: it’s dense, it’s voluntary, and it sounds like something for a compliance team. All three of those things are true. But they describe the document, not the framework. The framework itself is a structured way of thinking about AI risk that maps reasonably well onto things PMs already do.

Released by NIST in January 2023 (NIST AI 100-1), the AI RMF was developed through an unusually broad consultation process — multiple public workshops, a formal request for information, two public drafts. The result is a framework that tries to be useful across wildly different contexts: small teams deploying off-the-shelf AI tools, large enterprises building proprietary models, federal agencies implementing AI in high-stakes public systems. That breadth is why the language stays abstract. Your job as a PM is to make it concrete for your specific situation.

The framework has two parts. Part 1 establishes foundational concepts around AI risk, trustworthiness, and the ways AI risks differ from traditional software risks — worth reading once. Part 2 is the operational core: four functions (GOVERN, MAP, MEASURE, MANAGE) broken into categories and subcategories. This article focuses on Part 2 and on what each function actually requires a project team to do.

The Trustworthiness Characteristics

Before the four functions, there’s something worth understanding: the AI RMF organises its requirements around seven trustworthiness characteristics, not around a single definition of ‘safe AI’. Those characteristics are valid and reliable, safe, secure and resilient, accountable and transparent, explainable and interpretable, privacy-enhanced, and fair with harmful biases managed.

The reason this matters for PMs is that the framework treats these as a set, not a hierarchy. A system that is technically accurate but inexplicable to the people affected by it is not trustworthy. A system that is fair to the general population but produces discriminatory outcomes for a demographic subgroup is not trustworthy. NIST is explicit that tradeoffs among these characteristics are common — improving interpretability sometimes reduces accuracy, for instance — and that those tradeoffs require human judgment rather than algorithmic resolution. The framework’s MEASURE and MANAGE functions are partly designed to surface and document those tradeoffs so they can be made deliberately rather than by default.

For PMs, the practical implication is this: acceptance criteria for an AI system need to address all seven characteristics relevant to your context, not just the performance metrics the technical team cares about. Defining what ‘done’ looks like across accuracy, fairness, explainability, and oversight is a project initiation task, not a quality assurance afterthought.

GOVERN: The Function That Isn’t Really a Function

The AI RMF describes GOVERN as ‘cross-cutting’ — meaning it doesn’t happen at one project phase. It is infused throughout the other three functions. That framing is accurate but slightly misleading, because it suggests GOVERN is ambient rather than active. It isn’t. GOVERN produces specific outputs that MAP, MEASURE, and MANAGE depend on.

The most important of those outputs is a documented risk tolerance position for the AI system. Risk tolerance in the AI RMF sense is not a general organisational risk appetite statement. It is a specific answer to the question: what level of harm from this particular AI system’s failures is acceptable, and who has authority to make that determination? Without that answer, every contested decision in MAP and MEASURE — whether a bias rate is acceptable, whether an unexplained edge case justifies deployment delay — becomes a political negotiation rather than a principled one.

GOVERN also establishes accountability structures: who owns the model, who owns the data, who owns the deployment decision, who owns the post-deployment monitoring. These are not the same person as the project manager. They persist past project close. Getting those roles documented in project governance artifacts — not just in org charts — is a GOVERN deliverable that most projects skip.

Two other GOVERN requirements are worth flagging specifically. GOVERN 1.1 requires that legal and regulatory requirements involving AI be understood, managed, and documented — which means the legal review of applicable AI regulation belongs at project initiation, not after architecture decisions are made. And GOVERN 1.7 requires processes for safe decommissioning from the start, before the system is even built. Plan the shutdown before you design the launch.

MAP: The Function Most Projects Compress

MAP establishes the context in which the AI system will operate and identifies the risks associated with that context. It is, in project management terms, planning — requirements gathering, stakeholder analysis, scope definition, initial risk identification. Most project teams do some version of MAP. Few do all of it.

The part that gets compressed most often is stakeholder scope. MAP 5 asks organisations to characterise impacts on individuals, groups, communities, organisations, and society. That is a genuinely wider scope than most AI project stakeholder registers capture. The ‘users’ and ‘customers’ who appear on conventional stakeholder matrices are typically the people who interact with the system. MAP 5 is also asking about the people affected by the system’s outputs — the loan applicant who doesn’t appear in any user story, the employee whose performance score was generated by a model they’ll never see. Characterising those impacts requires a different kind of analysis than traditional stakeholder mapping, and it usually takes longer.

MAP also handles something that most project planning disciplines handle poorly: the go/no-go question. MAP 1.5 requires that organisational risk tolerances be determined and documented. MAP 3.2 requires that potential costs — including non-monetary costs arising from AI errors — be examined against that tolerance. Together, these subcategories create the basis for a documented, defensible go/no-go decision before deployment. Not a steering committee vote. A traceable decision with documented rationale.

The third area where MAP does work that project plans typically don’t is assumptions. MAP 1.1 explicitly requires that assumptions about AI system purposes, uses, and risks be documented. Assumptions are where AI risks hide. The team building a recruitment screening tool assumes it will be used to narrow a candidate pool, not to generate final hiring decisions. The team deploying a customer service model assumes it will handle tier-1 queries, not sensitive account disputes. When those assumptions prove wrong — and they often do, quietly, over time — the risk profile of the system changes without the project team knowing it. Documenting the assumptions creates the baseline for detecting that drift.

MEASURE: Testing Is Not QA

The MEASURE function covers quantitative and qualitative methods for assessing, benchmarking, and monitoring AI risks — before deployment and regularly in operation. The key word is ‘regularly.’ MEASURE does not end at go-live.

What distinguishes AI testing from conventional software QA is the scope of what needs to be measured. Software QA asks whether the system does what the specification says. MEASURE asks whether the system is valid and reliable, safe under foreseeable conditions and foreseeable misuse, secure against adversarial inputs, fair across demographic groups, explainable to the people who need to understand it, and privacy-preserving in practice. These are not features to be tested once. They are properties to be demonstrated, documented, and tracked.

MEASURE 2.11 is the subcategory that most often catches teams off-guard: fairness and bias are evaluated and results documented. This is not a one-time bias check during model development. It applies to the deployed system, in the actual population it operates on, over time. Population characteristics shift. The distribution of inputs changes. A system that passes bias evaluation at launch can develop disparate impact six months later without any code change. Catching that requires ongoing measurement, not a pre-deployment certification.

The other gap that MEASURE surfaces is the absence of baselines. To detect drift — degradation in model performance, shift in output distribution, emerging bias — you need a documented baseline taken at a known point when the system was performing as intended. Establishing that baseline is a MEASURE deliverable. It needs to happen at deployment or shortly after, before noise accumulates. Teams that skip it discover the absence of a baseline at exactly the wrong moment: when a stakeholder asks whether the system’s performance has changed and there’s nothing to measure against.

AI-Specific Risk Property	What MEASURE Requires from Your Project
Validity and reliability	Performance metrics against defined benchmarks; uncertainty estimates; comparison to baseline. MEASURE 2.5.
Fairness and bias	Disaggregated evaluation across relevant demographic groups at deployment and on a defined monitoring cadence. MEASURE 2.11.
Explainability	Documentation of how model outputs are interpreted, by whom, and with what level of context. MEASURE 2.9.
Safety	Evaluation that residual risk does not exceed tolerance; defined failure modes and safe-failure conditions. MEASURE 2.6.
Security and resilience	Testing for adversarial inputs and unexpected edge conditions; documentation of results. MEASURE 2.7.
Privacy	Privacy risk evaluation specific to the AI system’s data use, not just inherited from the organisation’s general privacy program. MEASURE 2.10.

MANAGE: Where Most AI Projects Currently Have Gaps

The MANAGE function covers what happens after identified risks are assessed: prioritisation, treatment, response planning, and monitoring. In conventional risk management terms, this is risk response and control. In AI terms, it extends through the operational life of the system in ways that traditional project closeout does not.

Most project teams are reasonably competent at the pre-deployment part of MANAGE — documenting risk treatments, assigning owners, building response procedures. The gap is post-deployment. The AI RMF’s MANAGE function does not treat deployment as a handoff point. Risks identified in MAP continue to be monitored. Measurements taken in MEASURE continue to feed MANAGE. New risks that emerge in production — unanticipated use cases, demographic shifts in the user population, adversarial behaviour that wasn’t anticipated during testing — need to be captured and managed. That requires a structure that persists past project close.

The practical challenge is that project managers close projects. Operations teams run systems. The handoff between the two is where AI risk management most often breaks down. Who receives the risk register? Who is responsible for acting on a monitoring alert? Who decides when model performance has degraded enough to trigger a retraining cycle? Who manages decommissioning when the system is eventually retired? These are not questions for the operations team to figure out post-handoff. They are MANAGE deliverables that need to be built into the project plan, documented, and handed over with the system.

MANAGE also covers something the AI RMF calls ‘residual risk’ decisions — the choice to deploy a system despite known, unresolved risks that fall within acceptable tolerance. These decisions are legitimate and common. What the framework requires is that they be explicit and documented: the risk is known, its level has been assessed, someone with authority has determined it is acceptable, and that determination is recorded. An undocumented residual risk is not a risk that has been accepted. It is a risk that has been ignored.

The Framework Is Not Linear

One thing that gets lost in most AI RMF summaries is the framework’s explicit position that the functions can be applied in any order. MAP, MEASURE, and MANAGE are not sequential steps. They are concurrent and iterative. You might begin MEASURE activities while MAP is still ongoing. A finding in MANAGE might send you back to MAP to revisit context assumptions. NIST describes this as intentional — the framework’s design reflects the reality that AI systems evolve in ways that don’t follow a linear development lifecycle.

For PMs used to phase-gated development, the iterative nature of the AI RMF can be unsettling. It does not produce a clear sequence of gates and approvals. What it produces instead is a set of outcomes that should be demonstrable at key decision points — before procurement of a third-party system, before deployment to production, before a significant change to an existing system, before decommissioning. Mapping those outcomes to your project’s specific decision points is how you operationalise the framework for your context.

GOVERN, as discussed earlier, is the exception: it genuinely does need to precede the other functions in meaningful respects. Without documented risk tolerance, a clear accountability structure, and a legal and regulatory review, MAP activities will surface questions that the project team cannot answer because the organisational authority to answer them was never established.

Profiles: Tailoring the Framework to Your Context

Part 2 of the AI RMF includes a concept called Profiles — a way for organisations to tailor the framework to their specific context, risk tolerance, and resources. A Profile is essentially a documented selection of relevant categories and subcategories from across the four functions, shaped by the organisation’s mission, legal environment, and the characteristics of the specific AI system.

NIST has released a Generative AI Profile (NIST AI 600-1) that extends the AI RMF for large language model systems specifically. It addresses risks that don’t appear in the base framework — hallucination, prompt injection, homogenisation, and data provenance issues specific to foundation models. If your project involves deploying or building on a generative AI system, the 600-1 profile is worth reviewing alongside the base AI RMF.

The crosswalk between the AI RMF and ISO 42001 is also worth noting. NIST has published mapping documentation showing how the two frameworks relate. Organisations implementing ISO 42001 as a management system standard will find that AI RMF alignment addresses many of the same requirements, particularly in GOVERN and MAP. The two are not duplicative — ISO 42001 provides the certifiable management system structure, the AI RMF provides the risk management process detail — but they are designed to coexist.

Right-Sizing for Your Situation

The AI RMF is designed to scale. A solo PM deploying a low-stakes internal productivity tool and an enterprise team building a high-stakes customer decision system are both within scope — they just apply the framework at very different levels of depth and formality. The relevant variable is the risk level of the AI system and the consequences of failure, not the size of the team or the sophistication of the organisation.

Greenfield — NIST AI RMF Playbook

For PMs with no existing AI governance infrastructure. A lightweight approach to each of the four functions when you’re building from scratch: which outputs matter most, what’s the minimum documentation that makes the process defensible, and how to make GOVERN decisions when the organisation hasn’t made them yet.

Emerging — NIST AI RMF Playbook

For PMs creating repeatable processes across multiple projects. How to build AI RMF alignment into project templates, establish consistent monitoring practices, and operationalise MANAGE handoffs so they don’t depend on individual project teams to reinvent the process.

Established — NIST AI RMF Playbook

For PMs integrating the AI RMF into existing governance frameworks — alongside ISO 42001, existing risk management programs, or sector-specific regulatory requirements. How to map existing controls to AI RMF categories and identify genuine gaps rather than duplicating work.

Become a member →

Framework References

• NIST AI RMF 1.0 (NIST AI 100-1, January 2023) — Part 1 (trustworthiness characteristics, AI risk framing), Part 2 (GOVERN, MAP, MEASURE, MANAGE functions; all categories and subcategories). Primary source for all four functions and subcategory references.

• NIST AI RMF Playbook (NIST, 2023, online) — Suggested actions for each subcategory across all four functions. Voluntary implementation guidance; useful for translating AI RMF subcategories into specific project activities.

• NIST Generative AI Profile (NIST AI 600-1, 2024) — Full profile. Extension of the AI RMF for large language model systems; covers hallucination, prompt injection, data provenance, and homogenisation risks.

• ISO/IEC 42001:2023 — NIST has published crosswalk documentation mapping AI RMF categories to ISO 42001 clauses. Relevant for organisations implementing both frameworks.

This article is part of AIPMO’s Frameworks series. See also: AI Impact Assessments | ISO 42001 for Project Managers | OECD AI Principles: The Framework Behind the Frameworks