The PM's Guide to NIST AI RMF

PM Takeaways

“Voluntary” has become a relative term. US federal agencies are expected to align with the AI RMF, enterprise procurement increasingly asks suppliers to demonstrate alignment, and the EU AI Act recitals reference it. It’s the de facto common language for AI governance conversations.
GOVERN is not a phase — it’s a precondition. MAP, MEASURE, and MANAGE all depend on decisions that only governance can authorize: risk tolerance, accountability structure, legal review. A project that skips GOVERN and starts with MAP will stall the moment a contested decision needs an answer.
The AI RMF contains 72 subcategories across four functions. NIST is explicit that it’s not a checklist and no one implements all 72. The intended approach is to select what applies, document why the rest doesn’t, and move on — not treat the full list as a task backlog.
MAP asks what happens if the system is wrong — not in the “bug in the code” sense, but in the “technically correct output that harms someone” sense. Most project stakeholder analyses don’t go that far. MAP 5 requires characterizing impacts on individuals, groups, communities, and society.
Most AI project teams are competent at pre-deployment risk management. The gap is post-deployment. MANAGE extends accountability past go-live: monitoring for drift, responding to incidents, planning model updates, and eventually decommissioning — none of which traditional project closeout handles.

Most summaries of the NIST AI Risk Management Framework open with the same observation: it’s dense, it’s voluntary, and it sounds like something for a compliance team. All three of those things are true. But they describe the document, not the framework. The framework itself is a structured way of thinking about AI risk that maps reasonably well onto things PMs already do.

Released by NIST in January 2023 (NIST AI 100-1), the AI RMF was developed through an unusually broad consultation process — multiple public workshops, a formal request for information, two public drafts. The result is a framework that tries to be useful across wildly different contexts: small teams deploying off-the-shelf AI tools, large enterprises building proprietary models, federal agencies implementing AI in high-stakes public systems. That breadth is why the language stays abstract. Your job as a PM is to make it concrete for your specific situation.

The framework has two parts. Part 1 establishes foundational concepts around AI risk, trustworthiness, and the ways AI risks differ from traditional software risks — worth reading once. Part 2 is the operational core: four functions (GOVERN, MAP, MEASURE, MANAGE) broken into categories and subcategories. This article focuses on Part 2 and on what each function actually requires a project team to do.

The Trustworthiness Characteristics

Before the four functions, there’s something worth understanding: the AI RMF organizes its requirements around seven trustworthiness characteristics, not around a single definition of “safe AI.” Those characteristics are: valid and reliable, safe, secure and resilient, accountable and transparent, explainable and interpretable, privacy-enhanced, and fair with harmful biases managed.

The reason this matters for PMs is that the framework treats these as a set, not a hierarchy. A system that is technically accurate but inexplicable to the people affected by it is not trustworthy. A system that is fair to the general population but produces discriminatory outcomes for a demographic subgroup is not trustworthy. NIST is explicit that tradeoffs among these characteristics are common — improving interpretability sometimes reduces accuracy — and that those tradeoffs require human judgment rather than algorithmic resolution.

The practical implication: acceptance criteria for an AI system need to address all seven characteristics relevant to your context, not just the performance metrics the technical team cares about. Defining what “done” looks like across accuracy, fairness, explainability, and oversight is a project initiation task, not a quality assurance afterthought.

GOVERN: The Function That Isn’t Really a Function

The AI RMF describes GOVERN as “cross-cutting” — meaning it doesn’t happen at one project phase. It is infused throughout the other three functions. That framing is accurate but slightly misleading, because it suggests GOVERN is ambient rather than active. It isn’t. GOVERN produces specific outputs that MAP, MEASURE, and MANAGE depend on.

The most important of those outputs is a documented risk tolerance position for the AI system. Risk tolerance in the AI RMF sense is not a general organizational risk appetite statement. It is a specific answer to the question: what level of harm from this particular AI system’s failures is acceptable, and who has authority to make that determination? Without that answer, every contested decision in MAP and MEASURE — whether a bias rate is acceptable, whether an unexplained edge case justifies deployment delay — becomes a political negotiation rather than a principled one.

GOVERN also establishes accountability structures: who owns the model, who owns the data, who owns the deployment decision, who owns the post-deployment monitoring. These are not the same person as the project manager. They persist past project close. Getting those roles documented in project governance artifacts — not just in org charts — is a GOVERN deliverable that most projects skip.

Two other GOVERN requirements are worth flagging specifically. GOVERN 1.1 requires that legal and regulatory requirements involving AI be understood, managed, and documented — which means the legal review of applicable AI regulation belongs at project initiation, not after architecture decisions are made. And GOVERN 1.7 requires processes for safe decommissioning from the start, before the system is even built. Plan the shutdown before you design the launch.

MAP: The Function Most Projects Compress

MAP establishes the context in which the AI system will operate and identifies the risks associated with that context. It is, in project management terms, planning — requirements gathering, stakeholder analysis, scope definition, initial risk identification. Most project teams do some version of MAP. Few do all of it.

The part that gets compressed most often is stakeholder scope. MAP 5 asks organizations to characterize impacts on individuals, groups, communities, organizations, and society. That is a genuinely wider scope than most AI project stakeholder registers capture. The “users” and “customers” on conventional stakeholder matrices are typically the people who interact with the system. MAP 5 is also asking about the people affected by the system’s outputs — the loan applicant who doesn’t appear in any user story, the employee whose performance score was generated by a model they’ll never see.

MAP also handles something that most project planning disciplines handle poorly: the go/no-go question. MAP 1.5 requires that organizational risk tolerances be determined and documented. MAP 3.2 requires that potential costs — including non-monetary costs arising from AI errors — be examined against that tolerance. Together, these subcategories create the basis for a documented, defensible go/no-go decision before deployment. Not a steering committee vote. A traceable decision with documented rationale.

MAP 1.1 explicitly requires that assumptions about AI system purposes, uses, and risks be documented. Assumptions are where AI risks hide. The team building a recruitment screening tool assumes it will be used to narrow a candidate pool, not to generate final hiring decisions. When those assumptions prove wrong — and they often do, quietly, over time — the risk profile of the system changes without the project team knowing it. Documenting the assumptions creates the baseline for detecting that drift.

MEASURE: Testing Is Not QA

The MEASURE function covers quantitative and qualitative methods for assessing, benchmarking, and monitoring AI risks — before deployment and regularly in operation. The key word is “regularly.” MEASURE does not end at go-live.

What distinguishes AI testing from conventional software QA is the scope of what needs to be measured. Software QA asks whether the system does what the specification says. MEASURE asks whether the system is valid and reliable, safe under foreseeable conditions and misuse, secure against adversarial inputs, fair across demographic groups, explainable to the people who need to understand it, and privacy-preserving in practice. These are not features to be tested once. They are properties to be demonstrated, documented, and tracked.

MEASURE 2.11 is the subcategory that most often catches teams off-guard: fairness and bias are evaluated and results documented. This is not a one-time bias check during model development. It applies to the deployed system, in the actual population it operates on, over time. A system that passes bias evaluation at launch can develop disparate impact six months later without any code change. Catching that requires ongoing measurement, not a pre-deployment certification.

The other gap that MEASURE surfaces is the absence of baselines. To detect drift — degradation in model performance, shift in output distribution, emerging bias — you need a documented baseline taken at a known point when the system was performing as intended. Teams that skip it discover the absence of a baseline at exactly the wrong moment: when a stakeholder asks whether the system’s performance has changed and there’s nothing to measure against.

AI-Specific Risk Property	What MEASURE Requires from Your Project
Validity and reliability	Performance metrics against defined benchmarks; uncertainty estimates; comparison to baseline. MEASURE 2.5.
Fairness and bias	Disaggregated evaluation across relevant demographic groups at deployment and on a defined monitoring cadence. MEASURE 2.11.
Explainability	Documentation of how model outputs are interpreted, by whom, and with what level of context. MEASURE 2.9.
Safety	Evaluation that residual risk does not exceed tolerance; defined failure modes and safe-failure conditions. MEASURE 2.6.
Security and resilience	Testing for adversarial inputs and unexpected edge conditions; documentation of results. MEASURE 2.7.
Privacy	Privacy risk evaluation specific to the AI system’s data use, not just inherited from the organization’s general privacy program. MEASURE 2.10.

MANAGE: Where Most AI Projects Currently Have Gaps

The MANAGE function covers what happens after identified risks are assessed: prioritization, treatment, response planning, and monitoring. In AI terms, it extends through the operational life of the system in ways that traditional project closeout does not.

Most project teams are reasonably competent at the pre-deployment part of MANAGE — documenting risk treatments, assigning owners, building response procedures. The gap is post-deployment. The AI RMF’s MANAGE function does not treat deployment as a handoff point. Risks identified in MAP continue to be monitored. Measurements taken in MEASURE continue to feed MANAGE. New risks that emerge in production — unanticipated use cases, demographic shifts in the user population, adversarial behavior not anticipated during testing — need to be captured and managed. That requires a structure that persists past project close.

The practical challenge is that project managers close projects. Operations teams run systems. The handoff between the two is where AI risk management most often breaks down. Who receives the risk register? Who is responsible for acting on a monitoring alert? Who decides when model performance has degraded enough to trigger a retraining cycle? Who manages decommissioning? These are not questions for the operations team to figure out post-handoff. They are MANAGE deliverables that need to be built into the project plan, documented, and handed over with the system.

MANAGE also covers what the AI RMF calls “residual risk” decisions — the choice to deploy a system despite known, unresolved risks that fall within acceptable tolerance. These decisions are legitimate and common. What the framework requires is that they be explicit and documented: the risk is known, its level has been assessed, someone with authority has determined it is acceptable, and that determination is recorded. An undocumented residual risk is not a risk that has been accepted. It is a risk that has been ignored.

The Framework Is Not Linear

One thing that gets lost in most AI RMF summaries is the framework’s explicit position that the functions can be applied in any order. MAP, MEASURE, and MANAGE are not sequential steps. They are concurrent and iterative. You might begin MEASURE activities while MAP is still ongoing. A finding in MANAGE might send you back to MAP to revisit context assumptions. NIST describes this as intentional — the framework’s design reflects the reality that AI systems evolve in ways that don’t follow a linear development lifecycle.

For PMs used to phase-gated development, the iterative nature of the AI RMF can be unsettling. It does not produce a clear sequence of gates and approvals. What it produces instead is a set of outcomes that should be demonstrable at key decision points — before procurement of a third-party system, before deployment to production, before a significant change to an existing system, before decommissioning. Mapping those outcomes to your project’s specific decision points is how you operationalize the framework for your context.

GOVERN is the exception: it genuinely does need to precede the other functions in meaningful respects. Without documented risk tolerance, a clear accountability structure, and a legal and regulatory review, MAP activities will surface questions that the project team cannot answer because the organizational authority to answer them was never established.

Profiles: Tailoring the Framework to Your Context

Part 2 of the AI RMF includes a concept called Profiles — a way for organizations to tailor the framework to their specific context, risk tolerance, and resources. A Profile is essentially a documented selection of relevant categories and subcategories from across the four functions, shaped by the organization’s mission, legal environment, and the characteristics of the specific AI system.

NIST has released a Generative AI Profile (NIST AI 600-1) that extends the AI RMF for large language model systems specifically. It addresses risks that don’t appear in the base framework — hallucination, prompt injection, homogenization, and data provenance issues specific to foundation models. If your project involves deploying or building on a generative AI system, the 600-1 profile is worth reviewing alongside the base AI RMF.

The crosswalk between the AI RMF and ISO 42001 is also worth noting. NIST has published mapping documentation showing how the two frameworks relate. ISO 42001 provides the certifiable management system structure; the AI RMF provides the risk management process detail. They are designed to coexist, not compete.

Right-Sizing for Your Situation

The AI RMF is designed to scale. A solo PM deploying a low-stakes internal productivity tool and an enterprise team building a high-stakes customer decision system are both within scope — they just apply the framework at very different levels of depth and formality. The relevant variable is the risk level of the AI system and the consequences of failure, not the size of the team.

Greenfield — Starting Out

Start with GOVERN, even informally. Before any MAP or MEASURE activity, get three things documented: who has authority to set the risk tolerance for this system, what applicable legal or regulatory requirements have been identified, and who owns post-deployment monitoring. These don’t need to be formal deliverables — a decision log entry and an email trail are better than nothing. Then use MAP 1.1 to log your key assumptions about how the system will be used. Those two steps give you a defensible foundation and create the baseline for detecting drift.

Emerging — Building Repeatability

The highest-value investment at this stage is the MANAGE handoff template. Define what the operations team receives when a project closes: the risk register with owners, monitoring cadence and alert thresholds, baseline performance metrics, and the residual risk acceptance record. Standardizing that handoff across projects means MANAGE doesn’t depend on individual PMs to reinvent it. Pair this with a bias evaluation protocol in MEASURE that specifies which demographic disaggregations apply to each system type — codifying that judgment once saves relitigating it on every project.

Established — Mature Programs

At this level the work is mapping, not building. NIST has published crosswalk documentation between the AI RMF and ISO 42001, the EU AI Act, and other major frameworks. Use those crosswalks to identify which AI RMF categories are already addressed by existing controls and where genuine gaps remain. The goal is not to implement the AI RMF on top of existing governance — it’s to demonstrate that your existing governance addresses the AI RMF’s intent, supplemented where it doesn’t. That distinction matters when regulators or auditors ask.

The AI Governance Advisor at app.aipmo.co can help you work through how the AI RMF’s four functions apply to your specific project context, methodology, and risk level.

Framework References

NIST AI Risk Management Framework 1.0 (NIST AI 100-1, January 2023) — Part 1 (trustworthiness characteristics, AI risk framing), Part 2 (GOVERN, MAP, MEASURE, MANAGE functions and all subcategories). Primary source for all function and subcategory references in this article.

NIST AI RMF Playbook (NIST, 2023, online) — Suggested actions for each subcategory across all four functions. Voluntary implementation guidance for translating AI RMF subcategories into specific project activities.

NIST Generative AI Profile (NIST AI 600-1, 2024) — Extension of the AI RMF for large language model systems. Covers hallucination, prompt injection, data provenance, and homogenization risks not addressed in the base framework.

ISO/IEC 42001:2023 — NIST has published crosswalk documentation mapping AI RMF categories to ISO 42001 clauses. Relevant for organizations implementing both frameworks simultaneously.

This article is part of AIPMO’s Frameworks series. See also: AI Impact Assessments | ISO 42001 for Project Managers | OECD AI Principles | AI Risk Classification

To err is AI; to govern, human.

AIPMO.co · AI Governance, PM-first

The PM's Guide to NIST AI RMF

The Trustworthiness Characteristics

GOVERN: The Function That Isn’t Really a Function

MAP: The Function Most Projects Compress

MEASURE: Testing Is Not QA

MANAGE: Where Most AI Projects Currently Have Gaps

The Framework Is Not Linear

Profiles: Tailoring the Framework to Your Context

Right-Sizing for Your Situation

Framework References

AIPMO

More in Frameworks & Regulations

The White House Just Published a National AI Framework. Don’t Rewrite Your Governance Program Yet.

AI Governance for U.S. Projects: What Actually Applies?

What the EU AI Act Means for Your Project Timeline

AI Risk Classification: How to Use the EU AI Act Framework for Project Scoping

More from AIPMO

NAIC AI Bulletin Adoption: Q2 2026 State-by-State Status

The Banking Sector Got Mythos First. Here's What That Means for Its PMs.

The Mythos Signal: Why a Model You Can't Use Should Change Your AI Governance

The AI Project Charter for Agile Teams: Governance that Enables Agility, Not Bureaucracy