- When you deploy a third-party AI system, you own the accountability — even if you didn't write a line of it. EU AI Act Article 25 and NIST AI RMF GOVERN 6.1 are both clear: compliance obligations don't transfer to the vendor. They sit with you.
- Standard software contracts were not written for AI. Before you sign, negotiate change notification, audit rights, IP indemnification for training data, and data handling terms. These aren't extras — they're your primary protection when something goes wrong.
- A vendor's test results on their general population don't substitute for your own validation in your specific context. Test against your users, your use case, before go-live. Then keep testing in production.
- If a vendor won't share model cards, validation reports, or testing results, that silence is a risk signal in itself. NIST AI RMF GOVERN 6.1 flags vendors who obstruct risk mapping as elevated risk. Make documentation transparency a formal evaluation criterion.
- Plan for vendor failure before you need to act on it. Have a tested fallback — even if that means reverting to manual process — and know exactly how to escalate a critical incident to your vendor at 2am on a Sunday.
Most AI projects don't build from scratch. They use pre-trained models, integrate APIs, license AI-powered software, or deploy vendor solutions. That's faster and cheaper — but it introduces risks that traditional vendor management wasn't designed to handle.
When you deploy third-party AI, you inherit the choices someone else made about training data, testing, and risk mitigation — along with their limitations and blind spots. You also inherit compliance obligations that don't transfer to the vendor, regardless of what the contract says. Understanding that changes everything about how you approach procurement.
The AI Value Chain
AI systems frequently involve multiple third-party components, each contributing its own risk profile to your deployment. A single production system may draw on several of these simultaneously.
| Component Type | Examples |
|---|---|
| Foundation models | GPT, Claude, Llama, Gemini — pre-trained models used via API or fine-tuned for your use case |
| Pre-trained models | Specialized models for vision, speech, translation, sentiment, or document processing |
| Training datasets | Licensed or purchased data used to train or fine-tune custom models |
| AI platforms | Cloud ML services, MLOps platforms, feature stores, labeling services |
| Embedded AI | AI capabilities built into enterprise software — CRM, ERP, HRIS, procurement tools |
| AI APIs | Third-party services for specific functions: OCR, fraud detection, recommendations, scoring |
| Software libraries | Open-source ML frameworks, data processing tools, evaluation packages |
Every component in this chain carries its own risk. A problem in one layer flows downstream — and tracing it back through a complex dependency stack is genuinely hard. NIST AI RMF MAP 4.2 requires you to identify and document internal risk controls for third-party AI components. You can't manage a chain you haven't mapped.
Where Third-Party Risk Hides
Documentation Gaps
If a vendor can't or won't tell you what data the model was trained on, which populations are underrepresented, what testing was done, or what the known limitations are — that's not just inconvenient. It means you can't fully assess whether the system is appropriate for your use case, and it makes your own downstream transparency obligations nearly impossible to meet. NIST AI RMF GOVERN 6.1 makes documentation transparency a core requirement.
Inherited Bias
A model trained on biased data produces biased outputs — regardless of your intent when you deployed it. When you use a third-party model, you inherit whatever biases are baked into its training pipeline, and you're accountable for what happens in your context. A vendor's bias testing on their own general population doesn't substitute for your own validation with your actual users and use case.
Intellectual Property Risk
Training data sometimes includes copyrighted material, personal data, or content with legal restrictions — and using a model trained on such data can expose your organization to infringement claims. The EU AI Act puts copyright compliance obligations on GPAI model providers, but that doesn't mean deployers are off the hook. Assess your IP exposure, especially in sectors where it's a known sensitivity.
Security Vulnerabilities
Third-party AI components can introduce attack vectors your standard security controls don't cover: malware in software libraries, backdoors in pre-trained weights, data poisoning in training datasets, adversarial vulnerabilities, and privacy leaks from model memorization. NIST GenAI GV-6.1-009 requires vendor assessments to check against incident and vulnerability databases — a step most IT procurement processes still skip.
Dependency and Lock-In
Vendor dependency is easy to underestimate at procurement and painful to discover at renewal. Vendors discontinue services, revise API terms, modify model behavior in silent updates, change pricing, or exit the market. NIST GenAI GV-6.2-007 advises reviewing contracts specifically to avoid arbitrary termination rights and liability terms that shift risk to you in ways that aren't obvious upfront. Map your dependency risk before you sign.
Compliance Stays With You
Regulatory obligations don't follow the supply chain — they sit with the deployer. Under the EU AI Act, if you deploy a high-risk AI system, you have obligations for human oversight, output interpretability, and post-market monitoring regardless of who built the underlying model. You need your own risk controls and documentation. Contracting those obligations to your vendor doesn't satisfy the regulator.
Due Diligence: Beyond the Brochure
Standard vendor due diligence looks at financial stability, service reliability, and contract terms. AI due diligence needs to go further — into the model itself: how it was trained, what it was tested against, and what its known failure modes are. NIST GenAI GV-6.1-009 requires due diligence to cover IP, data privacy, security, and ongoing monitoring for third-party generative AI specifically.
Documentation to Request
| Document | Purpose |
|---|---|
| Model cards | Intended use, training data summary, performance metrics, known limitations, out-of-scope uses |
| Datasheets for datasets | Data sources, collection methods, preprocessing, potential biases, gaps in coverage |
| System cards | Full system behavior, safety measures, human oversight integration, deployment constraints |
| Testing and validation reports | Evaluation methodology, fairness and bias testing, adversarial testing results |
| Incident history | Past problems, how they were identified, how they were addressed, recurrence rate |
If a vendor won't or can't provide this documentation, that itself is a risk signal. NIST AI RMF GOVERN 6.1 flags vendors who obstruct risk mapping as indicators of elevated risk. Gaps in transparency at procurement almost never improve after the contract is signed.
Evaluation Criteria
| Criterion | Key Questions |
|---|---|
| Fitness for purpose | Is this AI appropriate for your specific use case? Has it been validated in similar operational contexts? |
| Performance | What are the accuracy metrics? How were they measured? On what populations and in what conditions? |
| Fairness | Has bias testing been performed? Across which demographic groups? What were the results and accepted thresholds? |
| Transparency | Can individual decisions be explained? Is documentation sufficient for your compliance obligations? |
| Security | What security testing has been performed? What vulnerabilities are known and how have they been addressed? |
| Privacy | What data was used for training? How is your input data handled at inference? Is it used to retrain the model? |
| Regulatory support | Does the vendor understand the regulatory requirements in your operating jurisdictions? |
| Incident response | What are the vendor's notification timelines and support obligations when the AI causes harm? |
Approved Vendor Lists
NIST GenAI GV-6.1-007 recommends keeping an inventory of all third-party entities with access to your content and maintaining an approved AI provider list. Set criteria for getting on the list, a review schedule, and clear conditions for removal. This is a governance document — procurement, legal, and technology should own it together.
Contracts That Don't Disappoint You Later
Standard software contracts weren't designed for AI. They assume stable, deterministic behavior and cover what most SLAs and liability clauses have always covered. AI systems change, drift, and fail in ways those templates don't anticipate. Before you sign, make sure the contract actually addresses the governance scenarios you'll face in production.
Key Contract Elements
| Element | Why It Matters |
|---|---|
| Liability allocation | Who is responsible when the AI causes harm? Avoid blanket terms that defer all liability to the deployer. |
| Incident notification | Require timely notification of serious incidents arising from the vendor's AI, with defined timelines. |
| Change notification | Require advance notice before material changes to model behavior, architecture, or training data. |
| Performance SLAs | Define acceptable performance levels — including fairness metrics — and remedies if they are not met. |
| Audit rights | Include the right to evaluate the vendor's AI processes, documentation, and testing results. |
| Data handling | Specify how your input data is used — particularly whether it is used to train or fine-tune the model. |
| IP indemnification | Protect against claims arising from the vendor's training data, especially for copyright and personal data. |
| Termination rights | Ensure you can exit the contract if the AI does not meet requirements or compliance obligations. |
| Data portability | Ensure you can retrieve your data and any fine-tuned artifacts if you switch vendors. |
Terms to Watch Out For
Standard AI vendor agreements increasingly contain terms that should concern you: unlimited use of your data for model training, the right to change model behavior without notice, blanket disclaimers on AI outputs, switching costs baked into proprietary formats, and requirements to indemnify the vendor for AI-related claims. These aren't hidden — they're in the standard agreement. Read and negotiate them out before you sign.
Monitoring: Because Validation Day Is Not Enough
Due diligence doesn't stop at contract signing. Model behavior can change after a vendor update, after your own fine-tuning, or as your production data distribution drifts from what the model was trained on. A model that passed validation at procurement may behave meaningfully differently six months in. NIST AI RMF MANAGE 3.2 requires pre-trained models to be monitored as part of regular AI system maintenance.
What to Monitor
| Area | Focus |
|---|---|
| Performance | Is the AI still performing as expected in your environment against your baseline metrics? |
| Behavior changes | Have vendor updates changed outputs in ways that affect your use case or compliance posture? |
| Fairness drift | Are bias metrics stable across demographic groups, or drifting as the user population or data changes? |
| Incidents | Has the vendor reported incidents? Have you experienced issues that should be reported upstream? |
| Compliance alignment | Do vendor practices still align with evolving regulatory requirements in your operating jurisdictions? |
| Security | Are there new vulnerabilities reported in vendor components, model architectures, or dependencies? |
Review Cadence
Calibrate your review cadence to the risk level of the AI system. Lower-risk internal tools can run on a lighter schedule; anything affecting consequential decisions deserves tighter oversight.
- Continuous: Automated performance and fairness metric monitoring with defined alert thresholds.
- Monthly: Review of incidents, user feedback, and any vendor communications about model changes.
- Quarterly: Vendor performance against SLAs, including incident response and change notification.
- Annually: Full reassessment of vendor relationship, AI suitability, and alignment with current regulatory requirements.
When Things Go Wrong
When a third-party AI causes a problem, your incident response has to account for the vendor in the loop. NIST GenAI GV-6.2-003 requires incident response plans for third-party AI to define ownership clearly, communicate the plan to everyone who needs it, and be rehearsed regularly — not written for the first time while an incident is in progress.
| Element | Action | Vendor Dependency |
|---|---|---|
| Detection | Monitor for problems in third-party components via automated tooling and user feedback channels | Define what vendor telemetry you have access to under the contract |
| Containment | Be able to disable or bypass third-party AI and activate fallback procedures immediately | Test fallback activation before you need it; manual fallback may be the only option |
| Escalation | Know exactly how to reach vendor support for urgent issues, with documented contacts and SLAs | Verify escalation paths at contract execution, not during an incident |
| Communication | Coordinate incident communication with the vendor to avoid contradictory public statements | Clarify communication ownership in the contract; align on timelines for disclosure |
| Documentation | Track all third-party AI incidents with timestamps, decisions made, and outcomes | Document vendor response times and quality for use in periodic relationship reviews |
| Review | Incorporate vendor incidents into relationship reviews; poor incident response is a termination trigger | Review contract terms after every significant incident for adequacy |
Fallback Procedures
For any critical AI function, a tested fallback is non-negotiable. NIST GenAI GV-6.2-006 requires policies for rollover and fallback technologies — and explicitly acknowledges that fallback may mean manual processing. Design and test the fallback before you need it. If your system has no viable fallback, you've created an unassessed operational dependency.
What the Regulations Actually Say
EU AI Act
Under EU AI Act Article 25, if you put your name or trademark on a third-party high-risk AI system, you assume provider obligations — regardless of who built the underlying model. That matters for white-label AI products and enterprise software with embedded AI. GPAI model providers must supply technical documentation to downstream deployers under Article 53, which forms the foundation of your compliance. If an upstream provider is falling short, EU AI Act Article 83 lets you lodge a complaint with the AI Office — but only if you've done enough due diligence to identify the problem.
NIST AI RMF
NIST AI RMF's governance of third-party AI is spread across multiple functions, but the direction is consistent: you are accountable for the AI you deploy. GOVERN 6.1 requires third-party transparency policies. GOVERN 6.2 requires contingency plans for high-risk third-party failures. MAP 4.2 requires documented internal controls for third-party components. MANAGE 3.1 requires ongoing monitoring. Together, these add up to a genuine third-party governance program — not a one-time procurement checklist.
Your Responsibilities, Phase by Phase
During Planning
- Identify all third-party AI components in scope and map the full value chain from model provider to deployment.
- Include vendor due diligence as a formal project deliverable with its own timeline, owner, and acceptance criteria.
- Budget for evaluation, validation in your context, and ongoing monitoring — not just procurement costs.
- Define the requirements that third-party AI must meet before deployment, including performance, fairness, and documentation thresholds.
During Procurement
- Conduct AI-specific due diligence using the documentation and evaluation criteria above — not standard IT vendor assessment.
- Request and review model cards, datasheets, testing reports, and incident history before contract execution.
- Negotiate AI-specific contract terms covering change notification, audit rights, IP indemnification, and data handling.
- Establish acceptance criteria that must be validated in your deployment environment, not just accepted from vendor documentation.
During Implementation
- Validate that third-party AI meets your requirements in your specific context — vendor test results on different populations or use cases are insufficient.
- Test for bias and fairness in your deployment environment with your actual user population.
- Establish monitoring infrastructure for third-party components before go-live, not after.
- Document all third-party dependencies in the project's AI system documentation for governance and examination readiness.
Post-Deployment
- Monitor third-party AI performance and behavior against your defined thresholds on the cadence appropriate to the risk level.
- Track vendor updates and formally assess their impact before accepting them into production.
- Manage incidents involving third-party components using your defined response process, with vendor escalation paths tested.
- Conduct periodic vendor reviews on the schedule established at procurement; use incident history and SLA performance as inputs.
Questions Worth Asking
Use these questions across the procurement and ongoing monitoring lifecycle. A vendor's willingness and ability to answer them is itself a governance signal.
Training and Data
- What data was used to train this model, and how was it sourced and licensed?
- Were there data quality, representativeness, or coverage gaps that affect performance for specific populations or use cases?
- How do you handle personal data in training, and have you assessed compliance with applicable data protection law?
- Is our input data used to train or fine-tune the model? If so, under what conditions and with what controls?
Testing and Validation
- What testing was performed before release, and can you provide the full test reports?
- How do you test for fairness and bias, and what were the results across demographic groups?
- What are the known limitations of this model, and in which contexts is it not appropriate to use?
- Have you tested for adversarial vulnerabilities, data poisoning risks, and model memorization of training data?
Operations and Incident Response
- How and when do you notify customers of model changes that could affect outputs or behavior?
- What are your defined response times for critical incidents, and what remedies are available if you miss them?
- Can you provide your incident history for this model, including how issues were identified and resolved?
- What fallback or continuity options are available if your service is disrupted?
Compliance and Audit
- How do you support customers' obligations under the EU AI Act, including documentation for deployers?
- What audit rights do you provide, and have you been audited by third parties?
- How do you handle IP indemnification for claims arising from your training data?
- Are you registered with the EU AI Act's AI Office or any sector regulator, and can you provide compliance documentation?
Right-Sizing This for Your Situation
Scale your vendor governance to match the risk. A foundation model API used for internal document summarization needs far less governance overhead than a third-party scoring model that influences credit or employment decisions. Match the rigor to the stakes.
You're procuring third-party AI without an established vendor governance process. Start with three things: a documentation request list for every vendor (model card, testing summary, incident history), five non-negotiable contract terms (change notification, data handling, audit rights, IP indemnification, termination rights), and a basic post-deployment monitoring checklist. You don't need a formal program yet — you need the fundamentals in place before you sign.
You're moving from ad hoc evaluation to a repeatable process. Build an approved vendor list with explicit entry criteria and a defined review schedule. Formalize your AI-specific evaluation rubric so every procurement uses the same criteria. Design a monitoring program with defined thresholds, not just periodic manual reviews. The goal is consistency — so that vendor governance isn't reinvented from scratch each time.
Third-party AI governance needs to integrate with your existing procurement, vendor management, legal, and compliance frameworks — not sit alongside them. Map your EU AI Act value chain obligations, including what your GPAI model providers must supply and what your deployer obligations are. Make AI-specific risk assessment part of your enterprise vendor risk management program. Conduct annual full reassessments, not just SLA reviews.
The AI Governance Advisor can help you work through vendor evaluation criteria, contract term analysis, and monitoring program design for your specific deployment context.
h2('Framework References'),
- EU AI Act (2024) — Articles 25, 53, 83. Value chain accountability, GPAI documentation obligations, and deployer complaint rights.
- NIST AI RMF 1.0 — GOVERN 6.1, GOVERN 6.2, MAP 4.2, MANAGE 3.1, MANAGE 3.2. Third-party transparency, contingency planning, internal controls, and ongoing monitoring.
- NIST AI 600-1 GenAI Profile (2024) — GV-6.1-009, GV-6.2-003, GV-6.2-006, GV-6.2-007. Due diligence for GenAI acquisition, incident response, fallback procedures, and contract review.
- AIGP Body of Knowledge v1.0.0 — Domains III and IV. IP risks in training data, third-party governance programs, and contractual protections for deployers.
- PMI CPMAI Guide (2025) — Phases I, III, V, VI. Dependency mapping, vendor evaluation as a project deliverable, third-party validation, and ongoing monitoring of procured AI.
This article is part of AIPMO’s PM Practice series. See also: AI Risk Registers | Model Cards and Datasheets | Monitoring AI Systems in Production