Model Cards and Datasheets: Documentation That Matters

PM Takeaways

Model cards, dataset documentation, and system cards are not technical files for the data science team to manage. They are governance documents — the evidence that shows how the system was built, what it was tested against, and where it may not perform as expected. They support regulatory compliance, enable meaningful oversight, and protect the organization when decisions are challenged.
EU AI Act Article 11 requires technical documentation for high-risk AI to be completed before the system is placed on the market — and kept up to date throughout its life. A 10-year retention obligation means documentation is an ongoing cost, not a project deliverable that closes at go-live.
Documentation decisions must be made during development, not afterward. Once training is complete, there is no human-readable record of why certain data or design choices were made. If it wasn’t documented at the time, it often can’t be reconstructed.
NIST AI RMF MEASURE 2.11 requires fairness and bias evaluations to be documented with results disaggregated by group. Aggregate accuracy metrics are not sufficient. If your model card only reports overall performance, it has a compliance gap.

Traditional software projects produce requirements docs, design specs, and user manuals. AI projects need all of those — and three more that don’t have equivalents in conventional IT: model cards, dataset documentation, and system cards.

These aren’t files for the data science team to archive. They are the governance record — evidence of how the system was built, what data it used, where it works, and where it may not. They’re what you’ll reach for during a regulatory review, a legal challenge, or a post-incident investigation. Your job as PM: make sure they exist, stay current, and have an owner.

Why AI Documentation Is Different

Traditional software documentation answers two questions: what does it do, and how do you use it? AI documentation needs to answer those — and several more: How was the system built? What data was used to train it? Where does it perform well, and where might it fail? Are there groups for whom performance is different? Who could be harmed if it gets things wrong?

This shift reflects a fundamental difference in how AI system behavior is produced. Traditional software behavior is determined by code you can inspect line by line. AI system behavior emerges from data and training processes that, once complete, leave no human-readable record of the decisions that shaped them. If you do not document those decisions during development — why this training dataset, why this preprocessing choice, what the model was tested against and what it was not — they cannot be reconstructed after the fact.

NIST AI RMF requires documentation of system functionality and trustworthiness throughout the full lifecycle. EU AI Act Article 11 makes this a legal requirement for high-risk systems — with a 10-year retention obligation. The record needs to outlast the development team that built it.

The Core Documents

Model Cards

A model card is a standardized disclosure document for a machine learning model. First proposed by Mitchell et al. (2019) and widely adopted in the ML community, a model card provides a concise, structured summary of what the model does, how it was built, how it performs, and where it should and should not be used. The model card concept maps closely to what EU AI Act Article 13 requires providers to include in instructions for use.

Section	Contents and Why It Matters
Model details	Name, version, model type, developers, release date, license. Establishes provenance and accountability. Required by EU AI Act Annex IV technical documentation.
Intended use	Primary use cases, intended user types, and explicitly out-of-scope uses. NIST AI RMF MAP 2.2 requires documentation of AI system knowledge limits and intended use sufficient to inform subsequent decision-making. Out-of-scope uses are as important as intended uses — they define the boundary of the organization’s accountability.
Training data	Data sources, dataset size, preprocessing steps, known limitations and gaps. EU AI Act Article 10 requires training data to be relevant, well representative, and assessed for possible biases. The model card is where those assessments are disclosed.
Evaluation data	Test datasets and evaluation methodology. Which population did the test data represent? What was excluded? These choices determine what the reported performance metrics actually measure.
Performance metrics	Accuracy, precision, recall, and other relevant metrics — disaggregated by relevant subgroups. NIST MEASURE 2.11 requires that fairness and bias evaluations be documented with results. Aggregate metrics without disaggregation can hide systematic disparities in performance across demographic groups.
Ethical considerations	Identified risks, sensitive use cases, fairness assessments, and known potential for harm. Not a disclaimer — a risk disclosure that informs deployers’ own risk management and oversight decisions.
Limitations	Known failure modes, contexts where performance degrades, population groups underrepresented in training or evaluation data. EU AI Act Article 13(3)(b)(iii) requires disclosure of known or foreseeable circumstances that may lead to risks to health, safety, or fundamental rights.
Recommendations	Practical guidance for deployers and users on appropriate use, oversight requirements, and situations requiring extra scrutiny or override.

A model card that covers these sections serves three simultaneous functions: it is a technical reference for the development and operations teams, a compliance document for regulators and auditors, and a transparency disclosure for downstream deployers who need to understand what they are integrating into their own systems.

Datasheets for Datasets

A datasheet documents the dataset used to train or evaluate an AI system. The concept was formalised by Gebru et al. (2021), cited directly by NIST AI RMF, and reflects the recognition that data quality decisions made during collection and preprocessing directly shape model behavior in ways that are invisible in the trained system unless documented at source.

EU AI Act Article 10 requires detailed data governance documentation for high-risk AI systems: the design choices made in selecting the dataset, the collection processes and origin of data, the preprocessing operations applied, the assumptions made about what the data represents, the assessment of availability and suitability, the examination for possible biases that may affect health, safety, or fundamental rights, and the identification of data gaps. A datasheet is the structured format for capturing all of this.

Section	Contents and Why It Matters
Motivation	Why was this dataset created? Who created it, and who funded the creation? EU AI Act Article 10(2)(b) requires documentation of the original purpose of data collection — because repurposing data for AI training raises separate consent and appropriateness questions.
Composition	What is in the dataset? How many instances? What do instances represent? Are there sensitive attributes, demographic identifiers, or proxy variables present? Composition determines what the model can learn and what biases it may absorb.
Collection process	How was the data collected, over what timeframe, and by whom? Was informed consent obtained where required? Gaps in collection process documentation are gaps in the organization’s ability to defend training data decisions.
Preprocessing	What cleaning, filtering, labeling, or augmentation was done? Each step introduces choices that affect what the model learns. NIST MAP 2.3 requires documentation of data preparation operations for scientific integrity and reproducibility.
Uses	What tasks is the dataset suitable for? What is it explicitly not suitable for? What populations or contexts was it not designed to represent? The out-of-scope uses section prevents the dataset from being reused inappropriately in future projects.
Distribution	How is the dataset shared? Under what license? Are there access restrictions? Who has access to sensitive or personal data within the dataset?
Maintenance	Who maintains the dataset? How can errors be reported? Will it be updated, and if so, on what cadence? Who is accountable if the dataset is found to contain errors or inappropriately sourced data after deployment?

Datasheets matter beyond the immediate project. A well-documented dataset can be safely reused in future systems; an undocumented dataset creates a compounding liability every time it is reused, because no one can assess whether its limitations are appropriate for the new use case.

System Cards

A system card documents the complete deployed AI system, not just the underlying model. Where a model card describes what the model can do in isolation, a system card describes how the model is actually used: the deployment context, the other components the model is integrated with, the human oversight mechanisms in place, how the system is monitored, and how incidents are handled.

System cards are particularly important for complex deployments where a foundation model or third-party model is one component among many. NIST AI RMF MAP 3.1 requires documentation of the full socio-technical system context, not only the model. EU AI Act Article 11, read in conjunction with Annex IV, requires technical documentation that covers the system as deployed, not only the model as trained.

Section	Contents and Why It Matters
System overview	What the system does end-to-end, how it is deployed, who uses it, and what decisions or outputs it produces. This is the entry point for regulators, auditors, and new team members who need to understand the system without specialist knowledge.
Components	All models involved, data pipelines, pre- and post-processing logic, integration points with other systems, and human oversight mechanisms. Agentic AI systems with multiple interacting models require particular care: accountability must be traceable through the full chain of automated steps.
Intended use and deployment context	Approved use cases, deployment contexts, user types, and any geographic, demographic, or operational constraints. EU AI Act Annex IV requires documentation of the intended purpose as specified by the provider and the deployment context as implemented by the deployer.
Risk assessment summary	Identified risks, mitigations applied, and residual risks acknowledged. Should reference the full AI impact assessment or risk register, not replace it. Regulators reviewing the system card need to see that risk management was performed, not only that deployment was authorized.
Testing and evaluation	How the system as a whole was validated, including integration testing, adversarial testing, and red-team exercises. Model-level test results do not fully characterize system-level behavior; system cards document the full TEVV scope.
Human oversight	How humans monitor the system, what override mechanisms exist, who is assigned oversight, and what the escalation path is. EU AI Act Article 27(1)(e) requires that the Fundamental Rights Impact Assessment include a description of the implementation of human oversight measures.
Incident reporting	How to report problems, how incidents are triaged and investigated, and how outcomes feed back into system documentation. EU AI Act Article 72 requires providers to establish and document a post-market monitoring system.

Regulatory Requirements

EU AI Act: Articles 11, 13, 18, and Annex IV

EU AI Act Article 11(1) states that technical documentation shall be drawn up before the high-risk AI system is placed on the market or put into service, and shall be kept up-to-date. The timing requirement is unambiguous: documentation is a pre-deployment gate, not a post-deployment deliverable.

Annex IV specifies the minimum content for technical documentation: a general description of the AI system and its intended purpose; design specifications and system elements; information on training methodology and datasets; metrics used to measure accuracy, reliability, and cybersecurity; risk management measures applied; human oversight provisions; and the post-market monitoring plan. Each Annex IV element is a documentation deliverable that must be assigned, resourced, and verified before go-live.

Article 18 establishes the retention obligation: providers must keep technical documentation at the disposal of national competent authorities for 10 years after the high-risk AI system has been placed on the market or put into service. The 10-year obligation means that documentation governance must outlast the project, the team, and in many cases the product.

GPAI model providers face parallel obligations under Annex XI, which requires technical documentation covering model architecture, training methodology, training data (including curation methods and bias detection measures), computational resources used, and known or estimated energy consumption. For organizations building on foundation models, this documentation must be available from the GPAI provider before integration — Article 53 requires GPAI providers to supply technical documentation to downstream providers so they can meet their own compliance obligations.

NIST AI RMF: MAP 2.2, MEASURE 2.9, MEASURE 2.11

NIST AI RMF MAP 2.2 requires that information about the AI system’s knowledge limits and how system output may be used and overseen by humans is documented, with sufficient information to assist relevant AI actors in making informed decisions and taking subsequent actions. This is the specification for model card content in NIST terms: documentation must be useful for the people making decisions with the system, not only for technical reviewers.

NIST AI RMF MEASURE 2.9 requires that the AI model be explained, validated, and documented, and that AI system output be interpreted within its context to inform responsible use and governance. MEASURE 2.11 requires that fairness and bias evaluations be documented with results — disaggregated performance reporting is an explicit NIST requirement, not an optional best practice.

NIST GOVERN 1.4 establishes the organizational-level expectation: policies and processes regarding public disclosure of AI use and risk management material — including model documentation and validation and testing results — should be established and regularly reviewed. Documentation is not a project-level artifact; it is an organizational governance commitment with defined disclosure expectations.

A Note on Sector-Specific Requirements

Financial services organizations using AI for credit decisions, anti-money-laundering, or fraud detection may face additional model risk management requirements that overlap with model card and system card obligations. Healthcare organizations deploying AI-enabled clinical decision tools face documentation requirements from medical device regulations that layer on top of EU AI Act requirements. PMs in regulated sectors should confirm applicable sector requirements with their compliance function and ensure AI documentation standards satisfy both.

The PM’s Role

You will not write these documents yourself. But you are responsible for ensuring they exist, are complete, reflect the current state of the system, and are owned by named individuals who will maintain them. Documentation that exists but is outdated, inaccessible, or unsupported by anyone is not a governance asset — it is a liability.

During Planning

Include all three document types in the project scope. Model cards, datasheets, and system cards are project deliverables with defined content requirements. EU AI Act Article 11 requires that technical documentation exist before deployment — which means development timelines must budget for it.
Assign ownership before development begins. Who is responsible for the model card? The datasheet? The system card? Who reviews and approves each? Vacant ownership is the most common cause of outdated AI documentation.
Allocate time explicitly. Building a complete model card for a complex system is not a one-afternoon task. Time for documentation must appear in the project schedule, not be absorbed from contingency.
Establish the 10-year retention plan at the start. Who will maintain documentation access if the system is retired? If the organization restructures? If the original team moves on? These questions are easier to answer before the project closes than after.

During Development

Document data decisions at the time they are made. The dataset selection rationale, the preprocessing choices, the bias assessment results — these can only be accurately documented while the decisions are fresh. A datasheet written after training, from memory, is reconstructed and unreliable.
Document model decisions at the time training is performed. Hyperparameter choices, training methodology, evaluation design — document as you go. NIST MAP 2.3 requires scientific integrity documentation that is structurally impossible to produce retroactively.
Use established templates and adapt them to context. Model Cards (Mitchell et al.), Datasheets for Datasets (Gebru et al.), and IBM FactSheets are well-established starting points. Standardised templates ensure completeness within your organization.
Review iteratively. A model card written for an early prototype and not updated through fine-tuning, evaluation, and deployment iteration is not current documentation — it is a snapshot that misrepresents the delivered system.

At Deployment and Post-Deployment

Verify completeness against Annex IV for high-risk systems. Before go-live, confirm that every Annex IV element is addressed. If any element is absent, document why.
Establish operations handoff. Operations and monitoring teams need access to current documentation. A model card locked in a development repo that operations cannot access is not operational documentation.
Define and document update triggers. Model retraining, significant fine-tuning, changes to input data sources, changes to the deployment context, post-incident findings — each should trigger a documented review. EU AI Act Article 11(1) requires that documentation be kept up-to-date.
Update documentation when incidents occur. Post-incident findings often reveal limitations that were not documented or risks that were not anticipated. Documentation that does not incorporate operational experience is incomplete.
Maintain the 10-year clock. For high-risk AI systems, track the retention deadline for each system’s technical documentation. If systems are retired, ensure documentation access is preserved per Article 18.

Common Pitfalls

Pitfall	Consequence and Correction
Documentation as afterthought — written after training, testing, or deployment	Reconstructed documentation cannot capture why decisions were made. For EU AI Act Article 10 data governance compliance and NIST MAP 2.3 scientific integrity requirements, retroactive documentation is structurally insufficient. Correction: build documentation milestones into the project schedule at the point when decisions are made.
Technical documentation only — written for data scientists, inaccessible to others	EU AI Act Article 13 requires instructions for use that are comprehensible to deployers, not only to providers’ technical teams. A model card written in ML notation that a compliance officer or oversight person cannot interpret does not satisfy the transparency requirement. Correction: model cards must be readable by the intended audience, which includes non-technical stakeholders.
Static documents — accurate at training, not updated through deployment	EU AI Act Article 11(1) requires documentation to be kept up-to-date. A model card that reflects v1.0 of a system that has since been retrained twice is not current documentation. Correction: define update triggers, assign update ownership, and treat documentation currency as an ongoing operational requirement.
Missing the ‘why’ — documents describe what was done but not why	Audit and compliance investigations are primarily interested in decision rationale, not decision outcomes. A datasheet that says “dataset was filtered for quality” without explaining what criteria were applied, by whom, and why, provides no governance value. Correction: for every significant data and model decision, document the rationale explicitly.
Incomplete performance reporting — aggregate metrics only, no disaggregation	Aggregate accuracy metrics hide systematic disparities in performance across demographic groups. NIST MEASURE 2.11 requires that fairness and bias evaluations be documented with results. EU AI Act Article 13 requires performance information regarding specific persons or groups. Correction: require disaggregated performance reporting as a model card acceptance criterion, not an optional enhancement.
No retention plan — documentation inaccessible or deleted after project closes	EU AI Act Article 18 requires 10-year retention for high-risk AI system technical documentation. If documentation is stored only in a project repository that is decommissioned at project closure, the retention obligation is unmet. Correction: designate a long-term documentation repository and confirm access will persist regardless of project, team, or system lifecycle changes.

Right-Sizing for Your Situation

Documentation depth should match system risk and deployment stakes. A proof-of-concept internal tool does not require the same documentation as a production system making consequential decisions about individuals. But documentation should begin early regardless of scale — the decisions made in early stages are often the hardest to reconstruct.

Greenfield — Starting Out

The two decisions that matter most at this level: document while you go (not after), and assign an owner for each of the three document types before development starts. For lower-risk systems, a simplified model card covering intended use, training data summary, performance metrics with disaggregation, and known limitations is the minimum defensible baseline. If you’re in scope for EU AI Act high-risk requirements, work through the Annex IV checklist explicitly before go-live and document any elements you’re handling via simplified format.

Emerging — Building Repeatability

Standardize templates across projects so documentation completeness is structural rather than dependent on individual initiative. The highest-value standardization is the update trigger list — define what changes to a model or deployment require documentation updates so teams don’t make that judgment call ad hoc. Connect documentation review to your existing change control process: if a change goes through change control, documentation review should be a mandatory step, not an optional one.

Established — Mature Programs

At this level the documentation is live evidence in regulatory compliance programs. Annex IV completeness should be a formal go-live gate, not a checklist item. The 10-year retention obligation needs to be managed as an organizational asset, not a project deliverable — which means documentation governance must be integrated into your records management system with clear accountability for access preservation when systems are retired, teams change, or the organization restructures. For organizations building on GPAI foundation models, Article 53 downstream documentation requirements need to be part of vendor due diligence, not an afterthought.

The AI Governance Advisor can help you structure AI documentation requirements for your specific system type, deployment context, and regulatory obligations — and generate a Model Card pre-populated for your model architecture and use case.

Free Template — AI Model Card

AIPMO’s AI Model Card template is a structured, fillable PDF covering all eight sections described in this article — model details, intended use, training data, evaluation data, performance metrics (with disaggregation guidance), ethical considerations, limitations, and recommendations. It is mapped to EU AI Act Article 13 instructions-for-use requirements and NIST MEASURE 2.9 and 2.11. Download free and adapt to your system, or use the AI Governance Advisor to generate a version pre-populated for your model type, use case, and regulatory context.

Get the free template → AI-customize it →

Framework References

EU AI Act (Regulation (EU) 2024/1689) — Article 11 (technical documentation required before market placement; must be kept up to date), Article 13 (transparency and instructions for use including performance limitations disaggregated by group), Article 18 (10-year documentation retention obligations), Annex IV (minimum technical documentation content), Annex XI (GPAI model technical documentation requirements).

NIST AI Risk Management Framework 1.0 (NIST AI 100-1, 2023) — MAP 2.2 (documentation of AI system knowledge limits and intended use sufficient to inform subsequent decision-making), MAP 2.3 (scientific integrity documentation including experimental design and data selection rationale), MEASURE 2.9 (model explanation, validation, and documentation), MEASURE 2.11 (fairness and bias evaluation documentation with disaggregated results), GOVERN 1.4 (organizational policies for public disclosure of AI documentation and testing results).

NIST AI 600-1 GenAI Profile (2024) — Documentation requirements for GenAI systems including provenance, limitations, and safety testing results. Extends core AI RMF documentation requirements for foundation model deployments.

PMI CPMAI Guide (2025) — Documentation as a phase gate requirement across the lifecycle; model cards and system cards as formal project deliverables with defined completion criteria.

This article is part of AIPMO’s PM Practice series. See also: The AI Project Charter | AI Risk Registers | AI Impact Assessments | The PM’s Guide to NIST AI RMF

To err is AI; to govern, human.

AIPMO.co · AI Governance, PM-first

Model Cards and Datasheets: Documentation That Matters

Why AI Documentation Is Different

The Core Documents

Model Cards

Datasheets for Datasets

System Cards

Regulatory Requirements

EU AI Act: Articles 11, 13, 18, and Annex IV

NIST AI RMF: MAP 2.2, MEASURE 2.9, MEASURE 2.11

A Note on Sector-Specific Requirements

The PM’s Role

During Planning

During Development

At Deployment and Post-Deployment

Common Pitfalls

Right-Sizing for Your Situation

Framework References

AIPMO

More in PM Practice

The AI Project Charter for Agile Teams: Governance that Enables Agility, Not Bureaucracy

Change Management for AI Projects: Preparing People for a New Way of Working

Third-Party AI and Vendor Management: Risks You Don't Control

Monitoring AI Systems in Production: The Work After Go-Live

More from AIPMO

NAIC AI Bulletin Adoption: Q2 2026 State-by-State Status

The Banking Sector Got Mythos First. Here's What That Means for Its PMs.

The Mythos Signal: Why a Model You Can't Use Should Change Your AI Governance

The AI Project Charter for Agile Teams: Governance that Enables Agility, Not Bureaucracy