Skip to content

NEW - Model Cards and Datasheets: Documentation That Matters

AI projects need documentation that doesn't exist in conventional IT: model cards, datasheets, and system cards. Under the EU AI Act, technical documentation must exist before deployment — and be retained for 10 years. Here's what each document must contain and why.

By AIPMO
Published: · 17 min read

 

PM Takeaways

       EU AI Act Article 11 requires that technical documentation for high-risk AI systems be drawn up before the system is placed on the market or put into service — not after. Retroactive documentation cannot capture why data and design decisions were made. The window to document those decisions is during development, not during a compliance review.

       EU AI Act Article 18 requires providers to retain technical documentation for 10 years after a high-risk AI system is placed on the market or put into service. This is a long-tail PM obligation: documentation maintenance must be resourced as an ongoing operational cost, not treated as a project deliverable that closes at go-live.

       NIST AI RMF MEASURE 2.9 requires that the AI model be explained, validated, and documented, and that system output be interpreted within context to inform responsible use. MEASURE 2.11 requires that fairness and bias evaluations be documented with results. Both require disaggregated performance reporting — aggregate accuracy metrics are not sufficient.

       A model card is not just a technical artifact for the data science team — EU AI Act Article 13 requires that instructions for use include characteristics, capabilities, and limitations of performance, including performance regarding specific persons or groups and circumstances that may affect accuracy or robustness. This is model card content, and it must be accessible to non-technical deployers.

       Documentation is only governance-valuable if it is kept current. EU AI Act Article 11(1) states that technical documentation ‘shall be kept up-to-date.’ A model card that reflects the system at initial training but not after subsequent updates or distribution shifts is a compliance gap, not a compliance asset.

Traditional software projects produce requirements documents, design specifications, and user manuals. AI projects need those too — plus documentation that does not exist in conventional IT: model cards, datasheets for datasets, and system cards.

These are not technical artifacts for the data science team to file and forget. They are governance documents — the evidentiary record that demonstrates how the system was built, what it was built on, where it performs as intended, and where it may not. They support regulatory compliance, enable meaningful human oversight, protect the organisation against claims of unexplained harm, and make it possible to audit decisions after the fact. As PM, your job is to ensure they exist, are current, and are owned. 

Why AI Documentation Is Different

Traditional software documentation answers: What does the system do? How do you use it? AI documentation must answer those questions too — and several others that have no equivalent in conventional software: How was it built? What data trained it? Where does it perform well? Where might it fail? For whom might performance be different? Who might be harmed?

This shift reflects a fundamental difference in how AI system behaviour is produced. Traditional software behaviour is determined by code you can inspect line by line. AI system behaviour emerges from data and training processes that, once complete, leave no human-readable record of the decisions that shaped them. If you do not document those decisions during development — why this training dataset, why this preprocessing choice, what the model was tested against and what it was not — they cannot be reconstructed after the fact. Documentation created during development may be the only way to understand, explain, and defend how the system works.

NIST AI RMF frames this explicitly: AI risk management requires documenting aspects of systems’ functionality and trustworthiness across the full lifecycle, with testing before deployment and regular documentation while in operation. EU AI Act Article 11 makes it a legal requirement for high-risk systems, with a 10-year retention obligation that ensures the record persists long after the original development team has moved on. 

The Core Documents

Model Cards

A model card is a standardised disclosure document for a machine learning model. First proposed by Mitchell et al. (2019) and widely adopted in the ML community, a model card provides a concise, structured summary of what the model does, how it was built, how it performs, and where it should and should not be used.

The model card concept maps closely to what EU AI Act Article 13 requires providers to include in instructions for use: characteristics, capabilities, and limitations of performance; accuracy metrics and known circumstances that may affect them; risks to health, safety, or fundamental rights under foreseeable misuse; performance disaggregated by specific persons or groups where relevant; and information enabling deployers to interpret the system’s output appropriately. The model card is the practical vehicle for delivering these requirements.

Section

Contents and Why It Matters

Model details

Name, version, model type, developers, release date, licence. Establishes provenance and accountability. Required by EU AI Act Annex IV technical documentation.

Intended use

Primary use cases, intended user types, and explicitly out-of-scope uses. NIST AI RMF MAP 2.2 requires that documentation of AI system knowledge limits and intended use be sufficient to inform subsequent decision-making. Out-of-scope uses are as important as intended uses — they define the boundary of the organisation’s accountability.

Training data

Data sources, dataset size, preprocessing steps, known limitations and gaps. EU AI Act Article 10 requires training data to be relevant, sufficiently representative, and assessed for possible biases. The model card is where those assessments are disclosed.

Evaluation data

Test datasets and evaluation methodology. Which population did the test data represent? What was excluded? These choices determine what the reported performance metrics actually measure.

Performance metrics

Accuracy, precision, recall, and other relevant metrics — disaggregated by relevant subgroups. NIST MEASURE 2.11 requires that fairness and bias evaluations be documented with results. Aggregate metrics without disaggregation can hide systematic disparities in performance across demographic groups.

Ethical considerations

Identified risks, sensitive use cases, fairness assessments, and known potential for harm. Not a disclaimer — a risk disclosure that informs deployers’ own risk management and oversight decisions.

Limitations

Known failure modes, contexts where performance degrades, population groups underrepresented in training or evaluation data. EU AI Act Article 13(3)(b)(iii) requires disclosure of known or foreseeable circumstances that may lead to risks to health, safety, or fundamental rights.

Recommendations

Practical guidance for deployers and users on appropriate use, oversight requirements, and situations requiring extra scrutiny or override.

A model card that covers these sections serves three simultaneous functions: it is a technical reference for the development and operations teams, a compliance document for regulators and auditors, and a transparency disclosure for downstream deployers who need to understand what they are integrating into their own systems.

Datasheets for Datasets

A datasheet documents the dataset used to train or evaluate an AI system. The concept was formalised by Gebru et al. (2021), cited directly by NIST AI RMF, and reflects the recognition that data quality decisions made during collection and preprocessing directly shape model behaviour in ways that are invisible in the trained system unless documented at source.

EU AI Act Article 10 requires detailed data governance documentation for high-risk AI systems: the design choices made in selecting the dataset, the collection processes and origin of data, the preprocessing operations applied, the assumptions made about what the data represents, the assessment of availability and suitability, the examination for possible biases that may affect health, safety, or fundamental rights, and the identification of data gaps. A datasheet is the structured format for capturing and communicating all of this.

Section

Contents and Why It Matters

Motivation

Why was this dataset created? Who created it, and who funded the creation? What purpose was it originally collected for? EU AI Act Article 10(2)(b) requires documentation of the original purpose of data collection — because repurposing data for AI training raises separate consent and appropriateness questions.

Composition

What is in the dataset? How many instances? What do instances represent? Are there sensitive attributes, demographic identifiers, or proxy variables present? Composition determines what the model can learn and what biases it may absorb.

Collection process

How was the data collected, over what timeframe, and by whom? Was informed consent obtained where required? Were collection methods standardised? Gaps in collection process documentation are gaps in the organisation’s ability to defend training data decisions.

Preprocessing

What cleaning, filtering, labelling, or augmentation was done? Each step introduces choices that affect what the model learns. NIST MAP 2.3 requires documentation of data preparation operations for scientific integrity and reproducibility.

Uses

What tasks is the dataset suitable for? What is it explicitly not suitable for? What populations or contexts was it not designed to represent? Out-of-scope uses section prevents the dataset from being reused inappropriately in future projects.

Distribution

How is the dataset shared? Under what licence? Are there access restrictions? Who has access to sensitive or personal data within the dataset?

Maintenance

Who maintains the dataset? How can errors be reported? Will it be updated, and if so, on what cadence? Who is accountable if the dataset is found to contain errors or inappropriately sourced data after deployment?

Datasheets matter beyond the immediate project. A well-documented dataset can be safely reused in future systems; an undocumented dataset creates a compounding liability every time it is reused, because no one can assess whether its limitations are appropriate for the new use case.

System Cards

A system card documents the complete deployed AI system, not just the underlying model. Where a model card describes what the model can do in isolation, a system card describes how the model is actually used: the deployment context, the other components the model is integrated with, the human oversight mechanisms in place, how the system is monitored, and how incidents are handled.

System cards are particularly important for complex deployments where a foundation model or third-party model is one component among many, and where the deployment organisation has added additional layers — pre-processing, post-processing, routing, human review — that shape what the system actually does. NIST AI RMF MAP 3.1 requires documentation of the full socio-technical system context, not only the model. EU AI Act Article 11, read in conjunction with Annex IV, requires technical documentation that covers the system as deployed, not only the model as trained.

Section

Contents and Why It Matters

System overview

What the system does end-to-end, how it is deployed, who uses it, and what decisions or outputs it produces. This is the entry point for regulators, auditors, and new team members who need to understand the system without specialist knowledge.

Components

All models involved, data pipelines, pre- and post-processing logic, integration points with other systems, and human oversight mechanisms. Agentic AI systems with multiple interacting models require particular care here: accountability must be traceable through the full chain of automated steps.

Intended use and deployment context

Approved use cases, deployment contexts, user types, and any geographic, demographic, or operational constraints on deployment. EU AI Act Annex IV requires documentation of the intended purpose as specified by the provider and the deployment context as implemented by the deployer.

Risk assessment summary

Identified risks, mitigations applied, and residual risks acknowledged. Should reference the full AI impact assessment or risk register, not replace it. Regulators reviewing the system card need to see that risk management was performed, not only that deployment was authorised.

Testing and evaluation

How the system as a whole was validated, including integration testing, adversarial testing, and red-team exercises. Model-level test results do not fully characterise system-level behaviour; system cards document the full TEVV scope.

Human oversight

How humans monitor the system, what override mechanisms exist, who is assigned oversight, and what the escalation path is. EU AI Act Article 27(1)(e) requires that the Fundamental Rights Impact Assessment include a description of the implementation of human oversight measures — the system card is where that description lives operationally.

Incident reporting

How to report problems, how incidents are triaged and investigated, and how outcomes feed back into system documentation. EU AI Act Article 72 requires providers to establish and document a post-market monitoring system. The system card is the operational interface to that requirement.

 

Regulatory Requirements

Documentation is not optional for many AI systems. For high-risk AI systems under the EU AI Act, documentation is mandatory, must exist before deployment, and must be maintained for a decade. For systems assessed and built using NIST AI RMF, documentation is the primary mechanism through which governance commitments are operationalised and made auditable.

EU AI Act: Articles 11, 13, 18, and Annex IV

EU AI Act Article 11(1) states that technical documentation shall be drawn up before the high-risk AI system is placed on the market or put into service, and shall be kept up-to-date. The timing requirement is unambiguous and has direct implications for project planning: documentation is a pre-deployment gate, not a post-deployment deliverable.

Annex IV specifies the minimum content for technical documentation. It includes: a general description of the AI system and its intended purpose; a description of design specifications and the elements of the system; information on the training methodology and datasets; the metrics used to measure accuracy, robustness, and cybersecurity; the risk management measures applied; the human oversight provisions; and the post-market monitoring plan. Each Annex IV element is a documentation deliverable that must be assigned, resourced, and verified before go-live.

Article 13 requires that instructions for use include performance characteristics and limitations, disaggregated performance data for specific persons or groups where relevant, known risks to health, safety, or fundamental rights, and information enabling deployers to interpret system output. This is effectively a specification for model card content, framed as a disclosure obligation to downstream deployers.

Article 18 establishes the retention obligation: providers must keep technical documentation at the disposal of national competent authorities for 10 years after the high-risk AI system has been placed on the market or put into service. For organisations that operate high-risk AI systems and then change ownership, restructure, or cease activity, Article 18(2) requires Member States to define how documentation remains accessible. The 10-year obligation means that documentation governance must outlast the project, the team, and in many cases the product.

GPAI model providers face parallel obligations under Annex XI, which requires technical documentation covering the model architecture, training methodology, training data (including curation methods and bias detection measures), computational resources used, and known or estimated energy consumption. For organisations building on foundation models, this documentation must be available from the GPAI provider before integration — Article 53 requires GPAI providers to supply technical documentation to downstream providers so they can meet their own compliance obligations.

NIST AI RMF: MAP 2.2, MEASURE 2.9, MEASURE 2.11

NIST AI RMF MAP 2.2 requires that information about the AI system’s knowledge limits and how system output may be utilised and overseen by humans is documented, with sufficient information to assist relevant AI actors in making informed decisions and taking subsequent actions. This is the specification for model card content in NIST terms: documentation must be useful for the people making decisions with the system, not only for technical reviewers.

NIST AI RMF MEASURE 2.9 requires that the AI model be explained, validated, and documented, and that AI system output be interpreted within its context to inform responsible use and governance. NIST references model cards and datasheets directly as transparency tools that organisations should document and review. MEASURE 2.11 requires that fairness and bias evaluations be documented with results — disaggregated performance reporting is an explicit NIST requirement, not an optional best practice.

NIST GOVERN 1.4 establishes the organisational-level expectation: policies and processes regarding public disclosure of AI use and risk management material — including model documentation and validation and testing results — should be established and regularly reviewed. Documentation is not a project-level artefact; it is an organisational governance commitment with defined disclosure expectations.

A Note on Sector-Specific Requirements

Financial services organisations using AI for credit decisions, anti-money-laundering, or fraud detection may face additional model risk management requirements that overlap with model card and system card obligations — SR 11-7 guidance in the US, for example, establishes model documentation requirements that predate the AI governance frameworks but address similar transparency concerns. Healthcare organisations deploying AI-enabled clinical decision tools face documentation requirements from medical device regulations that layer on top of EU AI Act requirements for high-risk AI. PMs in regulated sectors should confirm applicable sector requirements with their compliance function and ensure AI documentation standards satisfy both. 

The PM’s Role

You will not write these documents yourself. But you are responsible for ensuring they exist, are complete, reflect the current state of the system, and are owned by named individuals who will maintain them. Documentation that exists but is outdated, inaccessible, or unsupported by anyone is not a governance asset — it is a liability that will be found wanting at exactly the wrong moment.

During Planning

•       Include all three document types in the project scope. Model cards, datasheets, and system cards are project deliverables with defined content requirements. They are not afterthoughts. EU AI Act Article 11 requires that technical documentation exist before deployment — which means development timelines must budget for it.

•       Assign ownership before development begins. Who is responsible for the model card? The datasheet? The system card? Who reviews and approves each? Who updates them when the system changes? These are named-individual questions, not role-category questions. Vacant ownership is the most common cause of outdated AI documentation.

•       Allocate time explicitly. Documentation takes effort. Building a complete model card for a complex system is not a one-afternoon task. Time for documentation must appear in the project schedule, not be absorbed from contingency or delivered at the cost of quality.

•       Establish the 10-year retention plan at the start. EU AI Act Article 18 requires 10-year retention. Who will maintain documentation access if the system is retired? If the organisation restructures? If the original team moves on? These questions are easier to answer before the project closes than after.

During Development

•       Document data decisions at the time they are made. The dataset selection rationale, the preprocessing choices, the bias assessment results — these can only be accurately documented while the decisions are fresh. A datasheet written after training, from memory, is reconstructed and unreliable. EU AI Act Article 10 documentation must reflect actual governance, not a post-hoc narrative.

•       Document model decisions at the time training is performed. Hyperparameter choices, training methodology, evaluation design — document as you go. NIST MAP 2.3 requires scientific integrity documentation, including experimental design and data selection rationale, which is structurally impossible to produce retroactively.

•       Use established templates and adapt them to context. Model Cards (Mitchell et al.), Datasheets for Datasets (Gebru et al.), and IBM FactSheets are well-established starting points. Standardised templates ensure completeness within your organisation. Adapt rather than invent.

•       Review iteratively. Documentation should evolve with the system. A model card written for an early prototype and not updated through fine-tuning, evaluation, and deployment iteration is not current documentation — it is a snapshot that misrepresents the delivered system.

At Deployment

•       Verify completeness against Annex IV for high-risk systems. Before go-live, confirm that every Annex IV element is addressed. If any element is absent, document why — for SMEs and start-ups the Act permits a simplified format, but the decision to use it must be explicit, not accidental.

•       Establish operations handoff. Operations and monitoring teams need access to current documentation to perform their functions. A model card locked in a development repo that operations cannot access is not operational documentation.

•       Define and document update triggers. What system changes require documentation updates? Model retraining, significant fine-tuning, changes to input data sources, changes to the deployment context, post-incident findings — each should trigger a documented review. EU AI Act Article 11(1) requires that documentation be kept up-to-date; the update triggers are the mechanism.

Post-Deployment

•       Schedule regular documentation reviews. EU AI Act Article 72 requires providers to actively and systematically collect and document data on performance of high-risk AI systems throughout their lifetime. Documentation review must be a scheduled operational activity, not an ad hoc response to incidents.

•       Update documentation when incidents occur. Post-incident findings often reveal limitations that were not documented or risks that were not anticipated. The system card and model card should be updated to reflect what was learned. Documentation that does not incorporate operational experience is incomplete.

•       Maintain the 10-year clock. For high-risk AI systems, track the retention deadline for each system’s technical documentation. If systems are retired, ensure documentation access is preserved per Article 18. If the organisation changes, ensure documentation obligations transfer with accountability. 

Common Pitfalls

These failure patterns recur across AI projects and carry concrete governance consequences.

Pitfall

Consequence and Correction

Documentation as afterthought — written after training, after testing, after deployment

Reconstructed documentation cannot capture why decisions were made. It can describe what was done; it cannot reliably explain the reasoning. For EU AI Act Article 10 data governance compliance and NIST MAP 2.3 scientific integrity requirements, retroactive documentation is structurally insufficient. Correction: build documentation milestones into the project schedule at the point when decisions are made.

Technical documentation only — written for data scientists, inaccessible to others

EU AI Act Article 13 requires instructions for use that are comprehensible to deployers, not only to providers’ technical teams. A model card written in ML notation that a compliance officer or an oversight person cannot interpret does not satisfy the transparency requirement. Correction: model cards must be readable by the intended audience, which includes non-technical stakeholders.

Static documents — accurate at training, not updated through deployment

EU AI Act Article 11(1) requires documentation to be kept up-to-date. A model card that reflects v1.0 of a system that has since been retrained twice is not current documentation. Correction: define update triggers, assign update ownership, and treat documentation currency as an ongoing operational requirement.

Missing the ‘why’ — documents describe what was done but not why

Audit and compliance investigations are primarily interested in decision rationale, not decision outcomes. A datasheet that says ‘dataset was filtered for quality’ without explaining what criteria were applied, by whom, and why those criteria were selected, provides no governance value. Correction: for every significant data and model decision, document the rationale explicitly.

Incomplete performance reporting — aggregate metrics only, no disaggregation

Aggregate accuracy metrics hide systematic disparities in performance across demographic groups. NIST MEASURE 2.11 requires that fairness and bias evaluations be documented with results, which requires disaggregated reporting. EU AI Act Article 13 requires performance information regarding specific persons or groups. Correction: require disaggregated performance reporting as a model card acceptance criterion, not an optional enhancement.

No retention plan — documentation inaccessible or deleted after project closes

EU AI Act Article 18 requires 10-year retention for high-risk AI system technical documentation. If documentation is stored only in a project repository that is decommissioned at project closure, the retention obligation is unmet. Correction: designate a long-term documentation repository and confirm access will persist regardless of project, team, or system lifecycle changes.

 

Right-Sizing for Your Situation

Documentation depth should match system risk and deployment stakes. A proof-of-concept internal tool does not require the same documentation as a production system making consequential decisions about individuals. But documentation should begin early regardless of scale — the decisions made in early stages are often the hardest to reconstruct.

Greenfield — AI Documentation Playbook

For PMs without formal AI documentation standards. Simplified model card and datasheet templates that cover the essentials — including the Annex IV minimum elements for high-risk systems and the NIST MAP 2.2 and MEASURE 2.11 requirements. Includes guidance on how to document data decisions as they are made rather than after the fact.

Emerging — AI Documentation Playbook

For PMs building repeatable documentation processes. Full templates for model cards, datasheets, and system cards with section-by-section guidance, review workflows, and update trigger definitions. Includes the EU AI Act Article 13 instructions-for-use specification mapped to model card structure.

Established — AI Documentation Playbook

For PMs in organisations with formal governance. How to integrate AI documentation into existing documentation management and compliance systems — including EU AI Act Article 18 10-year retention planning, GPAI Annex XI downstream documentation requirements for organisations building on foundation models, and portfolio-level documentation governance for organisations running multiple high-risk AI systems.

Become a member →

 

Framework References

•       EU AI Act (Official Journal, 12 July 2024) — Article 11(1) (technical documentation must be drawn up before a high-risk AI system is placed on the market or put into service, and must be kept up-to-date; must contain at minimum the elements in Annex IV); Article 13 (instructions for use must include characteristics, capabilities, and limitations of performance; performance regarding specific persons or groups where applicable; known or foreseeable risks; information enabling deployers to interpret system outputs); Article 18 (providers must retain technical documentation for 10 years after the high-risk AI system is placed on the market or put into service); Annex IV (minimum technical documentation elements: general description, design specifications, training methodology and datasets, accuracy and robustness metrics, risk management measures, human oversight provisions, post-market monitoring plan); Article 10(2) (data governance documentation requirements: design choices, data origin and original collection purpose, preprocessing operations, bias assessment and mitigation measures, data gap identification); Article 53 and Annex XI (GPAI model providers must provide technical documentation to downstream providers, including training data description, curation methodologies, and known or estimated energy consumption)

•       NIST AI RMF 1.0 (NIST AI 100-1, 2023) — MAP 2.2 (documentation of AI system knowledge limits and intended use must be sufficient for relevant AI actors to make informed decisions; explicitly covers how system output may be utilised and overseen by humans); MAP 2.3 (scientific integrity and TEVV documentation, including experimental design, data collection and selection rationale, and construct validation — structured to support reproducibility and accountability); MEASURE 2.9 (the AI model must be explained, validated, and documented; AI system output must be interpreted within context to inform responsible use and governance); MEASURE 2.11 (fairness and bias evaluations must be documented with results; disaggregated reporting required); GOVERN 1.4 (policies and processes for public disclosure of AI use and risk management material, including model documentation and validation results, should be established and regularly reviewed; references model cards and datasheets as transparency tools)

•       Mitchell et al. (2019) — Model Cards for Model Reporting. Proceedings of the 2019 Conference on Fairness, Accountability, and Transparency (FAT* ’19). Original formulation of the model card concept: structured disclosure for ML model intended use, performance disaggregated by group, and ethical considerations. Referenced directly by NIST AI RMF.

•       Gebru et al. (2021) — Datasheets for Datasets. Communications of the ACM 64(12). Formalised structured dataset documentation covering motivation, composition, collection process, preprocessing, uses, distribution, and maintenance. Referenced directly by NIST AI RMF as a transparency tool.

•       NIST AI 600-1: Generative AI Profile (2024) — Documentation requirements for generative AI systems, including system card structure for complex deployments with multiple model components; transparency documentation for AI-generated content and synthetic data; provenance and watermarking documentation.

 

This article is part of AIPMO’s PM Practice series. See also: The AI Project Charter | AI Risk Registers | AI Impact Assessments