AI Governance & Compliance
As AI systems increasingly make or influence decisions that affect people's lives, governance and compliance have become critical responsibilities for every AI practitioner. This lesson covers the major frameworks, regulations, and practices you need to know.
Why Governance Matters
AI governance is not bureaucracy for its own sake. It protects:
NIST AI Risk Management Framework (AI RMF)
The NIST AI RMF (published January 2023) is the U.S. government's primary framework for managing AI risks. It's voluntary but widely adopted, especially in federal agencies.
Core Functions
The framework has four core functions:
| Function | Purpose | Key Activities |
|---|---|---|
| GOVERN | Establish culture and structure for AI risk management | Policies, roles, training, accountability |
| MAP | Understand the context and risks of an AI system | Identify stakeholders, assess impact, define risk tolerance |
| MEASURE | Assess and quantify identified risks | Bias testing, performance evaluation, red teaming |
| MANAGE | Treat, monitor, and communicate risks | Mitigations, monitoring, incident response, transparency |
GOVERN
MAP
MEASURE
MANAGE
Trustworthy AI Characteristics (NIST)
EU AI Act
The EU AI Act (effective August 2024, enforced from 2025-2026) is the world's first comprehensive AI regulation. It categorizes AI systems by risk level:
Risk Categories
| Risk Level | Description | Requirements | Examples |
|---|---|---|---|
| Unacceptable | Banned outright | Prohibited | Social scoring, real-time facial recognition (with exceptions), manipulative AI |
| High | Significant impact on safety or rights | Conformity assessment, registration, monitoring | Medical devices, credit scoring, hiring, law enforcement, critical infrastructure |
| Limited | Some transparency risks | Transparency obligations | Chatbots (must disclose they're AI), emotion recognition, deepfakes |
| Minimal | Low risk | No specific requirements | Spam filters, AI in video games |
High-Risk System Requirements
1. Risk management system throughout the lifecycle 2. Data governance — training data must be relevant, representative, and error-free 3. Technical documentation — detailed description of the system 4. Record-keeping — automatic logging of system operations 5. Transparency — clear instructions for users 6. Human oversight — ability for humans to override or stop the system 7. Accuracy, robustness, and cybersecurity — appropriate for the risk levelAI Bill of Rights (White House OSTP)
The Blueprint for an AI Bill of Rights (October 2022) outlines five principles:
1. Safe and Effective Systems: You should be protected from unsafe or ineffective systems 2. Algorithmic Discrimination Protections: You should not face discrimination by algorithms 3. Data Privacy: You should be protected from abusive data practices 4. Notice and Explanation: You should know when an AI system is being used and understand how it affects you 5. Human Alternatives: You should be able to opt out and access a human alternative
While not legally binding, these principles guide federal agency AI policies and procurement.
Model Documentation
Model Cards
Model cards (Mitchell et al., 2019) provide structured documentation for trained models:
Model Card: [Model Name]
├── Model Details (name, version, type, date, authors)
├── Intended Use (primary use, out-of-scope uses)
├── Training Data (sources, size, preprocessing)
├── Evaluation Data (sources, selection rationale)
├── Metrics (overall + disaggregated by group)
├── Ethical Considerations (risks, mitigations)
├── Caveats and Recommendations
└── Quantitative Analyses (bias testing results)
Datasheets for Datasets
Datasheets for datasets (Gebru et al., 2018) document the data that trains the models:
| Section | Key Questions |
|---|---|
| Motivation | Why was this dataset created? Who funded it? |
| Composition | What does each instance consist of? How many instances? |
| Collection | How was data collected? Who collected it? Was consent obtained? |
| Preprocessing | What cleaning/labeling was done? |
| Uses | What tasks has this dataset been used for? What shouldn't it be used for? |
| Distribution | How is the dataset distributed? Under what license? |
| Maintenance | Who maintains it? How can errors be reported? |
Audit Trails
Every AI system that makes consequential decisions needs an audit trail — a complete, immutable record of:
Implementation Pattern
1import json
2import hashlib
3from datetime import datetime, timezone
4from dataclasses import dataclass, asdict
5from typing import Dict, Any
6
7@dataclass
8class AuditRecord:
9 """Immutable audit record for an AI decision."""
10 timestamp: str
11 model_name: str
12 model_version: str
13 model_commit: str
14 input_hash: str # Hash of input features (not PII)
15 input_features: Dict[str, Any] # Actual feature values
16 prediction: Any
17 confidence: float
18 explanation: Dict[str, float] # Feature importance/SHAP values
19 action_taken: str
20 human_reviewer: str = ""
21 human_override: bool = False
22 override_reason: str = ""
23
24 def to_json(self) -> str:
25 return json.dumps(asdict(self), indent=2)
26
27 def compute_record_hash(self) -> str:
28 """Tamper-evident hash of the entire record."""
29 record_str = json.dumps(asdict(self), sort_keys=True)
30 return hashlib.sha256(record_str.encode()).hexdigest()
31
32# Example: Log an AI decision
33record = AuditRecord(
34 timestamp=datetime.now(timezone.utc).isoformat(),
35 model_name="benefits-eligibility",
36 model_version="2.1.0",
37 model_commit="a1b2c3d",
38 input_hash=hashlib.sha256(b"features...").hexdigest()[:16],
39 input_features={
40 "income_pct_ami": 45.2,
41 "household_size": 4,
42 "current_housing": "renting",
43 "employment_status": "employed_part_time",
44 },
45 prediction="likely_eligible",
46 confidence=0.87,
47 explanation={
48 "income_pct_ami": 0.45,
49 "household_size": 0.25,
50 "employment_status": 0.20,
51 "current_housing": 0.10,
52 },
53 action_taken="routed_to_caseworker_queue",
54 human_reviewer="caseworker_j.smith",
55)
56
57print(record.to_json())
58print(f"\nRecord hash: {record.compute_record_hash()}")Bias Testing Requirements
Bias testing is not a one-time activity — it must be conducted:
Common Fairness Metrics
| Metric | Definition | Threshold |
|---|---|---|
| Demographic Parity | P(positive) is the same across groups | Ratio ≥ 0.8 (four-fifths rule) |
| Equal Opportunity | True positive rate is the same across groups | Difference < 0.1 |
| Equalized Odds | TPR and FPR are the same across groups | Difference < 0.1 |
| Predictive Parity | Precision is the same across groups | Ratio ≥ 0.8 |
The Four-Fifths Rule
A selection rate for any group that is less than 80% (four-fifths) of the group with the highest selection rate is evidence of adverse impact. This comes from EEOC guidelines and is widely applied to AI systems.1import numpy as np
2from typing import Dict, List
3
4def four_fifths_test(
5 predictions: np.ndarray,
6 groups: np.ndarray,
7 positive_label: int = 1
8) -> Dict[str, any]:
9 """
10 Run the four-fifths (80%) rule test for adverse impact.
11
12 Args:
13 predictions: Model predictions (0 or 1)
14 groups: Group membership labels
15 positive_label: What counts as a positive prediction
16
17 Returns:
18 Dict with selection rates, ratios, and pass/fail for each group
19 """
20 unique_groups = np.unique(groups)
21 selection_rates = {}
22
23 for group in unique_groups:
24 mask = groups == group
25 rate = np.mean(predictions[mask] == positive_label)
26 selection_rates[group] = rate
27
28 max_rate = max(selection_rates.values())
29
30 results = {}
31 for group, rate in selection_rates.items():
32 ratio = rate / max_rate if max_rate > 0 else 0
33 results[group] = {
34 "selection_rate": round(rate, 4),
35 "ratio_to_highest": round(ratio, 4),
36 "passes_four_fifths": ratio >= 0.8,
37 }
38
39 return {
40 "group_results": results,
41 "highest_rate_group": max(selection_rates, key=selection_rates.get),
42 "overall_pass": all(r["passes_four_fifths"] for r in results.values()),
43 }
44
45# Example usage
46np.random.seed(42)
47n = 1000
48predictions = np.random.choice([0, 1], n, p=[0.4, 0.6])
49groups = np.random.choice(["Group_A", "Group_B", "Group_C"], n)
50
51# Introduce bias: Group_C gets fewer positive predictions
52bias_mask = groups == "Group_C"
53predictions[bias_mask] = np.random.choice([0, 1], bias_mask.sum(), p=[0.65, 0.35])
54
55results = four_fifths_test(predictions, groups)
56
57print("=== Four-Fifths Rule Test ===")
58print(f"Highest rate group: {results['highest_rate_group']}")
59print(f"Overall pass: {results['overall_pass']}\n")
60
61for group, data in results["group_results"].items():
62 status = "PASS" if data["passes_four_fifths"] else "FAIL"
63 print(f" {group}: rate={data['selection_rate']:.2%}, "
64 f"ratio={data['ratio_to_highest']:.2%} [{status}]")Section 508 Accessibility
Section 508 of the Rehabilitation Act requires federal agencies to make electronic and information technology accessible to people with disabilities. For AI systems, this means:
Responsible AI Is Not a Checklist
Responsible AI Checklist for Government
A comprehensive checklist for deploying AI in government contexts: