Skip to main content

Bias & Fairness

Types of bias (selection, measurement, algorithmic), fairness metrics (demographic parity, equalized odds, calibration), bias detection (AIF360, Fairlearn), and mitigation strategies (pre/in/post-processing)

~45 min
Listen to this lesson

Bias & Fairness in Machine Learning

Machine learning models can perpetuate and amplify societal biases present in training data. A hiring model trained on historical decisions might discriminate by gender; a recidivism prediction model might have different error rates by race. Understanding, detecting, and mitigating these biases is both an ethical imperative and, increasingly, a legal requirement.

This lesson covers the types of bias, mathematical fairness definitions, detection tools, and mitigation strategies.

The Impossibility Theorem

It is mathematically impossible to simultaneously satisfy all fairness criteria (demographic parity, equalized odds, and calibration) when base rates differ between groups. This means fairness requires making explicit value judgments about which fairness criteria matter most in each application context. There is no purely technical solution to fairness.

Types of Bias

Data Bias

TypeDescriptionExample
Selection biasTraining data is not representativeMedical data mostly from one demographic
Measurement biasFeatures are measured differently across groupsWealthier areas have more sensors
Label biasLabels reflect historical discriminationHistorical hiring decisions that excluded women
Representation biasSome groups are underrepresentedFew elderly users in tech product data

Algorithmic Bias

TypeDescriptionExample
Optimization biasModel optimizes for majority groupAccuracy-maximizing model ignores minorities
Feature biasProxy features encode protected attributesZip code as a proxy for race
Feedback loopsModel predictions affect future dataPredictive policing increases arrests in targeted areas

Fairness Metrics

Group Fairness Metrics

Demographic Parity (Statistical Parity): The selection rate should be equal across groups. P(Y_hat=1 | A=0) = P(Y_hat=1 | A=1)

Equalized Odds: True positive rate and false positive rate should be equal across groups. P(Y_hat=1 | Y=1, A=0) = P(Y_hat=1 | Y=1, A=1) AND P(Y_hat=1 | Y=0, A=0) = P(Y_hat=1 | Y=0, A=1)

Equal Opportunity: A relaxation of equalized odds — only requires equal true positive rates. P(Y_hat=1 | Y=1, A=0) = P(Y_hat=1 | Y=1, A=1)

Calibration: Among those predicted positive at a given probability, the actual positive rate should be equal. P(Y=1 | Y_hat=p, A=0) = P(Y=1 | Y_hat=p, A=1)

The Four-Fifths Rule

A practical guideline from US employment law: the selection rate for any protected group should be at least 80% of the rate for the group with the highest selection rate. Also known as the "80% rule" or "disparate impact ratio."

python
1import numpy as np
2from sklearn.ensemble import GradientBoostingClassifier
3from sklearn.model_selection import train_test_split
4from sklearn.metrics import confusion_matrix
5
6np.random.seed(42)
7
8# --- Create a biased dataset ---
9# Simulate a hiring scenario with gender bias
10n = 3000
11gender = np.random.binomial(1, 0.5, n)  # 0=female, 1=male
12education = np.random.normal(5, 1.5, n)
13experience = np.random.normal(5, 2, n)
14
15# Bias: males get a boost in hiring probability
16score = 0.3 * education + 0.4 * experience + 0.8 * gender
17noise = np.random.normal(0, 1, n)
18hired = (score + noise > 4).astype(int)
19
20X = np.column_stack([gender, education, experience])
21y = hired
22feature_names = ["gender", "education", "experience"]
23
24X_train, X_test, y_train, y_test = train_test_split(
25    X, y, test_size=0.3, random_state=42
26)
27
28# --- Train model ---
29gbc = GradientBoostingClassifier(
30    n_estimators=100, max_depth=3, random_state=42
31)
32gbc.fit(X_train, y_train)
33y_pred = gbc.predict(X_test)
34y_proba = gbc.predict_proba(X_test)[:, 1]
35
36print(f"Overall accuracy: {(y_pred == y_test).mean():.4f}")
37
38# --- Compute fairness metrics ---
39female_mask = X_test[:, 0] == 0
40male_mask = X_test[:, 0] == 1
41
42def fairness_metrics(y_true, y_pred, group_mask, group_name):
43    """Compute fairness metrics for a subgroup."""
44    cm = confusion_matrix(y_true[group_mask], y_pred[group_mask])
45    tn, fp, fn, tp = cm.ravel()
46    tpr = tp / (tp + fn) if (tp + fn) > 0 else 0
47    fpr = fp / (fp + tn) if (fp + tn) > 0 else 0
48    selection_rate = y_pred[group_mask].mean()
49    accuracy = (y_pred[group_mask] == y_true[group_mask]).mean()
50    return {
51        "group": group_name,
52        "n": group_mask.sum(),
53        "selection_rate": selection_rate,
54        "tpr": tpr,
55        "fpr": fpr,
56        "accuracy": accuracy,
57    }
58
59female_metrics = fairness_metrics(y_test, y_pred, female_mask, "Female")
60male_metrics = fairness_metrics(y_test, y_pred, male_mask, "Male")
61
62print("\n=== Fairness Metrics ===")
63print(f"{'Metric':<20} {'Female':>10} {'Male':>10} {'Ratio':>10}")
64print("-" * 55)
65for metric in ["selection_rate", "tpr", "fpr", "accuracy"]:
66    f_val = female_metrics[metric]
67    m_val = male_metrics[metric]
68    ratio = f_val / m_val if m_val > 0 else float("inf")
69    flag = " FAIL" if ratio < 0.8 else " PASS"
70    print(f"{metric:<20} {f_val:>10.4f} {m_val:>10.4f} {ratio:>9.2f}{flag}")
71
72# Demographic parity
73dp_diff = abs(female_metrics["selection_rate"] - male_metrics["selection_rate"])
74print(f"\nDemographic Parity Difference: {dp_diff:.4f}")
75
76# Equalized odds
77eo_tpr_diff = abs(female_metrics["tpr"] - male_metrics["tpr"])
78eo_fpr_diff = abs(female_metrics["fpr"] - male_metrics["fpr"])
79print(f"Equalized Odds (TPR gap): {eo_tpr_diff:.4f}")
80print(f"Equalized Odds (FPR gap): {eo_fpr_diff:.4f}")
81
82# Four-fifths rule
83ratio_4_5 = female_metrics["selection_rate"] / male_metrics["selection_rate"]
84print(f"\nFour-Fifths Rule: {ratio_4_5:.4f} "
85      f"({'PASS' if ratio_4_5 >= 0.8 else 'FAIL - disparate impact detected'})")

Bias Mitigation Strategies

Pre-Processing (before training)

Modify the training data to remove bias:
  • Resampling: Over/under-sample to equalize group representation
  • Reweighting: Assign higher weights to underrepresented groups
  • Disparate Impact Remover: Transform features to remove correlation with protected attributes
  • Fair representation learning: Learn a latent representation that encodes task-relevant information but not protected attributes
  • In-Processing (during training)

    Modify the learning algorithm:
  • Adversarial debiasing: Add an adversary that tries to predict the protected attribute from model outputs; the main model learns to fool the adversary
  • Fairness constraints: Add fairness metrics as constraints or regularization terms in the loss function
  • Exponentiated Gradient: Reduce fair classification to a sequence of cost-sensitive classification problems
  • Post-Processing (after training)

    Modify the model's predictions:
  • Threshold adjustment: Use different decision thresholds for each group to equalize metrics
  • Calibrated equalized odds: Find the threshold combination that satisfies equalized odds while minimizing accuracy loss
  • Reject option classification: In the uncertainty region, favor the underprivileged group
  • python
    1import numpy as np
    2from sklearn.ensemble import GradientBoostingClassifier
    3from sklearn.model_selection import train_test_split
    4
    5np.random.seed(42)
    6
    7# Recreate biased dataset
    8n = 3000
    9gender = np.random.binomial(1, 0.5, n)
    10education = np.random.normal(5, 1.5, n)
    11experience = np.random.normal(5, 2, n)
    12score = 0.3 * education + 0.4 * experience + 0.8 * gender
    13noise = np.random.normal(0, 1, n)
    14hired = (score + noise > 4).astype(int)
    15X = np.column_stack([gender, education, experience])
    16y = hired
    17
    18X_train, X_test, y_train, y_test = train_test_split(
    19    X, y, test_size=0.3, random_state=42
    20)
    21
    22# --- Mitigation 1: Remove protected attribute ---
    23print("=== Mitigation 1: Remove Gender Feature ===")
    24X_train_no_gender = X_train[:, 1:]  # Drop gender column
    25X_test_no_gender = X_test[:, 1:]
    26
    27gbc_no_gender = GradientBoostingClassifier(
    28    n_estimators=100, max_depth=3, random_state=42
    29)
    30gbc_no_gender.fit(X_train_no_gender, y_train)
    31y_pred_ng = gbc_no_gender.predict(X_test_no_gender)
    32
    33female = X_test[:, 0] == 0
    34male = X_test[:, 0] == 1
    35sr_f = y_pred_ng[female].mean()
    36sr_m = y_pred_ng[male].mean()
    37print(f"Selection rate: Female={sr_f:.4f}, Male={sr_m:.4f}, "
    38      f"Ratio={sr_f/sr_m:.4f}")
    39print(f"Accuracy: {(y_pred_ng == y_test).mean():.4f}")
    40
    41# --- Mitigation 2: Reweighting ---
    42print("\n=== Mitigation 2: Sample Reweighting ===")
    43# Compute weights to balance group-label combinations
    44groups = X_train[:, 0]
    45weights = np.ones(len(y_train))
    46
    47for g in [0, 1]:
    48    for label in [0, 1]:
    49        mask = (groups == g) & (y_train == label)
    50        expected = len(y_train) / 4
    51        actual = mask.sum()
    52        weights[mask] = expected / actual if actual > 0 else 1.0
    53
    54gbc_reweight = GradientBoostingClassifier(
    55    n_estimators=100, max_depth=3, random_state=42
    56)
    57gbc_reweight.fit(X_train, y_train, sample_weight=weights)
    58y_pred_rw = gbc_reweight.predict(X_test)
    59
    60sr_f_rw = y_pred_rw[female].mean()
    61sr_m_rw = y_pred_rw[male].mean()
    62print(f"Selection rate: Female={sr_f_rw:.4f}, Male={sr_m_rw:.4f}, "
    63      f"Ratio={sr_f_rw/sr_m_rw:.4f}")
    64print(f"Accuracy: {(y_pred_rw == y_test).mean():.4f}")
    65
    66# --- Mitigation 3: Threshold adjustment (post-processing) ---
    67print("\n=== Mitigation 3: Threshold Adjustment ===")
    68gbc_full = GradientBoostingClassifier(
    69    n_estimators=100, max_depth=3, random_state=42
    70)
    71gbc_full.fit(X_train, y_train)
    72y_proba = gbc_full.predict_proba(X_test)[:, 1]
    73
    74# Find thresholds that equalize selection rates
    75target_rate = y_proba.mean()  # Use overall mean as target
    76
    77best_thresh = {"female": 0.5, "male": 0.5}
    78for name, mask in [("female", female), ("male", male)]:
    79    for t in np.arange(0.1, 0.9, 0.01):
    80        sr = (y_proba[mask] >= t).mean()
    81        if abs(sr - target_rate) < abs(
    82            (y_proba[mask] >= best_thresh[name]).mean() - target_rate
    83        ):
    84            best_thresh[name] = t
    85
    86y_pred_thresh = np.zeros(len(y_test), dtype=int)
    87y_pred_thresh[female] = (y_proba[female] >= best_thresh["female"]).astype(int)
    88y_pred_thresh[male] = (y_proba[male] >= best_thresh["male"]).astype(int)
    89
    90sr_f_t = y_pred_thresh[female].mean()
    91sr_m_t = y_pred_thresh[male].mean()
    92print(f"Thresholds: Female={best_thresh['female']:.2f}, "
    93      f"Male={best_thresh['male']:.2f}")
    94print(f"Selection rate: Female={sr_f_t:.4f}, Male={sr_m_t:.4f}, "
    95      f"Ratio={sr_f_t/(sr_m_t+1e-8):.4f}")
    96print(f"Accuracy: {(y_pred_thresh == y_test).mean():.4f}")
    97
    98# --- Summary ---
    99print("\n=== Comparison ===")
    100print(f"{'Method':<25} {'DP Ratio':>10} {'Accuracy':>10}")
    101print("-" * 45)
    102methods = [
    103    ("Baseline (with gender)", X_test[:, 0] == 0, X_test[:, 0] == 1,
    104     gbc_full.predict(X_test)),
    105    ("Remove gender", female, male, y_pred_ng),
    106    ("Reweighting", female, male, y_pred_rw),
    107    ("Threshold adjustment", female, male, y_pred_thresh),
    108]
    109for name, f_m, m_m, preds in methods:
    110    sr_f = preds[f_m].mean()
    111    sr_m = preds[m_m].mean()
    112    ratio = sr_f / sr_m if sr_m > 0 else 0
    113    acc = (preds == y_test).mean()
    114    flag = " *" if ratio >= 0.8 else ""
    115    print(f"{name:<25} {ratio:>10.4f} {acc:>10.4f}{flag}")

    Removing the Protected Attribute Is Often Not Enough

    Simply removing the protected attribute (e.g., gender) from the features does not eliminate bias. Other features (zip code, name, purchasing patterns) can serve as proxies for the protected attribute. This is called 'redundant encoding.' True bias mitigation requires measuring fairness metrics and applying systematic mitigation techniques.

    Fairness Toolkits

    Fairlearn (Microsoft)

    Python library for assessing and improving fairness. Provides:
  • MetricFrame: Compute any metric disaggregated by group
  • ThresholdOptimizer: Post-processing threshold adjustment
  • ExponentiatedGradient: In-processing constrained optimization
  • GridSearch: Find the fairness-accuracy Pareto frontier
  • AIF360 (IBM)

    Comprehensive toolkit with 70+ fairness metrics and 10+ algorithms:
  • Pre-processing: Reweighting, Disparate Impact Remover, Optimized Preprocessing
  • In-processing: Adversarial Debiasing, Prejudice Remover
  • Post-processing: Equalized Odds, Calibrated Equalized Odds, Reject Option
  • Choosing a Strategy

    1. Start with measurement: You cannot improve what you do not measure 2. Try post-processing first: It is the easiest and does not require retraining 3. Use pre-processing for data issues: If the problem is in the data, fix the data 4. Use in-processing for algorithmic issues: If the model itself introduces bias 5. Document and monitor: Fairness is not a one-time check; monitor in production