Skip to main content

Privacy & Security in AI

Differential privacy, federated learning, model attacks, adversarial ML, secure aggregation, and data anonymization

~45 min
Listen to this lesson

Privacy & Security in AI

AI systems are uniquely vulnerable to privacy and security threats. Unlike traditional software, ML models can memorize training data, be reverse-engineered to reveal private information, and be manipulated through adversarial inputs. This lesson covers the techniques for building privacy-preserving and secure AI systems.

The Privacy Paradox of ML

Machine learning models need large, rich datasets to perform well — but those datasets often contain sensitive personal information. Moreover, the trained model itself can leak information about its training data. Differential privacy, federated learning, and secure computation are the three pillars of privacy-preserving ML.

Differential Privacy

Differential privacy (DP) provides a mathematical guarantee that the output of a computation does not reveal whether any individual's data was included in the input.

The Core Idea

A mechanism M is epsilon-differentially private if for any two datasets D and D' that differ by one record, and any set of outputs S:

P[M(D) in S] <= e^epsilon * P[M(D') in S]

Understanding Epsilon

The privacy budget (epsilon) controls the privacy-utility trade-off:

EpsilonPrivacy LevelUtility
0.1Very strong privacyLower accuracy
1.0Strong privacyGood accuracy
10.0Weak privacyHigh accuracy
infinityNo privacyMaximum accuracy

Common Mechanisms

1. Laplace Mechanism: Adds noise drawn from a Laplace distribution to numeric query results 2. Gaussian Mechanism: Adds Gaussian noise (requires relaxed "approximate DP") 3. Exponential Mechanism: For categorical outputs — selects results with probability proportional to their quality score 4. Randomized Response: For surveys — each respondent flips a coin to decide whether to answer truthfully

python
1import numpy as np
2
3def laplace_mechanism(true_value: float, sensitivity: float,
4                       epsilon: float) -> float:
5    """Add Laplace noise for differential privacy.
6
7    Args:
8        true_value: The actual query result
9        sensitivity: Max change in output when one record changes
10        epsilon: Privacy budget (lower = more private)
11
12    Returns:
13        Noisy result satisfying epsilon-differential privacy
14    """
15    scale = sensitivity / epsilon
16    noise = np.random.laplace(0, scale)
17    return true_value + noise
18
19def gaussian_mechanism(true_value: float, sensitivity: float,
20                        epsilon: float, delta: float = 1e-5) -> float:
21    """Add Gaussian noise for (epsilon, delta)-differential privacy."""
22    sigma = sensitivity * np.sqrt(2 * np.log(1.25 / delta)) / epsilon
23    noise = np.random.normal(0, sigma)
24    return true_value + noise
25
26# Example: Computing average salary with differential privacy
27np.random.seed(42)
28salaries = np.array([50000, 65000, 72000, 48000, 95000,
29                     61000, 58000, 83000, 71000, 55000])
30
31true_mean = np.mean(salaries)
32sensitivity = (max(salaries) - min(salaries)) / len(salaries)
33
34print(f"True mean salary: ${true_mean:,.0f}")
35print(f"Sensitivity: ${sensitivity:,.0f}")
36print()
37
38# Compare different epsilon values
39for eps in [0.1, 1.0, 5.0, 10.0]:
40    noisy_results = [
41        laplace_mechanism(true_mean, sensitivity, eps)
42        for _ in range(1000)
43    ]
44    avg_noisy = np.mean(noisy_results)
45    std_noisy = np.std(noisy_results)
46    print(f"epsilon={eps:5.1f}: "
47          f"mean=${avg_noisy:>10,.0f}, "
48          f"std=${std_noisy:>10,.0f}")

Federated Learning

Federated learning trains ML models across multiple decentralized devices or organizations without sharing raw data. Each participant trains on their local data and shares only model updates (gradients).

How It Works

1. Central server sends the current model to all participants 2. Each participant trains the model on their local data 3. Participants send only model updates (gradients) back to the server 4. Server aggregates updates (e.g., FedAvg) to produce an improved global model 5. Repeat until convergence

Key Benefits

  • Data never leaves the device/organization
  • Enables collaboration between competitors or regulated entities
  • Naturally reduces data transfer costs
  • Challenges

  • Non-IID data: Each participant's data may have very different distributions
  • Communication overhead: Sending gradients for large models is expensive
  • Gradient attacks: Model updates can still leak information (see below)
  • python
    1import numpy as np
    2
    3def federated_averaging(client_models, client_sizes):
    4    """Federated Averaging (FedAvg) algorithm.
    5
    6    Aggregates model parameters from multiple clients,
    7    weighted by the number of samples each client has.
    8
    9    Args:
    10        client_models: List of model weight arrays (one per client)
    11        client_sizes: List of dataset sizes (one per client)
    12
    13    Returns:
    14        Aggregated global model weights
    15    """
    16    total_samples = sum(client_sizes)
    17    # Weighted average of all client models
    18    global_model = np.zeros_like(client_models[0])
    19    for model, size in zip(client_models, client_sizes):
    20        weight = size / total_samples
    21        global_model += weight * model
    22    return global_model
    23
    24def simulate_federated_learning(num_clients=5, num_rounds=10):
    25    """Simulate federated learning for a simple linear model."""
    26    np.random.seed(42)
    27
    28    # True model: y = 3x + 2 + noise
    29    true_weights = np.array([3.0, 2.0])  # [slope, intercept]
    30
    31    # Each client has different data (non-IID simulation)
    32    client_data = []
    33    client_sizes = []
    34    for i in range(num_clients):
    35        n = np.random.randint(50, 200)
    36        x = np.random.uniform(i * 2, i * 2 + 5, n)  # Different ranges
    37        y = 3 * x + 2 + np.random.normal(0, 1, n)
    38        client_data.append((x, y))
    39        client_sizes.append(n)
    40
    41    # Initialize global model
    42    global_weights = np.array([0.0, 0.0])
    43    lr = 0.01
    44
    45    print("Federated Learning Simulation")
    46    print(f"True weights: {true_weights}")
    47    print(f"Clients: {num_clients}, Rounds: {num_rounds}")
    48    print(f"Client sizes: {client_sizes}\n")
    49
    50    for round_num in range(num_rounds):
    51        client_models = []
    52
    53        for i in range(num_clients):
    54            x, y = client_data[i]
    55            # Local training (1 epoch of gradient descent)
    56            local_weights = global_weights.copy()
    57            for _ in range(5):  # 5 local steps
    58                preds = local_weights[0] * x + local_weights[1]
    59                errors = preds - y
    60                grad_w = 2 * np.mean(errors * x)
    61                grad_b = 2 * np.mean(errors)
    62                local_weights[0] -= lr * grad_w
    63                local_weights[1] -= lr * grad_b
    64            client_models.append(local_weights)
    65
    66        # Aggregate with FedAvg
    67        global_weights = federated_averaging(
    68            client_models, client_sizes
    69        )
    70
    71        if (round_num + 1) % 2 == 0:
    72            print(f"Round {round_num + 1:2d}: "
    73                  f"weights = [{global_weights[0]:.3f}, "
    74                  f"{global_weights[1]:.3f}]")
    75
    76    print(f"\nFinal:  [{global_weights[0]:.3f}, {global_weights[1]:.3f}]")
    77    print(f"Target: [{true_weights[0]:.3f}, {true_weights[1]:.3f}]")
    78
    79simulate_federated_learning()

    Attacks on ML Models

    Model Inversion Attacks

    An attacker with access to a model's predictions reconstructs the training data. For example, given a facial recognition model, an attacker can generate approximate images of faces in the training set.

    Membership Inference Attacks

    An attacker determines whether a specific data point was in the training set. The key insight: models tend to be more confident on data they were trained on.

    Adversarial Examples

    Carefully crafted inputs that look normal to humans but cause the model to make incorrect predictions. A tiny perturbation to an image can flip a classifier's prediction with high confidence.

    Data Poisoning

    An attacker injects malicious data into the training set to manipulate the model's behavior. This can create backdoors — e.g., a stop sign with a small sticker is classified as a speed limit sign.

    Defenses

    AttackDefense
    Model inversionDifferential privacy, output perturbation, limiting prediction confidence
    Membership inferenceDifferential privacy, regularization, limiting query access
    Adversarial examplesAdversarial training, input preprocessing, certified defenses
    Data poisoningData validation, anomaly detection, robust aggregation

    Secure Aggregation

    In federated learning, secure aggregation ensures the server can compute the aggregate of client updates without seeing any individual client's update. This is achieved through cryptographic protocols (homomorphic encryption, secret sharing).

    Data Anonymization Techniques

    TechniqueDescriptionLimitation
    k-anonymityEach record is indistinguishable from k-1 othersVulnerable to attribute disclosure
    l-diversityEach group has at least l distinct sensitive valuesDoesn't prevent probabilistic inference
    t-closenessDistribution of sensitive values in each group is close to overall distributionComputationally expensive
    Synthetic dataGenerate artificial data that preserves statistical propertiesMay not capture rare patterns

    Anonymization Is Hard

    Research has repeatedly shown that 'anonymized' datasets can be re-identified. In one famous study, 87% of Americans could be uniquely identified from just zip code, gender, and birth date. True privacy requires mathematical guarantees like differential privacy, not just removing names and IDs.