Privacy & Security in AI

AI systems are uniquely vulnerable to privacy and security threats. Unlike traditional software, ML models can memorize training data, be reverse-engineered to reveal private information, and be manipulated through adversarial inputs. This lesson covers the techniques for building privacy-preserving and secure AI systems.

The Privacy Paradox of ML

Machine learning models need large, rich datasets to perform well — but those datasets often contain sensitive personal information. Moreover, the trained model itself can leak information about its training data. Differential privacy, federated learning, and secure computation are the three pillars of privacy-preserving ML.

Differential Privacy

Differential privacy (DP) provides a mathematical guarantee that the output of a computation does not reveal whether any individual's data was included in the input.

The Core Idea

A mechanism M is epsilon-differentially private if for any two datasets D and D' that differ by one record, and any set of outputs S:

P[M(D) in S] <= e^epsilon * P[M(D') in S]

Understanding Epsilon

The privacy budget (epsilon) controls the privacy-utility trade-off:

Epsilon	Privacy Level	Utility
0.1	Very strong privacy	Lower accuracy
1.0	Strong privacy	Good accuracy
10.0	Weak privacy	High accuracy
infinity	No privacy	Maximum accuracy

Common Mechanisms

1. Laplace Mechanism: Adds noise drawn from a Laplace distribution to numeric query results 2. Gaussian Mechanism: Adds Gaussian noise (requires relaxed "approximate DP") 3. Exponential Mechanism: For categorical outputs — selects results with probability proportional to their quality score 4. Randomized Response: For surveys — each respondent flips a coin to decide whether to answer truthfully

python

1import numpy as np
2
3def laplace_mechanism(true_value: float, sensitivity: float,
4                       epsilon: float) -> float:
5    """Add Laplace noise for differential privacy.
6
7    Args:
8        true_value: The actual query result
9        sensitivity: Max change in output when one record changes
10        epsilon: Privacy budget (lower = more private)
11
12    Returns:
13        Noisy result satisfying epsilon-differential privacy
14    """
15    scale = sensitivity / epsilon
16    noise = np.random.laplace(0, scale)
17    return true_value + noise
18
19def gaussian_mechanism(true_value: float, sensitivity: float,
20                        epsilon: float, delta: float = 1e-5) -> float:
21    """Add Gaussian noise for (epsilon, delta)-differential privacy."""
22    sigma = sensitivity * np.sqrt(2 * np.log(1.25 / delta)) / epsilon
23    noise = np.random.normal(0, sigma)
24    return true_value + noise
25
26# Example: Computing average salary with differential privacy
27np.random.seed(42)
28salaries = np.array([50000, 65000, 72000, 48000, 95000,
29                     61000, 58000, 83000, 71000, 55000])
30
31true_mean = np.mean(salaries)
32sensitivity = (max(salaries) - min(salaries)) / len(salaries)
33
34print(f"True mean salary: ${true_mean:,.0f}")
35print(f"Sensitivity: ${sensitivity:,.0f}")
36print()
37
38# Compare different epsilon values
39for eps in [0.1, 1.0, 5.0, 10.0]:
40    noisy_results = [
41        laplace_mechanism(true_mean, sensitivity, eps)
42        for _ in range(1000)
43    ]
44    avg_noisy = np.mean(noisy_results)
45    std_noisy = np.std(noisy_results)
46    print(f"epsilon={eps:5.1f}: "
47          f"mean=${avg_noisy:>10,.0f}, "
48          f"std=${std_noisy:>10,.0f}")

Federated Learning

Federated learning trains ML models across multiple decentralized devices or organizations without sharing raw data. Each participant trains on their local data and shares only model updates (gradients).

How It Works

1. Central server sends the current model to all participants 2. Each participant trains the model on their local data 3. Participants send only model updates (gradients) back to the server 4. Server aggregates updates (e.g., FedAvg) to produce an improved global model 5. Repeat until convergence

Key Benefits

Data never leaves the device/organization

Enables collaboration between competitors or regulated entities

Naturally reduces data transfer costs

Challenges

Non-IID data: Each participant's data may have very different distributions

Communication overhead: Sending gradients for large models is expensive

Gradient attacks: Model updates can still leak information (see below)

python

1import numpy as np
2
3def federated_averaging(client_models, client_sizes):
4    """Federated Averaging (FedAvg) algorithm.
5
6    Aggregates model parameters from multiple clients,
7    weighted by the number of samples each client has.
8
9    Args:
10        client_models: List of model weight arrays (one per client)
11        client_sizes: List of dataset sizes (one per client)
12
13    Returns:
14        Aggregated global model weights
15    """
16    total_samples = sum(client_sizes)
17    # Weighted average of all client models
18    global_model = np.zeros_like(client_models[0])
19    for model, size in zip(client_models, client_sizes):
20        weight = size / total_samples
21        global_model += weight * model
22    return global_model
23
24def simulate_federated_learning(num_clients=5, num_rounds=10):
25    """Simulate federated learning for a simple linear model."""
26    np.random.seed(42)
27
28    # True model: y = 3x + 2 + noise
29    true_weights = np.array([3.0, 2.0])  # [slope, intercept]
30
31    # Each client has different data (non-IID simulation)
32    client_data = []
33    client_sizes = []
34    for i in range(num_clients):
35        n = np.random.randint(50, 200)
36        x = np.random.uniform(i * 2, i * 2 + 5, n)  # Different ranges
37        y = 3 * x + 2 + np.random.normal(0, 1, n)
38        client_data.append((x, y))
39        client_sizes.append(n)
40
41    # Initialize global model
42    global_weights = np.array([0.0, 0.0])
43    lr = 0.01
44
45    print("Federated Learning Simulation")
46    print(f"True weights: {true_weights}")
47    print(f"Clients: {num_clients}, Rounds: {num_rounds}")
48    print(f"Client sizes: {client_sizes}\n")
49
50    for round_num in range(num_rounds):
51        client_models = []
52
53        for i in range(num_clients):
54            x, y = client_data[i]
55            # Local training (1 epoch of gradient descent)
56            local_weights = global_weights.copy()
57            for _ in range(5):  # 5 local steps
58                preds = local_weights[0] * x + local_weights[1]
59                errors = preds - y
60                grad_w = 2 * np.mean(errors * x)
61                grad_b = 2 * np.mean(errors)
62                local_weights[0] -= lr * grad_w
63                local_weights[1] -= lr * grad_b
64            client_models.append(local_weights)
65
66        # Aggregate with FedAvg
67        global_weights = federated_averaging(
68            client_models, client_sizes
69        )
70
71        if (round_num + 1) % 2 == 0:
72            print(f"Round {round_num + 1:2d}: "
73                  f"weights = [{global_weights[0]:.3f}, "
74                  f"{global_weights[1]:.3f}]")
75
76    print(f"\nFinal:  [{global_weights[0]:.3f}, {global_weights[1]:.3f}]")
77    print(f"Target: [{true_weights[0]:.3f}, {true_weights[1]:.3f}]")
78
79simulate_federated_learning()

Attacks on ML Models

Model Inversion Attacks

An attacker with access to a model's predictions reconstructs the training data. For example, given a facial recognition model, an attacker can generate approximate images of faces in the training set.

Membership Inference Attacks

An attacker determines whether a specific data point was in the training set. The key insight: models tend to be more confident on data they were trained on.

Adversarial Examples

Carefully crafted inputs that look normal to humans but cause the model to make incorrect predictions. A tiny perturbation to an image can flip a classifier's prediction with high confidence.

Data Poisoning

An attacker injects malicious data into the training set to manipulate the model's behavior. This can create backdoors — e.g., a stop sign with a small sticker is classified as a speed limit sign.

Defenses

Attack	Defense
Model inversion	Differential privacy, output perturbation, limiting prediction confidence
Membership inference	Differential privacy, regularization, limiting query access
Adversarial examples	Adversarial training, input preprocessing, certified defenses
Data poisoning	Data validation, anomaly detection, robust aggregation

Secure Aggregation

In federated learning, secure aggregation ensures the server can compute the aggregate of client updates without seeing any individual client's update. This is achieved through cryptographic protocols (homomorphic encryption, secret sharing).

Data Anonymization Techniques

Technique	Description	Limitation
k-anonymity	Each record is indistinguishable from k-1 others	Vulnerable to attribute disclosure
l-diversity	Each group has at least l distinct sensitive values	Doesn't prevent probabilistic inference
t-closeness	Distribution of sensitive values in each group is close to overall distribution	Computationally expensive
Synthetic data	Generate artificial data that preserves statistical properties	May not capture rare patterns

Anonymization Is Hard

Research has repeatedly shown that 'anonymized' datasets can be re-identified. In one famous study, 87% of Americans could be uniquely identified from just zip code, gender, and birth date. True privacy requires mathematical guarantees like differential privacy, not just removing names and IDs.

Privacy & Security in AI

The Privacy Paradox of ML

Differential Privacy

Differential privacy (DP) provides a mathematical guarantee that the output of a computation does not reveal whether any individual's data was included in the input.

The Core Idea

A mechanism M is epsilon-differentially private if for any two datasets D and D' that differ by one record, and any set of outputs S:

P[M(D) in S] <= e^epsilon * P[M(D') in S]

Understanding Epsilon

The privacy budget (epsilon) controls the privacy-utility trade-off:

Epsilon	Privacy Level	Utility
0.1	Very strong privacy	Lower accuracy
1.0	Strong privacy	Good accuracy
10.0	Weak privacy	High accuracy
infinity	No privacy	Maximum accuracy

Common Mechanisms

python

1import numpy as np
2
3def laplace_mechanism(true_value: float, sensitivity: float,
4                       epsilon: float) -> float:
5    """Add Laplace noise for differential privacy.
6
7    Args:
8        true_value: The actual query result
9        sensitivity: Max change in output when one record changes
10        epsilon: Privacy budget (lower = more private)
11
12    Returns:
13        Noisy result satisfying epsilon-differential privacy
14    """
15    scale = sensitivity / epsilon
16    noise = np.random.laplace(0, scale)
17    return true_value + noise
18
19def gaussian_mechanism(true_value: float, sensitivity: float,
20                        epsilon: float, delta: float = 1e-5) -> float:
21    """Add Gaussian noise for (epsilon, delta)-differential privacy."""
22    sigma = sensitivity * np.sqrt(2 * np.log(1.25 / delta)) / epsilon
23    noise = np.random.normal(0, sigma)
24    return true_value + noise
25
26# Example: Computing average salary with differential privacy
27np.random.seed(42)
28salaries = np.array([50000, 65000, 72000, 48000, 95000,
29                     61000, 58000, 83000, 71000, 55000])
30
31true_mean = np.mean(salaries)
32sensitivity = (max(salaries) - min(salaries)) / len(salaries)
33
34print(f"True mean salary: ${true_mean:,.0f}")
35print(f"Sensitivity: ${sensitivity:,.0f}")
36print()
37
38# Compare different epsilon values
39for eps in [0.1, 1.0, 5.0, 10.0]:
40    noisy_results = [
41        laplace_mechanism(true_mean, sensitivity, eps)
42        for _ in range(1000)
43    ]
44    avg_noisy = np.mean(noisy_results)
45    std_noisy = np.std(noisy_results)
46    print(f"epsilon={eps:5.1f}: "
47          f"mean=${avg_noisy:>10,.0f}, "
48          f"std=${std_noisy:>10,.0f}")

Federated Learning

How It Works

Key Benefits

Data never leaves the device/organization

Enables collaboration between competitors or regulated entities

Naturally reduces data transfer costs

Challenges

Non-IID data: Each participant's data may have very different distributions

Communication overhead: Sending gradients for large models is expensive

Gradient attacks: Model updates can still leak information (see below)

python

1import numpy as np
2
3def federated_averaging(client_models, client_sizes):
4    """Federated Averaging (FedAvg) algorithm.
5
6    Aggregates model parameters from multiple clients,
7    weighted by the number of samples each client has.
8
9    Args:
10        client_models: List of model weight arrays (one per client)
11        client_sizes: List of dataset sizes (one per client)
12
13    Returns:
14        Aggregated global model weights
15    """
16    total_samples = sum(client_sizes)
17    # Weighted average of all client models
18    global_model = np.zeros_like(client_models[0])
19    for model, size in zip(client_models, client_sizes):
20        weight = size / total_samples
21        global_model += weight * model
22    return global_model
23
24def simulate_federated_learning(num_clients=5, num_rounds=10):
25    """Simulate federated learning for a simple linear model."""
26    np.random.seed(42)
27
28    # True model: y = 3x + 2 + noise
29    true_weights = np.array([3.0, 2.0])  # [slope, intercept]
30
31    # Each client has different data (non-IID simulation)
32    client_data = []
33    client_sizes = []
34    for i in range(num_clients):
35        n = np.random.randint(50, 200)
36        x = np.random.uniform(i * 2, i * 2 + 5, n)  # Different ranges
37        y = 3 * x + 2 + np.random.normal(0, 1, n)
38        client_data.append((x, y))
39        client_sizes.append(n)
40
41    # Initialize global model
42    global_weights = np.array([0.0, 0.0])
43    lr = 0.01
44
45    print("Federated Learning Simulation")
46    print(f"True weights: {true_weights}")
47    print(f"Clients: {num_clients}, Rounds: {num_rounds}")
48    print(f"Client sizes: {client_sizes}\n")
49
50    for round_num in range(num_rounds):
51        client_models = []
52
53        for i in range(num_clients):
54            x, y = client_data[i]
55            # Local training (1 epoch of gradient descent)
56            local_weights = global_weights.copy()
57            for _ in range(5):  # 5 local steps
58                preds = local_weights[0] * x + local_weights[1]
59                errors = preds - y
60                grad_w = 2 * np.mean(errors * x)
61                grad_b = 2 * np.mean(errors)
62                local_weights[0] -= lr * grad_w
63                local_weights[1] -= lr * grad_b
64            client_models.append(local_weights)
65
66        # Aggregate with FedAvg
67        global_weights = federated_averaging(
68            client_models, client_sizes
69        )
70
71        if (round_num + 1) % 2 == 0:
72            print(f"Round {round_num + 1:2d}: "
73                  f"weights = [{global_weights[0]:.3f}, "
74                  f"{global_weights[1]:.3f}]")
75
76    print(f"\nFinal:  [{global_weights[0]:.3f}, {global_weights[1]:.3f}]")
77    print(f"Target: [{true_weights[0]:.3f}, {true_weights[1]:.3f}]")
78
79simulate_federated_learning()

Attacks on ML Models

Model Inversion Attacks

Membership Inference Attacks

An attacker determines whether a specific data point was in the training set. The key insight: models tend to be more confident on data they were trained on.

Adversarial Examples

Carefully crafted inputs that look normal to humans but cause the model to make incorrect predictions. A tiny perturbation to an image can flip a classifier's prediction with high confidence.

Data Poisoning

An attacker injects malicious data into the training set to manipulate the model's behavior. This can create backdoors — e.g., a stop sign with a small sticker is classified as a speed limit sign.

Defenses

Attack	Defense
Model inversion	Differential privacy, output perturbation, limiting prediction confidence
Membership inference	Differential privacy, regularization, limiting query access
Adversarial examples	Adversarial training, input preprocessing, certified defenses
Data poisoning	Data validation, anomaly detection, robust aggregation

Secure Aggregation

Data Anonymization Techniques

Technique	Description	Limitation
k-anonymity	Each record is indistinguishable from k-1 others	Vulnerable to attribute disclosure
l-diversity	Each group has at least l distinct sensitive values	Doesn't prevent probabilistic inference
t-closeness	Distribution of sensitive values in each group is close to overall distribution	Computationally expensive
Synthetic data	Generate artificial data that preserves statistical properties	May not capture rare patterns