Linear & Logistic Regression
Linear models are the workhorses of machine learning. They are fast, interpretable, and surprisingly powerful. Understanding them deeply gives you a foundation for understanding all other models.
Linear Regression
Linear regression models the relationship between features and a continuous target as a weighted sum:
**y = w1*x1 + w2*x2 + ... + wn*xn + b
Where:
Ordinary Least Squares (OLS)
The most common approach minimizes the Mean Squared Error (MSE):
MSE = (1/n) * sum((y_pred - y_actual)^2)
This is called Ordinary Least Squares because it minimizes the sum of squared residuals. There are two ways to solve it:
1. Normal Equation: Closed-form solution (fast for small datasets) 2. Gradient Descent**: Iterative optimization (scales to large datasets)
The Cost Function
1from sklearn.linear_model import LinearRegression
2from sklearn.datasets import make_regression
3from sklearn.model_selection import train_test_split
4from sklearn.metrics import mean_squared_error, r2_score
5import numpy as np
6
7# Generate synthetic regression data
8X, y = make_regression(n_samples=200, n_features=3, noise=10, random_state=42)
9X_train, X_test, y_train, y_test = train_test_split(
10 X, y, test_size=0.2, random_state=42
11)
12
13# Train a linear regression model
14model = LinearRegression()
15model.fit(X_train, y_train)
16
17# Evaluate
18y_pred = model.predict(X_test)
19mse = mean_squared_error(y_test, y_pred)
20r2 = r2_score(y_test, y_pred)
21
22print(f"Coefficients: {model.coef_}")
23print(f"Intercept: {model.intercept_:.4f}")
24print(f"MSE: {mse:.4f}")
25print(f"R2 Score: {r2:.4f}")Regularization: Ridge, Lasso, and ElasticNet
Plain linear regression can overfit when you have many features or correlated features. Regularization adds a penalty term to the cost function that discourages large coefficients.
Ridge Regression (L2 Regularization)
Adds the sum of squared coefficients to the cost:
**Cost = MSE + alpha * sum(w^2)
Lasso Regression (L1 Regularization)
Adds the sum of absolute coefficients to the cost:
Cost = MSE + alpha * sum(|w|)
ElasticNet (L1 + L2 Combined)
Cost = MSE + alpha * (l1_ratio * sum(|w|) + (1 - l1_ratio) * sum(w^2))**
L1 vs L2 Regularization
1from sklearn.linear_model import Ridge, Lasso, ElasticNet
2import numpy as np
3
4# Compare regularization methods
5models = {
6 "Linear": LinearRegression(),
7 "Ridge (L2)": Ridge(alpha=1.0),
8 "Lasso (L1)": Lasso(alpha=1.0),
9 "ElasticNet": ElasticNet(alpha=1.0, l1_ratio=0.5),
10}
11
12print(f"{'Model':<18} {'R2':>8} {'Non-zero coefs':>16}")
13print("-" * 44)
14for name, model in models.items():
15 model.fit(X_train, y_train)
16 r2 = model.score(X_test, y_test)
17 non_zero = np.sum(np.abs(model.coef_) > 1e-6)
18 print(f"{name:<18} {r2:>8.4f} {non_zero:>16}")Logistic Regression
Despite its name, logistic regression is a classification algorithm. It uses the sigmoid function to map any real number to a probability between 0 and 1:
sigma(z) = 1 / (1 + e^(-z))
Where z = w*x + b (just like linear regression). The sigmoid "squashes" the output:
The model predicts class 1 if the probability > 0.5, and class 0 otherwise.
1from sklearn.linear_model import LogisticRegression
2from sklearn.datasets import load_breast_cancer
3from sklearn.model_selection import train_test_split
4from sklearn.metrics import accuracy_score, classification_report
5
6# Load binary classification dataset
7X, y = load_breast_cancer(return_X_y=True)
8X_train, X_test, y_train, y_test = train_test_split(
9 X, y, test_size=0.2, random_state=42, stratify=y
10)
11
12# Train logistic regression
13model = LogisticRegression(max_iter=5000, random_state=42)
14model.fit(X_train, y_train)
15
16# Predict probabilities and classes
17y_prob = model.predict_proba(X_test)[:5] # First 5 probabilities
18y_pred = model.predict(X_test)
19
20print("First 5 predicted probabilities [class 0, class 1]:")
21for i, probs in enumerate(y_prob):
22 print(f" Sample {i}: [{probs[0]:.4f}, {probs[1]:.4f}] -> class {y_pred[i]}")
23
24print(f"\nAccuracy: {accuracy_score(y_test, y_pred):.4f}")
25print(f"\n{classification_report(y_test, y_pred)}")Multi-Class Classification
For more than 2 classes, logistic regression extends via:
P(class_k) = e^(z_k) / sum(e^(z_j) for all j)
Softmax ensures all class probabilities sum to 1.
1from sklearn.linear_model import LogisticRegression
2from sklearn.datasets import load_iris
3
4X, y = load_iris(return_X_y=True)
5X_train, X_test, y_train, y_test = train_test_split(
6 X, y, test_size=0.2, random_state=42, stratify=y
7)
8
9# Multi-class with softmax (multinomial)
10model = LogisticRegression(
11 multi_class="multinomial",
12 solver="lbfgs",
13 max_iter=200,
14 random_state=42
15)
16model.fit(X_train, y_train)
17
18# Predict probabilities for all 3 classes
19sample_probs = model.predict_proba(X_test[:3])
20class_names = load_iris().target_names
21
22print("Predicted probabilities:")
23for i, probs in enumerate(sample_probs):
24 print(f" Sample {i}: {dict(zip(class_names, probs.round(4)))}")
25
26print(f"\nAccuracy: {model.score(X_test, y_test):.4f}")