Modern Forecasting Methods
The latest advances in time series forecasting leverage Transformer architectures, specialized neural network designs, and foundation models pre-trained on massive datasets.
Temporal Fusion Transformers (TFT)
TFT (Google, 2019) is an attention-based architecture specifically designed for multi-horizon forecasting:
TFT excels when you have:
N-BEATS (Neural Basis Expansion)
N-BEATS (Oreshkin et al., 2019) is a pure deep learning architecture that achieved state-of-the-art results:
Architecture
1import numpy as np
2
3class NBEATSBlock:
4 """A single N-BEATS block (simplified)."""
5
6 def __init__(self, input_dim, hidden_dim, backcast_dim, forecast_dim):
7 self.input_dim = input_dim
8 self.backcast_dim = backcast_dim
9 self.forecast_dim = forecast_dim
10
11 scale = np.sqrt(2.0 / hidden_dim)
12
13 # Fully connected layers
14 self.W1 = np.random.randn(input_dim, hidden_dim) * scale
15 self.b1 = np.zeros(hidden_dim)
16 self.W2 = np.random.randn(hidden_dim, hidden_dim) * scale
17 self.b2 = np.zeros(hidden_dim)
18
19 # Backcast and forecast heads
20 self.W_back = np.random.randn(hidden_dim, backcast_dim) * scale
21 self.b_back = np.zeros(backcast_dim)
22 self.W_fore = np.random.randn(hidden_dim, forecast_dim) * scale
23 self.b_fore = np.zeros(forecast_dim)
24
25 def forward(self, x):
26 """Forward pass returning backcast and forecast."""
27 h = np.maximum(0, x @ self.W1 + self.b1)
28 h = np.maximum(0, h @ self.W2 + self.b2)
29 backcast = h @ self.W_back + self.b_back
30 forecast = h @ self.W_fore + self.b_fore
31 return backcast, forecast
32
33
34class SimpleNBEATS:
35 """Simplified N-BEATS model."""
36
37 def __init__(self, input_dim, forecast_dim, n_blocks=3, hidden_dim=64):
38 self.blocks = [
39 NBEATSBlock(input_dim, hidden_dim, input_dim, forecast_dim)
40 for _ in range(n_blocks)
41 ]
42
43 def forward(self, x):
44 """
45 Process input through all blocks with residual learning.
46 Each block sees the residual from previous blocks.
47 """
48 residual = x.copy()
49 total_forecast = np.zeros(self.blocks[0].forecast_dim)
50
51 for block in self.blocks:
52 backcast, forecast = block.forward(residual)
53 residual = residual - backcast # Subtract what was explained
54 total_forecast = total_forecast + forecast # Add forecast contribution
55
56 return total_forecast
57
58
59# Demo
60input_dim = 30 # Look-back window
61forecast_dim = 10 # Forecast horizon
62
63model = SimpleNBEATS(input_dim, forecast_dim, n_blocks=3, hidden_dim=32)
64
65# Test with a sample input
66x = np.random.randn(input_dim)
67forecast = model.forward(x)
68print(f"Input shape: {x.shape}")
69print(f"Forecast shape: {forecast.shape}")
70print(f"Forecast: {np.round(forecast, 3)}")PatchTST (Patch Time Series Transformer)
PatchTST (Nie et al., 2023) adapts Vision Transformer ideas to time series:
1. Patching: Divide the time series into non-overlapping patches (e.g., 16 consecutive points = 1 patch) 2. Patch embedding: Project each patch to an embedding vector 3. Transformer encoder: Apply self-attention across patches 4. Channel independence: Process each variable independently
This is much more efficient than point-level attention (O(n^2) vs O((n/p)^2)).
Foundation Models for Time Series
Just as GPT revolutionized NLP, foundation models are emerging for time series:
TimeGPT (Nixtla)
Chronos (Amazon)
Lag-Llama
Probabilistic Forecasting
1import numpy as np
2
3class QuantileForecaster:
4 """
5 Quantile regression for probabilistic forecasting.
6 Predicts multiple quantiles to form prediction intervals.
7 """
8
9 def __init__(self, input_dim, quantiles=(0.1, 0.5, 0.9)):
10 self.quantiles = quantiles
11 self.models = {}
12
13 for q in quantiles:
14 # Separate linear model for each quantile
15 self.models[q] = {
16 'W': np.random.randn(input_dim, 1) * 0.01,
17 'b': np.zeros(1),
18 }
19
20 def quantile_loss(self, y_true, y_pred, q):
21 """Pinball loss for quantile regression."""
22 errors = y_true - y_pred
23 return np.mean(np.maximum(q * errors, (q - 1) * errors))
24
25 def fit(self, X, y, epochs=200, lr=0.001):
26 """Train each quantile model separately."""
27 for q in self.quantiles:
28 W = self.models[q]['W']
29 b = self.models[q]['b']
30
31 for _ in range(epochs):
32 pred = (X @ W + b).flatten()
33 errors = y - pred
34
35 # Gradient of pinball loss
36 grad = np.where(errors >= 0, -q, -(q - 1))
37 grad_W = (X.T @ grad.reshape(-1, 1)) / len(y)
38 grad_b = np.mean(grad)
39
40 W -= lr * grad_W
41 b -= lr * grad_b
42
43 self.models[q]['W'] = W
44 self.models[q]['b'] = b
45
46 def predict(self, X):
47 """Predict all quantiles."""
48 predictions = {}
49 for q in self.quantiles:
50 W = self.models[q]['W']
51 b = self.models[q]['b']
52 predictions[q] = (X @ W + b).flatten()
53 return predictions
54
55
56# Demo: probabilistic forecast
57np.random.seed(42)
58n = 300
59t = np.arange(n, dtype=float)
60y = 2 * np.sin(t / 10) + t * 0.01 + np.random.randn(n) * 0.5
61
62# Create features (lagged values)
63window = 10
64X = np.array([y[i:i+window] for i in range(n - window)])
65targets = y[window:]
66
67# Split
68split = 250
69X_train, X_test = X[:split], X[split:]
70y_train, y_test = targets[:split], targets[split:]
71
72# Train quantile forecaster
73model = QuantileForecaster(input_dim=window, quantiles=(0.1, 0.25, 0.5, 0.75, 0.9))
74model.fit(X_train, y_train, epochs=300, lr=0.001)
75
76# Predict
77preds = model.predict(X_test)
78
79# Evaluate coverage
80p10, p50, p90 = preds[0.1], preds[0.5], preds[0.9]
81coverage_80 = np.mean((y_test >= p10) & (y_test <= p90))
82mae_median = np.mean(np.abs(y_test - p50))
83
84print(f"80% interval coverage: {coverage_80*100:.1f}% (target: 80%)")
85print(f"Median forecast MAE: {mae_median:.4f}")
86print(f"Average interval width: {np.mean(p90 - p10):.4f}")Choosing the Right Method
| Method | Best For | Data Requirement | Interpretability |
|---|---|---|---|
| ARIMA/SARIMA | Single series, clear patterns | Small-medium | High |
| Prophet | Business data with holidays | Medium | High |
| LSTM/GRU | Complex nonlinear patterns | Large | Low |
| TCN | Long sequences, need speed | Large | Low |
| N-BEATS | Pure forecasting benchmark | Large | Medium |
| TFT | Multi-series with metadata | Very large | Medium |
| PatchTST | Long-context multivariate | Large | Low |
| Foundation models | Zero/few-shot scenarios | Pre-trained | Low |