ML Infrastructure & Platforms
Production ML systems rely on specialized infrastructure beyond standard web services. In this lesson, you'll learn about the key infrastructure components that make ML systems reliable, consistent, and scalable: feature stores, model registries, vector databases, and infrastructure as code.
Feature Stores
A feature store is a centralized system for defining, computing, storing, and serving ML features. It solves one of the most common and insidious problems in production ML: training-serving skew.
The Training-Serving Skew Problem
Training time:
raw_data → pandas → feature_engineering.py → model.fit()Serving time:
raw_data → Java backend → different_feature_code → model.predict()
→ Subtle bugs! Features computed differently → degraded model performance
A feature store ensures the exact same feature definitions are used for both training and serving.
Feast (Feature Store)
Feast is the most popular open-source feature store. It provides:
1# feature_repo/features.py — Feast feature definitions
2
3from feast import Entity, FeatureView, Field, FileSource
4from feast.types import Float32, Int64, String
5from datetime import timedelta
6
7# Define the entity (the "who" for feature lookups)
8customer = Entity(
9 name="customer_id",
10 join_keys=["customer_id"],
11 description="Unique customer identifier"
12)
13
14# Define a data source
15customer_transactions_source = FileSource(
16 path="data/customer_transactions.parquet",
17 timestamp_field="event_timestamp",
18 created_timestamp_column="created_timestamp"
19)
20
21# Define a feature view (a group of related features)
22customer_transaction_features = FeatureView(
23 name="customer_transactions",
24 entities=[customer],
25 ttl=timedelta(days=90), # Features expire after 90 days
26 schema=[
27 Field(name="total_transactions_30d", dtype=Int64),
28 Field(name="avg_transaction_amount_30d", dtype=Float32),
29 Field(name="max_transaction_amount_30d", dtype=Float32),
30 Field(name="transaction_count_7d", dtype=Int64),
31 Field(name="unique_merchants_30d", dtype=Int64),
32 ],
33 source=customer_transactions_source,
34 online=True, # Materialize to online store for serving
35)1# Using Feast for training and serving
2from feast import FeatureStore
3import pandas as pd
4
5store = FeatureStore(repo_path="feature_repo/")
6
7# --- OFFLINE: Get features for training ---
8# Point-in-time correct join — no data leakage!
9entity_df = pd.DataFrame({
10 "customer_id": [1001, 1002, 1003],
11 "event_timestamp": pd.to_datetime([
12 "2024-01-15", "2024-01-15", "2024-01-15"
13 ])
14})
15
16training_df = store.get_historical_features(
17 entity_df=entity_df,
18 features=[
19 "customer_transactions:total_transactions_30d",
20 "customer_transactions:avg_transaction_amount_30d",
21 "customer_transactions:transaction_count_7d",
22 ]
23).to_df()
24
25print("Training features:")
26print(training_df)
27
28# --- ONLINE: Get features for serving ---
29# Materialize features to the online store first
30store.materialize_incremental(end_date=datetime.now())
31
32# Then retrieve for a single customer at serving time (low latency)
33online_features = store.get_online_features(
34 features=[
35 "customer_transactions:total_transactions_30d",
36 "customer_transactions:avg_transaction_amount_30d",
37 "customer_transactions:transaction_count_7d",
38 ],
39 entity_rows=[{"customer_id": 1001}]
40).to_dict()
41
42print("\nOnline features for customer 1001:")
43print(online_features)Model Registries
A model registry is a central catalog for managing the lifecycle of trained models. It provides versioning, staging, approval workflows, and deployment tracking.
MLflow Model Registry
Developer trains model
│
▼
Register in MLflow ──► Version 1 (None)
│
▼
Promote to Staging ──► Version 1 (Staging)
│
▼
Run validation tests
│
▼
Promote to Production ──► Version 1 (Production)
│
▼ (new model trained)
Register new version ──► Version 2 (None)
│
▼
Version 1 still in Production
Version 2 in Staging for testing
Vertex AI Model Registry (Google Cloud)
Google's managed offering adds:
1# MLflow Model Registry — promotion workflow
2import mlflow
3from mlflow.tracking import MlflowClient
4
5client = MlflowClient()
6
7# Register a model from a training run
8model_name = "fraud-detector"
9run_id = "abc123def456"
10
11# Register version 1
12result = mlflow.register_model(
13 model_uri=f"runs:/{run_id}/model",
14 name=model_name
15)
16print(f"Registered {model_name} v{result.version}")
17
18# Add description and tags
19client.update_model_version(
20 name=model_name,
21 version=result.version,
22 description="XGBoost fraud classifier trained on 2024-Q1 data"
23)
24client.set_model_version_tag(
25 name=model_name, version=result.version,
26 key="training_dataset", value="fraud_2024_q1"
27)
28client.set_model_version_tag(
29 name=model_name, version=result.version,
30 key="accuracy", value="0.956"
31)
32
33# Promote through stages
34client.transition_model_version_stage(
35 name=model_name, version=result.version, stage="Staging"
36)
37print(f"Moved v{result.version} to Staging")
38
39# After validation...
40client.transition_model_version_stage(
41 name=model_name, version=result.version, stage="Production"
42)
43print(f"Promoted v{result.version} to Production!")
44
45# Load the production model for serving
46prod_model = mlflow.pyfunc.load_model(f"models:/{model_name}/Production")
47prediction = prod_model.predict(test_features)Vector Databases
A vector database stores and indexes high-dimensional vectors (embeddings) for fast similarity search. They are essential for:
When and Why to Use a Vector Database
| Use Case | Why Vectors? |
|---|---|
| Semantic search | Query "budget travel tips" matches "affordable vacation ideas" |
| RAG (LLM context) | Find relevant documents to include in an LLM prompt |
| Recommendations | Find items with similar embeddings to user preferences |
| Deduplication | Find near-duplicate documents or images |
| Anomaly detection | Find data points far from any cluster |
Popular Vector Databases
| Database | Type | Best For |
|---|---|---|
| Pinecone | Managed cloud | Production RAG, scale without ops |
| Chroma | Open-source, lightweight | Prototyping, small-medium scale |
| pgvector | PostgreSQL extension | Already using Postgres, moderate scale |
| Weaviate | Open-source, full-featured | Hybrid search (vector + keyword) |
| Qdrant | Open-source, high-performance | Large-scale, filtering + search |
1# --- Chroma: Lightweight vector database ---
2import chromadb
3
4client = chromadb.Client()
5
6# Create a collection
7collection = client.create_collection(
8 name="documents",
9 metadata={"description": "Government policy documents"}
10)
11
12# Add documents (Chroma auto-embeds with a default model)
13collection.add(
14 documents=[
15 "The housing assistance program provides subsidies for low-income families.",
16 "Veterans are eligible for enhanced healthcare benefits.",
17 "The SNAP program provides food assistance to qualifying households.",
18 "Section 8 vouchers help families afford rental housing.",
19 "Medicare covers hospital stays and medical services for seniors.",
20 ],
21 ids=["doc1", "doc2", "doc3", "doc4", "doc5"],
22 metadatas=[
23 {"department": "housing", "year": 2024},
24 {"department": "veterans", "year": 2024},
25 {"department": "agriculture", "year": 2024},
26 {"department": "housing", "year": 2024},
27 {"department": "health", "year": 2024},
28 ]
29)
30
31# Semantic search — finds relevant documents by meaning
32results = collection.query(
33 query_texts=["affordable housing for families"],
34 n_results=3
35)
36
37print("Query: 'affordable housing for families'")
38for doc, dist in zip(results['documents'][0], results['distances'][0]):
39 print(f" [{dist:.3f}] {doc}")
40
41# Filter + search
42results = collection.query(
43 query_texts=["healthcare coverage"],
44 n_results=2,
45 where={"department": "health"}
46)1# --- pgvector: Vector search in PostgreSQL ---
2# SQL to set up pgvector
3"""
4CREATE EXTENSION vector;
5
6CREATE TABLE documents (
7 id SERIAL PRIMARY KEY,
8 content TEXT,
9 embedding vector(384), -- 384-dim embeddings
10 department TEXT,
11 created_at TIMESTAMP DEFAULT NOW()
12);
13
14-- Create an index for fast similarity search
15CREATE INDEX ON documents
16 USING ivfflat (embedding vector_cosine_ops)
17 WITH (lists = 100);
18
19-- Insert a document with its embedding
20INSERT INTO documents (content, embedding, department)
21VALUES (
22 'Housing assistance for low-income families',
23 '[0.1, 0.2, 0.3, ...]'::vector,
24 'housing'
25);
26
27-- Similarity search (cosine distance)
28SELECT content, 1 - (embedding <=> query_embedding) AS similarity
29FROM documents
30ORDER BY embedding <=> '[0.1, 0.2, ...]'::vector
31LIMIT 5;
32"""
33
34# Python with psycopg2 and pgvector
35import psycopg2
36from pgvector.psycopg2 import register_vector
37
38conn = psycopg2.connect("dbname=mydb")
39register_vector(conn)
40
41cur = conn.cursor()
42
43# Search for similar documents
44query_embedding = model.encode("affordable housing programs")
45cur.execute(
46 "SELECT content, 1 - (embedding <=> %s) AS similarity "
47 "FROM documents ORDER BY embedding <=> %s LIMIT 5",
48 (query_embedding, query_embedding)
49)
50
51for content, similarity in cur.fetchall():
52 print(f"[{similarity:.3f}] {content}")Metadata Stores
A metadata store tracks the lineage and provenance of all artifacts in your ML system:
ML Metadata (MLMD), used by TFX, is the most common open-source metadata store. It tracks:
Infrastructure as Code for ML
Treat ML infrastructure like software infrastructure — define it in code, version it, and automate it.
1# Terraform-style infrastructure as code for ML (pseudocode)
2# In practice, this would be HCL (.tf files) or Pulumi (Python)
3
4ml_infrastructure = {
5 "feature_store": {
6 "provider": "feast",
7 "offline_store": {"type": "bigquery", "project": "my-project"},
8 "online_store": {"type": "redis", "host": "redis.internal:6379"},
9 "registry": {"type": "gcs", "path": "gs://ml-registry/feast/"},
10 },
11 "model_registry": {
12 "provider": "mlflow",
13 "backend_store": "postgresql://mlflow:pass@db:5432/mlflow",
14 "artifact_store": "gs://ml-artifacts/mlflow/",
15 },
16 "serving": {
17 "provider": "kubernetes",
18 "gpu_type": "nvidia-t4",
19 "min_replicas": 2,
20 "max_replicas": 20,
21 "autoscaling": {
22 "target_cpu": 70,
23 "target_latency_ms": 200,
24 },
25 },
26 "monitoring": {
27 "prometheus": {"retention_days": 30},
28 "grafana": {"dashboards": ["model-performance", "data-drift"]},
29 "evidently": {"drift_check_schedule": "0 * * * *"}, # Hourly
30 },
31 "vector_database": {
32 "provider": "pgvector",
33 "host": "postgres.internal:5432",
34 "index_type": "ivfflat",
35 "dimensions": 384,
36 },
37}
38
39# In a real project, this would be:
40# terraform apply
41# or
42# pulumi up