ML Infrastructure & Platforms

Production ML systems rely on specialized infrastructure beyond standard web services. In this lesson, you'll learn about the key infrastructure components that make ML systems reliable, consistent, and scalable: feature stores, model registries, vector databases, and infrastructure as code.

Feature Stores

A feature store is a centralized system for defining, computing, storing, and serving ML features. It solves one of the most common and insidious problems in production ML: training-serving skew.

The Training-Serving Skew Problem

Training time: raw_data → pandas → feature_engineering.py → model.fit() Serving time: raw_data → Java backend → different_feature_code → model.predict()

→ Subtle bugs! Features computed differently → degraded model performance

A feature store ensures the exact same feature definitions are used for both training and serving.

Feast (Feature Store)

Feast is the most popular open-source feature store. It provides:

Offline store: For training — batch access to historical features (BigQuery, Parquet, etc.)

Online store: For serving — low-latency feature lookup (Redis, DynamoDB, etc.)

Feature consistency: Same definition for both stores

Point-in-time correct joins: No data leakage in training

python

1# feature_repo/features.py — Feast feature definitions
2
3from feast import Entity, FeatureView, Field, FileSource
4from feast.types import Float32, Int64, String
5from datetime import timedelta
6
7# Define the entity (the "who" for feature lookups)
8customer = Entity(
9    name="customer_id",
10    join_keys=["customer_id"],
11    description="Unique customer identifier"
12)
13
14# Define a data source
15customer_transactions_source = FileSource(
16    path="data/customer_transactions.parquet",
17    timestamp_field="event_timestamp",
18    created_timestamp_column="created_timestamp"
19)
20
21# Define a feature view (a group of related features)
22customer_transaction_features = FeatureView(
23    name="customer_transactions",
24    entities=[customer],
25    ttl=timedelta(days=90),  # Features expire after 90 days
26    schema=[
27        Field(name="total_transactions_30d", dtype=Int64),
28        Field(name="avg_transaction_amount_30d", dtype=Float32),
29        Field(name="max_transaction_amount_30d", dtype=Float32),
30        Field(name="transaction_count_7d", dtype=Int64),
31        Field(name="unique_merchants_30d", dtype=Int64),
32    ],
33    source=customer_transactions_source,
34    online=True,  # Materialize to online store for serving
35)

python

1# Using Feast for training and serving
2from feast import FeatureStore
3import pandas as pd
4
5store = FeatureStore(repo_path="feature_repo/")
6
7# --- OFFLINE: Get features for training ---
8# Point-in-time correct join — no data leakage!
9entity_df = pd.DataFrame({
10    "customer_id": [1001, 1002, 1003],
11    "event_timestamp": pd.to_datetime([
12        "2024-01-15", "2024-01-15", "2024-01-15"
13    ])
14})
15
16training_df = store.get_historical_features(
17    entity_df=entity_df,
18    features=[
19        "customer_transactions:total_transactions_30d",
20        "customer_transactions:avg_transaction_amount_30d",
21        "customer_transactions:transaction_count_7d",
22    ]
23).to_df()
24
25print("Training features:")
26print(training_df)
27
28# --- ONLINE: Get features for serving ---
29# Materialize features to the online store first
30store.materialize_incremental(end_date=datetime.now())
31
32# Then retrieve for a single customer at serving time (low latency)
33online_features = store.get_online_features(
34    features=[
35        "customer_transactions:total_transactions_30d",
36        "customer_transactions:avg_transaction_amount_30d",
37        "customer_transactions:transaction_count_7d",
38    ],
39    entity_rows=[{"customer_id": 1001}]
40).to_dict()
41
42print("\nOnline features for customer 1001:")
43print(online_features)

Model Registries

A model registry is a central catalog for managing the lifecycle of trained models. It provides versioning, staging, approval workflows, and deployment tracking.

MLflow Model Registry

Developer trains model
      │
      ▼
Register in MLflow ──► Version 1 (None)
      │
      ▼
Promote to Staging ──► Version 1 (Staging)
      │
      ▼
Run validation tests
      │
      ▼
Promote to Production ──► Version 1 (Production)
      │
      ▼ (new model trained)
Register new version ──► Version 2 (None)
      │
      ▼
Version 1 still in Production
Version 2 in Staging for testing

Vertex AI Model Registry (Google Cloud)

Google's managed offering adds:

Endpoints: Deploy models with traffic splitting

Model evaluation: Built-in evaluation metrics

Explainability: Integrated SHAP/IG

Monitoring: Automatic drift detection

python

1# MLflow Model Registry — promotion workflow
2import mlflow
3from mlflow.tracking import MlflowClient
4
5client = MlflowClient()
6
7# Register a model from a training run
8model_name = "fraud-detector"
9run_id = "abc123def456"
10
11# Register version 1
12result = mlflow.register_model(
13    model_uri=f"runs:/{run_id}/model",
14    name=model_name
15)
16print(f"Registered {model_name} v{result.version}")
17
18# Add description and tags
19client.update_model_version(
20    name=model_name,
21    version=result.version,
22    description="XGBoost fraud classifier trained on 2024-Q1 data"
23)
24client.set_model_version_tag(
25    name=model_name, version=result.version,
26    key="training_dataset", value="fraud_2024_q1"
27)
28client.set_model_version_tag(
29    name=model_name, version=result.version,
30    key="accuracy", value="0.956"
31)
32
33# Promote through stages
34client.transition_model_version_stage(
35    name=model_name, version=result.version, stage="Staging"
36)
37print(f"Moved v{result.version} to Staging")
38
39# After validation...
40client.transition_model_version_stage(
41    name=model_name, version=result.version, stage="Production"
42)
43print(f"Promoted v{result.version} to Production!")
44
45# Load the production model for serving
46prod_model = mlflow.pyfunc.load_model(f"models:/{model_name}/Production")
47prediction = prod_model.predict(test_features)

Vector Databases

A vector database stores and indexes high-dimensional vectors (embeddings) for fast similarity search. They are essential for:

Semantic search (find documents by meaning, not keywords)

Retrieval-Augmented Generation (RAG for LLMs)

Recommendation systems

Image/audio similarity search

When and Why to Use a Vector Database

Use Case	Why Vectors?
Semantic search	Query "budget travel tips" matches "affordable vacation ideas"
RAG (LLM context)	Find relevant documents to include in an LLM prompt
Recommendations	Find items with similar embeddings to user preferences
Deduplication	Find near-duplicate documents or images
Anomaly detection	Find data points far from any cluster

Popular Vector Databases

Database	Type	Best For
Pinecone	Managed cloud	Production RAG, scale without ops
Chroma	Open-source, lightweight	Prototyping, small-medium scale
pgvector	PostgreSQL extension	Already using Postgres, moderate scale
Weaviate	Open-source, full-featured	Hybrid search (vector + keyword)
Qdrant	Open-source, high-performance	Large-scale, filtering + search

python

1# --- Chroma: Lightweight vector database ---
2import chromadb
3
4client = chromadb.Client()
5
6# Create a collection
7collection = client.create_collection(
8    name="documents",
9    metadata={"description": "Government policy documents"}
10)
11
12# Add documents (Chroma auto-embeds with a default model)
13collection.add(
14    documents=[
15        "The housing assistance program provides subsidies for low-income families.",
16        "Veterans are eligible for enhanced healthcare benefits.",
17        "The SNAP program provides food assistance to qualifying households.",
18        "Section 8 vouchers help families afford rental housing.",
19        "Medicare covers hospital stays and medical services for seniors.",
20    ],
21    ids=["doc1", "doc2", "doc3", "doc4", "doc5"],
22    metadatas=[
23        {"department": "housing", "year": 2024},
24        {"department": "veterans", "year": 2024},
25        {"department": "agriculture", "year": 2024},
26        {"department": "housing", "year": 2024},
27        {"department": "health", "year": 2024},
28    ]
29)
30
31# Semantic search — finds relevant documents by meaning
32results = collection.query(
33    query_texts=["affordable housing for families"],
34    n_results=3
35)
36
37print("Query: 'affordable housing for families'")
38for doc, dist in zip(results['documents'][0], results['distances'][0]):
39    print(f"  [{dist:.3f}] {doc}")
40
41# Filter + search
42results = collection.query(
43    query_texts=["healthcare coverage"],
44    n_results=2,
45    where={"department": "health"}
46)

python

1# --- pgvector: Vector search in PostgreSQL ---
2# SQL to set up pgvector
3"""
4CREATE EXTENSION vector;
5
6CREATE TABLE documents (
7    id SERIAL PRIMARY KEY,
8    content TEXT,
9    embedding vector(384),   -- 384-dim embeddings
10    department TEXT,
11    created_at TIMESTAMP DEFAULT NOW()
12);
13
14-- Create an index for fast similarity search
15CREATE INDEX ON documents
16    USING ivfflat (embedding vector_cosine_ops)
17    WITH (lists = 100);
18
19-- Insert a document with its embedding
20INSERT INTO documents (content, embedding, department)
21VALUES (
22    'Housing assistance for low-income families',
23    '[0.1, 0.2, 0.3, ...]'::vector,
24    'housing'
25);
26
27-- Similarity search (cosine distance)
28SELECT content, 1 - (embedding <=> query_embedding) AS similarity
29FROM documents
30ORDER BY embedding <=> '[0.1, 0.2, ...]'::vector
31LIMIT 5;
32"""
33
34# Python with psycopg2 and pgvector
35import psycopg2
36from pgvector.psycopg2 import register_vector
37
38conn = psycopg2.connect("dbname=mydb")
39register_vector(conn)
40
41cur = conn.cursor()
42
43# Search for similar documents
44query_embedding = model.encode("affordable housing programs")
45cur.execute(
46    "SELECT content, 1 - (embedding <=> %s) AS similarity "
47    "FROM documents ORDER BY embedding <=> %s LIMIT 5",
48    (query_embedding, query_embedding)
49)
50
51for content, similarity in cur.fetchall():
52    print(f"[{similarity:.3f}] {content}")

Metadata Stores

A metadata store tracks the lineage and provenance of all artifacts in your ML system:

Which dataset was used to train which model?

Which features were computed from which raw data?

Which model version is currently serving?

Who approved the deployment?

ML Metadata (MLMD), used by TFX, is the most common open-source metadata store. It tracks:

Artifacts: Datasets, models, metrics, schemas

Executions: Training runs, evaluation runs, transformations

Contexts: Experiments, pipelines, projects

Infrastructure as Code for ML

Treat ML infrastructure like software infrastructure — define it in code, version it, and automate it.

python

1# Terraform-style infrastructure as code for ML (pseudocode)
2# In practice, this would be HCL (.tf files) or Pulumi (Python)
3
4ml_infrastructure = {
5    "feature_store": {
6        "provider": "feast",
7        "offline_store": {"type": "bigquery", "project": "my-project"},
8        "online_store": {"type": "redis", "host": "redis.internal:6379"},
9        "registry": {"type": "gcs", "path": "gs://ml-registry/feast/"},
10    },
11    "model_registry": {
12        "provider": "mlflow",
13        "backend_store": "postgresql://mlflow:pass@db:5432/mlflow",
14        "artifact_store": "gs://ml-artifacts/mlflow/",
15    },
16    "serving": {
17        "provider": "kubernetes",
18        "gpu_type": "nvidia-t4",
19        "min_replicas": 2,
20        "max_replicas": 20,
21        "autoscaling": {
22            "target_cpu": 70,
23            "target_latency_ms": 200,
24        },
25    },
26    "monitoring": {
27        "prometheus": {"retention_days": 30},
28        "grafana": {"dashboards": ["model-performance", "data-drift"]},
29        "evidently": {"drift_check_schedule": "0 * * * *"},  # Hourly
30    },
31    "vector_database": {
32        "provider": "pgvector",
33        "host": "postgres.internal:5432",
34        "index_type": "ivfflat",
35        "dimensions": 384,
36    },
37}
38
39# In a real project, this would be:
40# terraform apply
41# or
42# pulumi up

Start Simple, Scale Later

You don't need all of this infrastructure on day one. Start with: 1. **MLflow** for experiment tracking and model registry 2. **PostgreSQL + pgvector** for vector search (if you already have Postgres) 3. **Simple feature pipeline** in Python Add Feast, Kubernetes, and dedicated vector DBs when scale demands it.

ML Infrastructure & Platforms

Feature Stores

The Training-Serving Skew Problem

Training time: raw_data → pandas → feature_engineering.py → model.fit() Serving time: raw_data → Java backend → different_feature_code → model.predict()

→ Subtle bugs! Features computed differently → degraded model performance

A feature store ensures the exact same feature definitions are used for both training and serving.

Feast (Feature Store)

Feast is the most popular open-source feature store. It provides:

Offline store: For training — batch access to historical features (BigQuery, Parquet, etc.)

Online store: For serving — low-latency feature lookup (Redis, DynamoDB, etc.)

Feature consistency: Same definition for both stores

Point-in-time correct joins: No data leakage in training

python

1# feature_repo/features.py — Feast feature definitions
2
3from feast import Entity, FeatureView, Field, FileSource
4from feast.types import Float32, Int64, String
5from datetime import timedelta
6
7# Define the entity (the "who" for feature lookups)
8customer = Entity(
9    name="customer_id",
10    join_keys=["customer_id"],
11    description="Unique customer identifier"
12)
13
14# Define a data source
15customer_transactions_source = FileSource(
16    path="data/customer_transactions.parquet",
17    timestamp_field="event_timestamp",
18    created_timestamp_column="created_timestamp"
19)
20
21# Define a feature view (a group of related features)
22customer_transaction_features = FeatureView(
23    name="customer_transactions",
24    entities=[customer],
25    ttl=timedelta(days=90),  # Features expire after 90 days
26    schema=[
27        Field(name="total_transactions_30d", dtype=Int64),
28        Field(name="avg_transaction_amount_30d", dtype=Float32),
29        Field(name="max_transaction_amount_30d", dtype=Float32),
30        Field(name="transaction_count_7d", dtype=Int64),
31        Field(name="unique_merchants_30d", dtype=Int64),
32    ],
33    source=customer_transactions_source,
34    online=True,  # Materialize to online store for serving
35)

python

1# Using Feast for training and serving
2from feast import FeatureStore
3import pandas as pd
4
5store = FeatureStore(repo_path="feature_repo/")
6
7# --- OFFLINE: Get features for training ---
8# Point-in-time correct join — no data leakage!
9entity_df = pd.DataFrame({
10    "customer_id": [1001, 1002, 1003],
11    "event_timestamp": pd.to_datetime([
12        "2024-01-15", "2024-01-15", "2024-01-15"
13    ])
14})
15
16training_df = store.get_historical_features(
17    entity_df=entity_df,
18    features=[
19        "customer_transactions:total_transactions_30d",
20        "customer_transactions:avg_transaction_amount_30d",
21        "customer_transactions:transaction_count_7d",
22    ]
23).to_df()
24
25print("Training features:")
26print(training_df)
27
28# --- ONLINE: Get features for serving ---
29# Materialize features to the online store first
30store.materialize_incremental(end_date=datetime.now())
31
32# Then retrieve for a single customer at serving time (low latency)
33online_features = store.get_online_features(
34    features=[
35        "customer_transactions:total_transactions_30d",
36        "customer_transactions:avg_transaction_amount_30d",
37        "customer_transactions:transaction_count_7d",
38    ],
39    entity_rows=[{"customer_id": 1001}]
40).to_dict()
41
42print("\nOnline features for customer 1001:")
43print(online_features)

Model Registries

A model registry is a central catalog for managing the lifecycle of trained models. It provides versioning, staging, approval workflows, and deployment tracking.

MLflow Model Registry

Developer trains model
      │
      ▼
Register in MLflow ──► Version 1 (None)
      │
      ▼
Promote to Staging ──► Version 1 (Staging)
      │
      ▼
Run validation tests
      │
      ▼
Promote to Production ──► Version 1 (Production)
      │
      ▼ (new model trained)
Register new version ──► Version 2 (None)
      │
      ▼
Version 1 still in Production
Version 2 in Staging for testing

Vertex AI Model Registry (Google Cloud)

Google's managed offering adds:

Endpoints: Deploy models with traffic splitting

Model evaluation: Built-in evaluation metrics

Explainability: Integrated SHAP/IG

Monitoring: Automatic drift detection

python

1# MLflow Model Registry — promotion workflow
2import mlflow
3from mlflow.tracking import MlflowClient
4
5client = MlflowClient()
6
7# Register a model from a training run
8model_name = "fraud-detector"
9run_id = "abc123def456"
10
11# Register version 1
12result = mlflow.register_model(
13    model_uri=f"runs:/{run_id}/model",
14    name=model_name
15)
16print(f"Registered {model_name} v{result.version}")
17
18# Add description and tags
19client.update_model_version(
20    name=model_name,
21    version=result.version,
22    description="XGBoost fraud classifier trained on 2024-Q1 data"
23)
24client.set_model_version_tag(
25    name=model_name, version=result.version,
26    key="training_dataset", value="fraud_2024_q1"
27)
28client.set_model_version_tag(
29    name=model_name, version=result.version,
30    key="accuracy", value="0.956"
31)
32
33# Promote through stages
34client.transition_model_version_stage(
35    name=model_name, version=result.version, stage="Staging"
36)
37print(f"Moved v{result.version} to Staging")
38
39# After validation...
40client.transition_model_version_stage(
41    name=model_name, version=result.version, stage="Production"
42)
43print(f"Promoted v{result.version} to Production!")
44
45# Load the production model for serving
46prod_model = mlflow.pyfunc.load_model(f"models:/{model_name}/Production")
47prediction = prod_model.predict(test_features)

Vector Databases

A vector database stores and indexes high-dimensional vectors (embeddings) for fast similarity search. They are essential for:

Semantic search (find documents by meaning, not keywords)

Retrieval-Augmented Generation (RAG for LLMs)

Recommendation systems

Image/audio similarity search

When and Why to Use a Vector Database

Use Case	Why Vectors?
Semantic search	Query "budget travel tips" matches "affordable vacation ideas"
RAG (LLM context)	Find relevant documents to include in an LLM prompt
Recommendations	Find items with similar embeddings to user preferences
Deduplication	Find near-duplicate documents or images
Anomaly detection	Find data points far from any cluster

Popular Vector Databases

Database	Type	Best For
Pinecone	Managed cloud	Production RAG, scale without ops
Chroma	Open-source, lightweight	Prototyping, small-medium scale
pgvector	PostgreSQL extension	Already using Postgres, moderate scale
Weaviate	Open-source, full-featured	Hybrid search (vector + keyword)
Qdrant	Open-source, high-performance	Large-scale, filtering + search

python

1# --- Chroma: Lightweight vector database ---
2import chromadb
3
4client = chromadb.Client()
5
6# Create a collection
7collection = client.create_collection(
8    name="documents",
9    metadata={"description": "Government policy documents"}
10)
11
12# Add documents (Chroma auto-embeds with a default model)
13collection.add(
14    documents=[
15        "The housing assistance program provides subsidies for low-income families.",
16        "Veterans are eligible for enhanced healthcare benefits.",
17        "The SNAP program provides food assistance to qualifying households.",
18        "Section 8 vouchers help families afford rental housing.",
19        "Medicare covers hospital stays and medical services for seniors.",
20    ],
21    ids=["doc1", "doc2", "doc3", "doc4", "doc5"],
22    metadatas=[
23        {"department": "housing", "year": 2024},
24        {"department": "veterans", "year": 2024},
25        {"department": "agriculture", "year": 2024},
26        {"department": "housing", "year": 2024},
27        {"department": "health", "year": 2024},
28    ]
29)
30
31# Semantic search — finds relevant documents by meaning
32results = collection.query(
33    query_texts=["affordable housing for families"],
34    n_results=3
35)
36
37print("Query: 'affordable housing for families'")
38for doc, dist in zip(results['documents'][0], results['distances'][0]):
39    print(f"  [{dist:.3f}] {doc}")
40
41# Filter + search
42results = collection.query(
43    query_texts=["healthcare coverage"],
44    n_results=2,
45    where={"department": "health"}
46)

python

1# --- pgvector: Vector search in PostgreSQL ---
2# SQL to set up pgvector
3"""
4CREATE EXTENSION vector;
5
6CREATE TABLE documents (
7    id SERIAL PRIMARY KEY,
8    content TEXT,
9    embedding vector(384),   -- 384-dim embeddings
10    department TEXT,
11    created_at TIMESTAMP DEFAULT NOW()
12);
13
14-- Create an index for fast similarity search
15CREATE INDEX ON documents
16    USING ivfflat (embedding vector_cosine_ops)
17    WITH (lists = 100);
18
19-- Insert a document with its embedding
20INSERT INTO documents (content, embedding, department)
21VALUES (
22    'Housing assistance for low-income families',
23    '[0.1, 0.2, 0.3, ...]'::vector,
24    'housing'
25);
26
27-- Similarity search (cosine distance)
28SELECT content, 1 - (embedding <=> query_embedding) AS similarity
29FROM documents
30ORDER BY embedding <=> '[0.1, 0.2, ...]'::vector
31LIMIT 5;
32"""
33
34# Python with psycopg2 and pgvector
35import psycopg2
36from pgvector.psycopg2 import register_vector
37
38conn = psycopg2.connect("dbname=mydb")
39register_vector(conn)
40
41cur = conn.cursor()
42
43# Search for similar documents
44query_embedding = model.encode("affordable housing programs")
45cur.execute(
46    "SELECT content, 1 - (embedding <=> %s) AS similarity "
47    "FROM documents ORDER BY embedding <=> %s LIMIT 5",
48    (query_embedding, query_embedding)
49)
50
51for content, similarity in cur.fetchall():
52    print(f"[{similarity:.3f}] {content}")

Metadata Stores

A metadata store tracks the lineage and provenance of all artifacts in your ML system:

Which dataset was used to train which model?

Which features were computed from which raw data?

Which model version is currently serving?

Who approved the deployment?

ML Metadata (MLMD), used by TFX, is the most common open-source metadata store. It tracks:

Artifacts: Datasets, models, metrics, schemas

Executions: Training runs, evaluation runs, transformations

Contexts: Experiments, pipelines, projects

Infrastructure as Code for ML

Treat ML infrastructure like software infrastructure — define it in code, version it, and automate it.

python

1# Terraform-style infrastructure as code for ML (pseudocode)
2# In practice, this would be HCL (.tf files) or Pulumi (Python)
3
4ml_infrastructure = {
5    "feature_store": {
6        "provider": "feast",
7        "offline_store": {"type": "bigquery", "project": "my-project"},
8        "online_store": {"type": "redis", "host": "redis.internal:6379"},
9        "registry": {"type": "gcs", "path": "gs://ml-registry/feast/"},
10    },
11    "model_registry": {
12        "provider": "mlflow",
13        "backend_store": "postgresql://mlflow:pass@db:5432/mlflow",
14        "artifact_store": "gs://ml-artifacts/mlflow/",
15    },
16    "serving": {
17        "provider": "kubernetes",
18        "gpu_type": "nvidia-t4",
19        "min_replicas": 2,
20        "max_replicas": 20,
21        "autoscaling": {
22            "target_cpu": 70,
23            "target_latency_ms": 200,
24        },
25    },
26    "monitoring": {
27        "prometheus": {"retention_days": 30},
28        "grafana": {"dashboards": ["model-performance", "data-drift"]},
29        "evidently": {"drift_check_schedule": "0 * * * *"},  # Hourly
30    },
31    "vector_database": {
32        "provider": "pgvector",
33        "host": "postgres.internal:5432",
34        "index_type": "ivfflat",
35        "dimensions": 384,
36    },
37}
38
39# In a real project, this would be:
40# terraform apply
41# or
42# pulumi up