AI System Design Patterns

Building a production AI system is much more than training a model. It requires designing data pipelines, serving infrastructure, monitoring systems, and governance frameworks that all work together. In this lesson, you'll learn the design patterns and architectural decisions behind real-world AI systems.

The 80/20 Rule of ML Systems

Only about 5-10% of a production ML system is the model code itself. The rest is data pipelines, feature engineering, serving infrastructure, monitoring, configuration, and testing. Designing the system around the model is where most of the engineering effort goes.

Build vs Buy vs Fine-Tune

The first architectural decision is whether to build a custom model, buy an off-the-shelf solution, or fine-tune a pre-trained model.

Factor	Build from Scratch	Fine-Tune Pre-Trained	Buy / API
Data required	Large (10K-1M+ labeled)	Moderate (100-10K labeled)	None or few-shot
Time to deploy	Months	Weeks	Days
Cost	High (compute + team)	Moderate	Per-request pricing
Customization	Full control	Moderate	Limited
Maintenance	Full responsibility	Moderate	Vendor handles
Data privacy	Data stays in-house	Data stays in-house	Data sent to vendor
Best when	Unique problem, large data, competitive edge	Good pre-trained base exists	Commodity task, fast time-to-market

Decision Framework

Is this a commodity task (translation, OCR, sentiment)?
  YES → Use an API / buy
  NO  → Does a strong pre-trained model exist for your domain?
        YES → Fine-tune it
        NO  → Build from scratch

Inference Patterns: Online vs Batch vs Near-Real-Time

Pattern	Latency	Throughput	Cost	Use Case
Online (Real-Time)	<100ms	Per-request	High (always-on)	Fraud detection, recommendations, chatbots
Batch	Hours	Millions at once	Low (scheduled)	Credit scoring, report generation, ETL
Near-Real-Time (Streaming)	Seconds to minutes	Continuous stream	Medium	Anomaly detection, IoT, live dashboards

Online Inference Architecture

Client → Load Balancer → Model Server (GPU/CPU)
                              ↓
                        Feature Store (online) → cached features
                              ↓
                        Prediction → Response

Batch Inference Architecture

Scheduler → Data Warehouse → Feature Pipeline → Model
                                                   ↓
                                            Prediction Store → Downstream Systems

Reference Architecture: Document Intelligence System

Let's design a complete system for classifying, extracting, and routing government documents.

                    ┌─────────────────────────────────────────────┐
                    │              Document Intelligence          │
                    ├─────────────────────────────────────────────┤
                    │                                             │
  Document Upload   │  ┌──────────┐    ┌──────────────┐          │
  ──────────────►   │  │   OCR    │───►│  Text Clean   │         │
                    │  └──────────┘    └──────┬───────┘          │
                    │                         │                   │
                    │              ┌──────────▼──────────┐        │
                    │              │   Feature Store      │       │
                    │              │  (embeddings, meta)  │       │
                    │              └──────────┬──────────┘        │
                    │                   ┌─────┴─────┐             │
                    │                   │           │             │
                    │          ┌────────▼──┐  ┌────▼────────┐    │
                    │          │ Classifier │  │  NER/Entity  │   │
                    │          │ (type/dept)│  │  Extraction  │   │
                    │          └────────┬──┘  └────┬────────┘    │
                    │                   └─────┬─────┘             │
                    │              ┌──────────▼──────────┐        │
                    │              │   Decision Engine    │       │
                    │              │  (routing + priority)│       │
                    │              └──────────┬──────────┘        │
                    │                         │                   │
                    │              ┌──────────▼──────────┐        │
                    │              │   Monitoring &       │       │
                    │              │   Audit Trail        │       │
                    │              └─────────────────────┘        │
                    └─────────────────────────────────────────────┘

Scaling layers in this architecture: 1. Data Layer: Scalable storage (S3/GCS), data versioning, partitioning 2. Training Layer: Distributed training, experiment tracking, model registry 3. Serving Layer: Auto-scaling model servers, load balancing, caching 4. Monitoring Layer: Drift detection, performance metrics, alerting

AI Governance & Ethics for Government

Government AI systems have unique requirements beyond typical commercial applications. These are not optional — they are legal, ethical, and operational necessities.

Key Concerns

Concern	Why It Matters	Example
Bias & Fairness	Government decisions affect citizens' lives	A benefits-eligibility model that disadvantages certain demographics
Explainability	Decisions must be justifiable and auditable	A citizen denied a permit deserves to know why
Privacy	Government holds sensitive personal data	PII in training data must be protected (FISMA, FedRAMP)
Security	Models can be attacked (adversarial inputs, data poisoning)	An adversarial document crafted to fool a classifier
Compliance	Federal regulations require specific safeguards	NIST AI RMF, OMB AI guidance, Section 508
Accessibility	Services must be accessible to all citizens	Section 508 requires AI-powered tools to be usable by people with disabilities

The Government AI Checklist

Before deploying an AI system in a government context:

[ ] Bias testing across protected categories (race, gender, age, disability)

[ ] Explainability method implemented (SHAP, LIME, attention maps)

[ ] Privacy Impact Assessment (PIA) completed

[ ] Authority to Operate (ATO) obtained

[ ] Section 508 accessibility audit passed

[ ] Model card documented (purpose, data, metrics, limitations)

[ ] Audit trail enabled (who made what decision, when, based on what)

[ ] Human-in-the-loop for high-stakes decisions

[ ] Incident response plan for model failures

High-Stakes AI Requires Human Oversight

For government applications that affect people's rights, benefits, or liberty, always include a human-in-the-loop. The AI should recommend, not decide. A human reviewer should make the final call, with the AI's reasoning visible and auditable.