Skip to main content

AI System Design Patterns

Architect end-to-end ML systems with proven design patterns

~55 min
Listen to this lesson

AI System Design Patterns

Building a production AI system is much more than training a model. It requires designing data pipelines, serving infrastructure, monitoring systems, and governance frameworks that all work together. In this lesson, you'll learn the design patterns and architectural decisions behind real-world AI systems.

The 80/20 Rule of ML Systems

Only about 5-10% of a production ML system is the model code itself. The rest is data pipelines, feature engineering, serving infrastructure, monitoring, configuration, and testing. Designing the system around the model is where most of the engineering effort goes.

Build vs Buy vs Fine-Tune

The first architectural decision is whether to build a custom model, buy an off-the-shelf solution, or fine-tune a pre-trained model.

FactorBuild from ScratchFine-Tune Pre-TrainedBuy / API
Data requiredLarge (10K-1M+ labeled)Moderate (100-10K labeled)None or few-shot
Time to deployMonthsWeeksDays
CostHigh (compute + team)ModeratePer-request pricing
CustomizationFull controlModerateLimited
MaintenanceFull responsibilityModerateVendor handles
Data privacyData stays in-houseData stays in-houseData sent to vendor
Best whenUnique problem, large data, competitive edgeGood pre-trained base existsCommodity task, fast time-to-market

Decision Framework

Is this a commodity task (translation, OCR, sentiment)?
  YES → Use an API / buy
  NO  → Does a strong pre-trained model exist for your domain?
        YES → Fine-tune it
        NO  → Build from scratch

Inference Patterns: Online vs Batch vs Near-Real-Time

PatternLatencyThroughputCostUse Case
Online (Real-Time)<100msPer-requestHigh (always-on)Fraud detection, recommendations, chatbots
BatchHoursMillions at onceLow (scheduled)Credit scoring, report generation, ETL
Near-Real-Time (Streaming)Seconds to minutesContinuous streamMediumAnomaly detection, IoT, live dashboards

Online Inference Architecture

Client → Load Balancer → Model Server (GPU/CPU)
                              ↓
                        Feature Store (online) → cached features
                              ↓
                        Prediction → Response

Batch Inference Architecture

Scheduler → Data Warehouse → Feature Pipeline → Model
                                                   ↓
                                            Prediction Store → Downstream Systems

Reference Architecture: Document Intelligence System

Let's design a complete system for classifying, extracting, and routing government documents.

                    ┌─────────────────────────────────────────────┐
                    │              Document Intelligence          │
                    ├─────────────────────────────────────────────┤
                    │                                             │
  Document Upload   │  ┌──────────┐    ┌──────────────┐          │
  ──────────────►   │  │   OCR    │───►│  Text Clean   │         │
                    │  └──────────┘    └──────┬───────┘          │
                    │                         │                   │
                    │              ┌──────────▼──────────┐        │
                    │              │   Feature Store      │       │
                    │              │  (embeddings, meta)  │       │
                    │              └──────────┬──────────┘        │
                    │                   ┌─────┴─────┐             │
                    │                   │           │             │
                    │          ┌────────▼──┐  ┌────▼────────┐    │
                    │          │ Classifier │  │  NER/Entity  │   │
                    │          │ (type/dept)│  │  Extraction  │   │
                    │          └────────┬──┘  └────┬────────┘    │
                    │                   └─────┬─────┘             │
                    │              ┌──────────▼──────────┐        │
                    │              │   Decision Engine    │       │
                    │              │  (routing + priority)│       │
                    │              └──────────┬──────────┘        │
                    │                         │                   │
                    │              ┌──────────▼──────────┐        │
                    │              │   Monitoring &       │       │
                    │              │   Audit Trail        │       │
                    │              └─────────────────────┘        │
                    └─────────────────────────────────────────────┘

Scaling layers in this architecture: 1. Data Layer: Scalable storage (S3/GCS), data versioning, partitioning 2. Training Layer: Distributed training, experiment tracking, model registry 3. Serving Layer: Auto-scaling model servers, load balancing, caching 4. Monitoring Layer: Drift detection, performance metrics, alerting

AI Governance & Ethics for Government

Government AI systems have unique requirements beyond typical commercial applications. These are not optional — they are legal, ethical, and operational necessities.

Key Concerns

ConcernWhy It MattersExample
Bias & FairnessGovernment decisions affect citizens' livesA benefits-eligibility model that disadvantages certain demographics
ExplainabilityDecisions must be justifiable and auditableA citizen denied a permit deserves to know why
PrivacyGovernment holds sensitive personal dataPII in training data must be protected (FISMA, FedRAMP)
SecurityModels can be attacked (adversarial inputs, data poisoning)An adversarial document crafted to fool a classifier
ComplianceFederal regulations require specific safeguardsNIST AI RMF, OMB AI guidance, Section 508
AccessibilityServices must be accessible to all citizensSection 508 requires AI-powered tools to be usable by people with disabilities

The Government AI Checklist

Before deploying an AI system in a government context:

  • [ ] Bias testing across protected categories (race, gender, age, disability)
  • [ ] Explainability method implemented (SHAP, LIME, attention maps)
  • [ ] Privacy Impact Assessment (PIA) completed
  • [ ] Authority to Operate (ATO) obtained
  • [ ] Section 508 accessibility audit passed
  • [ ] Model card documented (purpose, data, metrics, limitations)
  • [ ] Audit trail enabled (who made what decision, when, based on what)
  • [ ] Human-in-the-loop for high-stakes decisions
  • [ ] Incident response plan for model failures
  • High-Stakes AI Requires Human Oversight

    For government applications that affect people's rights, benefits, or liberty, always include a human-in-the-loop. The AI should recommend, not decide. A human reviewer should make the final call, with the AI's reasoning visible and auditable.