Model Deployment and Production

80 min

Deploying ML models to production requires careful consideration of scalability, reliability, and maintainability. Production ML systems must handle real-world traffic, maintain performance, and adapt to changing data. Deployment involves serving models via APIs, managing infrastructure, monitoring performance, and handling updates. Understanding model deployment enables you to bring ML models from development to production. Production ML is different from research—it requires robust engineering practices.

Model versioning tracks different versions of models, enabling rollback, comparison, and reproducibility. Versioning includes model artifacts (weights, architecture), training code, data versions, and hyperparameters. Tools like MLflow, DVC, and model registries help manage versions. Understanding versioning enables you to maintain model history and reproduce results. Versioning is essential for debugging and compliance.

Model monitoring tracks model performance, data drift, and system health in production. Monitoring includes prediction accuracy, latency, throughput, error rates, and data distribution changes. Data drift occurs when production data differs from training data, degrading performance. Understanding monitoring enables you to detect and address issues quickly. Monitoring is critical for maintaining production ML systems.

A/B testing compares different model versions to determine which performs better in production. A/B testing splits traffic between versions and measures metrics (accuracy, business KPIs). Statistical significance testing determines if differences are real. Understanding A/B testing enables you to make data-driven decisions about model updates. A/B testing is essential for safely deploying new models.

Containerization (Docker) packages models and dependencies into portable containers, simplifying deployment across environments. Containers ensure consistency between development and production. Cloud deployment platforms (AWS SageMaker, Google AI Platform, Azure ML) provide managed services for model deployment, scaling, and monitoring. Understanding containerization and cloud platforms enables efficient deployment. These tools abstract infrastructure complexity.

Best practices include using APIs for model serving, implementing proper error handling and logging, designing for scalability (horizontal scaling), implementing canary deployments for gradual rollouts, monitoring model performance continuously, and having rollback procedures. Understanding production ML enables you to deploy reliable, maintainable ML systems. Production ML requires both ML expertise and software engineering skills.

Key Concepts

Production ML requires scalability, reliability, and maintainability.
Model versioning tracks model history and enables reproducibility.
Model monitoring detects performance issues and data drift.
A/B testing compares model versions safely.
Containerization and cloud platforms simplify deployment.

Learning Objectives

Master

Deploying ML models to production
Implementing model versioning and monitoring
Using A/B testing for model updates
Containerizing and deploying models on cloud platforms

Develop

Production ML thinking
Understanding ML system engineering
Designing reliable, scalable ML systems

Tips

Monitor models continuously in production—performance can degrade.
Use versioning for all model artifacts and code.
Implement A/B testing before fully deploying new models.
Use containerization for consistent deployments.

Common Pitfalls

Not monitoring models, missing performance degradation.
Not versioning models, making rollback impossible.
Deploying models without testing, causing production issues.
Not designing for scalability, causing system failures.

Summary

Production ML requires careful engineering for scalability and reliability.
Model versioning, monitoring, and A/B testing are essential.
Containerization and cloud platforms simplify deployment.
Understanding production ML enables reliable ML systems.
Production ML requires both ML and software engineering expertise.

Exercise

Create a simple Flask API for serving a machine learning model.

from flask import Flask, request, jsonify
import pickle
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification

app = Flask(__name__)

# Generate sample data and train a model
X, y = make_classification(n_samples=1000, n_features=10, n_classes=2, random_state=42)
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X, y)

# Save the model
with open('model.pkl', 'wb') as f:
    pickle.dump(model, f)

@app.route('/predict', methods=['POST'])
def predict():
    try:
        # Get data from request
        data = request.get_json()
        features = np.array(data['features']).reshape(1, -1)
        
        # Make prediction
        prediction = model.predict(features)[0]
        probability = model.predict_proba(features)[0].tolist()
        
        return jsonify({
            'prediction': int(prediction),
            'probability': probability,
            'status': 'success'
        })
    
    except Exception as e:
        return jsonify({
            'error': str(e),
            'status': 'error'
        }), 400

@app.route('/health', methods=['GET'])
def health():
    return jsonify({'status': 'healthy'})

if __name__ == '__main__':
    app.run(debug=True, host='0.0.0.0', port=5000)

# Test the API
import requests

# Sample request
test_data = {
    'features': [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]
}

response = requests.post('http://localhost:5000/predict', json=test_data)
print(response.json())

Model Deployment and Production