20/20
Transfer Learning & Model Deployment Β· Page 2 of 2

Deployment & Production Considerations

Deploying Models in Production

Model Export Formats

PyTorch β†’ ONNX β†’ Works everywhere
TensorFlow β†’ SavedModel β†’ TensorFlow Serving
Keras β†’ .h5 or SavedModel β†’ Any framework

Quantization (Make Models Smaller)

Problem: Neural network weights are 32-bit floats (4 bytes each).

Solution: Use 8-bit integers (1 byte)!

Original: 100M parameters Γ— 4 bytes = 400 MB
Quantized: 100M parameters Γ— 1 byte = 100 MB  (4x smaller!)

Speed: 2-4x faster on mobile!
Accuracy: Usually only 0.5-1% drop!

Batch Inference

Single prediction:

Input: One image β†’ Model β†’ Output
Latency: 100ms (slow)

Batch prediction:

Input: 32 images β†’ Model β†’ 32 outputs
Latency: 150ms (only 50% slower!)
Throughput: 32/150ms = 213 images/sec

Huge efficiency gain!

Monitoring in Production

Track:
- Accuracy on real data
- Inference latency
- Memory usage
- Error rates

Alert if:
- Accuracy drops (model drift)
- Latency spikes (resource issue)
- Error rate increases

Data Drift

Problem: Production data different from training!

Training: 2020 data
Production 2024: Different distribution!

Model's accuracy degrades over time.

Solution: Retrain periodically on new data

Deployment Platforms

PlatformUseNotes
TensorFlow ServingHigh-throughputGoogle-maintained
TorchServePyTorch modelsEasy setup
ONNX RuntimeAny frameworkLightweight
AWS SageMakerManaged serviceAuto-scaling
Hugging FaceNLP modelsOne-click deploy

Production Checklist

βœ“ Model accuracy validated (>95%?)
βœ“ Tested on diverse data (edge cases?)
βœ“ Latency acceptable (<100ms?)
βœ“ Memory footprint reasonable (<100MB?)
βœ“ Quantized for mobile (if needed)
βœ“ Error handling implemented
βœ“ Monitoring set up
βœ“ Retraining pipeline ready
βœ“ Documentation complete
βœ“ Ethics/bias reviewed

Common Pitfalls

PitfallHow to Avoid
Deploying without testingComprehensive test suite
Not handling edge casesAnomaly detection layer
Forgetting to log decisionsEnable model explanability
Ignoring model driftMonitor metrics continuously
Brittle preprocessingRobust, versioned pipeline

Ethics & Fairness

Before deployment, ask:

  • βœ“ Does model work equally for all groups?
  • βœ“ Are predictions explainable?
  • βœ“ Are there unintended biases?
  • βœ“ Is data used ethically?
  • βœ“ Can users understand why they were rejected/approved?
Done
main.py
Loading...
OUTPUT
β–ΆClick "Run Code" to execute…