Production Machine Learning (14 May 2023)

2023-05-14

2 minute read

Production Machine Learning

For the past two weeks, I’ve been picking up Kaggle courses as well as Coursera courses. Here are some notes taken that I think might be useful:

Adapting to Data: Different kind of data changes

change in distribution
change in depedencies and change in ingested data
Code smell
Model not updated to new data (Cold start problem)
- dynamic train
- understand model limit
Reroll old model with model versioning
Concept drift
- Change in P(Y|X) is a shift in the underlying relationship between model input and output
Data drift
- Change in P(X) is a shift in the distribution of data
Prediction Shift (Population)
- Change P(X|Y) is a shift in model prediction
Output shift (Co-variate Shift)

Tuning Performance to reduce training time

Constraint	Input/Output	CPU	Memory
Commonly Occurs	- Large inputs

Input requires parsing
Small models | - Expensive Computation
Underpowered Hardware | - Large number of inputs
complex models | | Take Action | - Store efficiently
Paralleize reads Consider batch size | - Train on faster accel.
Upgrade processor
Run on TPU
Simplify model | - Add more memory
Use fewer layers
Reduce batch size | | | | | |

tensorflow.distribute.strategy

mirrored
multi-worker mirrored
tpu
parameter server

tf.distribute

Create a strategy object
strategy = tf.distribute.MultiWorkerMirroredStrategy()

Wrap creation of model parameters within strategy scope

1with strategy.scope():
2	model = create_model()
3	model.compile(
4		loss = 'sparse_categorical_crossentropy'
5		optimizer = tf.keras.optimizers.Adam(0.0001),
6		metrics=['accuracy'])

Scale the batch size by the number of replicas in the cluster

1per_replica_batch_size = 64
2global_batch_size = per_replica_batch_size \
3	* strategy.num_replicas_in_sync

Readings: Designing High-pe

Readings: Designing High-peformance ML Systems

In this module, you focus on either I/O performance or computational speed, depending on the

model. For more information, see the following readings and videos.

● How to Evaluate the Performance of Your Machine Learning Model

● Best practices for performance and cost optimization for machine learning

● How To Improve Machine Learning Model Performance: Five Ways

● Distributed TensorFlow model training on Cloud AI Platform (TF Dev Summit ‘20)

● Distributed training with TensorFlow

● Speeding Up Neural Network Training with Data Echoing

● Machine Learning Performance Improvement Cheat Sheet

● Building a High-Performance Data Pipeline with Tensorflow 2.x

● Distributed training with TensorFlow

● AutoML Tables

Kubeflow

● Introduction to Kubeflow

● Orchestrating TFX Pipelines

● Introduction to Machine Learning Pipelines with Kubeflow

● Kubeflow — a machine learning toolkit for Kubernetes

● ML for Mobile and Edge Devices - TensorFlow Lite

● TensorFlow Lite Examples | Machine Learning Mobile Apps

● Optimize TensorFlow models for mobile and embedded devices

● The Essential Guide To Learn TensorFlow Mobile and Tensorflow Lite