AI DevTalks

introducing MLflow

Ghislain Vaillant <ghislain.vaillant@cnrs.fr>

The Perfect Tweet

Reproducibility

ML axis of change

Introducing MLflow (1)

  • "A platform for the machine learning lifecycle"

  • Opensource project kickstarted by Databricks

  • Version 1.0 released on June 2019

  • Steady flow of updates since

Introducing MLflow (2)

  • Overiew of the platform

  • Basics walkthrough

  • Going further

Overview

Challenges

  • Hard to keep track of experiments

  • Hard to reproduce code (and its runtime)

  • No standard way to package and deploy models

  • No central store to manage and version models

Components

MLflow components

Features and concepts

MLflow platform

Basics

The example

fidle

Sequence 1: Regression on the Boston Housing Price Dataset

Initial implementation

from tensorflow import keras

def load_dataset():
    from keras.datasets import boston_housing

    (x_train, y_train), (x_test, y_test) = (
        boston_housing.load_data(test_split=0.2)
    )

    x_mean = x_train.mean(axis=0)
    x_std = x_train.std(axis=0)

    x_train = (x_train - x_mean) / x_std
    x_test = (x_test - x_mean) / x_std

    return (x_train, y_train), (x_test, y_test)
def build_model(shape):
    from keras.models import Sequential
    from keras.layers import Input, Dense

    model = Sequential()
    model.add(keras.layers.Input(shape, name="I"))
    model.add(keras.layers.Dense(64, activation='relu', name='D1'))
    model.add(keras.layers.Dense(64, activation='relu', name='D2'))
    model.add(keras.layers.Dense(1, name='O'))
    model.compile(optimizer='rmsprop', loss='mse')
    return model
(x_train, y_train), (x_test, y_test) = load_dataset()
model = build_model(x_train.shape[1])
print(model.summary())

model.fit(
    x_train, y_train,
    epochs=60, batch_size=10,
    validation_data=(x_test, y_test)
)

model.evaluate(x_test, y_test)
features = np.array([
    1.26425925, -0.48522739,  1.0436489 , -0.23112788,  1.37120745,
    -2.14308942,  1.13489104, -1.06802005,  1.71189006,  1.57042287,
    0.77859951,  0.14769795,  2.7585581
]).reshape(1, 13)

ground_truth = 10.4

predictions = model.predict(features)

print("Prediction : {:.2f} K$".format(predictions[0][0]))
print("Reality    : {:.2f} K$".format(ground_truth))

Training output

Model summary:

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
D1 (Dense)                   (None, 64)                896
_________________________________________________________________
D2 (Dense)                   (None, 64)                4160
_________________________________________________________________
O (Dense)                    (None, 1)                 65
=================================================================
Total params: 5,121
Trainable params: 5,121
Non-trainable params: 0
_________________________________________________________________

Training logs:

Epoch 1/60
41/41 [==============================] - 1s 4ms/step - loss: 476.2540 - val_loss: 370.6863
Epoch 2/60
41/41 [==============================] - 0s 1ms/step - loss: 227.6013 - val_loss: 122.6836
Epoch 3/60
41/41 [==============================] - 0s 1ms/step - loss: 77.4421 - val_loss: 59.3098
...
Epoch 59/60
41/41 [==============================] - 0s 1ms/step - loss: 6.6998 - val_loss: 18.7970
Epoch 60/60
41/41 [==============================] - 0s 1ms/step - loss: 6.6098 - val_loss: 19.0598
4/4 [==============================] - 0s 775us/step - loss: 19.0598

Sample prediction:

Prediction : 10.26 K$
Reality    : 10.40 K$

Experiment tracking

  • Install mlflow:

conda install -c conda-forge mlflow
  • Enable autologging:

import mlflow

mlflow.keras.autolog()
  • Profit!

Experiment visualization

$ mlflow ui

MLflow UI

MLflow UI default experiment

MLflow UI parameters

MLflow UI metrics

MLflow UI detailed metric

Going further

Model serving

With the CLI:

$ mlflow models serve -m ./mlruns/${EXP_ID}/${RUN_ID}/artifacts/model

With Docker:

$ mlflow models build-docker \
	-m ./mlruns/${EXP_ID}/${RUN_ID}/artifacts/model
	-n docker-image-name

This will source a fresh conda environment with the necessary dependencies and a web server ready to serve predictions.

Deployment strategies (1)

MLflow no server

MLflow server locally

Deployment strategies (2)

MLflow server remote

Take aways

  • Key concepts: tracking, projects, models and registry

  • Easy integration with most popular ML frameworks

  • Easy deployment locally or as a remote instance

  • Pick the components tailored to your needs

Towards MLOps

We are not all at this stage yet!

CD4ML

Questions