🚧 Adds MLflow materializer #358

bryangalindo · 2023-09-18T21:47:59Z

🚧 WIP 🚧

Changes

How I tested this

Notes

Checklist

PR has an informative and human-readable title (this will be pulled into the release notes)
Changes are limited to a single goal (no scope creep)
Code passed the pre-commit check & code is left cleaner/nicer than when first encountered.
Any change in functionality is tested
New functions are documented (with a description, list of inputs, and expected output)
Placeholder code is flagged / future TODOs are captured in comments
Project documentation has been updated if adding/changing functionality.

bryangalindo · 2023-09-19T04:47:57Z

how to save a model into mlflow: https://mlflow.org/docs/latest/quickstart.html#store-a-model-in-mlflow
how to load a model from mlflow: https://mlflow.org/docs/latest/quickstart.html#load-a-model-from-a-specific-training-run-for-inference

bryangalindo · 2023-09-19T05:09:04Z

model flavors can be found here or below (but missing crate?)

>>> import mlflow
>>> mlflow.__version__
'2.7.1'
>>> [attr for attr in dir(mlflow) if hasattr(getattr(mlflow, attr), 'log_model')]
[
    'catboost', 'diviner', 'fastai', 'gluon', 'h2o', 'johnsnowlabs', 'langchain', 
    'lightgbm', 'mleap', 'onnx', 'openai', 'paddle', 'pmdarima', 'prophet', 
    'pyfunc', 'pytorch', 'sentence_transformers', 'sklearn', 'spacy', 'spark', 
    'statsmodels', 'tensorflow', 'transformers', 'xgboost'
]

top three flavors (probably): sklearn, tensorflow, pytorch. no hard data, just vibes.

bryangalindo · 2023-09-19T05:23:54Z

example of load/save flow for sklearn model flavor from mlflow quickstart.

from sklearn.datasets import load_diabetes
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split

import mlflow
from mlflow.models import infer_signature

run_id = None

db = load_diabetes()
X_train, X_test, y_train, y_test = train_test_split(db.data, db.target)

with mlflow.start_run() as run:
    rf = RandomForestRegressor(n_estimators=100, max_depth=6, max_features=3)
    rf.fit(X_train, y_train)
    save_predictions = rf.predict(X_test)
    signature = infer_signature(X_test, save_predictions)
    mlflow.sklearn.log_model(rf, "model", signature=signature)
    run_id = run.info.run_id

model = mlflow.sklearn.load_model(f"runs:/{run_id}")
load_predictions = model.predict(X_test)

assert save_predictions == load_predictions

disclaimer: I have not tested this

skrawcz · 2023-09-19T06:33:31Z

@bryangalindo we should come up with the Hamilton UX to help guide this. i.e. what's the API we want to expose for Hamilton?

bryangalindo · 2023-09-19T17:13:38Z

@bryangalindo we should come up with the Hamilton UX to help guide this. i.e. what's the API we want to expose for Hamilton?

Ok let's chat during our sync. Thanks!

bryangalindo · 2023-09-21T22:55:09Z

elijahbenizzy · 2023-09-28T18:05:59Z

Hey @bryangalindo -- a thought on a feature that might be helpful. Here's an outline of what the API should look like -- the data saver/materialization implementatino should support this.

from hamilton.function_modifiers import source
dr = driver.Driver(...)
dr.materialize(
   to.mlflow(
      id="mlflow_save",
      dependencies=["my_cool_model"],
      model_input=source("training_data"),
      model_output=source("predictions"),
   )
)

Then the materializer would call infer_signature with the model_input and model_output -- these would be taken from nodes called training_data and predictions. The DAG would look like:

training_data -> mlflow_save
predictions -> mlflow_save

and possibly more connections. Does this make sense? This is all supported btw -- materializers can take in source/value type parameters -- if they take in one that isn't it, they'll just resolve to a value.

This comment was marked as off-topic.

Sign in to view

bryangalindo changed the title ~~Adds MLflow materializer~~ 🚧 Adds MLflow materializer Sep 19, 2023

bryangalindo added 2 commits September 23, 2023 01:54

Adds empty files for mlflow materializer (i.e., mise en plase)

776086a

Adds mlflow sklearn model writer

0fc5dac

bryangalindo force-pushed the adds-mlflow-materializer branch from d0ee86e to 0fc5dac Compare September 23, 2023 05:57

bryangalindo added 2 commits September 23, 2023 02:07

Adds mlflow sklearn model reader

ad00b98

Adds mlflow data loader register

030b9fc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🚧 Adds MLflow materializer #358

🚧 Adds MLflow materializer #358

bryangalindo commented Sep 18, 2023

bryangalindo commented Sep 19, 2023

bryangalindo commented Sep 19, 2023 •

edited

bryangalindo commented Sep 19, 2023 •

edited

This comment was marked as off-topic.

skrawcz commented Sep 19, 2023

bryangalindo commented Sep 19, 2023

bryangalindo commented Sep 21, 2023 •

edited

elijahbenizzy commented Sep 28, 2023 •

edited

🚧 Adds MLflow materializer #358

Are you sure you want to change the base?

🚧 Adds MLflow materializer #358

Conversation

bryangalindo commented Sep 18, 2023

Changes

How I tested this

Notes

Checklist

bryangalindo commented Sep 19, 2023

bryangalindo commented Sep 19, 2023 • edited

bryangalindo commented Sep 19, 2023 • edited

This comment was marked as off-topic.

skrawcz commented Sep 19, 2023

bryangalindo commented Sep 19, 2023

bryangalindo commented Sep 21, 2023 • edited

High-level tasks:

Analysis:

Reader/Writer Development:

Materializer Development:

elijahbenizzy commented Sep 28, 2023 • edited

bryangalindo commented Sep 19, 2023 •

edited

bryangalindo commented Sep 19, 2023 •

edited

bryangalindo commented Sep 21, 2023 •

edited

elijahbenizzy commented Sep 28, 2023 •

edited