MLflow autologging issue # 1618 #2092

dennisbader · 2023-11-24T12:35:10Z

Addresses #1618.

Summary

Draft PR for MLflow integration example making pytorch calls and activating tensorboard collection on Darts' s pytortch models to do autologging with MLflow and activate MLflow UI.

MLflow integration example making pytorch calls and activating tensorboard collection on Darts' s pytortch models to do autologging with MLflow and activate MLflow UI.

codecov-commenter · 2023-11-24T13:15:53Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 93.97%. Comparing base (6f13a2f) to head (ee55932).

❗ Current head ee55932 differs from pull request most recent head 3593454. Consider uploading reports for the commit 3593454 to get more accurate results

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #2092      +/-   ##
==========================================
- Coverage   94.02%   93.97%   -0.06%     
==========================================
  Files         138      136       -2     
  Lines       14201    13647     -554     
==========================================
- Hits        13353    12825     -528     
+ Misses        848      822      -26

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

cargecla1 · 2023-12-05T11:48:01Z

Hello @dennisbader,

As a follow up, is there anything else that I need to do?
What is/are the next step/steps?

Cheers!

dennisbader · 2023-12-07T12:04:31Z

Hi @cargecla1. No, all good from your side 🚀
Right now our capacity is a bit limited before the holidays, so we'll most likely have time to review this early next year.
Thanks again for the contribution 💯

cargecla1 · 2023-12-08T04:04:19Z

Hi @cargecla1. No, all good from your side 🚀 Right now our capacity is a bit limited before the holidays, so we'll most likely have time to review this early next year. Thanks again for the contribution 💯

Hello @dennisbader,

Thanks for letting me know.
And thank you for supporting this review.

Merry Xmas!

cargecla1

I agree with this changes, there are as passed on.

turbotimon · 2024-01-29T13:11:31Z

docs/userguide/torch_forecasting_models.md

+
+with mlflow.start_run(nested=True) as run:
+
+    dataset: PandasDataset = mlflow.data.from_pandas(series, source="AirPassengersDataset")


IMHO should call series.pd_dataframe() because in my tests it worked only on DataFrames not Series:
dataset: PandasDataset = mlflow.data.from_pandas(series.pd_dataframe(), source="AirPassengersDataset")

Good suggestion

Added via commit c15eb1e
Thank you!

docs/userguide/torch_forecasting_models.md

turbotimon · 2024-01-29T13:17:26Z

docs/userguide/torch_forecasting_models.md

+
+# Registering model
+model_uri = f"runs:/{run.info.run_id}/darts-NBEATS"
+mlflow.register_model(model_uri=model_uri, name=model_name)


IMHO runs currently do not save models as artifacts becaus there is no call to log_model(). Therefore model_uri is not pointing to a valid model. Please correct me if I'm wrong, however, it did not work in my tests. If it works, can you pleas add an example how to load with loaded_model = mlflow.<flavor>.load_model(model_uri=model_uri) ?

I will need to have a look at this, my understanding is that it saves it both using mlflow.<model_flavor>.log_model() or mlflow.register_model(). Refer to https://mlflow.org/docs/latest/model-registry.html.

As i understand it you can only (kind of) promote a already saved model to a registered model..

Model
An MLflow Model is created from an experiment or run that is logged with one of the model flavor’s mlflow.<model_flavor>.log_model() methods. Once logged, this model can then be registered with the Model Registry.

Registered Model
An MLflow Model can be registered with the Model Registry. A registered model has a unique name, contains versions, aliases, tags, and other metadata.
https://mlflow.org/docs/latest/model-registry.html#concepts

As i understand it you can only (kind of) promote a already saved model to a registered model..

Model
An MLflow Model is created from an experiment or run that is logged with one of the model flavor’s mlflow.<model_flavor>.log_model() methods. Once logged, this model can then be registered with the Model Registry.
Registered Model
An MLflow Model can be registered with the Model Registry. A registered model has a unique name, contains versions, aliases, tags, and other metadata.
https://mlflow.org/docs/latest/model-registry.html#concepts

Hello @turbotimon ,
My plans was to share how to do the first part of issue # 1618. As it allows you to track the experiments (so one don't get lost if you are doing more of a few iterations), select the best one, see metrics progress, etc.
I will have a look at what you are suggesting. However, the normal mlflow.python.loadmodel or mlflow.pytorch.loadmodel doesn't seems to work because the way Darts is wrapping torch.nn modules. All test I have done so far didn't work to save and load the model. Only to record hyperparameters, other torchmetrics related information, dataset hash, etc. Thus, I propose to share this, and later we can add saving and loading models (but my last weekend research it is pointing out that the model.py files and common models may need to be rewritten to be compatible with MLFlow (I might be wrong).

Cheers!

@cargecla1 Yes, that's a good idea to split these two things

If using the mlflow model registry for darts completely fails, you could also mentioning the workaround I proposed in the issue. Which was manually saving/loading the model as an artifact. Something like:

dartsmodel.save("mymodel.pickle") mlflow.log_artifact("mymodel.pickle") # later, load artifact from mlflow and do e.g. a dartsmodel=RNNModel.load("mymodel.pickle")

Let me know if i can help anything!

Hello @turbotimon

After trying to save the Darts model using dartsmodel.save("mymodel.pickle") as suggested, I am getting the following error:

File ~.conda\envs\venv_darts\lib\site-packages\torch\serialization.py:653, in _save(obj, zip_file, pickle_module, pickle_protocol) 651 pickler = pickle_module.Pickler(data_buf, protocol=pickle_protocol) 652 pickler.persistent_id = persistent_id --> 653 pickler.dump(obj) 654 data_value = data_buf.getvalue() 655 zip_file.write_record('data.pkl', data_value, len(data_value))

AttributeError: Can't pickle local object 'LayerSummary._register_hook..hook'

Would you know why this is happening?

Cheers

Hi @cla-ra3426, sorry i missed your question. No idea.. but must have to do somehting with darts itself and not mlflow i suppose

@cargecla1 Yes, that's a good idea to split these two things
If using the mlflow model registry for darts completely fails, you could also mentioning the workaround I proposed in the issue. Which was manually saving/loading the model as an artifact. Something like:

dartsmodel.save("mymodel.pickle") mlflow.log_artifact("mymodel.pickle") # later, load artifact from mlflow and do e.g. a dartsmodel=RNNModel.load("mymodel.pickle")

Let me know if i can help anything!

Hello @turbotimon , @madtoinou , @dennisbader,

Is it possible to split this issue as discussed above? so we can share how to use, train, track, monitor and save the models using MLFlow? Just leaving loading the model for a later release?

Cheers

Hello gents (@turbotimon , @madtoinou , @dennisbader,)

Have you had time to consider my proposal above? "Is it possible to split this issue as discussed above? so we can share how to use, train, track, monitor and save the models using MLFlow? Just leaving loading the model for a later release?"

The only solution I have right now it is to train the model with MLFlow to track and monitor the model and retrain it with pure Darts to be able to save it and load to predict, Darts saving method "fails" when run inside a MLFlow run, and MLFlow log and save methods don't want to work with Darts either.

Thank you in advice for your consideration!

Cheers!

Hi @cargecla1,

Sorry for the delay, loading model is kind-of part of linked issue so it would probably be better to also include it in this PR. But if it too troublesome, we can treat it separately.

@cla-ra3426,

The error you're getting when trying to pickle the model is probably due to the (pytorch-lightning) callbacks. Can you try removing them prior to exporting the model?

I will test this example more thoroughly when I have more time, try to see if I can come up with a solution for the loading.

Hello @madtoinou

I will try this and see how it goes.

Thanks for this suggestion and for considering splitting the problem into two if above doesn't work.

Hello @madtoinou ,

I took your suggestion on board as per above, but that didn't fixed the issue with loading and predicting steps.

Refer to commit f798244

Adding first two suggestions raised in MLflow autologging issue # 1618 unit8co#2092 via comments.

Adding code to save pip requirements and conda environment required to run the model via MLFlow.

Adding a way to track and save model before registering it using MLFlow.

Expanding comment to create a new cell to log the artifact and register the model after a model run id has been created.

docs/userguide/torch_forecasting_models.md

madtoinou · 2024-04-03T12:34:27Z

docs/userguide/torch_forecasting_models.md

+    mlflow.log_param("model_type", "Darts_Pytorch_model")
+    mlflow.log_param("input_chunk_length", 24)
+    mlflow.log_param("output_chunk_length", 12)                   
+    mlflow.log_param("n_epochs", 500)
+    mlflow.log_param("model_name", 'NBEATS_MLflow')
+    mlflow.log_param("log_tensorboard", True)
+    mlflow.log_param("torch_metrics", "torchmetrics.regression.MeanAbsolutePercentageError()")
+    mlflow.log_param("nr_epochs_val_period", 1)
+    mlflow.log_param("pl_trainer_kwargs", "{callbacks: [loss_logger]}")    


Let's make this a bit more concise, with a list of parameters to log and access them in the model.model_params, model.pl_module_params or model.trainer_params attributes (potentially after fusing these dictionaries into one).

Hello @madtoinou
I made the code more concise as suggested. Refer to commit f798244

Hello @madtoinou , @dennisbader ,

I guess you may have been busy with the latest Darts release. Did you have time to look at my latest changes? Does this work to merge the first part of the MLFlow integration as previously discussed?

Cheers!

By the way, nicely done! I need to test the latest release speeds. :)

Updating table of content at the top of the file/page.

Making the log parameters method more concise as suggested.

madtoinou

Thank you for making the correction, I still have some minor comments.

madtoinou · 2024-04-23T08:07:16Z

docs/userguide/torch_forecasting_models.md

@@ -25,6 +25,7 @@ We assume that you already know about covariates in Darts. If you're new to the
    - [Callbacks](#callbacks)
      - [Early Stopping](#example-with-early-stopping)
      - [Custom Callback](#example-of-custom-callback-to-store-losses)
+      - [MLFlow: train, track and monitor](#example-with-mlflow-autologging)


I think that the indentation is off, can you shift it to the left once so that it's at the same level as "Callbacks"?

Hello @madtoinou ,

I fixed this via commit b839c67

docs/userguide/torch_forecasting_models.md

Fixed indentation that was off.

Adding suggestion, thanks! Co-authored-by: madtoinou <32447896+madtoinou@users.noreply.github.com>

Resolving torchmetrics issue.

Updating the MLFlow saving method including tracking uri.

cargecla1 · 2024-04-29T05:07:20Z

Thank you for making the correction, I still have some minor comments.

Hello @madtoinou ,

I just updated torchmetrics and saving model method to capture your latest suggestions with the **params. Hopefully this is work done now. :)

Cheers!

cargecla1 · 2024-05-11T05:42:48Z

Thank you for making the correction, I still have some minor comments.

Hello @madtoinou ,

I just updated torchmetrics and saving model method to capture your latest suggestions with the **params. Hopefully this is work done now. :)

Cheers!

Hello @madtoinou, @dennisbader ,

Any updates, comments from my latest updates/commits?

Cheers!

MLflow autologging issue # 1618

13d8113

MLflow integration example making pytorch calls and activating tensorboard collection on Darts' s pytortch models to do autologging with MLflow and activate MLflow UI.

dennisbader mentioned this pull request Nov 24, 2023

Integrate darts models with MLflow (logging, autologging, loading) #1618

Open

cargecla1 approved these changes Jan 26, 2024

View reviewed changes

turbotimon suggested changes Jan 29, 2024

View reviewed changes

cargecla1 added 5 commits February 21, 2024 18:26

Update torch_forecasting_models.md

c15eb1e

Adding first two suggestions raised in MLflow autologging issue # 1618 unit8co#2092 via comments.

Merge branch 'master' into feature/MLflow_1618

ad4796c

Saving pip and conda env

cb5dbaf

Adding code to save pip requirements and conda environment required to run the model via MLFlow.

Track and save model

22f75f5

Adding a way to track and save model before registering it using MLFlow.

Expanding comment

ee55932

Expanding comment to create a new cell to log the artifact and register the model after a model run id has been created.

madtoinou reviewed Apr 3, 2024

View reviewed changes

cargecla1 added 2 commits April 4, 2024 21:55

Updating table of content

ce635bc

Updating table of content at the top of the file/page.

Making log more concise

f798244

Making the log parameters method more concise as suggested.

madtoinou reviewed Apr 23, 2024

View reviewed changes

cargecla1 and others added 5 commits April 29, 2024 14:23

Indentation fix

b839c67

Fixed indentation that was off.

Update docs/userguide/torch_forecasting_models.md

5390383

Adding suggestion, thanks! Co-authored-by: madtoinou <32447896+madtoinou@users.noreply.github.com>

Merge branch 'master' into feature/MLflow_1618

f1a923b

Resolving torcmetrics

65dbd48

Resolving torchmetrics issue.

Update MLFlow saving method

3593454

Updating the MLFlow saving method including tracking uri.

cargecla1 mentioned this pull request May 14, 2024

Feat/neural prophet #1436

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MLflow autologging issue # 1618 #2092

MLflow autologging issue # 1618 #2092

dennisbader commented Nov 24, 2023 •

edited by madtoinou

codecov-commenter commented Nov 24, 2023 •

edited by codecov bot

cargecla1 commented Dec 5, 2023

dennisbader commented Dec 7, 2023

cargecla1 commented Dec 8, 2023

cargecla1 left a comment

turbotimon Jan 29, 2024

cargecla1 Feb 2, 2024

cargecla1 Feb 21, 2024

turbotimon Jan 29, 2024

cargecla1 Feb 2, 2024

turbotimon Feb 12, 2024

cargecla1 Feb 21, 2024

turbotimon Feb 21, 2024 •

edited

turbotimon Mar 19, 2024

cargecla1 Apr 3, 2024

madtoinou Apr 3, 2024

cargecla1 Apr 4, 2024

cargecla1 Apr 6, 2024

madtoinou Apr 3, 2024

cargecla1 Apr 6, 2024

cargecla1 Apr 19, 2024

madtoinou left a comment

madtoinou Apr 23, 2024

cargecla1 Apr 29, 2024

cargecla1 commented Apr 29, 2024

cargecla1 commented May 11, 2024


		with mlflow.start_run(nested=True) as run:

		dataset: PandasDataset = mlflow.data.from_pandas(series, source="AirPassengersDataset")

MLflow autologging issue # 1618 #2092

Are you sure you want to change the base?

MLflow autologging issue # 1618 #2092

Conversation

dennisbader commented Nov 24, 2023 • edited by madtoinou

Summary

codecov-commenter commented Nov 24, 2023 • edited by codecov bot

Codecov Report

cargecla1 commented Dec 5, 2023

dennisbader commented Dec 7, 2023

cargecla1 commented Dec 8, 2023

cargecla1 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

turbotimon Feb 21, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

madtoinou left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cargecla1 commented Apr 29, 2024

cargecla1 commented May 11, 2024

dennisbader commented Nov 24, 2023 •

edited by madtoinou

codecov-commenter commented Nov 24, 2023 •

edited by codecov bot

turbotimon Feb 21, 2024 •

edited