Reduce memory usage! #50

georgemilosh · 2023-01-27T14:56:24Z

There are not so many machines that can handle all the data in memory. We need to find a way to deal with this issue. There are many materials online which show how it can be achieved in tensorflow (pipelines etc). The problem is that I still get out of memory issue (the process simply gets killed).

The text was updated successfully, but these errors were encountered:

georgemilosh · 2023-02-03T17:53:23Z

Using tensorflow pipelines which should take care of our custom stratified k-fold-cross-validation that doesn't mix different years
When computing $A(t)$ we only need a small portion of the globe, so no need to load all the fields for that purpose
Balancing folds should be possible just based on this $A(t)$ but perhaps we don't need to actually shuffle the data. We just need to tell tensorflow the map to the year labels in the pipelines
Consider tensorflow pipelines which train from the disk rather than RAM

georgemilosh · 2023-02-04T19:20:44Z

I should note that there are extra memory leaks when trying current tensorflow version (that is no longer tensorflow-gpu and that is installed with pip rather than conda)

AlessandroLovo · 2023-03-03T13:22:23Z

Check for easy improvement when loading data, especially if we don't want to use the whole dataset. In particular inside the Plasim_Field object

georgemilosh · 2023-03-03T13:34:23Z

At the moment I create a separate dataset with cdo that's the subset, and everything works much faster. But I feel this is ad hoc. Alternatively we could always work with smaller datasets that are to be concatenated in Learn2_new.py if needed, but also a bit annoying

AlessandroLovo · 2023-03-03T13:38:39Z

Yeah, indeed. I was planning to do line by line evaluation of the code monitoring the RAM to see exactly when we load the data into memory and if we can do something about it, but I cannot guarantee I'll have time for that

AlessandroLovo · 2023-03-09T13:51:50Z

I was just doing some runs and I noticed that now with 1000 years of data training a CNN uses 1.3TB of virtual memory... Did you change anything? If yes is not going in the right direction 😆

AlessandroLovo · 2023-03-09T13:55:44Z

Found the reason: before the network had ~200.000 trainable parameters... now it has ~8 millions...

AlessandroLovo · 2023-03-09T13:58:26Z

And that is because the MaxPool layers have disappeared

georgemilosh · 2023-03-09T14:01:29Z

Yes, sorry forgot to re-implement them. As modified the way create_model works. I was working with strides instead.

georgemilosh · 2023-03-09T14:01:57Z

Although virtual memory normally doesn't matter.

AlessandroLovo · 2023-03-09T14:03:22Z

I will re-implement the MaxPool for backward compatibility

georgemilosh · 2023-03-13T14:52:28Z

My view is that we will need to implement tensorflow datasets. Operations such as shuffling (balancing) and train/validation have to be done virtually by permuting only indices. The dataset has to somehow know which portions of the full dataset it has to provide for the next(batch). In pytorch it is quite easy to control how data is extracted. I don't like that we have to copy paste data in memory. Normalization is another step, but it could be achieved by a layer implementation rather than doing it explicitly like we do?

georgemilosh added the enhancement New feature or request label Jan 27, 2023

georgemilosh changed the title ~~When data is too big~~ Reduce memory usage! Feb 3, 2023

georgemilosh assigned AlessandroLovo and georgemilosh Feb 3, 2023

georgemilosh added this to To do in Maintaining Climate-Learning repository via automation Feb 3, 2023

georgemilosh added this to the xarray milestone Feb 3, 2023

AlessandroLovo added a commit that referenced this issue Mar 9, 2023

Reitroducing maxpool layers #50

730500a

AlessandroLovo added a commit that referenced this issue Mar 10, 2023

Trying to address #50 with no success

16cf881

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce memory usage! #50

Reduce memory usage! #50

georgemilosh commented Jan 27, 2023

georgemilosh commented Feb 3, 2023

georgemilosh commented Feb 4, 2023

AlessandroLovo commented Mar 3, 2023 •

edited

georgemilosh commented Mar 3, 2023

AlessandroLovo commented Mar 3, 2023

AlessandroLovo commented Mar 9, 2023

AlessandroLovo commented Mar 9, 2023

AlessandroLovo commented Mar 9, 2023

georgemilosh commented Mar 9, 2023

georgemilosh commented Mar 9, 2023

AlessandroLovo commented Mar 9, 2023

georgemilosh commented Mar 13, 2023

Reduce memory usage! #50

Reduce memory usage! #50

Comments

georgemilosh commented Jan 27, 2023

georgemilosh commented Feb 3, 2023

georgemilosh commented Feb 4, 2023

AlessandroLovo commented Mar 3, 2023 • edited

georgemilosh commented Mar 3, 2023

AlessandroLovo commented Mar 3, 2023

AlessandroLovo commented Mar 9, 2023

AlessandroLovo commented Mar 9, 2023

AlessandroLovo commented Mar 9, 2023

georgemilosh commented Mar 9, 2023

georgemilosh commented Mar 9, 2023

AlessandroLovo commented Mar 9, 2023

georgemilosh commented Mar 13, 2023

AlessandroLovo commented Mar 3, 2023 •

edited