Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce memory usage! #50

Open
georgemilosh opened this issue Jan 27, 2023 · 12 comments
Open

Reduce memory usage! #50

georgemilosh opened this issue Jan 27, 2023 · 12 comments
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@georgemilosh
Copy link
Owner

There are not so many machines that can handle all the data in memory. We need to find a way to deal with this issue. There are many materials online which show how it can be achieved in tensorflow (pipelines etc). The problem is that I still get out of memory issue (the process simply gets killed).

@georgemilosh georgemilosh added the enhancement New feature or request label Jan 27, 2023
@georgemilosh georgemilosh changed the title When data is too big Reduce memory usage! Feb 3, 2023
@georgemilosh georgemilosh added this to To do in Maintaining Climate-Learning repository via automation Feb 3, 2023
@georgemilosh georgemilosh added this to the xarray milestone Feb 3, 2023
@georgemilosh
Copy link
Owner Author

  • Using tensorflow pipelines which should take care of our custom stratified k-fold-cross-validation that doesn't mix different years
  • When computing $A(t)$ we only need a small portion of the globe, so no need to load all the fields for that purpose
  • Balancing folds should be possible just based on this $A(t)$ but perhaps we don't need to actually shuffle the data. We just need to tell tensorflow the map to the year labels in the pipelines
  • Consider tensorflow pipelines which train from the disk rather than RAM

@georgemilosh
Copy link
Owner Author

I should note that there are extra memory leaks when trying current tensorflow version (that is no longer tensorflow-gpu and that is installed with pip rather than conda)

@AlessandroLovo
Copy link
Collaborator

AlessandroLovo commented Mar 3, 2023

Check for easy improvement when loading data, especially if we don't want to use the whole dataset. In particular inside the Plasim_Field object

@georgemilosh
Copy link
Owner Author

At the moment I create a separate dataset with cdo that's the subset, and everything works much faster. But I feel this is ad hoc. Alternatively we could always work with smaller datasets that are to be concatenated in Learn2_new.py if needed, but also a bit annoying

@AlessandroLovo
Copy link
Collaborator

Yeah, indeed. I was planning to do line by line evaluation of the code monitoring the RAM to see exactly when we load the data into memory and if we can do something about it, but I cannot guarantee I'll have time for that

@AlessandroLovo
Copy link
Collaborator

I was just doing some runs and I noticed that now with 1000 years of data training a CNN uses 1.3TB of virtual memory... Did you change anything? If yes is not going in the right direction 😆

@AlessandroLovo
Copy link
Collaborator

Found the reason: before the network had ~200.000 trainable parameters... now it has ~8 millions...

@AlessandroLovo
Copy link
Collaborator

And that is because the MaxPool layers have disappeared

@georgemilosh
Copy link
Owner Author

Yes, sorry forgot to re-implement them. As modified the way create_model works. I was working with strides instead.

@georgemilosh
Copy link
Owner Author

Although virtual memory normally doesn't matter.

@AlessandroLovo
Copy link
Collaborator

I will re-implement the MaxPool for backward compatibility

@georgemilosh
Copy link
Owner Author

My view is that we will need to implement tensorflow datasets. Operations such as shuffling (balancing) and train/validation have to be done virtually by permuting only indices. The dataset has to somehow know which portions of the full dataset it has to provide for the next(batch). In pytorch it is quite easy to control how data is extracted. I don't like that we have to copy paste data in memory. Normalization is another step, but it could be achieved by a layer implementation rather than doing it explicitly like we do?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Development

No branches or pull requests

2 participants