Alternative paramscan function for dealing with exceeing memory cap. #825

johnabs · 2023-07-05T15:34:37Z

Is your feature request related to a problem? Please describe.
The problem is that when I have particularly large models with substantial parameter spaces to work with, paramscan requires me to pre-divide the data before passing it in to prevent me from running out of memory. It would be great if this was all handled automatically behind the scenes when running experiments, if possible.

Describe the solution you'd like
I've already implemented a variant of the solution that I would like: namely, I have the user define if they expect the model to exceed the memory cap: if so, split the dictionary into a partition based on some user-defined size, otherwise, leave it alone and run as usual. I think having a way to determine that up front, rather than relying on the user would be a great feature, but as of yet, I'm unsure how to estimate memory consumption in general. In that case, both the boolean check, and the partition size of the paramdict can be set efficiently to minimize the chances of the code crashing, and minimize the number of writes to disk that occur.

Describe alternatives you've considered
The original method I used was pre-chunking the data, but since this wasn't always necessary for certain experiments, it typically resulted in more CSVs than I wanted. With this solution, it only chunks if needed. I don't know if there is a suitable alternative for sufficiently large models or models with sufficiently large search spaces.

I do have some code I can provide in a PR, if this seems like it would be of value to the project, if not, feel free to close the issue and I'll keep the changes for my own use cases.

Best,
John

Tortar · 2023-07-05T21:43:27Z

Hi! If I'm correctly understanding the problem in your case is that the list to which the dictionary expands to is too big (this is what happens behind the scenes in paramscan with the dictionary). If this is so, then I think there should be a simpler solution than what you propose, namely adding the possibility to use a lazy iterator instead of a list of the ranges in the dict (or use this way as default). I think this should be enough to solve the problem, let me know if I'm misunderstanding something, it seems to me a good idea to do something about that anyway!

Datseris · 2023-07-07T15:22:48Z

Hm, I am also not sure whether I have understood the problem: is the problem that the number of generated dictionaries is too large, or that the memory that the final DataFrames occupy is too large, because they have too many columns with different parameters? Since you mentioned you already have a code solution @johnabs perhaps you can paste it here and this will elucidate things.

Tortar added enhancement New feature or request data related with datacollection labels Jul 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Alternative paramscan function for dealing with exceeing memory cap. #825

Alternative paramscan function for dealing with exceeing memory cap. #825

johnabs commented Jul 5, 2023

Tortar commented Jul 5, 2023 •

edited

Datseris commented Jul 7, 2023

Alternative paramscan function for dealing with exceeing memory cap. #825

Alternative paramscan function for dealing with exceeing memory cap. #825

Comments

johnabs commented Jul 5, 2023

Tortar commented Jul 5, 2023 • edited

Datseris commented Jul 7, 2023

Tortar commented Jul 5, 2023 •

edited