WIP: Add DataFrames as a weak dependency #3441

odow · 2023-08-03T00:46:30Z

I started working on this, but I don't know whether I like it or not.

I think I prefer the approach of explicitly constructing a data frame, and then adding a new column which is a vector of variables. It's more explicit and gives the user control over what the call the new column. This current approach would likely also be used naively to construct sparse sets which doesn't fix the loop problem.

codecov · 2023-08-03T01:01:42Z

Codecov Report

Patch coverage: 100.00% and no project coverage change.

Comparison is base (a325eb6) 98.01% compared to head (9164ea7) 98.01%.

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #3441   +/-   ##
=======================================
  Coverage   98.01%   98.01%           
=======================================
  Files          36       38    +2     
  Lines        5039     5050   +11     
=======================================
+ Hits         4939     4950   +11     
  Misses        100      100

Files Changed	Coverage Δ
ext/JuMPDataFramesExt.jl	`100.00% <100.00%> (ø)`
ext/test_DataFrames.jl	`100.00% <100.00%> (ø)`

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

odow · 2023-08-11T02:43:06Z

ext/test_DataFrames.jl

+function test_dimension_data_variable()
+    model = Model()
+    @variable(model, x[i = 2:4, j = 1:2], container = DataFrame)
+    @variable(model, y[i = 2:4, j = 1:2; isodd(i+j)], container = DataFrame)


The more I think about this, the less I like it. I prefer the explicit "add a new column to an existing dataframe" than the "construct a dataframe from these indices." The latter approach is just going to encourage the same nested loops as the GAMS example.

blegat · 2023-08-11T08:36:47Z

Ideally, we should parse the condition of the JuMP.Containers.NestedIterator then it would be an expression tree of conditions with && and ||. Then we put it into conjunctive normal form c1 && c2 && ... && cm. Then for each condition, we should check:

which indices does it involve ?
is this an equality lhs == rhs

If it involves a subset of the indices then we should filter with this condition earlier and not in the most nested level of the for loops.
If it is an equality, we could do the indexing ourself (as in https://discourse.julialang.org/t/is-it-possible-to-incorporate-the-relational-algebra-technique-into-jump-in-the-future-to-expedite-model-generation/101343/4?u=blegat) or use DataFrames but that's kind of an heavy dependency for just this.
Now what's a bit tricky is that you could have [i = I, j = J, k = K, l = L; f(i, j, k) == g(j, k, l)]. Then, it means you need to build the set of (i, j, k) and the set of (j, k, l) and then index them with the Dict to take the intersection. One could wonder if it wouldn't be better to just to the 4-nested for loops instead of having to do twice a 3-nested for loops. I think however than unless K has only 2 elements, it is always better to do two 3-nested for-loops than a 4-nested for loops so doing this intersection with dictionaries is always better.

In some cases, it might be better to do the nested for loops in an order that is different from the order of the indices used by the user but that can be left as future work since I don't think this is easy to find that. But detecting the two things I mentioned above and improving the iteration shouldn't be too complicated and it would solve the issue in the GAMS example without the user to do anything. The SparseAxisArray would automatically build itself efficiently.

odow · 2023-08-14T02:38:10Z

detecting the two things I mentioned above and improving the iteration shouldn't be too complicated

This is getting a bit too magical. It also assumes that constructing the index sets and a dictionary solves the performance problem.

I don't think we need to "fix" the GAMS example. We need to encourage people to rethink how they view models. Nested for-loops are not the best way to conceptualize the IJKLM model.

blegat · 2023-08-14T08:35:25Z

This is getting a bit too magical. It also assumes that constructing the index sets and a dictionary solves the performance problem.

It makes the complexity depends linearly on the number of tuples satisfying the condition. It might be surprising to the user that we build the SparseAxisArray so naively while he gave us all the information in a macro that are necessary to take into account its sparsity.

Following this PR, how would the user solve the IJKLM issue ?

odow · 2023-08-14T08:37:13Z

Following this PR, how would the user solve the IJKLM issue ?

They wouldn't. That's why I said I don't really like this PR.

odow · 2023-08-14T19:36:00Z

Closing for now. But I'll leave #3438 open, and I'll add @blegat's comment to the issue. He has some CS things in mind.

Add DataFrames as a weak dependency

9164ea7

odow commented Aug 11, 2023

View reviewed changes

odow closed this Aug 14, 2023

odow deleted the od/dataframes branch August 14, 2023 19:36

odow mentioned this pull request Aug 14, 2023

Improve support for relational algebra #3438

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Add DataFrames as a weak dependency #3441

WIP: Add DataFrames as a weak dependency #3441

odow commented Aug 3, 2023

codecov bot commented Aug 3, 2023 •

edited

odow Aug 11, 2023

blegat commented Aug 11, 2023

odow commented Aug 14, 2023

blegat commented Aug 14, 2023

odow commented Aug 14, 2023

odow commented Aug 14, 2023

WIP: Add DataFrames as a weak dependency #3441

WIP: Add DataFrames as a weak dependency #3441

Conversation

odow commented Aug 3, 2023

codecov bot commented Aug 3, 2023 • edited

Codecov Report

odow Aug 11, 2023

Choose a reason for hiding this comment

blegat commented Aug 11, 2023

odow commented Aug 14, 2023

blegat commented Aug 14, 2023

odow commented Aug 14, 2023

odow commented Aug 14, 2023

codecov bot commented Aug 3, 2023 •

edited