New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[stdlib] Introduce alternative list sorting for scalar and order comparable elements #2609
base: nightly
Are you sure you want to change the base?
Conversation
That's a pretty cool PR! To make the review easier (1000 lines of code to read in one go is a lot), do you think we could merge
first or that would be incompatible? If we merge those three and then rebase this branch on top of nightly afterwards, that might make it more managable. Are there any conflict in the design philosophy between those and yours? |
All of the dependent PRs landed in today's nightly. Do you mind rebasing like Gabriel suggested? |
817160f
to
630e5f8
Compare
The branch is rebased, still quite a large PR :) What I can offer is to remove Radix sort from this PR and introduce it in the next one. |
@mzaks I'm in favour of removing Radix from this PR. I also think the Radix needs a lot more comments/design doc/implementation doc/reference, and |
Also, can we make functions other than |
My Idea was, that there is the Regarding the old std lib sort functions. I did a benchmark https://github.com/mzaks/mojo-sort/blob/main/benchmark.mojo and the old sort function performs worse through out the test. Radix sort performs incredible when the list size is higher. Hence why I added Radix sort from the get go, but now I removed it. I agree that it adds lots of complexity to this PR. |
I agree that we can and probably should expose them. I only suggested to use the prefix as indicator for unstable APIs (signature may change, they should live in
I knew that you did the benchmark, I don't question your result at all, and the Redix sort is a great addition. I meant keeping the old implementation to test the benchmarking infrastructure in the Mojo repo (CI etc.), and use the future PR which removes them as test bed. But, of course, these are merely my own views and suggests. As long as the team is ok with your approach, I'm fine with them too. Thank you for the great work. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great!!! Thanks so much for pushing on this.
I left a few minor comments that I'd like to see addressed before we land this.
stdlib/src/builtin/sort.mojo
Outdated
|
||
Args: | ||
v: Input vector to sort. | ||
list: The list of the order comparable elements which will be sorted inpace. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quibble s/inpace/in-place
0e93b77
to
9559893
Compare
The comments are addressed and the branch is rebased. |
Nice, thank you! Would you like to add a changelog entry for your awesome work here as well? Sorry I forgot to mention this yesterday. |
There are few places with typos / incosistency:
Everywhere should be : "in-place" |
We can use some more meaningful name than "D". "D" is both used here for DType and CollectionComparableElement Maybe just: "type"? |
stdlib/src/builtin/sort.mojo
Outdated
|
||
Args: | ||
v: Input integer vector to sort. | ||
list: The list of the order comparable elements which will be sorted inpace. | ||
low: Int value identifying the lowes index of the list section to be sorted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*lowest
07d2bf5
to
0770b9e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more nit. Changelog entry is not very consistent with the rest, but overall LGTM
I rephrased it a bit :) |
|
||
fn _quicksort[ | ||
type: CollectionElement, cmp_fn: fn (type, type) capturing -> Bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question Previously, this quicksort function allowed specifying a comparator; we use this functionality internally for sorting by greater_than
comparator for example instead of the default less than comparator. Do you mind adding this level of flexibility and plumbing it through the various sorting algorithms?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There a wrinkle with that, currently all sort algorithms are comparison based so it is not a problem. But Radix sort is not comparison based, hence it would mean we would have an issue to call it from the sort function, if sort function allows configurable comparison operator. I guess if the main motivation is for sorting in descending order, then we can introduce an ascending/descending parameter, instead of a sorting function. This should work for radix sort as well. If there are other motivations, then I guess we can go with operator and avoid calling radix sort from the sort function. We could still introduce it as a standalone function in the module though.
Let me know what you prefer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense. We have a couple of use cases internally for calling quick_sort
with various comparators such as >=
, <=
, or ==
on the elements when doing the comparisons. I don't think we should go the ascending/descending route since it's not much more work to be generic over any comparison function. Perhaps we should add an explicit sort_by
function that allows passing this generic functor working on comparison of two ComparableCollectionElement
values? Something like https://doc.rust-lang.org/std/primitive.slice.html#method.sort_by for example. With this particularly, we can decide whether we want to push the function (and default it to a generic less than function for sorting purposes) in the sort
API or have it be a different named function (such as the sort_by
).
I think we should go in this general direction and then think about various things as follow-ups — do we want various sorting APIs or have a one-stop-shop like in C++ here? Of note, you mentioned non-comparison based sorting algorithms wouldn't be suitable for this sort of function signature.
What do you think is best? Also CC @laszlokindrat @rparolin in case you have thoughts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The question is, how generic do we want the sort
to be.
The current internal functions signatures are:
[type: AnyRegType, cmp_fn: _cmp_fn_type](array: Pointer[type])
and
[type: CollectionElement, cmp_fn: fn (type, type) capturing -> Bool](array: UnsafePointer[type])
which are called from public signatures like:
sort(inout buff: Pointer[Int], len: Int):
fn sort[type: DType](inout buff: Pointer[Scalar[type]], len: Int):
fn sort(inout v: List[Int]):
fn sort[type: DType](inout v: List[Scalar[type]]):
and
fn sort[type: CollectionElement, cmp_fn: fn (type, type) capturing -> Bool](inout v: List[type]):
The state of the sort module actually changed a lot since my PR was opened so I think it is best to take a step back. I will make this PR less aggressive I will just introduce a new sort
function with following signature:
fn sort[type: ComparableCollectionElement](inout v: List[type]):
So the PR will introduce the initial concept of generic comparison sorting and then we can build upon that.
return right | ||
@always_inline | ||
@parameter | ||
fn _partition(low: Int, high: Int) -> Int: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion Do you mind lifting this out of _quick_sort
as a free function? We use _partition
internally and it would be nice to preserve that level of flexibility for custom sorting things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JoeLoser I just wanted to lift the _partition
function to upper level, but then I looked through the previous code again and I noticed that the top level partition
and _partition
functions expect cmp_fn: _cmp_fn_type
as a parameter, which I think is quite specific for the builtin
sort module. Or rather, we need to decide if we want a complete set of functions which expect the cmp_fn: _cmp_fn_type
parameter. Currently I implemented a set of functions for Scalar
list elements (mainly because allows some performance optimisations) and ComparableCollectionElement
list elements.
If we want to keep
alias _cmp_fn_type = fn[type: AnyRegType] (type, type) capturing -> Bool
It will make sense to introduce a third set of functions just with this signature. To be honest I would be agains it as it increases maintenance complexity for a very specific use case, but I am ok with going along with it, if it important internally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to my other comment, I think we should introduce a generic range partition that works on ComparableCollectionElement
I suppose. I'm imagining something similar to std::ranges::partition
from C++ as seen here. What do you think about that? I think partition as a general algorithm (outside of just the sorting use case) is useful and it'll evolve over time.
df7ef95
to
3227d3d
Compare
!sync |
…ements Signed-off-by: Maxim Zaks <maxim.zaks@gmail.com>
Signed-off-by: Maxim Zaks <maxim.zaks@gmail.com>
Signed-off-by: Maxim Zaks <maxim.zaks@gmail.com>
Signed-off-by: Maxim Zaks <maxim.zaks@gmail.com>
Signed-off-by: Maxim Zaks <maxim.zaks@gmail.com>
Signed-off-by: Maxim Zaks <maxim.zaks@gmail.com>
Signed-off-by: Maxim Zaks <maxim.zaks@gmail.com>
Signed-off-by: Maxim Zaks <maxim.zaks@gmail.com>
Signed-off-by: Maxim Zaks <maxim.zaks@gmail.com>
3227d3d
to
a6cff31
Compare
This is a big PR which addresses #2390 and makes following PRs:
#2517
#2378
#2409
obsolete.
It replaces the recently introduced
builtin/sort
module. I suppose that the replacement should not be problematic as the legacy sort module was introduced due to discussion in #2390 and is not used anywhere in the code base as far as I can tell.