-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use cython to speed-up wrap angle #16326
base: main
Are you sure you want to change the base?
Conversation
Thank you for your contribution to Astropy! 🌌 This checklist is meant to remind the package maintainers who will review this pull request of some common things to look for.
|
@neutrinoceros How can I run the same benchmarks as you did in #16222? |
you need to apply the |
Could you add it? I don't have the permissions to modify labels |
Neither do I, actually 😅 |
The test failure looks unrelated at first glance... any idea? https://github.com/astropy/astropy/actions/runs/8798691509/job/24146244116?pr=16326 |
That is a merged PR, there is nothing to keep open. Was there a typo? |
Sorry, I meant to tag this issue here: #13479 which is just a switch of the last two numbers, which is funny that it turns out to be related issues / PRs |
Needs:
Thanks! |
We can skip those tests in OSX or downgrade that job to older mac x86 runners. But we can discuss in a different issue. |
Re: |
Hmm is this related or just VM being flaky?
|
The uncertainty is 50 % of the value... |
Now I am very confused... I checked also creating an |
Scratch that, I must have fooled myself somehow with switching branches and editable installs... |
cdef dtype wrap_angle_floor | ||
wrap_angle_floor = wrap_angle - period | ||
|
||
if dtype is float or dtype is double: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just out of curiosity as I haven't seen the cython ufunc stuff before, are these checks going to be done for every single value? Or does this get optimised out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checks on fused dtypes are essentially like if constexpr
, the function is compiled for each member of the fused type separately
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if the cython wrapping function could be more generally useful in other parts of astropy - for example for folding time series. Perhaps we should think of whether it can be written with more general variable names and placed in e.g. utils or similar, for use in different sub-packages? (only if that generalization does not impact performance). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that it makes the readonly
case an explicit error, which is something I missed when I last worked on this.
I'm assuming that perf regressions aren't real (for now), so I'm ignoring this aspect in this review.
astropy/coordinates/angles/core.py
Outdated
wrap_angle = wrap_angle.to_value(self.unit).astype( | ||
self_angle.dtype, copy=COPY_IF_NEEDED | ||
) | ||
if not self_angle.data.readonly: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to write this logic in a flatter style, such as
if self_angle.data.readonly and needs_wrapping(self_angle, wrap_angle, a360).any():
raise ValueError(...)
wrap_at(...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice suggestion, will change
|
||
np.import_array() | ||
|
||
ctypedef fused dtype: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dtype
feels unexpected here since it has a different meaning in numpy, so I'd suggest using something more idiomatic in templates, like T
ctypedef fused dtype: | |
ctypedef fused T: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is how dtype is used in other astropy cython code, e.g. here: https://github.com/astropy/astropy/blob/main/astropy/stats/_stats.pyx
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well the file you linked uses it to fuse numpy data types, which in my view is a little different. But in any case this is just a nit. If nobody else finds this confusing feel free to ignore this suggestion.
if (angle >= wrap_angle_floor) and (angle < wrap_angle): | ||
return angle | ||
|
||
cdef int wraps |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is int
guaranteed to be 4 bytes long in Cython ? I'm worried that using platform-dependent-sized types might lead to very subtle errors in the distant future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is the native c type, but I really don't expect issues here. It would mean that a value wraps more often than 2 * pi * int max, which seems impossible...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point. I've been bitten by this kind of problem before which is why I may have jumped on this one a bit quickly. Agreed that it's unlikely enough that it doesn't matter in practice !
if NUMPY_LT_2_0: | ||
# Ensure ndim>=1 so that comparison is done using the angle dtype. | ||
self_angle = self_angle[np.newaxis] | ||
a360 = np.asarray(u.degree.to(self.unit, 360.0), dtype=self_angle.dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not 100% sure that np.asarray
would avoid a copy in scenarios where it's not needed. Is this any better ?
a360 = np.asarray(u.degree.to(self.unit, 360.0), dtype=self_angle.dtype) | |
a360 = u.degree.to_value(self.unit, 360.0).astype(self_angle.dtype, copy=COPY_IF_NEEDED) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is the recommended way as of the numpy 2.0 migration guide: https://numpy.org/devdocs/numpy_2_0_migration_guide.html#adapting-to-changes-in-the-copy-keyword
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the asarray
is needed to make it something that has astype
in the first place.
Also, units don't have to_value
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In [7]: u.degree.to(u.rad, 360.0)
Out[7]: 6.283185307179586
In [8]: u.degree.to(u.rad, 360.0).astype(np.float32)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[8], line 1
----> 1 u.degree.to(u.rad, 360.0).astype(np.float32)
AttributeError: 'float' object has no attribute 'astype'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also: this will always copy since we will convert a python float to the wished dtype
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh my, 3 mistakes in one line of suggestion, this must be my new record 🎉
(seriously though, thanks for checking, and sorry I wasn't careful enough here)
Please add a |
I'm not expert with
|
For fun: if we don't think we need the fused dtypes, we could also just use a plain C extension which compiles and runs faster: https://gist.github.com/astrofrog/136d15ebd650b4465060e314fccd7988 - runs about twice as fast (as Cython) for large arrays and about 10x faster for small arrays. In practice actually need to do the whole thing with the outer/inner loop to be able to deal with non-contiguous arrays and endian etc but just thought I'd try it out as an experiment (I did not actually write a single line of that module, it was an experiment with AI-generated code!). If this really is a bottleneck in coordinates, it might be worth writing the C ourselves - in a number of projects I've worked on before I've found you can often typically get 2x better performance when dealing with large arrays. To be clear though, we can still merge this PR with Cython, this is more of a 'in future if it's still a bottleneck' comment. |
Thanks @astrofrog, I remember dabbling around with writing a ufunc in C by hand a year ago or so, but ran into all the edge cases we test for
I am happy to give it a try again though, but I am no expert in writing c numpy code by hand. At the time I asked @mhvk if we could do a pair programming session, but we never followed up on it |
I have looked at the cython implementations and find them quite simple, which is good, but still they make the code less clear and add some 100kiB of binary library. I wonder whether the speed benefits are actually worth it for the larger question of wanting to speed up coordinate transformations, etc. In particular, from #13479, just initiating an |
Description
This is a follow-up to #13510, #13497 and #16222 to further improve up 13479.
I now use
cython.ufunc
to get a large speedup, factors of 2 to 3 for larger arrays, but faster in all cases I tested compared to current main (which already includes the improvements of #16222).I'll open a similar one for the checks in
Latitude
which remain a bottleneck.I'd also suggest to keep open
#13497#13479 for the time being, it's still true that these functions are bottlenecks, even with the recent improvements.