-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-39129 [Python] pa.array: add check for byte-swapped numpy arrays inside python objects #41549
GH-39129 [Python] pa.array: add check for byte-swapped numpy arrays inside python objects #41549
Conversation
ca290d8
to
13b248a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the contribution!
python/pyarrow/tests/test_array.py
Outdated
@@ -3896,3 +3896,26 @@ def test_list_view_slice(list_view_type): | |||
j = sliced_array.offsets[1].as_py() | |||
|
|||
assert sliced_array[0].as_py() == sliced_array.values[i:j].to_pylist() == [4] | |||
|
|||
|
|||
@pytest.mark.parametrize('numpy_dtype', ['>u2', '>i4', '>f8']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we do have a nightly test runner on a big endian machine, so in that case this hardcoded test will fail?
So either we have to skip this test in that case, or rewrite the test to work in both cases (numpy has functionality to byteswap the data and change the dtype to the opposite byteorder, so to create non-native data regardless of the platform you are running the test on)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, it seems we no longer have such CI coverage because our s390x build was dropped when Travis CI support was dropped ..
But still, we have regularly reports from people testing on big endian machines, so at least adding a skipif
based on sys.byteorder
would be good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jorisvandenbossche thank you, it is a good point. I've fixed the test to always use non-native byte order
13b248a
to
f1b92d9
Compare
f1b92d9
to
baeafc1
Compare
baeafc1
to
e4c612b
Compare
@jorisvandenbossche could you please look again? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect, thank!
After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit fd84ec0. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 3 possible false positives for unstable benchmarks that are known to sometimes produce them. |
…rays inside python objects (apache#41549) ### What changes are included in this PR? This PR introduces a check to verify if the dtype of the input numpy array is byte-swapped. If it is, a not-implemented exception is raised. This precaution prevents the data from being cast incorrectly as if it were in the correct byte order, which would lead to wrong data values. ### Are these changes tested? I added a new test to check if not-implemented exception is raised - for both old (primitive types) and new (composed types) code. ### Are there any user-facing changes? No changes in API, but old code which gave incorrect results now would fail with a not-implemented exception * GitHub Issue: apache#39129 Authored-by: Konstantin Malanchev <hombit@gmail.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
What changes are included in this PR?
This PR introduces a check to verify if the dtype of the input numpy array is byte-swapped. If it is, a not-implemented exception is raised. This precaution prevents the data from being cast incorrectly as if it were in the correct byte order, which would lead to wrong data values.
Are these changes tested?
I added a new test to check if not-implemented exception is raised - for both old (primitive types) and new (composed types) code.
Are there any user-facing changes?
No changes in API, but old code which gave incorrect results now would fail with a not-implemented exception