Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression: pandas.read_parquet hangs when using filprofiler 2022.09.0 #415

Open
kdebrab opened this issue Sep 20, 2022 · 4 comments
Open
Labels
bug Something isn't working NEXT

Comments

@kdebrab
Copy link

kdebrab commented Sep 20, 2022

I hope the following is sufficient for reproducing the issue.

Writing with df.to_parquet goes fine, it's when reading the data back with pd.read_parquet that the code hangs. The parquet engine used is pyarrow. No error is raised, the docker container simply hangs forever.

python: 3.10.7
OS: Linux
pandas: 1.4.4
numpy: 1.23.3
pyarrow: 9.0.0

Disabling filprofiler (I use the api with a conditional environment variable as documented in https://pythonspeed.com/fil/docs/api.html#using-the-python-api) resolves the issue. Also reverting to filprofiler 2022.06.0 (with everything else exactly the same) resolves the issue.

@itamarst itamarst added bug Something isn't working NEXT labels Sep 20, 2022
@itamarst
Copy link
Collaborator

Thanks for the detailed bug report. I will try to reproduce, and if I fail I will ask for more details.

@itamarst
Copy link
Collaborator

Hi, I am an unable to reproduce with a random parquet file I have lying around. Could you share a minimal reproducer if you can make one? Python script + parquet file, ideally.

@itamarst
Copy link
Collaborator

@kdebrab just checking again, would love to get this fixed.

@itamarst
Copy link
Collaborator

@kdebrab could you provide a reproducer?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working NEXT
Projects
None yet
Development

No branches or pull requests

2 participants