New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OSError: [Errno 24] Too many open files #6877
Comments
ulimit -n 8192 can solve this problem |
Would there be a systematic way to do this ? The data loading is part of the MTEB library |
I think we could modify the _prepare_split_single function |
fix bug huggingface#6877 due to f become invaild after yield process
I fixed it with #6893, feel free to re-open if you're still having the issue :) |
Thanks a lot! |
Describe the bug
I am trying to load the 'default' subset of the following dataset which contains lots of files (828 per split): https://huggingface.co/datasets/mteb/biblenlp-corpus-mmteb
When trying to load it using the
load_dataset
function I get the following errorI looked for the maximum number of open files on my machine (Ubuntu 24.04) and it seems to be 1024, but even when I try to load a single split (
load_dataset('mteb/biblenlp-corpus-mmteb', split='train')
) I get the same errorSteps to reproduce the bug
Expected behavior
Load the dataset without error
Environment info
datasets
version: 2.19.0huggingface_hub
version: 0.23.0fsspec
version: 2024.3.1The text was updated successfully, but these errors were encountered: