You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I want to be able to use cached dataset from HuggingFace even when I have no Internet connection (or when HuggingFace servers are down, or my company has network issues).
The problem why I can't use it: data_files argument from datasets.load_dataset() function get it updates from the server before calculating hash for caching. As a result, when I run the same code with and without Internet I get different dataset configuration directory name.
Second solution also shows where to find the bug. I suggest that the hashing functions should always use only original parameter data_files, and not the one they get after connecting to the server and creating DataFilesDict
Just tested. It doesn't work, because of the exact problem I described above: hash of dataset config is different.
The only error difference is the reason why it cannot connect to HuggingFace (now it's 'offline mode is enabled')
Describe the bug
I want to be able to use cached dataset from HuggingFace even when I have no Internet connection (or when HuggingFace servers are down, or my company has network issues).
The problem why I can't use it:
data_files
argument fromdatasets.load_dataset()
function get it updates from the server before calculating hash for caching. As a result, when I run the same code with and without Internet I get different dataset configuration directory name.Steps to reproduce the bug
Expected behavior
When running without the Internet connection, the loader should be able to get dataset from cache
Environment info
datasets
version: 2.19.0huggingface_hub
version: 0.22.2fsspec
version: 2023.12.2The text was updated successfully, but these errors were encountered: