You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Using Dataset.map(fn, batched=True) allows resizing the dataset by returning a dict of lists, all of which must be the same size. If they are not the same size, an error like pyarrow.lib.ArrowInvalid: Column 1 named x expected length 1 but got length 0 is raised.
This is not the case if the function returns an empty list for an existing column in the dataset. In that case, the dataset is silently resized to 0 rows.
Steps to reproduce the bug
MWE:
import datasets
data = datasets.Dataset.from_dict({"test": [1]})
def mapping_fn(examples):
return {"test": [], "y": [1]}
data = data.map(mapping_fn, batched=True)
print(len(data))
Note that when returning "x": [], the error is raised correctly, also when returning "test": [1,2].
Expected behavior
Expected an exception: pyarrow.lib.ArrowInvalid: Column 1 named test expected length 1 but got length 0 or pyarrow.lib.ArrowInvalid: Column 2 named y expected length 0 but got length 1.
Describe the bug
Using
Dataset.map(fn, batched=True)
allows resizing the dataset by returning a dict of lists, all of which must be the same size. If they are not the same size, an error likepyarrow.lib.ArrowInvalid: Column 1 named x expected length 1 but got length 0
is raised.This is not the case if the function returns an empty list for an existing column in the dataset. In that case, the dataset is silently resized to 0 rows.
Steps to reproduce the bug
MWE:
Note that when returning
"x": []
, the error is raised correctly, also when returning"test": [1,2]
.Expected behavior
Expected an exception:
pyarrow.lib.ArrowInvalid: Column 1 named test expected length 1 but got length 0
orpyarrow.lib.ArrowInvalid: Column 2 named y expected length 0 but got length 1
.Any exception would be acceptable.
Environment info
datasets
version: 2.19.1huggingface_hub
version: 0.22.2fsspec
version: 2024.2.0The text was updated successfully, but these errors were encountered: