Batched mapping does not raise an error if values for an existing column are empty #6879

felix-schneider · 2024-05-07T11:02:40Z

Describe the bug

Using Dataset.map(fn, batched=True) allows resizing the dataset by returning a dict of lists, all of which must be the same size. If they are not the same size, an error like pyarrow.lib.ArrowInvalid: Column 1 named x expected length 1 but got length 0 is raised.

This is not the case if the function returns an empty list for an existing column in the dataset. In that case, the dataset is silently resized to 0 rows.

Steps to reproduce the bug

MWE:

import datasets
data = datasets.Dataset.from_dict({"test": [1]})

def mapping_fn(examples):
    return {"test": [], "y": [1]}

data = data.map(mapping_fn, batched=True)
print(len(data))

Note that when returning "x": [], the error is raised correctly, also when returning "test": [1,2].

Expected behavior

Expected an exception: pyarrow.lib.ArrowInvalid: Column 1 named test expected length 1 but got length 0 or pyarrow.lib.ArrowInvalid: Column 2 named y expected length 0 but got length 1.

Any exception would be acceptable.

Environment info

datasets version: 2.19.1
Platform: Linux-5.4.0-153-generic-x86_64-with-glibc2.31
Python version: 3.11.8
huggingface_hub version: 0.22.2
PyArrow version: 15.0.2
Pandas version: 2.2.1
fsspec version: 2024.2.0

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batched mapping does not raise an error if values for an existing column are empty #6879

Batched mapping does not raise an error if values for an existing column are empty #6879

felix-schneider commented May 7, 2024

Batched mapping does not raise an error if values for an existing column are empty #6879

Batched mapping does not raise an error if values for an existing column are empty #6879

Comments

felix-schneider commented May 7, 2024

Describe the bug

Steps to reproduce the bug

Expected behavior

Environment info