You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Dataset is 10X slower when applying trivial transforms:
import time
import numpy as np
from datasets import Dataset, Features, Array2D
a = np.zeros((800, 800))
a = np.stack([a] * 1000)
features = Features({"a": Array2D(shape=(800, 800), dtype="uint8")})
ds1 = Dataset.from_dict({"a": a}, features=features).with_format('numpy')
def transform(batch):
return batch
ds2 = ds1.with_transform(transform)
%time sum(1 for _ in ds1)
%time sum(1 for _ in ds2)
CPU times: user 472 ms, sys: 319 ms, total: 791 ms
Wall time: 794 ms
CPU times: user 9.32 s, sys: 443 ms, total: 9.76 s
Wall time: 9.78 s
In my real code I'm using set_transform to apply some post-processing on-the-fly for the 2d array, but it significantly slows down the dataset even if the transform itself is trivial.
Describe the bug
Dataset is 10X slower when applying trivial transforms:
In my real code I'm using set_transform to apply some post-processing on-the-fly for the 2d array, but it significantly slows down the dataset even if the transform itself is trivial.
Related issue: #5841
Steps to reproduce the bug
Use code in the description to reproduce.
Expected behavior
Trivial custom transform in the example should not slowdown the dataset iteration.
Environment info
datasets
version: 2.18.0huggingface_hub
version: 0.20.2fsspec
version: 2023.12.2The text was updated successfully, but these errors were encountered: