Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IterableDataset set_epoch is ignored when DataLoader persistent_workers=True #6673

Open
rwightman opened this issue Feb 16, 2024 · 0 comments · May be fixed by #6710
Open

IterableDataset set_epoch is ignored when DataLoader persistent_workers=True #6673

rwightman opened this issue Feb 16, 2024 · 0 comments · May be fixed by #6710
Labels
bug Something isn't working streaming

Comments

@rwightman
Copy link

rwightman commented Feb 16, 2024

Describe the bug

When persistent workers are enabled, the epoch that's set via the IterableDataset instance held by the training process is ignored by the workers as they are disconnected across processes.

PyTorch samplers for non-iterable datasets have a mechanism to sync this, datasets.IterableDataset does not.

In my own use of IterableDatasets I usually track the epoch count which crosses process boundaries in a multiprocessing.Value

Steps to reproduce the bug

Use a streaming dataset (Iterable) w/ the recommended pattern below and persistent_workers=True in the torch DataLoader.

for epoch in range(epochs):
    shuffled_dataset.set_epoch(epoch)
    for example in shuffled_dataset:
        ...

Expected behavior

When the canonical bit of code above is used with num_workers > 0 and persistent_workers=True, the epoch set via set_epoch() is propagated to the IterableDataset instances in the worker processes

Environment info

N/A

@lhoestq lhoestq added bug Something isn't working streaming labels Feb 22, 2024
@lhoestq lhoestq linked a pull request Mar 2, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working streaming
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants