Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Continue pre-training got RuntimeError: Failed processing /tmp/data #1413

Open
BestJiayi opened this issue May 13, 2024 · 4 comments
Open

Continue pre-training got RuntimeError: Failed processing /tmp/data #1413

BestJiayi opened this issue May 13, 2024 · 4 comments
Labels
3rd party bug Something isn't working pre-training

Comments

@BestJiayi
Copy link

BestJiayi commented May 13, 2024

How can I solve this problem?

I download pythia-160m from hugging face.
My data was downloaded according to official documents.
This program is running in NVIDIA docker.

I followed the official documentation and continued pre-training, but an error occurred: RuntimeError: Failed processing /tmp/data.

litgpt pretrain
--model_name pythia-160m
--tokenizer_dir checkpoints/EleutherAI/pythia-160m
--initial_checkpoint_dir checkpoints/EleutherAI/pythia-160m
--data TextFiles
--data.train_data_path "custom_texts"
--out_dir out/custom_model

I got:
RuntimeError:
We found the following error Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/litdata/processing/data_processor.py", line 628, in _handle_data_chunk_recipe
for item_data in item_data_or_generator:
File "/usr/local/lib/python3.10/dist-packages/litdata/processing/functions.py", line 151, in _prepare_item_generator
yield from self._fn(item_metadata) # type: ignore
File "/usr/local/lib/python3.10/dist-packages/litgpt/data/text_files.py", line 124, in tokenize
with open(filename, "r", encoding="utf-8") as file:
IsADirectoryError: [Errno 21] Is a directory: '/tmp/data'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/litdata/processing/data_processor.py", line 423, in run
self._loop()
File "/usr/local/lib/python3.10/dist-packages/litdata/processing/data_processor.py", line 472, in _loop
self._handle_data_chunk_recipe(index)
File "/usr/local/lib/python3.10/dist-packages/litdata/processing/data_processor.py", line 638, in _handle_data_chunk_recipe
raise RuntimeError(f"Failed processing {self.items[index]}") from e
RuntimeError: Failed processing /tmp/data

@carmocca
Copy link
Contributor

Same issue as in #1402

cc @awaelchli

@carmocca carmocca added bug Something isn't working pre-training 3rd party labels May 13, 2024
@BestJiayi
Copy link
Author

@carmocca Please tell me, if I want to continue using litgpt for pre-training, what should I do? Should we wait until the bug is fixed before using litgpt? thank you!

@carmocca
Copy link
Contributor

Are you using Google Colab? You could try using https://lightning.ai while this gets fixed. It should work there without issues

@BestJiayi
Copy link
Author

Thank you, I am currently using litgpt under our company's gpu cluster. I will wait for the issue to be fixed before continuing to use it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3rd party bug Something isn't working pre-training
Projects
None yet
Development

No branches or pull requests

2 participants