-
Notifications
You must be signed in to change notification settings - Fork 316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The download process goes on forever #343
Comments
There is an other issue around this opened for a year which I can't
reproduce
If you can figure out on which environment it happens it would help.
…On Fri, Sep 1, 2023, 09:22 Gurbanov Novruz ***@***.***> wrote:
Hi! After downloading the files from laion2b-en with these parameters:
processes_count=32,
url_list=parquet_file,
resize_mode='no',
output_folder=output_dir,
output_format='webdataset', # Download files as a files
input_format='parquet',
url_col="URL",
caption_col="TEXT",
number_sample_per_shard=50000,
distributor='multiprocessing',
)
all files will be downloaded (I think), but then the last iteration goes
on forever and I have to stop manually. Could you look at this please?
—
Reply to this email directly, view it on GitHub
<#343>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAR437RWB5I7ZUN65S2WFY3XYGELXANCNFSM6AAAAAA4HEUUC4>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
@rom1504 I am running the download inside the docker container. Month ago, in the same docker container, it worked seamlessly. But now, I don't know why it cannot stop. I am not a pro about docker images, but if it is possible, maybe I can send you the image and you run a container and try to download some files? (img2dataset already installed) |
I think it would be useful if you can try and figure out which specific
docker config works vs which ones doesn't work
…On Fri, Sep 1, 2023, 09:34 Gurbanov Novruz ***@***.***> wrote:
@rom1504 <https://github.com/rom1504> I am running the download inside
the docker container. Month ago, in the same docker container, it worked
seamlessly. But now, I don't know why it cannot stop. I am not a pro about
docker images, but if it is possible, maybe I can send you the image and
you run a container and try to download some files? (img2dataset already
installed)
—
Reply to this email directly, view it on GitHub
<#343 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAR437TM3EE456RCCB2UD2TXYGFXLANCNFSM6AAAAAA4HEUUC4>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@rom1504 Sorry, I quite didn't get what do you mean. If the container is same, the image is same, what other configs should I check for? If you have suggestion what to check, would appreciate! |
You can check any other environment that works and then try to compare.
Maybe you changed the host if not the container?
…On Fri, Sep 1, 2023, 09:43 Gurbanov Novruz ***@***.***> wrote:
@rom1504 <https://github.com/rom1504> Sorry, I quite didn't get what do
you mean. If the container is same, the image is same, what other configs
should I check for? If you have suggestion what to check, would appreciate!
—
Reply to this email directly, view it on GitHub
<#343 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAR437RFYVNSI7SFCMKDGD3XYGG3JANCNFSM6AAAAAA4HEUUC4>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@rom1504 Interesting.. I downloaded files with the per shard parameter 10K, the download and the process finished on time. I guess, the function or something else cannot handle more shard per sample |
Hi! After downloading the files from laion2b-en with these parameters:
all files will be downloaded (I think), but then the last iteration goes on forever and I have to stop manually. Could you look at this please?
P.S. I tried this function a month ago, and it worked seamlessly. But now, no matter what I do, no matter how simple parameters I defined, it stucks.
The text was updated successfully, but these errors were encountered: