-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GOOGLE COLLAB works well for 2 days, then breaks. Why? #2827
Comments
always CUDA SETUP FAILS...... |
its working now, but after i change the google runtime from L4 to T4, and yesterday i used an A100 no issues....... maybe its an error on both sides? google GPU and the collab page... |
@TheLastBen i found the glitch - is when using L4 GPU, it will give a CUDA SETUP ERROR, and on A100 and T4 you dont get an error..... the bad side of this is that we are paying for google credits or PRO, and cannot use faster GPUs because A100 is NOT always available and its 11.30 credits PER HOUR compared to L4 that is 4 credits and hour....... so at the end, we pay ONLY for MORE TIME instead of FASTER GPUs, if A100 is not available, since L4 will give CUDA ERROR.... Are you aware of this issue? |
I'm aware, I'll try to find a fix |
I get a good model for a day or two, then next training i get this:
Traceback (most recent call last):
File "/content/diffusers/examples/dreambooth/train_dreambooth.py", line 803, in
main()
File "/content/diffusers/examples/dreambooth/train_dreambooth.py", line 535, in main
import bitsandbytes as bnb
File "/usr/local/lib/python3.10/dist-packages/bitsandbytes/init.py", line 6, in
from .autograd._functions import (
File "/usr/local/lib/python3.10/dist-packages/bitsandbytes/autograd/_functions.py", line 5, in
import bitsandbytes.functional as F
File "/usr/local/lib/python3.10/dist-packages/bitsandbytes/functional.py", line 13, in
from .cextension import COMPILED_WITH_CUDA, lib
File "/usr/local/lib/python3.10/dist-packages/bitsandbytes/cextension.py", line 41, in
lib = CUDALibrary_Singleton.get_instance().lib
File "/usr/local/lib/python3.10/dist-packages/bitsandbytes/cextension.py", line 37, in get_instance
cls._instance.initialize()
File "/usr/local/lib/python3.10/dist-packages/bitsandbytes/cextension.py", line 27, in initialize
raise Exception('CUDA SETUP: Setup Failed!')
Exception: CUDA SETUP: Setup Failed!
Traceback (most recent call last):
File "/usr/local/bin/accelerate", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/accelerate_cli.py", line 43, in main
args.func(args)
File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py", line 837, in launch_command
simple_launcher(args)
File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py", line 354, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', '/content/diffusers/examples/dreambooth/train_dreambooth.py', '--image_captions_filename', '--train_only_unet', '--save_starting_step=500', '--save_n_steps=0', '--Session_dir=/content/gdrive/MyDrive/Fast-Dreambooth/Sessions/NicoleTEST768-TEXT4NXI', '--pretrained_model_name_or_path=/content/stable-diffusion-v1-5', '--instance_data_dir=/content/gdrive/MyDrive/Fast-Dreambooth/Sessions/NicoleTEST768-TEXT4NXI/instance_images', '--output_dir=/content/models/NicoleTEST768-TEXT4NXI', '--captions_dir=/content/gdrive/MyDrive/Fast-Dreambooth/Sessions/NicoleTEST768-TEXT4NXI/captions', '--instance_prompt=', '--seed=869457', '--resolution=768', '--mixed_precision=fp16', '--train_batch_size=1', '--gradient_accumulation_steps=1', '--use_8bit_adam', '--learning_rate=2e-06', '--lr_scheduler=linear', '--lr_warmup_steps=0', '--max_train_steps=1500']' returned non-zero exit status 1.
Something went wrong
The text was updated successfully, but these errors were encountered: