Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repos for Training and Finetuning (1 already available!) #48

Open
1 task done
kolabearafk opened this issue Mar 23, 2023 · 6 comments
Open
1 task done

Repos for Training and Finetuning (1 already available!) #48

kolabearafk opened this issue Mar 23, 2023 · 6 comments
Labels
enhancement New feature or request external

Comments

@kolabearafk
Copy link

Is there an existing issue for this?

  • I have searched the existing issues and checked the recent builds/commits

What would your feature do ?

Is there any released training code or published paper mentioning the training methods used for this model?

Proposed workflow

N/A

Additional information

No response

@kolabearafk kolabearafk added the enhancement New feature or request label Mar 23, 2023
@ExponentialML
Copy link
Contributor

I can take a shot to see if this works with current available implementation floating around.

If we're just to training the CrossAttention layers (finetuning the Psuedo Conv3D layers are tricky) and limiting the size to 256x256, It may (this is a big if) be able to fit in 24GB of VRAM.

Also, I don't know if they used a DDPM scheduler for training or the Gaussian Diffusion scheduler for training as I don't know the correlating paper for this implementation. It seems to be a mix of video diffusion and Make-A-Video.

Either way, the process should be very simple if we reference the training methods we have floating around.

  1. Add noise to video latents based on timestep.
  2. Forward through 3D conditional unet with the noisy latents.
  3. Calculate the loss with the model prediction and the noisy latents.

I'm also curious since the model already has a sufficient amount of data, you may be able to fine tune it in an unconditional way (no prompts, just video data).

@ExponentialML
Copy link
Contributor

ExponentialML commented Mar 23, 2023

I created a repository for Text2Video finetuning here using the recent Diffusers addition. Let me know how it goes if you give it a shot!

https://github.com/ExponentialML/Text-To-Video-Finetuning

@kabachuha
Copy link
Owner

kabachuha commented Mar 23, 2023

Incredible! @ExponentialML, I'll post it on Reddit if you don't mind?

Upd: posted here https://www.reddit.com/r/StableDiffusion/comments/11zhy1b/wake_up_samurai_modelscope_text2video_finetuning/

@kabachuha kabachuha changed the title [Feature Request]: Training Code Repos for Training and Finetuning (1 already available!) Mar 23, 2023
@kabachuha kabachuha pinned this issue Mar 23, 2023
@kolabearafk
Copy link
Author

@ExponentialML Wow, truly amazing. Can't wait to try it. Thank you!

@ExponentialML
Copy link
Contributor

@kabachuha Didn't realize you posted it. All good, thanks for doing it!

@23Rj20
Copy link

23Rj20 commented Apr 10, 2024

@ExponentialML Hey can you please look at this error, for finetuning it is not able to locate the files even though they are present in that folder. Pease look at this issue I need an urgent fix for this.
lorafileslocation
lorafileslocation2
loadinglora
errorgen
errorreason

I have uploaded te necessary screenshot to understand the error.
@kabachuha Can you also take a look at this please.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request external
Projects
None yet
Development

No branches or pull requests

4 participants