-
Notifications
You must be signed in to change notification settings - Fork 337
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Help]: may i ask what is the diffrent between TTA and TTM? #190
Comments
Hi @rainbowjack, nice question! tl;dr: text-to-audio (TTA) includes text-to-music (TTM). You can use text-music pairs to train a TTA model, which turns into a TTM model actually, but it may require large amounts of data due to the inner structure(tempo, harmony, melody, etc.) in music piece. Theoretically, audio includes music. AudioLDM[1] denotes audio Currently, under Amphion framework, there is a TTA model based on latent diffusion. If you obtain text-music data pairs, you can use them directly to train a TTM model. However, it is important to note that music generation models generally require vast amounts of data (340k hours for noise2music[4], 280k hours for MusicLM[5], 20k hours for musicgen[6], 46k hours for singsong[7]), so if the results are not satisfactory, it is likely due to the quality of data rather than the model itself. Furthermore, we are also developing some music generation framework, stay tuned if you are interested :) [1] AudioLDM: Text-to-Audio Generation with Latent Diffusion Models |
Thank you very much, I just need to do some research on music synthesis or genre |
Is TTA included in TTM?
The text was updated successfully, but these errors were encountered: