ConsistencyTTA: Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation

This is the official website for the paper
"Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation"
from Microsoft Applied Science Group and UC Berkeley
by Yatong Bai, Trung Dang, Dung Tran, Kazuhito Koishida, and Somayeh Sojoudi.

[Preprint Paper] [Project Homepage] [Code] [Model Checkpoints] [Generation Examples]

Main Experiment results

Our method reduce the computation of the core step of diffusion-based text-to-audio generation by a factor of 400, while observing minimal performance degradation in terms of Fréchet Audio Distance (FAD), Fréchet Distance (FD), KL Divergence, and CLAP Scores.

	# queries (↓)	CLAP_T (↑)	CLAP_A (↑)	FAD (↓)	FD (↓)	KLD (↓)
Diffusion (Baseline)	400	24.57	72.79	1.908	19.57	1.350
Consistency + CLAP FT (Ours)	1	24.69	72.54	2.406	20.97	1.358
Consistency (Ours)	1	22.50	72.30	2.575	22.08	1.354

This benchmark demonstrates how our single-step models stack up with previous methods, most of which mostly require hundreds of generation steps.

Cite Our Work (BibTeX)

@article{bai2023accelerating,
  title={Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation},
  author={Bai, Yatong and Dang, Trung and Tran, Dung and Koishida, Kazuhito and Sojoudi, Somayeh},
  journal={arXiv preprint arXiv:2309.10740},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
audio		audio
compare_seed		compare_seed
demo_audio		demo_audio
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
demo-anony.html		demo-anony.html
demo.html		demo.html
diversity-anony.html		diversity-anony.html
diversity.html		diversity.html
evaluation-anony.html		evaluation-anony.html
evaluation.html		evaluation.html
google82187a00f12e3305.html		google82187a00f12e3305.html
index-anony.html		index-anony.html
index.html		index.html
js_script.js		js_script.js
main_figure_.png		main_figure_.png
report.pdf		report.pdf
sitemap.xml		sitemap.xml
styles.css		styles.css

License

Consistency-TTA/consistency-tta.github.io

Folders and files

Latest commit

History

Repository files navigation

ConsistencyTTA: Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation

Main Experiment results

Cite Our Work (BibTeX)

About

Topics

Resources

License

Stars

Watchers

Forks

Languages