Skip to content
/ SPMNet Public

Source code for "Visually aligned sound generation via sound-producing motion parsing" (Published at Neurocomputing)

Notifications You must be signed in to change notification settings

mx-mark/SPMNet

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 

Repository files navigation

Visually aligned sound generation via sound-producing motion parsing [paper]

SPMNet

Overview

We propose to tame the visually aligned sound generation by projecting the sound-producing motion to a discriminative temporal visual embedding. This visual embedding can, then, distinguish the transient visual motion from complex background information. which leads to produce high temporal-wise alignment sounds. We refer to it as SPMNet.

News

Code, pre-trained models and all demos will be released here. Welcome to watch this repository for the latest updates.

Demo

Dog

dog_1.mp4
dog_6.mp4

Drum

drum_1.mp4
drum_2.mp4

Firework

firework_1.mp4
firework_2.mp4

Listen for the audio samples on our materials.

Citation

Our paper was accepted by Neurocomputing. Please use this bibtex if you would like to cite our work

@article{Ma2022VisuallyAS,
  title={Visually Aligned Sound Generation via Sound-Producing Motion Parsing},
  author={Xin Ma and Wei Zhong and Long Ye and Qin Zhang},
  journal={Neurocomputing},
  year={2022}
}

Acknowledgments

We acknowledge the following work:

  • The code base is built upon RegNet repo.
  • Thanks to SpecVQGAN open source efforts.

About

Source code for "Visually aligned sound generation via sound-producing motion parsing" (Published at Neurocomputing)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published