Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About audio and video frames align #129

Open
zyhsuperman opened this issue Apr 21, 2024 · 3 comments
Open

About audio and video frames align #129

zyhsuperman opened this issue Apr 21, 2024 · 3 comments

Comments

@zyhsuperman
Copy link

非常感谢您的分享。
想问您一个我一直没搞清楚的问题:
对于使用mel的模型,我清楚mel是怎么和每个视频帧对齐的。
但是使用wav2vec2的模型,音频特征是怎么和每个视频帧对齐的呢?

@Zejun-Yang
Copy link
Owner

#131
感谢您的关注。可以参考我们在这个issue中公开的代码,采用audio_encoder将音频特征转换为与视频帧对应的分片。

@zyhsuperman
Copy link
Author

感谢!

@zyhsuperman
Copy link
Author

在看了您的131问题中提供的代码片段之后,有一个问题:
我有观察到似乎所有的数据都是共用了同一个neutral_face, 所以想问一下您这个neutral_face是怎么得到的呢?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants