Awesome Audio Encoding

Awesome Audio Encoding
- Papers and Projects
- References

Papers and Projects

snac - hubertsiuzdak

Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate · (huggingface)
ESC: Efficient Speech Coding with Cross-Scale Residual Vector Quantized Transformers, arXiv, 2404.19441, arxiv, pdf, cication: -1

Yuzhe Gu, Enmao Diao · (efficient-speech-codec - yzGuu830)
SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General Sound, arXiv, 2405.00233, arxiv, pdf, cication: -1

Haohe Liu, Xuenan Xu, Yi Yuan, Mengyue Wu, Wenwu Wang, Mark D. Plumbley · (haoheliu.github)
PromptCodec: High-Fidelity Neural Speech Codec using Disentangled Representation Learning based Adaptive Feature-aware Prompt Encoders, arXiv, 2404.02702, arxiv, pdf, cication: -1

Yu Pan, Lei Ma, Jianjun Zhao
Amphion - open-mmlab

Speech Codec with Attribute Factorization used for NaturalSpeech 3
Language-Codec: Reducing the Gaps Between Discrete Codec Representation and Speech Language Models, arXiv, 2402.12208, arxiv, pdf, cication: -1

Shengpeng Ji, Minghui Fang, Ziyue Jiang, Rongjie Huang, Jialung Zuo, Shulei Wang, Zhou Zhao · (languagecodec - jishengpeng) · (languagecodec.github)
funcodec - alibaba-damo-academy
sonar - facebookresearch

SONAR, a new multilingual and multimodal fixed-size sentence embedding space, with a full suite of speech and text encoders and decoders.
High-Fidelity Audio Compression with Improved RVQGAN, arXiv, 2306.06546, arxiv, pdf, cication: -1

Rithesh Kumar, Prem Seetharaman, Alejandro Luebs, Ishaan Kumar, Kundan Kumar · (descript-audio-codec - descriptinc)
SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models, arXiv, 2308.16692, arxiv, pdf, cication: -1

Xin Zhang, Dong Zhang, Shimin Li, Yaqian Zhou, Xipeng Qiu · (speechtokenizer - zhangxinfd)
SoundStorm: Efficient Parallel Audio Generation, arXiv, 2305.09636, arxiv, pdf, cication: -1

Zalán Borsos, Matt Sharifi, Damien Vincent, Eugene Kharitonov, Neil Zeghidour, Marco Tagliasacchi
DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning, arXiv, 2305.10005, arxiv, pdf, cication: -1

Alexander H. Liu, Heng-Jui Chang, Michael Auli, Wei-Ning Hsu, James R. Glass
High Fidelity Neural Audio Compression, arXiv, 2210.13438, arxiv, pdf, cication: -1

Alexandre Défossez, Jade Copet, Gabriel Synnaeve, Yossi Adi
EnCodec
- notebooks/use_encodec_w_transformers.ipynb at main · Vaibhavs10/notebooks · GitHub
- code: GitHub - facebookresearch/encodec: State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.
SoundStream: An End-to-End Neural Audio Codec, arXiv, 2107.03312, arxiv, pdf, cication: -1

Neil Zeghidour, Alejandro Luebs, Ahmed Omran, Jan Skoglund, Marco Tagliasacchi
- GitHub - lucidrains/vector-quantize-pytorch: Vector Quantization, in Pytorch
HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units, arXiv, 2106.07447, arxiv, pdf, cication: -1

Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations, arXiv, 2006.11477, arxiv, pdf, cication: -1

Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli

References

【機器學習2023】語音基石模型 (助教張凱為講授) (1/2) - YouTube
【機器學習2023】語音基石模型 (助教張凱為講授) (2/2) - YouTube
https://speech.ee.ntu.edu.tw/~hylee/ml/ml2023-course-data/張凱爲-x-機器學習-x-語音基石模型.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

awesome_audio_encoding.md

awesome_audio_encoding.md

Awesome Audio Encoding

Papers and Projects

References

Files

awesome_audio_encoding.md

Latest commit

History

awesome_audio_encoding.md

File metadata and controls

Awesome Audio Encoding

Papers and Projects

References