-
snac - hubertsiuzdak
Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate · (huggingface)
-
ESC: Efficient Speech Coding with Cross-Scale Residual Vector Quantized Transformers,
arXiv, 2404.19441
, arxiv, pdf, cication: -1Yuzhe Gu, Enmao Diao · (efficient-speech-codec - yzGuu830)
-
SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General Sound,
arXiv, 2405.00233
, arxiv, pdf, cication: -1Haohe Liu, Xuenan Xu, Yi Yuan, Mengyue Wu, Wenwu Wang, Mark D. Plumbley · (haoheliu.github)
-
PromptCodec: High-Fidelity Neural Speech Codec using Disentangled Representation Learning based Adaptive Feature-aware Prompt Encoders,
arXiv, 2404.02702
, arxiv, pdf, cication: -1Yu Pan, Lei Ma, Jianjun Zhao
-
Amphion - open-mmlab
Speech Codec with Attribute Factorization used for NaturalSpeech 3
-
Language-Codec: Reducing the Gaps Between Discrete Codec Representation and Speech Language Models,
arXiv, 2402.12208
, arxiv, pdf, cication: -1Shengpeng Ji, Minghui Fang, Ziyue Jiang, Rongjie Huang, Jialung Zuo, Shulei Wang, Zhou Zhao · (languagecodec - jishengpeng) · (languagecodec.github)
-
funcodec - alibaba-damo-academy
-
sonar - facebookresearch
SONAR, a new multilingual and multimodal fixed-size sentence embedding space, with a full suite of speech and text encoders and decoders.
-
High-Fidelity Audio Compression with Improved RVQGAN,
arXiv, 2306.06546
, arxiv, pdf, cication: -1Rithesh Kumar, Prem Seetharaman, Alejandro Luebs, Ishaan Kumar, Kundan Kumar · (descript-audio-codec - descriptinc)
-
SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models,
arXiv, 2308.16692
, arxiv, pdf, cication: -1Xin Zhang, Dong Zhang, Shimin Li, Yaqian Zhou, Xipeng Qiu · (speechtokenizer - zhangxinfd)
-
SoundStorm: Efficient Parallel Audio Generation,
arXiv, 2305.09636
, arxiv, pdf, cication: -1Zalán Borsos, Matt Sharifi, Damien Vincent, Eugene Kharitonov, Neil Zeghidour, Marco Tagliasacchi
-
DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning,
arXiv, 2305.10005
, arxiv, pdf, cication: -1Alexander H. Liu, Heng-Jui Chang, Michael Auli, Wei-Ning Hsu, James R. Glass
-
High Fidelity Neural Audio Compression,
arXiv, 2210.13438
, arxiv, pdf, cication: -1Alexandre Défossez, Jade Copet, Gabriel Synnaeve, Yossi Adi
-
SoundStream: An End-to-End Neural Audio Codec,
arXiv, 2107.03312
, arxiv, pdf, cication: -1Neil Zeghidour, Alejandro Luebs, Ahmed Omran, Jan Skoglund, Marco Tagliasacchi
-
HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units,
arXiv, 2106.07447
, arxiv, pdf, cication: -1Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed
-
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations,
arXiv, 2006.11477
, arxiv, pdf, cication: -1Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli