Issues regarding discrete WavLM and discrete HuBERT #2453

anupsingh15 · 2024-03-06T19:15:26Z

Describe the bug

Hi,
I am trying to extract tokens using the modules: speechbrain.lobes.models.huggingface_transformers.discrete_wavlm module and speechbrain.lobes.models.huggingface_transformers.discrete_hubert module, but neither seems to work due to missing model checkpoints in the SpeechBrain repo on HF. Could you please let me know of any workaround to get discrete tokens using WavLM/HuBERT?

Expected behaviour

Successful load of pre-trained models

To Reproduce

No response

Environment Details

No response

Relevant Log Output

No response

Additional Context

No response

The text was updated successfully, but these errors were encountered:

Chaanks · 2024-03-10T15:08:39Z

Hello @anupsingh15, You're right, the Kmeans HF repository is currently inaccessible due to an ongoing refactoring of the interface. A workaround could be to train your own K-means model (see SpeechBrain's LibriSpeech quantization recipe). Alternatively, I have just uploaded a list of pre-trained K-means model to my HF account (repository) that you can use until the new interface is merged.

anupsingh15 · 2024-03-11T17:43:50Z

Thanks @Chaanks. Do you plan to upload KMeans models with 1024 cluster centroids for HuBERT and WavLM? I am training the KMeans models for the same as you suggested; however, I run out of the GPU memory due to limited resources.

Adel-Moumen · 2024-04-08T14:21:44Z

Any news @Chaanks on that?

poonehmousavi · 2024-04-16T18:59:52Z

We have uploaded the models with 1000/2000 clusters for different layers in our own repo.. We plan to move all the trained kmeans to the speech \brain repo once the refactoring is done. Here you could find various K-means model trained 👍🏻 https://huggingface.co/poonehmousavi/SSL_Quantization/

anupsingh15 added the bug Something isn't working label Mar 6, 2024

Adel-Moumen assigned Chaanks Mar 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issues regarding discrete WavLM and discrete HuBERT #2453

Issues regarding discrete WavLM and discrete HuBERT #2453

anupsingh15 commented Mar 6, 2024 •

edited

Chaanks commented Mar 10, 2024 •

edited

anupsingh15 commented Mar 11, 2024

Adel-Moumen commented Apr 8, 2024

poonehmousavi commented Apr 16, 2024

Issues regarding discrete WavLM and discrete HuBERT #2453

Issues regarding discrete WavLM and discrete HuBERT #2453

Comments

anupsingh15 commented Mar 6, 2024 • edited

Describe the bug

Expected behaviour

To Reproduce

Environment Details

Relevant Log Output

Additional Context

Chaanks commented Mar 10, 2024 • edited

anupsingh15 commented Mar 11, 2024

Adel-Moumen commented Apr 8, 2024

poonehmousavi commented Apr 16, 2024

anupsingh15 commented Mar 6, 2024 •

edited

Chaanks commented Mar 10, 2024 •

edited