Hi! LibriSpeech char training!! #2385

scj0709 · 2024-02-02T09:54:32Z

Describe the bug

Hello!
I'm really impressed with your code. It's well-structured and a great GitHub repository.
As a result, I'd like to train the LibriSpeech Transformer model following your procedure. However, when I attempted to train it using character tokens by setting the token type to 'char,' I encountered the following error. It seems to be related to padding. image

Expected behaviour

Could you provide any solutions for this issue?

To Reproduce

No response

Environment Details

No response

Relevant Log Output

No response

Additional Context

No response

Adel-Moumen · 2024-04-10T14:42:15Z

Hello @scj0709,

Thanks for opening this issue.

Could you please share with me which YAML you are using to run into this error?

The issue is that the transformer YAMLs that we have in the LibriSpeech folder are using "transformerlm" which has been trained with a SentencePiece BPE tokenizer. We are using the same exact tokenizer, and therefore you cannot change the granularity of your tokenizer.

This is why I'm surprised that you ran into this issue. Do you mind sharing the YAML with me, please?

Thanks and have a great day.

scj0709 added the bug Something isn't working label Feb 2, 2024

Adel-Moumen self-assigned this Apr 7, 2024

Adel-Moumen added this to the v1.0.1 milestone Apr 8, 2024

Adel-Moumen assigned asumagic Apr 8, 2024

asumagic removed this from the v1.0.1 milestone Apr 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hi! LibriSpeech char training!! #2385

Hi! LibriSpeech char training!! #2385

scj0709 commented Feb 2, 2024

Adel-Moumen commented Apr 10, 2024

Hi! LibriSpeech char training!! #2385

Hi! LibriSpeech char training!! #2385

Comments

scj0709 commented Feb 2, 2024

Describe the bug

Expected behaviour

To Reproduce

Environment Details

Relevant Log Output

Additional Context

Adel-Moumen commented Apr 10, 2024