Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use CTC beam search decoder with subword encoding. #3750

Open
DomainFlag opened this issue Oct 12, 2022 · 0 comments
Open

Use CTC beam search decoder with subword encoding. #3750

DomainFlag opened this issue Oct 12, 2022 · 0 comments

Comments

@DomainFlag
Copy link

DomainFlag commented Oct 12, 2022

I'm using the scorer generator provided generate_scorer_package. I'm also using (e.g., SentencePiece) to build a unigram language model, where the decoder predicts the size of the language model. How can I adapt the scorer such that it supports sub-word units? Will scorer work if filling the alphabet file with the sub-word units? Or shall I rely on some tricks like encoding the unigram language model using an ASCII table and re-encoding the corpus and use the alphabet based on the previous encoding mapping? Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant