Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inference with cuda #969

Open
hieuhv94 opened this issue Apr 28, 2021 · 4 comments
Open

Inference with cuda #969

hieuhv94 opened this issue Apr 28, 2021 · 4 comments
Assignees

Comments

@hieuhv94
Copy link

hieuhv94 commented Apr 28, 2021

I run inference streaming code in CUDA environment (flashlight and wav2letter with CUDA) but results was the same between cpu and cuda.
And i have a question, how to run inference with CUDA?
Thanks!

@tlikhomanenko
Copy link
Contributor

cc @vineelpratap

@tlikhomanenko tlikhomanenko self-assigned this Apr 28, 2021
@hieuhv94
Copy link
Author

hieuhv94 commented May 6, 2021

Any suggestion for this??

@vineelpratap
Copy link
Contributor

Hey @hieuhv94 , sorry I didn't get the question.

For running inference with CUDA, you can use fl_asr_decode binary which will do beam search decoding with a LM.

If you meant using streaming inference from w2l@anywhere, we currently support only CPU version for now.

@hieuhv94
Copy link
Author

Hey @hieuhv94 , sorry I didn't get the question.

For running inference with CUDA, you can use fl_asr_decode binary which will do beam search decoding with a LM.

If you meant using streaming inference from w2l@anywhere, we currently support only CPU version for now.

Thank for your comment @vineelpratap
Do you have any ideal for streaming inferences with CUDA?
And performance of streaming inference with cuda is better than cpu or not? Because, we must copy multi chunks to vram in streaming progress.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants