Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WER Filtering takes too long? #80

Open
macabdul9 opened this issue Feb 5, 2024 · 3 comments
Open

WER Filtering takes too long? #80

macabdul9 opened this issue Feb 5, 2024 · 3 comments

Comments

@macabdul9
Copy link

Hi @sanchit-gandhi !

Currently, WER filtering takes way too long with 8 workers, and going beyond 8 gives self.pid = os.fork() OSError: [Errno 12] Cannot allocate memory. Also, it doesn't seem to cache filtered data which makes it too hard to run it for large data (up to 1M segments). Is there a way to expedite the filtering process?

@sanchit-gandhi
Copy link
Collaborator

Hey @macabdul9 - do you have a bash file configuration you're using to reproduce this error? It would be super helpful to see what configuration you're using so as to advise more appropriately here

@sanchit-gandhi
Copy link
Collaborator

Generally speaking, you should ensure that the number of workers is less than or equal to the number of CPUs on your device (you can check this with the bash command lscpu).

@macabdul9
Copy link
Author

I have replaced hf evaluate's WER metric with Jiwer's ( which I believe is same) and it fixes the issue. So mostly likely it has something to do with multiprocessing. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants