WER Filtering takes too long? #80

macabdul9 · 2024-02-05T07:03:54Z

Currently, WER filtering takes way too long with 8 workers, and going beyond 8 gives self.pid = os.fork() OSError: [Errno 12] Cannot allocate memory. Also, it doesn't seem to cache filtered data which makes it too hard to run it for large data (up to 1M segments). Is there a way to expedite the filtering process?

The text was updated successfully, but these errors were encountered:

sanchit-gandhi · 2024-02-05T11:54:20Z

Hey @macabdul9 - do you have a bash file configuration you're using to reproduce this error? It would be super helpful to see what configuration you're using so as to advise more appropriately here

sanchit-gandhi · 2024-03-28T17:39:18Z

Generally speaking, you should ensure that the number of workers is less than or equal to the number of CPUs on your device (you can check this with the bash command lscpu).

macabdul9 · 2024-03-28T18:38:50Z

I have replaced hf evaluate's WER metric with Jiwer's ( which I believe is same) and it fixes the issue. So mostly likely it has something to do with multiprocessing. Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WER Filtering takes too long? #80

WER Filtering takes too long? #80

macabdul9 commented Feb 5, 2024

sanchit-gandhi commented Feb 5, 2024

sanchit-gandhi commented Mar 28, 2024

macabdul9 commented Mar 28, 2024

WER Filtering takes too long? #80

WER Filtering takes too long? #80

Comments

macabdul9 commented Feb 5, 2024

sanchit-gandhi commented Feb 5, 2024

sanchit-gandhi commented Mar 28, 2024

macabdul9 commented Mar 28, 2024