Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"FileNotFoundError: KenLM binary file not found at : None" thrown when decoding without N-gram LM #9067

Open
aklemen opened this issue Apr 30, 2024 · 4 comments
Assignees
Labels
bug Something isn't working

Comments

@aklemen
Copy link

aklemen commented Apr 30, 2024

Describe the bug

I am trying to use an external LLM to rescore the results of beam search from Conformer-CTC model.

When trying to get the beam search results with the eval_beamsearch_ngram_ctc.py without passing the N-gram LM, I get the following error:

Traceback (most recent call last):
  File "/content/NeMo/scripts/asr_language_modeling/ngram_lm/eval_beamsearch_ngram_ctc.py", line 415, in main
    candidate_wer, candidate_cer = beam_search_eval(
  File "/content/NeMo/scripts/asr_language_modeling/ngram_lm/eval_beamsearch_ngram_ctc.py", line 196, in beam_search_eval
    _, beams_batch = decoding.ctc_decoder_predictions_tensor(
  File "/usr/local/lib/python3.10/dist-packages/nemo/collections/asr/parts/submodules/ctc_decoding.py", line 319, in ctc_decoder_predictions_tensor
    hypotheses_list = self.decoding(
  File "/usr/local/lib/python3.10/dist-packages/nemo/collections/asr/parts/submodules/ctc_beam_decoding.py", line 166, in __call__
    return self.forward(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/nemo/core/classes/common.py", line 1098, in __call__
    outputs = wrapped(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/nemo/collections/asr/parts/submodules/ctc_beam_decoding.py", line 280, in forward
    hypotheses = self.search_algorithm(prediction_tensor, out_len)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/nemo/collections/asr/parts/submodules/ctc_beam_decoding.py", line 314, in default_beam_search
    raise FileNotFoundError(
FileNotFoundError: KenLM binary file not found at : None. Please set a valid path in the decoding config.

Steps/Code to reproduce bug

  1. Install decoders.
NEMO_PATH=<insert absolute path to NeMo directory>
cd $NEMO_PATH && bash scripts/asr_language_modeling/ngram_lm/install_beamsearch_decoders.sh $NEMO_PATH
  1. Run the beam search with the following config:
python3 $NEMO_PATH/scripts/asr_language_modeling/ngram_lm/eval_beamsearch_ngram_ctc.py \
    nemo_model_file="<nemo CTC ASR model, e.g. stt_en_conformer_ctc_medium.nemo>" \
    input_manifest="<manifest json file>" \
    preds_output_folder="<output directory>" \
    decoding_mode=beamsearch \
    decoding_strategy="beam"

Expected behavior

I would expect the error to not be thrown as BeamSearchDecoderWithLM actually handles the case when the path to N-gram LM is not passed:

        # from  nemo/collections/asr/modules/beam_search_decoder.py
        if lm_path is not None:
            self.scorer = Scorer(alpha, beta, model_path=lm_path, vocabulary=vocab)
        else:
            self.scorer = None

When I removed the check for the KenLM file path from nemo/collections/asr/parts/submodules/ctc_beam_decoding.py, it worked:

            # Check for filepath
            if self.kenlm_path is None or not os.path.exists(self.kenlm_path):
                raise FileNotFoundError(
                    f"KenLM binary file not found at : {self.kenlm_path}. "
                    f"Please set a valid path in the decoding config."
                )

Environment overview

  • Environment location: Google Colab
  • Method of NeMo install: python -m pip install git+https://github.com/NVIDIA/NeMo.git@v1.23.0#egg=nemo_toolkit[all]

Environment details

  • OS version: Ubuntu 22.04.4 LTS
  • PyTorch version: 2.2.1+cu121
  • Python version: 3.10

Additional context

GPU: T4

@aklemen aklemen added the bug Something isn't working label Apr 30, 2024
@nithinraok
Copy link
Collaborator

Update: We observed couple of code changes required with this script due to recent updates during the model and transcription refactoring. @karpov-nick is working to provide a fix for this.

@karpnv
Copy link
Collaborator

karpnv commented May 17, 2024

There is a work in progress in the PR #8428

@aklemen
Copy link
Author

aklemen commented May 18, 2024

Thank you both!

@karpnv
Copy link
Collaborator

karpnv commented May 20, 2024

You can try decoding without N-gram at the branch karpnv/beamsearch with parameters

python3 ./scripts/asr_language_modeling/ngram_lm/eval_beamsearch_ngram_ctc.py \
model_path=./am_model.nemo  \
dataset_manifest=./manifest.json  \
preds_output_folder=/tmp   \
ctc_decoding.strategy=flashlight \
ctc_decoding.beam.nemo_kenlm_path="" \
ctc_decoding.beam.beam_size=[4]   \
ctc_decoding.beam.beam_beta=[0.5] 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants