Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory Allocation Error during alignement (tools/nemo_forced_aligner/align.py) #9039

Open
Ara-Yeroyan opened this issue Apr 25, 2024 · 1 comment
Assignees
Labels
bug Something isn't working

Comments

@Ara-Yeroyan
Copy link

Describe the bug

I am trying to run the tools/nemo_forced_aligner/align.py on a very small toy dataset but get the following error:

return torch._C._nn.pad(input, pad, mode, value)
RuntimeError: [enforce fail at alloc_cpu.cpp:114] data. DefaultCPUAllocator: not enough memory: you tried to allocate 109086779069136 bytes.

The same code works on relatively large dataset but with short audio-clips while in the current dataset I have audios of length
[14, 2, 20, , 2, 16, 19, 2, 14, 2, 7] minutes (batch_size=2)

Steps/Code to reproduce bug

python tools/nemo_forced_aligner/align.py

Params:
model_path=speech_to_text_ctc_bpe__checkpoint.nemo
manifest_filepath=metadata_small.json
output_dir=save_dir

Expected behavior

The alignements stored in save_dir as follows:

ass - directory with corresponding tokens & words folders containing the .ass files
ctm - directory with corresponding tokens, segments & words folders containing the .ctm files

Environment overview

  • Environment location: Local
  • Method of NeMo install: descibed in the NeMo docs

Environment details

If NVIDIA docker image is used you don't need to specify these.
Otherwise, please provide:

  • OS version: Windows 10 Pro (22H2) | OS build - 19045.4291
  • PyTorch version: torch2.2.1+cu121
  • Python version: 3.10.12

Additional context

GPU model: GeForce RTX 3090
CUDA: 12.2

I have modified some torch and pytorch_lightning source codes to make the nemo work on my Windows. But those changes were mostly related to multiprocessing strategy and other parralelization things which should have nothing to do with the given error, as I managed to run the aligner over a big dataset (but many short audios: up to 15 seconds).

@Ara-Yeroyan Ara-Yeroyan added the bug Something isn't working label Apr 25, 2024
@erastorgueva-nv
Copy link
Collaborator

erastorgueva-nv commented May 14, 2024

Hi, could you please show more of the error message/where exactly in the NFA code the error happens?
Does the error also happen with batch_size=1?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants