Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Couldn't find appropriate audio backend to handle URI" when training with WSJ03_mix Sepformer model #2287

Open
Legacy549 opened this issue Dec 2, 2023 · 8 comments
Assignees
Labels
bug Something isn't working

Comments

@Legacy549
Copy link

Describe the bug

I am trying to "fine tune" this separation model using some audio files I have prepared. I have looked at the correct .csv file naming and ordering scheme but it can't read my .wav file correctly.

When running the "test" code for the model (not train) by using the audio provided I got a similar error that went away once I installed pysoundfile, but the issue still persists when training.

Expected behaviour

I am expecting it to run through all the required Epochs and complete the training, I have all the necessary libraries needed and I have ensured that CUDA is up to date and installed.

To Reproduce

sudo python3 train.py hparams/sepformer.yaml --data_folder=/home/karson/honors/Dataset\

The "Dataset" contains the correct folders the model needs to look at.

Environment Details

I am using miniconda to set up my environment and have installed the latest version of speechbrain.

Relevant Log Output

torchvision is not available - cannot save figures
speechbrain.core - Beginning experiment!
speechbrain.core - Experiment folder: results/sepformer/1234
Creating a csv file for a custom dataset
speechbrain.core - Info: auto_mix_prec arg from hparam file is used
speechbrain.core - Info: noprogressbar arg from hparam file is used
speechbrain.core - 25.7M trainable parameters in Separation
speechbrain.utils.checkpoints - Would load a checkpoint here, but none found yet.
speechbrain.utils.epoch_loop - Going into epoch 1
  0%|                                                                                                                                                                             | 0/4 [00:00<?, ?it/s]
speechbrain.core - Exception:
Traceback (most recent call last):
  File "/home/karson/speechbrain/recipes/WSJ0Mix/separation/train.py", line 625, in <module>
    separator.fit(
  File "/home/karson/speechbrain/speechbrain/core.py", line 1366, in fit
    self._fit_train(train_set=train_set, epoch=epoch, enable=enable)
  File "/home/karson/speechbrain/speechbrain/core.py", line 1187, in _fit_train
    for batch in t:
  File "/usr/local/lib/python3.10/dist-packages/tqdm/std.py", line 1182, in __iter__
    for obj in iterable:
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 630, in __next__
    data = self._next_data()
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 1345, in _next_data
    return self._process_data(data)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 1371, in _process_data
    data.reraise()
  File "/usr/local/lib/python3.10/dist-packages/torch/_utils.py", line 694, in reraise
    raise exception
RuntimeError: Caught RuntimeError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop
    data = fetcher.fetch(index)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/fetch.py", line 51, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/karson/speechbrain/speechbrain/dataio/dataset.py", line 167, in __getitem__
    return self.pipeline.compute_outputs({"id": data_id, **data_point})
  File "/home/karson/speechbrain/speechbrain/utils/data_pipeline.py", line 464, in compute_outputs
    return self._compute(data, self._exec_order, self.output_mapping)
  File "/home/karson/speechbrain/speechbrain/utils/data_pipeline.py", line 496, in _compute
    values = item(*args)  # Call the DynamicItem to produce output
  File "/home/karson/speechbrain/speechbrain/utils/data_pipeline.py", line 72, in __call__
    return self.func(*args)
  File "/home/karson/speechbrain/recipes/WSJ0Mix/separation/train.py", line 472, in audio_pipeline_mix
    mix_sig = sb.dataio.dataio.read_audio(mix_wav)
  File "/home/karson/speechbrain/speechbrain/dataio/dataio.py", line 275, in read_audio
    audio, _ = torchaudio.load(waveforms_obj)
  File "/usr/local/lib/python3.10/dist-packages/torchaudio/_backend/utils.py", line 203, in load
    backend = dispatcher(uri, format, backend)
  File "/usr/local/lib/python3.10/dist-packages/torchaudio/_backend/utils.py", line 115, in dispatcher
    raise RuntimeError(f"Couldn't find appropriate backend to handle uri {uri} and format {format}.")
RuntimeError: Couldn't find appropriate backend to handle uri /home/karson/honors/Dataset/train/mixture/Baylor Maile Nathan RDPD.wav and format None.

Additional Context

I am doing a project where I am looking into "fine-tuning" a speech separation model for a professor at my University, I have learned quite a bit about AI, and any help someone could provide would be much appreciated!

@Legacy549 Legacy549 added the bug Something isn't working label Dec 2, 2023
@Legacy549
Copy link
Author

It appears the solution is to install sox_io using: apt-get install sox libsox-dev
This worked for me at least, however I am now getting a new error:

RuntimeError: Sizes of tensors must match except in dimension 3. Expected size 1439552 but got size 1382016 for tensor number 1 in the list.

@asumagic
Copy link
Collaborator

asumagic commented Dec 4, 2023

It appears the solution is to install sox_io using: apt-get install sox libsox-dev This worked for me at least, however I am now getting a new error:

RuntimeError: Sizes of tensors must match except in dimension 3. Expected size 1439552 but got size 1382016 for tensor number 1 in the list.

Can you provide the backtrace to this error?

@Legacy549
Copy link
Author

Thank you for responding! Me and my professor think it might have to do with the "duration" set in my .csv files, we don't know if its in seconds or audio samples, here is the backtrace:

) karson@Le-Ubuntu-Laptop:/speechbrain/recipes/WSJ0Mix/separation$ conda activate sb
(sb) karson@Le-Ubuntu-Laptop:
/speechbrain/recipes/WSJ0Mix/separation$ python3 train.py hparams/sepformer.yaml --data_folder=/home/karson/honors/Dataset
speechbrain.core - Beginning experiment!
speechbrain.core - Experiment folder: results/sepformer/1234
Creating a csv file for a custom dataset
speechbrain.core - Info: auto_mix_prec arg from hparam file is used
speechbrain.core - Info: noprogressbar arg from hparam file is used
speechbrain.core - 25.7M trainable parameters in Separation
speechbrain.utils.checkpoints - Would load a checkpoint here, but none found yet.
speechbrain.utils.epoch_loop - Going into epoch 1
0%| | 0/4 [00:00<?, ?it/s]Mix tensor shape before processing: torch.Size([1, 70680])
0%| | 0/4 [00:00<?, ?it/s]
speechbrain.core - Exception:
Traceback (most recent call last):
File "/home/karson/speechbrain/recipes/WSJ0Mix/separation/train.py", line 625, in
separator.fit(
File "/home/karson/miniconda3/envs/sb/lib/python3.11/site-packages/speechbrain/core.py", line 1366, in fit
self._fit_train(train_set=train_set, epoch=epoch, enable=enable)
File "/home/karson/miniconda3/envs/sb/lib/python3.11/site-packages/speechbrain/core.py", line 1193, in _fit_train
loss = self.fit_batch(batch)
^^^^^^^^^^^^^^^^^^^^^
File "/home/karson/speechbrain/recipes/WSJ0Mix/separation/train.py", line 109, in fit_batch
predictions, targets = self.compute_forward(
^^^^^^^^^^^^^^^^^^^^^
File "/home/karson/speechbrain/recipes/WSJ0Mix/separation/train.py", line 50, in compute_forward
targets = torch.cat(
^^^^^^^^^^
RuntimeError: Sizes of tensors must match except in dimension 2. Expected size 261143 but got size 250706 for tensor number 1 in the list.

@asumagic
Copy link
Collaborator

asumagic commented Dec 6, 2023

I am not too familiar with this task and dataset but it looks like the code is written with the assumption that all segments in a batch are of the same length. I don't really seem to be able to find code that tries to work around this issue so I'm assuming the segments in the original dataset are already of a fixed length...?
I'll mention this issue internally to see if anyone understands the situation better than me.

Either way, as a workaround, I think you could try truncating the different inputs within the batch to the shortest one.

@ycemsubakan
Copy link
Collaborator

ycemsubakan commented Dec 6, 2023

Hey! What is the length of the signal?
Actually, it should work signal of any length. (if too long of course you ll run out of memory)

To answer @asumagic , no, the signals in the WSJ0Mix are of variable length. So I think there is something else going on here.

Does the same signal signal work if you try it with the pretrained sepformer? (You can try that using the model on huggingface )

@Legacy549
Copy link
Author

The same signals work both on Hugging-face and natively using the code snippet provided on hugging-face. Could it be formatting of my .csv files? This is for a project in which we try to improve the model with additional signal made by my group. I appreciate all of yalls help through this.

@ycemsubakan ycemsubakan self-assigned this Dec 7, 2023
@ycemsubakan
Copy link
Collaborator

RuntimeError: Sizes of tensors must match except in dimension 3. Expected size 1439552 but got size 1382016 for tensor number 1 in the list.

Actually, thinking about it, it might be a length issue. We do support variable length signals, but this might be due to the positional embeddings that we are adding. Could you put a breakpoint here, to see if it's here?

if use_positional_encoding:

@asumagic
Copy link
Collaborator

asumagic commented Dec 7, 2023

Also, could you print the shape of the tensors in tensors prior to the failing cat? I am wondering if this could be due to stereo audio.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants