Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent librosa versions PyTorch/SpeechSynthesis/All and CUDA-Optimized/FastSpeech #1369

Open
xvdp opened this issue Jan 16, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@xvdp
Copy link

xvdp commented Jan 16, 2024

PyTorch/SpeechSynthesis/All and CUDA-Optimized/FastSpeech

librosa is used through all audio projects although only a few functions. requirements files refer to different versions. But not all syntax is coherent with the versions 'required`.

The main change in librosa > 7 is that many of the functions require kwargs, only positional args allowed are typically the data.
e.g. librosa.core.resample(y: 'np.ndarray', *, orig_sr: 'float', target_sr: 'float', .. etc

  1. PyTorch/SpeechSynthesis/ project requirements ask for
  • PyTorch/SpeechSynthesis/Tacotron2/requirements.txt requires librosa
  • PyTorch/SpeechSynthesis/Tacotron2/trtis_cpp/src/trt/requirements.txt librosa==0.7.0
  • PyTorch/SpeechSynthesis/HiFiGAN/requirements.txt librosa==0.9.0
  • PyTorch/SpeechSynthesis/FastPitch/requirements.txt librosa==0.9.0

For consistency they should all require the same version. All but one function - listed below - can run on librosa 10

  1. On the frameworks requiring the newer pytorch, some files use the old syntax.
  • PyTorch/SpeechSynthesis/FastPitch/hifigan/data_function.py line 72 librosa_mel_fn(sampling_rate, n_fft, num_mels, fmin, fmax)
  • PyTorch/SpeechSynthesis/Tacotron2/notebooks/conversationalai/client/speech_ai_demo/utils/jasper/speech_utils.py lines 386 & 389 samples = librosa.core.resample(samples, sample_rate, target_sr) librosa.effects.trim(samples, trim_db)

*CUDA-Optimized/FastSpeech/generate.py uses deprecated librosa.output.write_wav(path, wav, hp.sr) see librosa/librosa#1062

  • CUDA-Optimized/FastSpeech/tacotron2/audio_processing.py line 82 win_sq = librosa_util.pad_center(win_sq, n_fft)

Several of those functions will. It is simple enough to clean the code.

Environment
*Driver Version: 535.129.03
*NVIDIA GeForce RTX 3080

  • github cloned over docker image nvidia/cuda:12.1.0-devel-ubuntu22.04
@xvdp xvdp added the bug Something isn't working label Jan 16, 2024
xvdp added a commit to xvdp/DeepLearningExamples that referenced this issue Jan 16, 2024
… latest librosa.

	modified:   CUDA-Optimized/FastSpeech/fastspeech/dataset/ljspeech_dataset.py
	modified:   CUDA-Optimized/FastSpeech/generate.py
	modified:   CUDA-Optimized/FastSpeech/tacotron2/audio_processing.py
	modified:   CUDA-Optimized/FastSpeech/tacotron2/layers.py
	modified:   Kaldi/SpeechRecognition/notebooks/Kaldi_TRTIS_inference_offline_demo.ipynb
	modified:   Kaldi/SpeechRecognition/notebooks/Kaldi_TRTIS_inference_online_demo.ipynb
	modified:   PyTorch/SpeechRecognition/Jasper/requirements.txt
	modified:   PyTorch/SpeechRecognition/QuartzNet/requirements.txt
	modified:   PyTorch/SpeechRecognition/wav2vec2/requirements.txt
	modified:   PyTorch/SpeechSynthesis/FastPitch/hifigan/data_function.py
	modified:   PyTorch/SpeechSynthesis/FastPitch/requirements.txt
	modified:   PyTorch/SpeechSynthesis/HiFiGAN/requirements.txt
	modified:   PyTorch/SpeechSynthesis/Tacotron2/notebooks/conversationalai/client/speech_ai_demo/utils/jasper/speech_utils.py
	modified:   PyTorch/SpeechSynthesis/Tacotron2/trtis_cpp/src/trt/requirements.txt
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant