Inconsistent librosa versions PyTorch/SpeechSynthesis/All and CUDA-Optimized/FastSpeech #1369

xvdp · 2024-01-16T16:25:16Z

PyTorch/SpeechSynthesis/All and CUDA-Optimized/FastSpeech

librosa is used through all audio projects although only a few functions. requirements files refer to different versions. But not all syntax is coherent with the versions 'required`.

The main change in librosa > 7 is that many of the functions require kwargs, only positional args allowed are typically the data.
e.g. librosa.core.resample(y: 'np.ndarray', *, orig_sr: 'float', target_sr: 'float', .. etc

PyTorch/SpeechSynthesis/ project requirements ask for

PyTorch/SpeechSynthesis/Tacotron2/requirements.txt requires librosa
PyTorch/SpeechSynthesis/Tacotron2/trtis_cpp/src/trt/requirements.txt librosa==0.7.0
PyTorch/SpeechSynthesis/HiFiGAN/requirements.txt librosa==0.9.0
PyTorch/SpeechSynthesis/FastPitch/requirements.txt librosa==0.9.0

For consistency they should all require the same version. All but one function - listed below - can run on librosa 10

On the frameworks requiring the newer pytorch, some files use the old syntax.

PyTorch/SpeechSynthesis/FastPitch/hifigan/data_function.py line 72 librosa_mel_fn(sampling_rate, n_fft, num_mels, fmin, fmax)
PyTorch/SpeechSynthesis/Tacotron2/notebooks/conversationalai/client/speech_ai_demo/utils/jasper/speech_utils.py lines 386 & 389 samples = librosa.core.resample(samples, sample_rate, target_sr) librosa.effects.trim(samples, trim_db)

*CUDA-Optimized/FastSpeech/generate.py uses deprecated librosa.output.write_wav(path, wav, hp.sr) see librosa/librosa#1062

CUDA-Optimized/FastSpeech/tacotron2/audio_processing.py line 82 win_sq = librosa_util.pad_center(win_sq, n_fft)

Several of those functions will. It is simple enough to clean the code.

Environment
*Driver Version: 535.129.03
*NVIDIA GeForce RTX 3080

github cloned over docker image nvidia/cuda:12.1.0-devel-ubuntu22.04

The text was updated successfully, but these errors were encountered:

… latest librosa. modified: CUDA-Optimized/FastSpeech/fastspeech/dataset/ljspeech_dataset.py modified: CUDA-Optimized/FastSpeech/generate.py modified: CUDA-Optimized/FastSpeech/tacotron2/audio_processing.py modified: CUDA-Optimized/FastSpeech/tacotron2/layers.py modified: Kaldi/SpeechRecognition/notebooks/Kaldi_TRTIS_inference_offline_demo.ipynb modified: Kaldi/SpeechRecognition/notebooks/Kaldi_TRTIS_inference_online_demo.ipynb modified: PyTorch/SpeechRecognition/Jasper/requirements.txt modified: PyTorch/SpeechRecognition/QuartzNet/requirements.txt modified: PyTorch/SpeechRecognition/wav2vec2/requirements.txt modified: PyTorch/SpeechSynthesis/FastPitch/hifigan/data_function.py modified: PyTorch/SpeechSynthesis/FastPitch/requirements.txt modified: PyTorch/SpeechSynthesis/HiFiGAN/requirements.txt modified: PyTorch/SpeechSynthesis/Tacotron2/notebooks/conversationalai/client/speech_ai_demo/utils/jasper/speech_utils.py modified: PyTorch/SpeechSynthesis/Tacotron2/trtis_cpp/src/trt/requirements.txt

xvdp added the bug Something isn't working label Jan 16, 2024

xvdp mentioned this issue Jan 16, 2024

https://github.com/NVIDIA/DeepLearningExamples/issues/1369 Updated De… #1370

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent librosa versions PyTorch/SpeechSynthesis/All and CUDA-Optimized/FastSpeech #1369

Inconsistent librosa versions PyTorch/SpeechSynthesis/All and CUDA-Optimized/FastSpeech #1369

xvdp commented Jan 16, 2024

Inconsistent librosa versions PyTorch/SpeechSynthesis/All and CUDA-Optimized/FastSpeech #1369

Inconsistent librosa versions PyTorch/SpeechSynthesis/All and CUDA-Optimized/FastSpeech #1369

Comments

xvdp commented Jan 16, 2024