Titanet-L Augmentation #9127

CreativeSelf0 · 2024-05-07T13:03:48Z

Is your feature request related to a problem? Please describe.

I want to train TitaNet with augmentation.
However, I do not know how can I prepare the rir_noise_manifest.json, file for training Titanet-L with rir noise augmentation.

Describe the solution you'd like

Provide a code snippet on how to add augmentation for TitaNet-L training.

CreativeSelf0 · 2024-05-07T14:03:15Z

I added the following to augmentor, and used the following snippet from online augmentation tutorial

rir_data_path = f'{data_dir}/dataset'
!python {NEMO_ROOT}/scripts/dataset_processing/get_openslr_rir_data.py --data_root {rir_data_path}
rir_manifest_path = os.path.join(rir_data_path, 'processed', 'rir.json')
!head -n 3 {rir_manifest_path}

Then to use the augmentation I applied the following

audio_augmentations = dict(
    speed = dict(
        sr=16000,
        prob=0.3,
        resample_type='kaiser_fast',
        min_speed_rate=0.95,
        max_speed_rate=1.05,
    ),
    noise = dict(
        manifest_path=rir_manifest_path,
        prob=0.5,
        min_snr_db=0,
        max_snr_db=15,
    ),
)
finetune_config.model.train_ds.augmentor = audio_augmentations

Am I correct and thanks @okuchaiev

nithinraok · 2024-05-08T17:54:31Z

Yes, code looks fine to me. But for impulse you should use impulse pertubation not noise pertubation.
Sample can be found here:

NeMo/examples/speaker_tasks/recognition/conf/titanet-small.yaml

Line 14 in 6442bb6

augmentor:

CreativeSelf0 · 2024-05-08T18:01:58Z

@nithinraok that's what I thought, However in Titanet-Large they use noise instead of impulse, and it says we are using impulse perturbation. So, does that mean in their training they made an error using RIR corpora for noise instead of pulse perturbation.

NeMo/examples/speaker_tasks/recognition/conf/titanet-large.yaml

Lines 14 to 26 in 6442bb6

    
           augmentor: 
        
             noise: 
        
               manifest_path: null 
        
               prob: 0.5 
        
               min_snr_db: 0 
        
               max_snr_db: 15 
        
             speed: 
        
               prob: 0.3 
        
               sr: *sample_rate 
        
               resample_type: 'kaiser_fast' 
        
               min_speed_rate: 0.95 
        
               max_speed_rate: 1.05

The paper statement:

(just realized you are the first author x.x)
Thank you @nithinraok

nithinraok · 2024-05-08T18:24:36Z

I don;t remember details exactly but as far I remember RIR corpora also has noise samples as well along with impulse responses, and I have not added impulse section to this config file but was added to titanet-small config.

CreativeSelf0 assigned okuchaiev May 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Titanet-L Augmentation #9127

Titanet-L Augmentation #9127

CreativeSelf0 commented May 7, 2024

CreativeSelf0 commented May 7, 2024

nithinraok commented May 8, 2024

CreativeSelf0 commented May 8, 2024 •

edited

nithinraok commented May 8, 2024

Titanet-L Augmentation #9127

Titanet-L Augmentation #9127

Comments

CreativeSelf0 commented May 7, 2024

CreativeSelf0 commented May 7, 2024

nithinraok commented May 8, 2024

CreativeSelf0 commented May 8, 2024 • edited

nithinraok commented May 8, 2024

CreativeSelf0 commented May 8, 2024 •

edited