Lhotse support for espnet-ez by class inheritance #5772

popcornell · 2024-05-02T21:17:43Z

@Masao-Someki this is what I have in mind based on our discussion.

Is it fine ? I will map internally the dataset to the structure of a HF dataset.
What about multi-channel audio support however ? Will we lose it in this way ?

Also lhotse supports various augmentations that we will lose in this way.
It is the simplest way but maybe not as flexible IMO. Do you have other suggestions ?

for more information, see https://pre-commit.ci

Masao-Someki · 2024-05-03T05:54:39Z

Thank you @popcornell, I think it is fine for just using the lhotze as a dataset. And as for the augmentations, you are right, we cannot take its advantage in this way.

I think we might able to use the following code, which I've added to use the custom dataloader, to fully utilize the lhotze library:

espnet/espnetez/task.py

Lines 105 to 115 in 543f488

    
           def build_iter_factory( 
        
               cls, 
        
               args: argparse.Namespace, 
        
               distributed_option: DistributedOption, 
        
               mode: str, 
        
               kwargs: dict = None, 
        
           ) -> AbsIterFactory: 
        
               if mode == "train" and cls.train_dataloader is not None: 
        
                   return cls.train_dataloader 
        
               elif mode == "valid" and cls.valid_dataloader is not None: 
        
                   return cls.valid_dataloader

We can utilize this feature by passing the dataloader in the definition of Trainer class:

trainer = ez.Trainer(
        task=args.task,
        train_config=finetune_config,
        train_dataloader=<custom dataloader>,
        train_dataloader=<custom dataloader>,
        data_info=data_info,
        output_dir=exp_dir,
        stats_dir=stats_dir,
        ngpu=0,
    )

A problem is that the given dataloader will also be used in the collect_stats stage. This is not preferable, but now I don't have a good solution. Do you have any idea for this point..?

popcornell · 2024-05-03T14:09:22Z

You are right.

It would be neat if we can use lhotse dataloaders as this could also allow for using sharded datasets for large scale training.

Maybe it is fine to feed the augmented data to the collect_stats ?
I mean after all you want to normalize by the mean and variance in the training set.
I think current ESPNet for example at least feeds the data that has been speed augmented no ?

I see that the Trainer takes a data_info argument however.
Why is that for ?

Masao-Someki · 2024-05-04T14:12:15Z

@popcornell

I think current ESPNet for example at least feeds the data that has been speed augmented no ?

You are right, I overlooked this point!! Then I think just feeding the dataloader as above is enough!

I see that the Trainer takes a data_info argument however.
Why is that for ?

My apologies for the confusion, this is not necessary!

popcornell and others added 2 commits May 2, 2024 17:13

lhotse support for espnet-ez by class inheritance

6c4b52a

[pre-commit.ci] auto fixes from pre-commit.com hooks

d9ee911

for more information, see https://pre-commit.ci

Merge branch 'master' into espnetez

b67b954

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lhotse support for espnet-ez by class inheritance #5772

Lhotse support for espnet-ez by class inheritance #5772

popcornell commented May 2, 2024

Masao-Someki commented May 3, 2024

popcornell commented May 3, 2024 •

edited

Masao-Someki commented May 4, 2024

Lhotse support for espnet-ez by class inheritance #5772

Are you sure you want to change the base?

Lhotse support for espnet-ez by class inheritance #5772

Conversation

popcornell commented May 2, 2024

Masao-Someki commented May 3, 2024

popcornell commented May 3, 2024 • edited

Masao-Someki commented May 4, 2024

popcornell commented May 3, 2024 •

edited