Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lhotse support for espnet-ez by class inheritance #5772

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

popcornell
Copy link
Contributor

@Masao-Someki this is what I have in mind based on our discussion.

Is it fine ? I will map internally the dataset to the structure of a HF dataset.
What about multi-channel audio support however ? Will we lose it in this way ?

Also lhotse supports various augmentations that we will lose in this way.
It is the simplest way but maybe not as flexible IMO. Do you have other suggestions ?

@Masao-Someki
Copy link
Contributor

Thank you @popcornell, I think it is fine for just using the lhotze as a dataset. And as for the augmentations, you are right, we cannot take its advantage in this way.

I think we might able to use the following code, which I've added to use the custom dataloader, to fully utilize the lhotze library:

espnet/espnetez/task.py

Lines 105 to 115 in 543f488

def build_iter_factory(
cls,
args: argparse.Namespace,
distributed_option: DistributedOption,
mode: str,
kwargs: dict = None,
) -> AbsIterFactory:
if mode == "train" and cls.train_dataloader is not None:
return cls.train_dataloader
elif mode == "valid" and cls.valid_dataloader is not None:
return cls.valid_dataloader

We can utilize this feature by passing the dataloader in the definition of Trainer class:

trainer = ez.Trainer(
        task=args.task,
        train_config=finetune_config,
        train_dataloader=<custom dataloader>,
        train_dataloader=<custom dataloader>,
        data_info=data_info,
        output_dir=exp_dir,
        stats_dir=stats_dir,
        ngpu=0,
    )

A problem is that the given dataloader will also be used in the collect_stats stage. This is not preferable, but now I don't have a good solution. Do you have any idea for this point..?

@popcornell
Copy link
Contributor Author

popcornell commented May 3, 2024

You are right.

It would be neat if we can use lhotse dataloaders as this could also allow for using sharded datasets for large scale training.

Maybe it is fine to feed the augmented data to the collect_stats ?
I mean after all you want to normalize by the mean and variance in the training set.
I think current ESPNet for example at least feeds the data that has been speed augmented no ?

I see that the Trainer takes a data_info argument however.
Why is that for ?

@Masao-Someki
Copy link
Contributor

@popcornell

I think current ESPNet for example at least feeds the data that has been speed augmented no ?

You are right, I overlooked this point!! Then I think just feeding the dataloader as above is enough!

I see that the Trainer takes a data_info argument however.
Why is that for ?

My apologies for the confusion, this is not necessary!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants