What is the training config? #6

mkw18 · 2022-12-27T08:35:05Z

Hello, thanks for your work! I want to try to implement this work myself, but I cann't achieve the high performance by xP3 and mT0-xxl as shown in the paper Crosslingual Generalization through Multitask Finetuning. I wonder the training details of this work, how many steps do you train the model, and what is your lr-decay-ratio? Could I get the config file to implement your result? Thank you very much!

Muennighoff · 2022-12-27T19:24:43Z

I've reached out to @adarob, who knows those details & has the config files - Will let you know if we can release them!

adarob · 2022-12-27T20:01:49Z

It's just the default T5X finetune configuration (https://github.com/google-research/t5x/blob/main/t5x/configs/runs/finetune.gin) with the following overrides:

BATCH_SIZE = 512
LOSS_NORMALIZING_FACTOR = 'AVERAGE_PER_SEQUENCE'
TASK_FEATURE_LENGTHS = {'inputs': 1024, 'targets': 1024}
train/utils.DatasetConfig.pack = False

Of course, you'd also need to set up the mixture if you're using SeqIO.

We only trained for 30k steps and picked the best checkpoint, which I believe was around 7k.

mkw18 · 2022-12-28T07:59:29Z

Thanks a lot! I will try this.

Muennighoff mentioned this issue May 16, 2023

mT0-xxl finetuning #19

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What is the training config? #6

What is the training config? #6

mkw18 commented Dec 27, 2022

Muennighoff commented Dec 27, 2022

adarob commented Dec 27, 2022 •

edited

mkw18 commented Dec 28, 2022

What is the training config? #6

What is the training config? #6

Comments

mkw18 commented Dec 27, 2022

Muennighoff commented Dec 27, 2022

adarob commented Dec 27, 2022 • edited

mkw18 commented Dec 28, 2022

adarob commented Dec 27, 2022 •

edited