Reproducing Experiment Results for Data Augmentation with TriviaQA #1393

gsmoon97 · 2023-10-06T13:23:26Z

Hi,

I am in the process of reproducing the experiment results presented in the BERT paper. More specifically, I have tried to improve the accuracy of BERT-Large model for SQuAD v1.1 dataset by first fine-tuning on TriviaQA then fine-tuning on SQuAD sequentially. Unfortunately, I was unable to reproduce the same results as presented in the paper. Instead, I saw a decline in accuracy after fine-tuning on TriviaQA, as shown below.

Model	Exact Match	F1
BERT-Large (SQuAD v1.1 (2 epochs))	84.06	90.84
BERT-Large(TriviaQA wiki (1 epoch) + SQuAD v1.1 (2 epochs))	83.53	90.35
BERT-Large(TriviaQA web (1 epoch) + SQuAD v1.1 (2 epochs))	83.30	90.36

For your reference, I have used SQuAD v1.1 and each of the Wikipedia and Web subsets for TriviaQA. Training hyperparameters are as below.

Batch Size : 12
Learning Rate : 3e-5
Num Training Epochs : 2

Could you help me check if the above method is correct and also provide me some guidance on how I can reproduce the same results as presented in the BERT paper?

Thank you for the great work and I would appreciate any help.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reproducing Experiment Results for Data Augmentation with TriviaQA #1393

Reproducing Experiment Results for Data Augmentation with TriviaQA #1393

gsmoon97 commented Oct 6, 2023 •

edited

Reproducing Experiment Results for Data Augmentation with TriviaQA #1393

Reproducing Experiment Results for Data Augmentation with TriviaQA #1393

Comments

gsmoon97 commented Oct 6, 2023 • edited

gsmoon97 commented Oct 6, 2023 •

edited