Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproducing Experiment Results for Data Augmentation with TriviaQA #1393

Open
gsmoon97 opened this issue Oct 6, 2023 · 0 comments
Open

Reproducing Experiment Results for Data Augmentation with TriviaQA #1393

gsmoon97 opened this issue Oct 6, 2023 · 0 comments

Comments

@gsmoon97
Copy link

gsmoon97 commented Oct 6, 2023

Hi,

I am in the process of reproducing the experiment results presented in the BERT paper. More specifically, I have tried to improve the accuracy of BERT-Large model for SQuAD v1.1 dataset by first fine-tuning on TriviaQA then fine-tuning on SQuAD sequentially. Unfortunately, I was unable to reproduce the same results as presented in the paper. Instead, I saw a decline in accuracy after fine-tuning on TriviaQA, as shown below.

Model Exact Match F1
BERT-Large (SQuAD v1.1 (2 epochs)) 84.06 90.84
BERT-Large(TriviaQA wiki (1 epoch) + SQuAD v1.1 (2 epochs)) 83.53 90.35
BERT-Large(TriviaQA web (1 epoch) + SQuAD v1.1 (2 epochs)) 83.30 90.36

For your reference, I have used SQuAD v1.1 and each of the Wikipedia and Web subsets for TriviaQA. Training hyperparameters are as below.

Batch Size : 12
Learning Rate : 3e-5
Num Training Epochs : 2

Could you help me check if the above method is correct and also provide me some guidance on how I can reproduce the same results as presented in the BERT paper?

Thank you for the great work and I would appreciate any help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant