Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replication of Instructor #42

Open
aamir-s18 opened this issue May 28, 2023 · 15 comments
Open

Replication of Instructor #42

aamir-s18 opened this issue May 28, 2023 · 15 comments

Comments

@aamir-s18
Copy link

Hey, we are currently trying to replicate the Instructor model. Issue #14 already asks this, but please report the exact training setup for the models.

Also, I am interested in the loss of your model. I didn't get your reported results by running the model for 100k steps. It could be more evident to me how you used just 40k steps for the model while you mentioned in your paper that you trained it on the MEDI dataset.

I would appreciate your help here :)

@hongjin-su
Copy link
Collaborator

Hi, Thanks a lot for your interest in the INSTRUCTOR model!

As the MEDI dataset contains a large volume of data, there is no need to complete training on all of them. In fact, as some sources in MEDI may contain similar data, there may be overfitting problem if the training goes up to 100k steps.

For your reference, we use the following command in the training:

python train.py --model_name_or_path sentence-transformers/gtr-t5-large --output_dir {output_directory} --cache_dir {cache_directory} --max_source_length 512 --num_train_epochs 10 --save_steps 500 --cl_temperature 0.01 --warmup_ratio 0.1 --learning_rate 2e-5 --overwrite_output_dir

Feel free to add any further questions or comments!

@aamir-s18
Copy link
Author

Hey,

But for your published model, what data exactly did you train it?

Also, loss and batch size are missing (in your report). If you say 40k steps, for example, the size of the samples differs a lot based on the batch size. It would be great if you could report the exact training setup to replicate and verify your work.

Thanks!

@hongjin-su
Copy link
Collaborator

Hi, we train the model on the MEDI data, which you can download from https://drive.google.com/file/d/1vZ5c2oJNonGOvXzppNg5mHz24O6jcc52/view?usp=sharing. In our setting, we only use the batch size 4

@aamir-s18
Copy link
Author

Hey,

could you please report the loss as well. So it means that you only train it on 4 * 40k data samples of the MEDI dataset and for 1 epoch?

@hongjin-su
Copy link
Collaborator

Hi,

  1. The loss of the model is in general between .4 and .5 for all the three models.
  2. Yes, we provide abundant resources in the MEDI data, and some of them may be similar. Therefore, there is no need to finish training the model on all the data.

@yangjianxin1
Copy link

the batch size of 4 is very small for contrastive learning, maybe it should be larger, such as 32 or 64?

@hongjin-su
Copy link
Collaborator

Yes, the model is probably better with a larger training batch size. However, due to the limit of the machine, we may leave the further scaling to future work!

@iavinasoss
Copy link

iavinasoss commented Jun 28, 2023

Hi, we train the model on the MEDI data, which you can download from https://drive.google.com/file/d/1vZ5c2oJNonGOvXzppNg5mHz24O6jcc52/view?usp=sharing. In our setting, we only use the batch size 4

Hey, i had a small question.

Where can we change the batch_size ? I can't find any argument for it.

Thanks

@hongjin-su
Copy link
Collaborator

Hi, you may change the batch size via the argument per_device_train_batch_size.

@iavinasoss
Copy link

Got it, Thank you for the help.

@YihanWang617
Copy link

YihanWang617 commented Jul 19, 2023

Hi, I am also trying to replicate your work. May I know how many GPUs do you use in training?

@hongjin-su
Copy link
Collaborator

Hi, we use only a single GPU in the training.

@EliverQ
Copy link

EliverQ commented Jul 24, 2023

Hey, we are currently trying to replicate the Instructor model. Issue #14 already asks this, but please report the exact training setup for the models.

Also, I am interested in the loss of your model. I didn't get your reported results by running the model for 100k steps. It could be more evident to me how you used just 40k steps for the model while you mentioned in your paper that you trained it on the MEDI dataset.

I would appreciate your help here :)

Hey! I also encountered issues with reproducing the results. Have you successfully replicated the INSTRUCTOR's performance? Even though I used the exact same settings, I couldn't achieve success. If you have succeeded, could you please give me some advice? Thank you very much.

@aamir-s18
Copy link
Author

@EliverQ could you hit me up through email aamir.shakir [at] epfl.ch

@YihanWang617
Copy link

YihanWang617 commented Aug 27, 2023

Hi, I have the same issue and cannot replica the results reported on the paper. Could the authors provide the exact training commands of the checkpoints released?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants