-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replication of Instructor #42
Comments
Hi, Thanks a lot for your interest in the INSTRUCTOR model! As the MEDI dataset contains a large volume of data, there is no need to complete training on all of them. In fact, as some sources in MEDI may contain similar data, there may be overfitting problem if the training goes up to 100k steps. For your reference, we use the following command in the training:
Feel free to add any further questions or comments! |
Hey, But for your published model, what data exactly did you train it? Also, loss and batch size are missing (in your report). If you say 40k steps, for example, the size of the samples differs a lot based on the batch size. It would be great if you could report the exact training setup to replicate and verify your work. Thanks! |
Hi, we train the model on the MEDI data, which you can download from https://drive.google.com/file/d/1vZ5c2oJNonGOvXzppNg5mHz24O6jcc52/view?usp=sharing. In our setting, we only use the batch size 4 |
Hey, could you please report the loss as well. So it means that you only train it on 4 * 40k data samples of the MEDI dataset and for 1 epoch? |
Hi,
|
the batch size of 4 is very small for contrastive learning, maybe it should be larger, such as 32 or 64? |
Yes, the model is probably better with a larger training batch size. However, due to the limit of the machine, we may leave the further scaling to future work! |
Hey, i had a small question. Where can we change the batch_size ? I can't find any argument for it. Thanks |
Hi, you may change the batch size via the argument |
Got it, Thank you for the help. |
Hi, I am also trying to replicate your work. May I know how many GPUs do you use in training? |
Hi, we use only a single GPU in the training. |
Hey! I also encountered issues with reproducing the results. Have you successfully replicated the INSTRUCTOR's performance? Even though I used the exact same settings, I couldn't achieve success. If you have succeeded, could you please give me some advice? Thank you very much. |
@EliverQ could you hit me up through email aamir.shakir [at] epfl.ch |
Hi, I have the same issue and cannot replica the results reported on the paper. Could the authors provide the exact training commands of the checkpoints released? |
Hey, we are currently trying to replicate the Instructor model. Issue #14 already asks this, but please report the exact training setup for the models.
Also, I am interested in the loss of your model. I didn't get your reported results by running the model for 100k steps. It could be more evident to me how you used just 40k steps for the model while you mentioned in your paper that you trained it on the MEDI dataset.
I would appreciate your help here :)
The text was updated successfully, but these errors were encountered: