fine-tune HKUNLP/instructor-embedding #74

Atlantic8 · 2023-08-02T02:42:05Z

can we fine-tune using train.py based on the released model hkunlp/instructor-xl? If yes, could you please show me the shell script for training? thanks

Atlantic8 · 2023-08-02T08:29:02Z

I only have training data with format: sentence1, sentence2, label
so I cannot construct training data with format: query=xxx, pos=[], neg=[]

Atlantic8 · 2023-08-03T06:09:19Z

Also, when I trying to train using train.py, with "--fp16 True --gradient_accumulation_steps 3", I got out of GPU memory. I was using A100 40G. why training this model takes this much GPU memory. could you tell me the GPU hardware you used to train this model?

Atlantic8 · 2023-08-03T06:10:16Z

btw, this model can be trained only when per_device_train_batch_size is set to 2

ashokrajab · 2023-08-11T08:39:28Z

could you tell me the GPU hardware you used to train this model?

@Atlantic8 , this is an excerpt from the paper:

We use the maximum batch size that fits the machine memory and run all our experiments on 40GB A100 GPUs.

taziksh · 2023-10-24T01:09:00Z

btw, this model can be trained only when per_device_train_batch_size is set to 2

What's your source for this? @Atlantic8

hongjin-su · 2023-12-19T12:03:38Z

Hi, Thanks a lot for your interest in the INSTRUCTOR!

As the INSTRUCTOR model follows the same architecture as GTR models, the same training script should be applicable.
If you have only paired sentences (I assume that they are positive pairs, e.g., question and answer), then using random negatives is probably the easiest way to construct the training data.
For the xl model, the maximum length, gradient accumulation steps and batch size should depend on your machines.

Hope this helps!

EricPaul03 · 2024-05-06T08:21:18Z

Hi, Thanks a lot for your interest in the INSTRUCTOR!

As the INSTRUCTOR model follows the same architecture as GTR models, the same training script should be applicable.

If you have only paired sentences (I assume that they are positive pairs, e.g., question and answer), then using random negatives is probably the easiest way to construct the training data.

For the xl model, the maximum length, gradient accumulation steps and batch size should depend on your machines.

Hope this helps!

So for custom data, do we need to randomly construct a data format like query=xxx, pos=[], neg=[] before running?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fine-tune HKUNLP/instructor-embedding #74

fine-tune HKUNLP/instructor-embedding #74

Atlantic8 commented Aug 2, 2023

Atlantic8 commented Aug 2, 2023

Atlantic8 commented Aug 3, 2023

Atlantic8 commented Aug 3, 2023

ashokrajab commented Aug 11, 2023

taziksh commented Oct 24, 2023 •

edited

hongjin-su commented Dec 19, 2023

EricPaul03 commented May 6, 2024

fine-tune HKUNLP/instructor-embedding #74

fine-tune HKUNLP/instructor-embedding #74

Comments

Atlantic8 commented Aug 2, 2023

Atlantic8 commented Aug 2, 2023

Atlantic8 commented Aug 3, 2023

Atlantic8 commented Aug 3, 2023

ashokrajab commented Aug 11, 2023

taziksh commented Oct 24, 2023 • edited

hongjin-su commented Dec 19, 2023

EricPaul03 commented May 6, 2024

taziksh commented Oct 24, 2023 •

edited