Training Time of Reranker #782

Impavidity · 2024-05-14T04:39:00Z

Thanks for the great work and the open-source models. BTW, I am quite interested in the following questions.

Total time to train the LLMReranker, such as Gemma and MiniCPM, under what kind of hardware.
Max length for the query/passage and batch size when training with LLM Reranker.

Many thanks!

545999961 · 2024-05-14T05:16:26Z

We trained for 4 days on 8 * 40G A100 GPUs. During training, the total length of query plus passage was 1024, and the batch size was 128.

Impavidity · 2024-05-14T08:55:36Z

Thank you for you quick follow up. Sorry I have another question: How many epoch you trained on all m3+fever+quora data? Do you do any downsampling?

545999961 · 2024-05-14T11:45:59Z

Training for 1-2 epochs is enough.

Impavidity · 2024-05-17T15:42:05Z

Thanks for the reply. Sorry I have more questions here.

for long context example (e.g. length > 1k), do we decrease the batch size during the training? If it is, is this done automatically?
During the training, is left padding used or right padding (for padding_side of tokenizer)

545999961 · 2024-05-20T02:19:30Z

Provide feedback