About training details #64

mireiffe · 2023-11-06T08:24:48Z

Hello, I really appreciate for your wonderful works. I have a question about training details.

In the issue #17 (comment), you said

STEP1: For encoding-side alignment, 3x GPUs, batch size 18, takes ≈30 mins on 40k instances of text-X pairs.
STEP2: For decoding-side alignment, 3x GPUs, batch size 18, takes ≈3 hours on 180k instances of text-X pairs.
STEP3: For instruction tuning, 4x GPUs, batch size 4, takes ≈5 hours on 250k instances.

but I claim that the number of instances in STEP1 are just 4k not 40k. In my personal experiments, estimated time and # of instances are fairly aligned with specs in your answer, ONLY in cases STEP2 and STEP3. For STEP1, on the other hand, I spent $\approx$ 10 times more than yours until training 40k instances.

Could you please double check the number?

Thank you.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About training details #64

About training details #64

mireiffe commented Nov 6, 2023 •

edited

About training details #64

About training details #64

Comments

mireiffe commented Nov 6, 2023 • edited

mireiffe commented Nov 6, 2023 •

edited