You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
STEP1: For encoding-side alignment, 3x GPUs, batch size 18, takes ≈30 mins on 40k instances of text-X pairs.
STEP2: For decoding-side alignment, 3x GPUs, batch size 18, takes ≈3 hours on 180k instances of text-X pairs.
STEP3: For instruction tuning, 4x GPUs, batch size 4, takes ≈5 hours on 250k instances.
but I claim that the number of instances in STEP1 are just 4k not 40k. In my personal experiments, estimated time and # of instances are fairly aligned with specs in your answer, ONLY in cases STEP2 and STEP3. For STEP1, on the other hand, I spent $\approx$ 10 times more than yours until training 40k instances.
Could you please double check the number?
Thank you.
The text was updated successfully, but these errors were encountered:
Hello, I really appreciate for your wonderful works. I have a question about training details.
In the issue #17 (comment), you said
STEP1: For encoding-side alignment, 3x GPUs, batch size 18, takes ≈30 mins on 40k instances of text-X pairs.
STEP2: For decoding-side alignment, 3x GPUs, batch size 18, takes ≈3 hours on 180k instances of text-X pairs.
STEP3: For instruction tuning, 4x GPUs, batch size 4, takes ≈5 hours on 250k instances.
but I claim that the number of instances in STEP1 are just 4k not 40k. In my personal experiments, estimated time and # of instances are fairly aligned with specs in your answer, ONLY in cases STEP2 and STEP3. For STEP1, on the other hand, I spent$\approx$ 10 times more than yours until training 40k instances.
Could you please double check the number?
Thank you.
The text was updated successfully, but these errors were encountered: