Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About training details #64

Open
mireiffe opened this issue Nov 6, 2023 · 0 comments
Open

About training details #64

mireiffe opened this issue Nov 6, 2023 · 0 comments

Comments

@mireiffe
Copy link

mireiffe commented Nov 6, 2023

Hello, I really appreciate for your wonderful works. I have a question about training details.

In the issue #17 (comment), you said

STEP1: For encoding-side alignment, 3x GPUs, batch size 18, takes ≈30 mins on 40k instances of text-X pairs.
STEP2: For decoding-side alignment, 3x GPUs, batch size 18, takes ≈3 hours on 180k instances of text-X pairs.
STEP3: For instruction tuning, 4x GPUs, batch size 4, takes ≈5 hours on 250k instances.

but I claim that the number of instances in STEP1 are just 4k not 40k. In my personal experiments, estimated time and # of instances are fairly aligned with specs in your answer, ONLY in cases STEP2 and STEP3. For STEP1, on the other hand, I spent $\approx$ 10 times more than yours until training 40k instances.

Could you please double check the number?

Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant