Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minimum Requirement of GPU for Fine Tuning #687

Open
vashiegaran opened this issue Dec 14, 2023 · 4 comments
Open

Minimum Requirement of GPU for Fine Tuning #687

vashiegaran opened this issue Dec 14, 2023 · 4 comments

Comments

@vashiegaran
Copy link

vashiegaran commented Dec 14, 2023

What is the minimum requirement in order to fine tune small model like openlm-research/open_llama_3b and big model like llama2-7b

@research4pan
Copy link
Contributor

research4pan commented Dec 17, 2023

Thanks for your interest for LMFlow! For full fine-tuning of llama2-7b, at least a single GPU of 3090 (GPU memory 24G) is required. Also, a RAM of model-size * 16G is needed for offloading, e.g. 112G RAM for 7b models. The RAM consumption will be halved when tuning a 3b model instead. Hope that answers your question 😄

@xigua314
Copy link

@research4pan Thank you for your work. I would like to ask, if I use the Text2Text data for Finetuning (Full) according to the script, will full finetuning only focus on generating from input to output, or will it also learn about the internal grammar knowledge of the input? Or what dataset or parameter settings should be used to achieve this?

@research4pan
Copy link
Contributor

Thanks for your interest in LMFlow! If you are using text2text, then the input context will not be counted towards the loss, i.e. it only focuses on generating the output, and will not learn how to generate input.

You may use "text2text" (https://optimalscale.github.io/LMFlow/examples/DATASETS.html#text2text) or "conversation" (https://optimalscale.github.io/LMFlow/examples/DATASETS.html#conversation) formats supported in LMFlow (to achieve this. Thanks 😄

@xigua314
Copy link

@research4pan Thank you for the response. I may not have expressed myself clearly. I hope to let the model learn all textual grammar knowledge, not just from input to output. How should I do full finetuning? Additionally, should tokenization use the original model, and are there corresponding parameters that can be modified? For example, if I follow the full finetuning of GPT-2 as in the example, can I use the tokenizer of BERT-Chinese? Lastly, could you please provide me with the latest QR code for the WeChat group? I've tried every QR code, but they are all from last October and have expired. Thank you very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants