Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Badam support #30308

Open
Theodotus1243 opened this issue Apr 18, 2024 · 2 comments · May be fixed by #30692
Open

Badam support #30308

Theodotus1243 opened this issue Apr 18, 2024 · 2 comments · May be fixed by #30692
Labels
Feature request Request for a new feature optimization

Comments

@Theodotus1243
Copy link

Feature request

https://arxiv.org/pdf/2404.02827.pdf

Memory efficient optimizer like galore but with less hyperparameters

Motivation

hiyouga/LLaMA-Factory#3287

LlamaFactory supports it, but having it in transformers is much more convenient

Your contribution

@amyeroberts amyeroberts added Feature request Request for a new feature optimization labels Apr 18, 2024
@amyeroberts
Copy link
Collaborator

cc @younesbelkada

@xiao-li-hub
Copy link

@Theodotus1243 @amyeroberts @younesbelkada

Hi, thanks so much for acknowledging our BAdam optimizer (https://github.com/Ledzy/BAdam). Here's a brief overview of its features:

  1. Memory Efficiency: It is memory efficient full parameter finetuning method. We tune Llama 2-8B and Llama 3-8B using a single RTX3090 with BAdam; see our Github page https://github.com/Ledzy/BAdam for detailed performance metrics.

  2. Simplicity in Hyperparameters: BAdam introduces only one additional hyperparameter, and it can be adaptively set; see the "Hyperparameter Suggestion" section in our Github page: https://github.com/Ledzy/BAdam#hyperparameter-suggestion.

  3. Time Efficiency: Compared to LoRA and traditional Adam, BAdam cuts the actual backward time by half after the same number of epochs, thanks to the Chain rule property.

  4. Rapid Convergence: The algorithm converges very fast, and we observe that only epoch is often enough for instruction tuning (e.g., tuning llama 3-8B using Alpaca-GPT4 dataset).

So far, Llama-Factory has integrated our method. We believe BAdam offers significant advancements for the LLMs community, and we are eager to support its integration into the Hugging Face Transformers library. Should you find these features compelling, we would be delighted to assist in its implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature request Request for a new feature optimization
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants