You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, thanks so much for acknowledging our BAdam optimizer (https://github.com/Ledzy/BAdam). Here's a brief overview of its features:
Memory Efficiency: It is memory efficient full parameter finetuning method. We tune Llama 2-8B and Llama 3-8B using a single RTX3090 with BAdam; see our Github page https://github.com/Ledzy/BAdam for detailed performance metrics.
Simplicity in Hyperparameters: BAdam introduces only one additional hyperparameter, and it can be adaptively set; see the "Hyperparameter Suggestion" section in our Github page: https://github.com/Ledzy/BAdam#hyperparameter-suggestion.
Time Efficiency: Compared to LoRA and traditional Adam, BAdam cuts the actual backward time by half after the same number of epochs, thanks to the Chain rule property.
Rapid Convergence: The algorithm converges very fast, and we observe that only epoch is often enough for instruction tuning (e.g., tuning llama 3-8B using Alpaca-GPT4 dataset).
So far, Llama-Factory has integrated our method. We believe BAdam offers significant advancements for the LLMs community, and we are eager to support its integration into the Hugging Face Transformers library. Should you find these features compelling, we would be delighted to assist in its implementation.
Feature request
https://arxiv.org/pdf/2404.02827.pdf
Memory efficient optimizer like galore but with less hyperparameters
Motivation
hiyouga/LLaMA-Factory#3287
LlamaFactory supports it, but having it in transformers is much more convenient
Your contribution
The text was updated successfully, but these errors were encountered: