You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When fine-tuning Mistral with LoRA, do you think FlashAttention2 helps in speeding up the process? If yes, how significant is the acceleration? Where is the primary acceleration achieved?
The text was updated successfully, but these errors were encountered:
Thanks for your interest in LMFlow! Theoretically I think it helps, since flash attention improves the cache-friendliness for attention operations and should also help the forward of the freezed model for lora. However, we haven't done empirical tests on this matter, which is indeed an interesting topic 😄
When fine-tuning Mistral with LoRA, do you think FlashAttention2 helps in speeding up the process? If yes, how significant is the acceleration? Where is the primary acceleration achieved?
The text was updated successfully, but these errors were encountered: