Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added attention layers to wrap for fsdp #30735

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

alvations
Copy link

@alvations alvations commented May 9, 2024

What does this PR do?

This PR defines the fsdp_transformer_layer_cls_to_wrap value in the Mistral config. This way user can easily load the config to figure out what values to use for FSDP, e.g.

from transformers import AutoConfig
c = AutoConfig.from_pretrained("mistralai/Mistral-7B-v0.1")
c.fsdp_transformer_layer_cls_to_wrap

[out]:

MistralDecoderLayer

Context: Users has been asking when and which layer to wrap, there shouldn't be a need to load the model to figure it out by going through the state_dict of model summary,

Fixes: https://discuss.huggingface.co/t/accelerate-fsdp-config-prompts/21262/3

Currently, this information is also available through https://github.com/huggingface/transformers/blob/main/src/transformers/models/mistral/modeling_mistral.py#L809

Who can review?

Models:

Integrations:

Documentation: @stevhliu and @MKhalusova

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!
It would make sense to always have _no_split_module == fsdp_transformers_layre_cls_to_wrap by default no?
This way all models that properly define one (usually we define the no_split_module) can benefit from this?

@pacman100 what's your take on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants