Added attention layers to wrap for fsdp #30735

alvations · 2024-05-09T23:36:40Z

What does this PR do?

This PR defines the fsdp_transformer_layer_cls_to_wrap value in the Mistral config. This way user can easily load the config to figure out what values to use for FSDP, e.g.

from transformers import AutoConfig
c = AutoConfig.from_pretrained("mistralai/Mistral-7B-v0.1")
c.fsdp_transformer_layer_cls_to_wrap

[out]:

MistralDecoderLayer

Context: Users has been asking when and which layer to wrap, there shouldn't be a need to load the model to figure it out by going through the state_dict of model summary,

Fixes: https://discuss.huggingface.co/t/accelerate-fsdp-config-prompts/21262/3

Currently, this information is also available through https://github.com/huggingface/transformers/blob/main/src/transformers/models/mistral/modeling_mistral.py#L809

Who can review?

Models:

text models: @ArthurZucker and @younesbelkada

Integrations:

deepspeed: HF Trainer/Accelerate: @pacman100

Documentation: @stevhliu and @MKhalusova

HuggingFaceDocBuilderDev · 2024-05-10T00:01:59Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ArthurZucker

Thanks for the PR!
It would make sense to always have _no_split_module == fsdp_transformers_layre_cls_to_wrap by default no?
This way all models that properly define one (usually we define the no_split_module) can benefit from this?

@pacman100 what's your take on this?

Added attention layers to wrap for fsdp

638c366

ArthurZucker reviewed May 10, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added attention layers to wrap for fsdp #30735

Added attention layers to wrap for fsdp #30735

alvations commented May 9, 2024 •

edited

HuggingFaceDocBuilderDev commented May 10, 2024

ArthurZucker left a comment

Added attention layers to wrap for fsdp #30735

Are you sure you want to change the base?

Added attention layers to wrap for fsdp #30735

Conversation

alvations commented May 9, 2024 • edited

What does this PR do?

Who can review?

HuggingFaceDocBuilderDev commented May 10, 2024

ArthurZucker left a comment

Choose a reason for hiding this comment

alvations commented May 9, 2024 •

edited