You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to fine-tune Llama2 70B model on a dataset, with TP=4, PP=8 it is working fine. But with FSDP on 6 nodes it is failing with below error
File "/opt/NeMo/nemo/collections/nlp/parts/nlp_overrides.py", line 670, in setup_environment
for p in self.model.parameters():
AttributeError: 'NoneType' object has no attribute 'parameters'
Steps/Code to reproduce bug
Converted Llama2 70B base model checkpoint from huggingface to nemo format
Started training on 6 nodes with the below config.
Unable to fine-tune Llama2 70B with FSDP
I am trying to fine-tune Llama2 70B model on a dataset, with TP=4, PP=8 it is working fine. But with FSDP on 6 nodes it is failing with below error
Steps/Code to reproduce bug
Expected behavior
Llama2 70B SFT works fine.
Environment details
Image:
nvcr.io/nvidia/nemo:24.03.01.framework
Using slurm cluster.
The text was updated successfully, but these errors were encountered: