freezing layers have differenct behaves for different models #438

hjc3613 · 2024-04-18T05:48:28Z

System Info

Dockerfile:

Information

The official example scripts
My own modified scripts

🐛 Describe the bug

when freezing the top n layers, it has different effect for different models:
for qwen, the freezed layers has no gradients, so the gpu ram becomes lower contrast to none freezing.
but for qwen1.5 and wizard-8x22B, the freezed layers seemes to have gradients, so the gpu ram still high, even when I freeze top n-1 layers and fintune the last one layer, it gives me OOM when trained on 3 nodes(total 24* A800).

Error logs

cuda out of memoy

Expected behavior

can freeze top layers for any models

HamidShojanazeri · 2024-04-19T04:20:17Z

@hjc3613 sorry for the inconvenience, this feature is not well tested thats why we didn't mention it much. If you are interested, would love to work with you and make a PR to fix the issues.

hjc3613 · 2024-04-19T06:30:26Z

@hjc3613 sorry for the inconvenience, this feature is not well tested thats why we didn't mention it much. If you are interested, would love to work with you and make a PR to fix the issues.

I think this feature is necessary, it can save much gpu memory and get the same results as full parameter training in some practical. I have test in my task that when freeze top 40 layers(total 80 layers), the test results between none freeze are same！！but only use 1 nodes(8*80G)

HamidShojanazeri self-assigned this Apr 19, 2024

HamidShojanazeri added the triaged label Apr 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

freezing layers have differenct behaves for different models #438

freezing layers have differenct behaves for different models #438

hjc3613 commented Apr 18, 2024 •

edited

HamidShojanazeri commented Apr 19, 2024

hjc3613 commented Apr 19, 2024

freezing layers have differenct behaves for different models #438

freezing layers have differenct behaves for different models #438

Comments

hjc3613 commented Apr 18, 2024 • edited

System Info

Information

🐛 Describe the bug

Error logs

Expected behavior

HamidShojanazeri commented Apr 19, 2024

hjc3613 commented Apr 19, 2024

hjc3613 commented Apr 18, 2024 •

edited