New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Save and Load sharded gptq checkpoint #364
base: main
Are you sure you want to change the base?
Conversation
when can we use the updated code(loading sharded checkpoints which quantized by autogptq,such as Qwen-vl-chat-int4) |
Ah yeah I forgot to test this for 0.5.0 I will give it a test today and then maybe it can be included in 0.5.1 @fxmarty ? Actually never mind, I didn't notice it only included saving, not loading as well. I guess not much point saving if it can't load, as Transformers can already do both so may as well use that until AutoGPTQ can do both. |
I happened to test this when I made a small sharded model to test the loading. Seemed to work fine, but I didn't do any comprehensive testing. |
havent had an issue with this branch |
the max file size for hf is 50gb |
What does this pr do
This pr adds support to save and load sharded gptq checkpoint.
Currently implemented:
save_quantized
method:max_shard_size
argument (defauts to "10GB") to specify each weights file's max storage size.model_base_name
argument (defaults to None) so that users can specify weights file's base name by themseves.