Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Save and Load sharded gptq checkpoint #364

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Conversation

PanQiWei
Copy link
Collaborator

@PanQiWei PanQiWei commented Oct 7, 2023

What does this pr do

This pr adds support to save and load sharded gptq checkpoint.

Currently implemented:

  • save sharded gptq checkpoint by using save_quantized method:
    • remove the code that move model to CPU before save weights.
    • add max_shard_size argument (defauts to "10GB") to specify each weights file's max storage size.
    • add model_base_name argument (defaults to None) so that users can specify weights file's base name by themseves.

@CrazyBrick
Copy link

What does this pr do

This pr adds support to save and load sharded gptq checkpoint.

Currently implemented:

* save sharded gptq checkpoint by using `save_quantized` method:
  
  * remove the code that move model to CPU before save weights.
  * add `max_shard_size` argument (defauts to "10GB") to specify each weights file's max storage size.
  * add `model_base_name` argument (defaults to None) so that users can specify weights file's base name by themseves.

when can we use the updated code(loading sharded checkpoints which quantized by autogptq,such as Qwen-vl-chat-int4)

@TheBloke
Copy link
Contributor

TheBloke commented Nov 9, 2023

Ah yeah I forgot to test this for 0.5.0

I will give it a test today and then maybe it can be included in 0.5.1 @fxmarty ?

Actually never mind, I didn't notice it only included saving, not loading as well. I guess not much point saving if it can't load, as Transformers can already do both so may as well use that until AutoGPTQ can do both.

@fxmarty
Copy link
Collaborator

fxmarty commented Nov 9, 2023

@TheBloke yeah I'm already spending more time than I should maintaining the repo. Let's wait if @PanQiWei comes back.

@LaaZa
Copy link
Contributor

LaaZa commented Nov 11, 2023

I happened to test this when I made a small sharded model to test the loading. Seemed to work fine, but I didn't do any comprehensive testing.

@ewof
Copy link

ewof commented Dec 15, 2023

havent had an issue with this branch

@ewof
Copy link

ewof commented Apr 19, 2024

the max file size for hf is 50gb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants