Save and Load sharded gptq checkpoint #364

PanQiWei · 2023-10-07T06:50:47Z

What does this pr do

This pr adds support to save and load sharded gptq checkpoint.

Currently implemented:

save sharded gptq checkpoint by using save_quantized method:
- remove the code that move model to CPU before save weights.
- add max_shard_size argument (defauts to "10GB") to specify each weights file's max storage size.
- add model_base_name argument (defaults to None) so that users can specify weights file's base name by themseves.

CrazyBrick · 2023-11-09T09:28:05Z

What does this pr do

This pr adds support to save and load sharded gptq checkpoint.

Currently implemented:

* save sharded gptq checkpoint by using `save_quantized` method:
  
  * remove the code that move model to CPU before save weights.
  * add `max_shard_size` argument (defauts to "10GB") to specify each weights file's max storage size.
  * add `model_base_name` argument (defaults to None) so that users can specify weights file's base name by themseves.

when can we use the updated code(loading sharded checkpoints which quantized by autogptq,such as Qwen-vl-chat-int4)

TheBloke · 2023-11-09T09:56:15Z

Ah yeah I forgot to test this for 0.5.0

I will give it a test today and then maybe it can be included in 0.5.1 @fxmarty ?

Actually never mind, I didn't notice it only included saving, not loading as well. I guess not much point saving if it can't load, as Transformers can already do both so may as well use that until AutoGPTQ can do both.

fxmarty · 2023-11-09T10:15:35Z

@TheBloke yeah I'm already spending more time than I should maintaining the repo. Let's wait if @PanQiWei comes back.

LaaZa · 2023-11-11T12:25:24Z

I happened to test this when I made a small sharded model to test the loading. Seemed to work fine, but I didn't do any comprehensive testing.

ewof · 2023-12-15T05:11:33Z

havent had an issue with this branch

ewof · 2024-04-19T17:02:16Z

the max file size for hf is 50gb

student686 added 3 commits October 7, 2023 13:47

bump transformers version to 4.34.0

bf70350

save_quantized method support shard checkpoint

fc1184e

add new args of save_quantized method to push_to_hub method

22af50b

PanQiWei mentioned this pull request Sep 25, 2023

The Path to v1.0.0 #348

Open

8 tasks

fxmarty mentioned this pull request Oct 27, 2023

Support sharded quantized model files in from_quantized #319

Closed

LaaZa mentioned this pull request Nov 11, 2023

Support loading sharded quantized checkpoints. #425

Merged

LaaZa mentioned this pull request Nov 21, 2023

[BUG] ---> 54 query_states, key_states, value_states = torch.split(qkv_states, self.hidden_size, dim=2) #436

Closed

LaaZa mentioned this pull request Apr 1, 2024

[WIP] [WORKING] dbrx (mod) support #625

Draft

LaaZa mentioned this pull request Apr 19, 2024

Save sharded checkpoints #645

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Save and Load sharded gptq checkpoint #364

Save and Load sharded gptq checkpoint #364

PanQiWei commented Oct 7, 2023

CrazyBrick commented Nov 9, 2023

What does this pr do

TheBloke commented Nov 9, 2023 •

edited

fxmarty commented Nov 9, 2023

LaaZa commented Nov 11, 2023

ewof commented Dec 15, 2023

ewof commented Apr 19, 2024

Save and Load sharded gptq checkpoint #364

Are you sure you want to change the base?

Save and Load sharded gptq checkpoint #364

Conversation

PanQiWei commented Oct 7, 2023

What does this pr do

CrazyBrick commented Nov 9, 2023

What does this pr do

TheBloke commented Nov 9, 2023 • edited

fxmarty commented Nov 9, 2023

LaaZa commented Nov 11, 2023

ewof commented Dec 15, 2023

ewof commented Apr 19, 2024

TheBloke commented Nov 9, 2023 •

edited