Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPTQ for RWKV #98

Draft
wants to merge 20 commits into
base: main
Choose a base branch
from
Draft

GPTQ for RWKV #98

wants to merge 20 commits into from

Conversation

3outeille
Copy link

This is work in progress and serve as main thread for any questions related to this topic

@3outeille
Copy link
Author

3outeille commented Apr 19, 2023

@BlinkDL Do I have to quantize blocks.1.att.* as well ? (I am thinking of key, value, receptance weight)

@BlinkDL
Copy link
Owner

BlinkDL commented Apr 20, 2023

@3outeille yes do it for all matrices weights (ignore time_xxx)

@3outeille
Copy link
Author

@BlinkDL Do you happen to have a reference perplexity measure (or whatever metrics ) I can use as a baseline ?

@BlinkDL
Copy link
Owner

BlinkDL commented Apr 25, 2023

@BlinkDL Do you happen to have a reference perplexity measure (or whatever metrics ) I can use as a baseline ?

https://github.com/BlinkDL/ChatRWKV/blob/main/v2/benchmark.py use the LAMBADA ppl here

@meditans
Copy link

Question: would we expect a huge improvement wrt perplexity if we did quantization-aware training?

@3outeille
Copy link
Author

3outeille commented Apr 27, 2023

@meditans QAT will probably yield huge improvement but this imply re-training your model whereas GPTQ uses a post-training quantization strategy (no re-training involved)

@3outeille 3outeille force-pushed the quantize branch 3 times, most recently from f4584b4 to 76d937b Compare April 28, 2023 20:29
@BlinkDL
Copy link
Owner

BlinkDL commented May 8, 2023

How's it going :) are you in Discord

@3outeille
Copy link
Author

Yep, I sent a message on discord in quantization channel

@Evilran
Copy link

Evilran commented May 19, 2023

Hi. Is it available now?

@3outeille
Copy link
Author

@Evilran Hi, making it work with chatRWKV is too much of a hassle because it requires to change the RWKV class too much, thus the PR will not be accepted. However, I made it work with HuggingFace version of RWKV if you want: https://github.com/3outeille/GPTQ-for-RWKV

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants