Skip to content

Q4_K Quantization Scheme adaptation #6760

Answered by ikawrakow
wilderfield asked this question in Q&A
Discussion options

You must be logged in to vote

If you want to use y = s * (q - z) where q and z are both int4, you are basically looking at something similar to a Q4_0 quantization (being exactly Q4_0 if z = 8). The whole point of Q4_K is that the offset from zero being used has a better precision. If you want to still try with Q4_K, you need to scale the quants up (hopefully your hardware can operate efficiently on int8_t's). I.e.,

y = s * q - m = s * (q - m/s) = s/8 * (8*q - 8*m/s)
=> use float scale s' = s/8 
=> compute y = s' ((q << 3) - z), where z = round(8*m/s)

This will work most of the time, but you need to be careful with overflow of 8*m/s (the q's are in 0...15, so 8*q is in the allowed range of a signed 8-bit integer, so …

Replies: 1 comment 2 replies

Comment options

You must be logged in to vote
2 replies
@ikawrakow
Comment options

Answer selected by wilderfield
@wilderfield
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants