-
Notifications
You must be signed in to change notification settings - Fork 958
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Significant drop in recall for int8 scalar quantization using maximum_inner_product #13350
Comments
@benwtrent Are you aware of this recall issue with IP using SQ int8? |
@naveentatikonda interesting results for sure. I tested with max-inner-product & CohereV2 and didn't see a drop like this. I will try and replicate. The |
@benwtrent Thanks for your response. I'm not exactly sure about the version of it. But, this is the dataset. Using the link I shared in the description, we can download a 1 million vector dataset of this in hdf5 file format. |
@naveentatikonda using lucene-util, scalar quantization, I get a recall@100 of I am calculating the recall by gathering the true 100 nearest neighbors from the test queries given the training docs given max-inner-product. Then I compare the overlap with 100 nearest neighbors found via scalar quantized HNSW. What is your flush buffer size? Maybe I need to kick off some more merges. Do you force-merge to a single segment? |
This looks much better. I will try to set it up and reproduce with lucene-util.
Sorry, I'm not sure how to set the flush buffer size and I don't think we control this parameter from opensearch.
Yes, I was actually using a single node cluster with 8 shards force merged to single segment. |
OK, I ran it again, on my index where the flush was set at 28MB & force merged. This time I ran it over all 10k queries (previously it was just 1k, as calculating the true nearest neighbors takes significant time). Recall@100 is a stead: |
@benwtrent Sorry for the delay in my response. Can you share more details about the dataset you used to get this recall. Is it a subset of this Cohere-wikipedia-22-12-en-embeddings dataset. I mean is the training data the first million vectors and the query data is the next 10k vectors ? If possible can you please share your dataset and the ground truth you generated through github or huggingface. Also, have you force merged to 1 segment before running the search queries? |
@naveentatikonda I used the dataset you linked. I simply downloaded the file. Ground truth is just the brute force nearest neighbors. I used the "test" set as the queries (10k of them) and "train" (1M) for the docs. Computing the true NN. Yes, I force merged. I imagine if I didn't, recall would actually be higher. |
@benwtrent I tried to setup luceneutil but I was running into a ton of compilation errors when building it against latest lucene src code. I have a doubt in the existing quantization process of quantizing a float value in a vector.
Here to bring it into signed int8 range of [-128, 127] we are casting it into byte. But, this leads to change of sign and magnitude for values > 127 and < 255 like 128 will be casted as -128 and 129 as -127 and so on...Is this a right way of quantizing because for spacetypes like inner product the sign matters when we are computing the distance and score ?
I have added a simple unit test to show that because of this type casting, even if we dequantize the quantized vector we don't get the original vector back. Also, the corrective offset calculation correlates with the mathematical derivation in documentation. But, for max inner product I have a simple example which is reranking the results because of this corrective offset.
|
Scalar quantization in Lucene is by default 7 bits. Meaning the range of values is actually 0-127. |
For 7 bits, it makes sense. But, the example I was referring to is for 8 bits. |
@benwtrent To be on the same page, can you please confirm if you have used 7 bits or 8 bits in your experiment that you ran above with cohere dataset using InnerProduct to get this recall@100 as |
Thank you @naveentatikonda for the deep dive here and a nice unit test ... I couldn't follow all of the logic you described, but if we are indeed first normalizing a dimension's value |
I used int7 for my experiments. While losing one bit of precision isn't the best, it works well. I explored adding an unsigned byte dot product, but that got rejected as too much code. I think for int8 to support all vector similarities, we need an unsigned dot product. But, if we support signed int8, we should restrict it to [-127, 127], as there can be nice performance benefits if we can assume these ranges on various hardware. I haven't done the math on Euclidean to figure out if we need an unsigned byte version of that as well. I am out on vacation. But here is my old PR: Maybe it's as simple as force int7 when the similarity used is max inner product and allowing signed int8 for everything else? |
I am trying to understand one thing: Does the corrective offset for dot product rectify issues with sign shift that is caused by going from signed domain: [-x, +y] to unsigned domain: [0, 127]? Or is this handled elsewhere? For an overly simplified example, for a data set with # Query vectors
query_vector = [-5]
quantized_query_vector = [11]
# Index vectors
## Full precision
1: [-6]
2: [-2]
3: [3]
4: [6]
## Quantized
1: [0]
2: [42]
3: [95]
4: [127]
## Full precision Ordering of query_vector . index_vectors:
1,2,3,4
## Quantized Ordering of quantized_query_vector . quantized_index_vectors
4,3,2,1 I think I might be missing something, but how is this accounted for? |
@jmazanec15 this is accounted for in the corrections. The moving from signed to unsigned is still just a linear transformation, we are not manually flipping signs, but instead doing a full linear scale. Assuming your vectors are in order that you provided Their calculated dot-product corrections are For query vector The overall quantile multiplier is To calculate the corrected dot-product score The raw dot-products (in order of the vectors)
The quantized dot-products with corrections
Of course, the quantized scores without corrections (thus not accounting for linear shift and breaking max-inner product score scaling)
|
Description
While running some benchmarking tests using opensearch-benchmark on int8 scalar quantization using some of the standard datasets, I observed that there is a significant drop in recall with max inner product space type when compared with other space types.
Here are some of those results
The
cohere-768-IP
dataset can be downloaded from this link. The L2 version of cohere dataset is generated from the same dataset by recomputing ground truth. Rest all datasets are downloaded from hereVersion and environment details
No response
The text was updated successfully, but these errors were encountered: