Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Computing vector distance functions (like vss_inner_product) incorrectly parses the data #122

Open
zen0wu opened this issue Feb 17, 2024 · 1 comment

Comments

@zen0wu
Copy link

zen0wu commented Feb 17, 2024

I'm dumping some raw bytes into a vector column, but when sqlite-vss parses the BLOB into a vector, it checks if it starts with a v\x01 and if so, these two bytes are treated a header.

The problem is, I have some vectors actually do start with this header (as part of the data) and now it fails to parse the data correctly.

if (header != VECTOR_BLOB_HEADER_BYTE) {
*pzErrMsg = "Blob not well-formatted vector blob";
return nullptr;
}
if (type != VECTOR_BLOB_HEADER_TYPE) {
*pzErrMsg = "Blob type not right";
return nullptr;
}

One thing I can do is to prepend the header to every row, but that feels a really bad solution and it would be great if we can fix this.

@asg017
Copy link
Owner

asg017 commented Feb 17, 2024

Try vector_from_raw() instead. The vector_from_blob() function was a poor attempt at a new vector format, but it made things more complex for no reason (like you saw). The vector_form_raw() on the other hand should handle raw blobs correctly (ie 4 bytes per float vectors).

I'll likely deprecate vector_from_blob() in the next release

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants