Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pass custom similarity function to similarityToQueryVector API #13187

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

shubhamvishu
Copy link
Contributor

Description

This PR allows passing a custom vector similarity function to DVS implementations of VectorSimilarityValuesSource as opposed to current behaviour which by default only allows to use the configured VectorSimilarityFunction associated with the KNN/vector field.

Tests : Added some unit tests

@msokolov
Copy link
Contributor

There is some discussion how to make similarities more pluggable #13182 that seems relevant. Part of the idea there is to accept ordinal values rather than arrays so that internally we can make use of directory memory access rather than having to materialize values on the heap. This issue could still fit into that framework I think, just providing a query vector but not a document vector? I guess we should think about whether this change would lock us in further to the current situation or not. Maybe we could go ahead with this and then change it later when that other issue lands? But it could make it more difficult to "plug in" implementations if they have to be reimplemented

@shubhamvishu
Copy link
Contributor Author

Thanks for the review @msokolov! The idea to make it pluggable seems relevant and interesting. Currently it was not possible to use any custom vector similarity function other than the configured one which for instance in case of DOT_PRODUCT similarity gives the scaled score and not the actual raw dot product. So by that logic I think this change would be good.

I guess we should think about whether this change would lock us in further to the current situation or not. Maybe we could go ahead with this and then change it later when that other issue lands? But it could make it more difficult to "plug in" implementations if they have to be reimplemented

Yes, we should definitely take that into account. Though I'm not sure if this change conflicts with or makes things difficult for the ongoing efforts to have pluggability (maybe @benwtrent would be interested in sharing his thoughts?).

@benwtrent
Copy link
Member

Though I'm not sure if this change conflicts with or makes things difficult for the ongoing efforts to have pluggability (maybe @benwtrent would be interested in sharing his thoughts?).

I would assume any change here would want to take advantage of any pluggable similarity. If so, this change would 100% be effected.

What makes this PR doubly worrying is that none of the interfaces are marked experimental. Meaning, if its released and we want to change it, we gotta be compatible, that doesn't seem wise.

@shubhamvishu
Copy link
Contributor Author

@benwtrent So should we instead wait for the pluggability support and discard this for now? or Is it possible to go forward with this?

What makes this PR doubly worrying is that none of the interfaces are marked experimental. Meaning, if its released and we want to change it, we gotta be compatible, that doesn't seem wise.

I'm not sure if we can mark them experimental at this point but if yes, maybe we could first mark these classes as experimental so that we don't have to worry about backward compatibility later?

Copy link

github-actions bot commented Apr 2, 2024

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution!

@github-actions github-actions bot added the Stale label Apr 2, 2024
@benwtrent
Copy link
Member

I am sorry this work has stalled. But I have been iterating on #13200 for a week now. Its getting to a palatable place.

@github-actions github-actions bot removed the Stale label Apr 3, 2024
Copy link

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution!

@github-actions github-actions bot added the Stale label Apr 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants