Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure we return non-negative scores when scoring scalar dot-products #108522

Merged

Conversation

benwtrent
Copy link
Member

closes: #108339

@elasticsearchmachine elasticsearchmachine added the Team:Search Meta label for search team label May 10, 2024
@elasticsearchmachine
Copy link
Collaborator

Hi @benwtrent, I've created a changelog YAML for you.

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@benwtrent benwtrent added the auto-merge Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label May 10, 2024
@benwtrent
Copy link
Member Author

@elasticmachine update branch

@ChrisHegarty
Copy link
Contributor

The scorer should be doing the same as that of the lucene one. Why is this necessary? Also, should the lucene one do same (in lucene itself)?

@ChrisHegarty
Copy link
Contributor

Since the individual vector values are positive, then is the negativity coming from the offsets or correction values? I think our tests should be updated to cover this.

@benwtrent
Copy link
Member Author

then is the negativity coming from the offsets or correction values? I think our tests should be updated to cover this.

This is being fixed because of a failing test. I could add a manual one where vectors are 0 & offset value is negative.

I also have a PR open in Lucene to correct this there. But, here, we have our own scorer implementation that needs handling.

One thing I should do here is do an assert >=0 on the scalar dot product to ensure we aren't allowing some sneaky bugs throught.

@benwtrent
Copy link
Member Author

@ChrisHegarty added a test & assertion that our dot product from the native impl is >=0

@benwtrent benwtrent removed the auto-merge Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label May 13, 2024
assertThat(scorer.score(0, 1), equalTo(expected));
assertThat((new VectorScorerSupplierAdapter(scorer)).scorer(0).score(1), equalTo(expected));
// cosine
expected = 0f; // TODO fix in Lucene: https://github.com/apache/lucene/pull/13356 luceneScore(COSINE, vec1, vec2, 1, -5,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah ha. I see this now. Thanks

@ChrisHegarty
Copy link
Contributor

I could add a manual one where vectors are 0 & offset value is negative.

Thanks, this is helpful.

I also have a PR open in Lucene to correct this there. But, here, we have our own scorer implementation that needs handling.

Ok cool. The point is that our scorer implementation should produce the same result as that of the Lucene implementation. I now see the separate Lucene PR, thanks.

One thing I should do here is do an assert >=0 on the scalar dot product to ensure we aren't allowing some sneaky bugs throught.

Yeah. Since asserts bloat the bytecode, an alternative is to put the assert in the tests that are closer to the implementation, so in JDKVectorLibraryTests.

@benwtrent benwtrent added the auto-merge Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label May 13, 2024
@benwtrent
Copy link
Member Author

@elasticmachine update branch

@elasticsearchmachine elasticsearchmachine merged commit e352345 into elastic:main May 13, 2024
15 checks passed
@benwtrent benwtrent deleted the ensure-positive-scores-dot-product branch May 13, 2024 14:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-merge Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) >bug :Search/Vectors Vector search Team:Search Meta label for search team v8.15.0
Projects
None yet
5 participants