Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve LongRange execution for ranges that have min value as Long.MIN_VALUE and max value as Long.MAX_VALUE or min as 0, and max as 2^64-1 #13375

Open
gautamworah96 opened this issue May 16, 2024 · 7 comments

Comments

@gautamworah96
Copy link
Contributor

Description

At Amazon Search, we came across a lot of client queries that were specifying the min value as 0, and the max value as Long.MAX_VALUE.

We don't use the associated ValueSourceQuery query internally, but use a custom logic for filtering.
An improvement here would be to rewrite ValueSourceQuery to a MatchAllDocsQuery if we detect that the min and max value are the same as the min and max value of Long.

I think we will have to check for something like (minVal==0 and maxVal==2^64-1) or (minVal==Long.MIN_VALUE and maxVal==Long.MAX_VALUE) since we will have to cover both signed and unsigned cases.

We can implement similar logic for DoubleRange as well.

@timgrein
Copy link
Contributor

timgrein commented May 16, 2024

That sounds interesting, I can take a look at DoubleRange first and do so some benchmarking, shouldn't be too hard to expand it then to the other data types, if it shows promising results.

@gautamworah96
Copy link
Contributor Author

I can take a look at DoubleRange first and do so some benchmarking

I suspect that existing benchmarks won't show needle moving results if they don't have enough queries that can be optimized through this opto. The change is still a positive one tho. We may need to benchmark it on a query set that specifically has a lot of these "full range from 0->Long.MAX_VALUE" clauses.

@timgrein
Copy link
Contributor

timgrein commented May 17, 2024

I suspect that existing benchmarks won't show needle moving results if they don't have enough queries that can be optimized through this opto.

Yes, I'm gathering some specific numbers for this exact case. I'll also suspect that this rewriting should've a purely positive effect, especially considering that you've a max of 4 dimensions for FloatRange, DoubleRange etc., which should make detecting a query containing such a range very fast.

@msokolov
Copy link
Contributor

Is it valid to rewrite to match all docs though? Some docs may lack a value, and I didn't think we would match in that case.

@timgrein
Copy link
Contributor

timgrein commented May 19, 2024

I took a first pass at this and also noticed that MatchAllDocsQuery is semantically wrong as some docs might not have the field (also caught by the existing test cases). What I think could work is to check, whether the valueSource is of type FieldValuesSource and rewrite the query to a FieldExistsQuery if we encounter the maximum possible range (Long.MIN_VALUE and Long.MAX_VALUE) at the same time.

WDYT @msokolov ?

@msokolov
Copy link
Contributor

+1, that should work.

@timgrein
Copy link
Contributor

@msokolov Opened a PR for LongRange, if you want to take a look (I cannot request a reviewer...): #13383

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants