-
Notifications
You must be signed in to change notification settings - Fork 958
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Binary search all terms. #13192
base: main
Are you sure you want to change the base?
Binary search all terms. #13192
Conversation
Implemented binary search term non leaf ( |
@jpountz @mikemccand |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @vsop-479 -- it seems to get quite a bit more complex to add binary search to non-leaf blocks, and to non-allEqual
blocks!
lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnumFrame.java
Outdated
Show resolved
Hide resolved
lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnumFrame.java
Outdated
Show resolved
Hide resolved
suffix = suffixLengthsReader.readVInt(); | ||
} else { | ||
// Handle subCode for non leaf block. | ||
postions = new int[entCount]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This many new allocations, at such a low level / hot spot (on each block load) seems risky / performance hurting?
I wonder how often we see allEqual
for non-leaf blocks when the field is indeed fixed-length terms? It might happen often, depending on how evenly distributed the tokens are across term space? A random UUID should be quite even, at least on initial indexing, but a predictable ID (just 0-padded incrementing int, like luceneutil) might be less so?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, maybe it is less worth to implement this binary search to non-leaf blocks (both allEqual
and non-allEqual
?).
Can we just take this change to non-allEqual
leaf blocks? But it still need to allocate suffixes
and offsets
( assume We don't need positions
to set suffixLengthsReader
's position after searching a block).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assume We don't need positions to set suffixLengthsReader's position after searching a block.
Setting suffixLengthsReader
's position is necessary, so we need positions
.
This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution! |
This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution! |
Base on #11888 (binary search
allEqual
term leaf), this change implemented binary search for all term leaf (allEqual
andunEqual
).I am still trying to do the same thing to
NonLeaf
term.