Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Binary search all terms. #13192

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

vsop-479
Copy link
Contributor

@vsop-479 vsop-479 commented Mar 20, 2024

Base on #11888 (binary search allEqual term leaf), this change implemented binary search for all term leaf (allEqual and unEqual).
I am still trying to do the same thing to NonLeaf term.

@vsop-479
Copy link
Contributor Author

Implemented binary search term non leaf (allEqual and unEqual).

@vsop-479
Copy link
Contributor Author

@jpountz @mikemccand
Please take a look when you get a chance.

Copy link
Member

@mikemccand mikemccand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @vsop-479 -- it seems to get quite a bit more complex to add binary search to non-leaf blocks, and to non-allEqual blocks!

suffix = suffixLengthsReader.readVInt();
} else {
// Handle subCode for non leaf block.
postions = new int[entCount];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This many new allocations, at such a low level / hot spot (on each block load) seems risky / performance hurting?

I wonder how often we see allEqual for non-leaf blocks when the field is indeed fixed-length terms? It might happen often, depending on how evenly distributed the tokens are across term space? A random UUID should be quite even, at least on initial indexing, but a predictable ID (just 0-padded incrementing int, like luceneutil) might be less so?

Copy link
Contributor Author

@vsop-479 vsop-479 Mar 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, maybe it is less worth to implement this binary search to non-leaf blocks (both allEqual and non-allEqual ?).

Can we just take this change to non-allEqual leaf blocks? But it still need to allocate suffixes and offsets ( assume We don't need positions to set suffixLengthsReader's position after searching a block).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assume We don't need positions to set suffixLengthsReader's position after searching a block.

Setting suffixLengthsReader's position is necessary, so we need positions.

Copy link

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution!

@github-actions github-actions bot added the Stale label Apr 12, 2024
@github-actions github-actions bot removed the Stale label Apr 27, 2024
Copy link

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution!

@github-actions github-actions bot added the Stale label May 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants