Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid SegmentTermsEnumFrame reload block. #13253

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

vsop-479
Copy link
Contributor

In SegmentTermsEnum.seekExact and SegmentTermsEnumFrame.seekCeil, if we seek a less target after seeking a greater target, we set nextEnt to -1 in SegmentTermsEnumFrame.rewind. Which result in Force reload a loaded block.

I think we could just set nextEnt to 0 to let seek from start, and temporarily set entCount to last nextEnt to reduce seeking range, and reset suffixesReader and suffixLengthsReader 's position. But don't need to reload the block again.

@vsop-479 vsop-479 marked this pull request as draft March 30, 2024 07:46
@vsop-479
Copy link
Contributor Author

vsop-479 commented Mar 30, 2024

Edit: Need more consider about multi floor block.

Copy link
Member

@mikemccand mikemccand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nice idea @vsop-479! It would speed up cases where the caller seeks backwards a bit, remaining in the current frame.

@vsop-479 vsop-479 marked this pull request as ready for review April 12, 2024 09:22
@vsop-479
Copy link
Contributor Author

@mikemccand
Please take a look when you get a chance.

@mikemccand
Copy link
Member

This might be a needle-moving optimization for apps that reuse a single TermsEnum and seek randomly to terms, right? Because all up and down the stack of SegmentTermsEnumFrames we can rewindWithoutReload for many/most of them?

@vsop-479
Copy link
Contributor Author

This might be a needle-moving optimization for apps that reuse a single TermsEnum and seek randomly to terms, right?

Yes.

Because all up and down the stack of SegmentTermsEnumFrames we can rewindWithoutReload for many/most of them?

I just implemented rewindWithoutReload for non-floor block and first floor block, in current version. Since target may exists in more upper floor when it less than last term, but lastFrame is non-first floor.

Copy link

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution!

@github-actions github-actions bot added the Stale label May 10, 2024
@mikemccand
Copy link
Member

Thanks @vsop-479 I will try to re-engage here soon!

@github-actions github-actions bot removed the Stale label May 11, 2024
Copy link
Member

@mikemccand mikemccand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for being patient @vsop-479 -- I keep trying to review/understand this change and keep struggling :)

I left some superficial-ish comments.


// We got lastFrame by comparing target and term, and target less than last seeked term in
// currentFrame. If lastFrame's fp is same with currentFrame's fp, we can reduce entCount to
// nextEnt.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe change to we can rewind without reloading? We don't seem to reduce entCount anymore?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't seem to reduce entCount anymore?

We can reduce entCount to nextEnt if we finally seek the same block.
This has been already implemented in:


// We still stay on withOutReload frame, reduce entCount to nextEnt.
        int origEntCount = -1;
        if (currentFrame.fp == withOutReloadFp && origNextEnt != 0) {
          origEntCount = currentFrame.entCount;
          currentFrame.entCount = origNextEnt;
        }

// Only rewindWithoutReload for non-floor block or first floor block.
// TODO: We need currentFrame's first entry to judge whether we can rewindWithoutReload for
// non-first floor blocks.
if (currentFrame.fp != currentFrame.fpOrig
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add comment for each of these reasons to still reload the frame?

E.g.:

currentFrame.fp != currentFrame.fpOrig // this is a floor multi-block and not the first one`` currentFrame.entCount == 0 // why does this happen?`

etc.?

I don't understand when/why the 2nd two conditions occur.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currentFrame.entCount == 0 // why does this happen?`

We push frame to stack, and load it when we need to seek it. so we may get a unloaded frame from stack.

Copy link
Contributor Author

@vsop-479 vsop-479 May 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe currentFrame.nextEnt == -1 is enough for both unloaded block, I will confirm that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add comment for each of these reasons to still reload the frame?

Added.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe currentFrame.nextEnt == -1 is enough for both unloaded block, I will confirm that.

We may meet a block nextEnt is -1 and entCount is not 0( set by SegmentTermsEnumFrame.scanToFloorFrame), but we should not meet a block entCount is 0 and nextEnt is not -1.

So currentFrame.nextEnt == -1 is enough for unloaded block.

currentFrame.rewind();
} else {
// Prepare to reduce entCount.
if (currentIsLast && currentFrame.isLeafBlock) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused why there are two cases under the rewindWithoutReload case. One case you can roll back entCount temporarily, because you know the target term is before the current term? And in the 2nd case, you don't know that, but you do know the target term is in this current block? But then why are we even rewinding?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One case you can roll back entCount temporarily, because you know the target term is before the current term?

Yes.

but you do know the target term is in this current block? But then why are we even rewinding?

We get lastFrame from stack[0] firstly, and compare target to last seek'd term to update lastFrame to reuse seek state, and assign it to currentFrame. This currentFrame only got by target and last seek'd term 's common prefix, we still need to continue walking the index to seek target term's block.

I think we need rewinding to set frame's state (nextEnt, entCount, etc.) to prepare for seeking. Unlike rewind, rewindWithoutReload can keep loaded block's data (same FP).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rewindWithoutReload is called when currentFrame's is loaded and fp equals fpOrig.
Doing this can avoid reload a loaded block, when we finally need seek it, and it is still in stack.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants