Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disjunction as CompetitiveIterator for numeric dynamic pruning #13221

Merged
merged 15 commits into from
May 20, 2024

Conversation

gf2121
Copy link
Contributor

@gf2121 gf2121 commented Mar 26, 2024

This PR proposes a new way to do numeric dynamic pruning with following changes:

  • Instead of complex sampling and estimating point count to judge whether to build the competitive iterator, this patch proposes to find out the threshold value. That said, we find out the value that 'N docs away from' the top value, in favor of the the fact that top value should be final in LeafComparators, like the following picture shows.

image

  • Instead of building and rebuilding the competitive iterator when bottom value get more competitive, this patch proposes to build the competitive iterator as a disjunction of small DISIs. Each small DISI maintains its most competitive value and get discarded when their most competitive value is no more competitive, like what we did in TermOrdValComparator. This helps us intersect the tree only once and update the competitive iterator more frequently.

  • For simplification, i tweaked the bytes things to comparable long values. e.g. maxValueAsBytes -> maxValueAsLong.

Benchmark

Here is a result based on wikimedium10m (baseline contains #13199)

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                      TermDTSort      176.84      (5.0%)      303.31      (6.3%)   71.5% (  57% -   87%) 0.000
           HighTermDayOfYearSort      454.72      (3.2%)      791.09      (7.9%)   74.0% (  60% -   87%) 0.000

*/
private static long estimatePointCount(
public static long estimatePointCount(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to make the public again to get accurate point count size in intersectThresholdValue

Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great idea. So when we get a more competitive bottom value, we can simply subtract from our existing competitive iterator instead of recomputing this competitive iterator entirely. A benefit of this approach is that we no longer have to wait for the estimated point count to decrease by 8x before updating the competitive iterator, we can do it more often and save evaluating more docs.

A downside is that this approach may be less memory-efficient, since we store competitive docs as integers, never as a bit set like today. But we may be able to work around it by just starting dynamic pruning a bit later, when we have fewer docs to store.

It's a bit surprising to me that we are configuring the disjunction by the number of docs on each clause (MIN_BLOCK_DISI_LENGTH) as this could create disjunctions with many many clauses (which is something I don't like about TermOrdValComparator). Maybe we could configure it via a target number of clauses instead, e.g. something like min(128, leadCost / 512)?

I wonder how this approach compares with the current one in the worst-case scenario when documents get collected in competitive order, e.g. increasing order for a descending sort. I don't have a good intuition for whether it would perform better or worse, do you know?

@gf2121
Copy link
Contributor Author

gf2121 commented Mar 27, 2024

A downside is that this approach may be less memory-efficient, since we store competitive docs as integers, never as a bit set like today. But we may be able to work around it by just starting dynamic pruning a bit later, when we have fewer docs to store.

+1 for memory concern. I'd limit the doc num threshold to maxDoc >> 5 so that it won't allocate more than maxDoc bits memory.

It's a bit surprising to me that we are configuring the disjunction by the number of docs on each clause (MIN_BLOCK_DISI_LENGTH) as this could create disjunctions with many many clauses (which is something I don't like about TermOrdValComparator). Maybe we could configure it via a target number of clauses instead, e.g. something like min(128, leadCost / 512)?

More disjunction clauses means we can update the competitive iterator more often but also introduce more overhead of CompetitiveIterator#advance. I like the idea to configure in the disjunction clause num, will do.

I wonder how this approach compares with the current one in the worst-case scenario when documents get collected in competitive order, e.g. increasing order for a descending sort. I don't have a good intuition for whether it would perform better or worse, do you know?

In the worst case we can not skip any docs in the first segment so I think the overhead can be considered in following perspectives:

  • Update bottom overhead: Current logic need to estimate the point count several round while this patch computes the threshold value once. This patch wins.

  • CompetitiveIterator iteration overhead: Current CompetitiveIterator builds a array / bitset iterator so it has a cheap advance while this patch builds a disjunction of small arrays. Main wins.

  • CompetitiveIterator building overhead:

    • For low/medium cardinality fields, this patch can win because we are building small disis so we can skip the sort for a leaf block when docs are in order.
    • For high cardinality fields, if current logic kick in the bitset case then Main win because building a bitset can be cheaper than sort.
    • For high cardinality fields, if current logic not kick in the bitset then they are equal since we are using a O(n) LSBRadixSorter to sort these docs.

In general, for the worst case, i think this patch may speed up medium cardinality fields but slow down high cardinality fields that kick in bitset. I can play with luceneutil to verify the guess.

@gf2121
Copy link
Contributor Author

gf2121 commented Mar 27, 2024

I separately build two wikimedium10m indices that force merged and reverse sorted by dayOfYear/lastMod and following is the result.

wikimedium10m.lucene_baseline.Lucene99.dvfields.sort=dayOfYear:int.nd10M

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
           HighTermDayOfYearSort      493.06      (1.3%)     1252.27      (4.5%)  154.0% ( 146% -  162%) 0.000

wikimedium10m.lucene_baseline.Lucene99.dvfields.sort=lastMod:long.nd10M

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                      TermDTSort      369.84      (3.1%)      357.03      (3.7%)   -3.5% (  -9% -    3%) 0.001

@gf2121
Copy link
Contributor Author

gf2121 commented Mar 27, 2024

I run 10 rounds on wikimediumall index (normal index, no index sorting / force merge). Result looks positive in general, but we do meet slight regression for high-cardinality field in several rounds (3/10).

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                      TermDTSort      377.08      (4.3%)      364.56      (5.6%)   -3.3% ( -12% -    6%) 0.036
           HighTermDayOfYearSort      620.13      (4.1%)      783.98      (7.8%)   26.4% (  13% -   39%) 0.000
           
                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                      TermDTSort      334.03      (8.0%)      357.86      (5.5%)    7.1% (  -5% -   22%) 0.001
           HighTermDayOfYearSort      565.44      (6.3%)     1043.55      (6.3%)   84.6% (  67% -  103%) 0.000
           
                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                      TermDTSort      434.01      (4.1%)      491.14      (4.3%)   13.2% (   4% -   22%) 0.000
           HighTermDayOfYearSort      606.48      (3.6%)      914.42      (3.3%)   50.8% (  42% -   59%) 0.000
           
                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                      TermDTSort      467.90      (5.1%)      498.69      (4.5%)    6.6% (  -2% -   17%) 0.000
           HighTermDayOfYearSort      784.73      (5.2%)     1015.12      (2.6%)   29.4% (  20% -   39%) 0.000
           
                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                      TermDTSort      347.23      (4.3%)      319.38      (1.9%)   -8.0% ( -13% -   -1%) 0.000
           HighTermDayOfYearSort      703.03      (3.5%)      962.18      (1.9%)   36.9% (  30% -   43%) 0.000
           
                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                      TermDTSort      348.83      (6.9%)      380.95      (5.0%)    9.2% (  -2% -   22%) 0.000
           HighTermDayOfYearSort      474.03      (5.4%)     1071.19      (6.0%)  126.0% ( 108% -  145%) 0.000
           
                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                      TermDTSort      364.28      (7.6%)      350.49      (7.5%)   -3.8% ( -17% -   12%) 0.111
           HighTermDayOfYearSort      761.71      (5.8%)     1090.74      (9.0%)   43.2% (  26% -   61%) 0.000
           
                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                      TermDTSort      351.76      (8.7%)      381.25     (10.4%)    8.4% (  -9% -   30%) 0.006
           HighTermDayOfYearSort      599.32      (6.5%)      791.83      (9.2%)   32.1% (  15% -   51%) 0.000
           
                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                      TermDTSort      313.74      (4.3%)      319.89      (4.7%)    2.0% (  -6% -   11%) 0.168
           HighTermDayOfYearSort      642.69      (5.6%)      836.24      (5.0%)   30.1% (  18% -   43%) 0.000
           
                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                      TermDTSort      421.29      (4.3%)      498.57      (3.2%)   18.3% (  10% -   27%) 0.000
           HighTermDayOfYearSort      544.96      (3.1%)      865.97      (2.2%)   58.9% (  51% -   66%) 0.000

Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some questions, this code is a bit complex (the existing one, not only your change!) so it's taking me a bit of time to fully understand how it's working.

@gf2121 gf2121 requested a review from jpountz April 16, 2024 11:47
Copy link

github-actions bot commented May 1, 2024

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution!

@github-actions github-actions bot added the Stale label May 1, 2024
Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a fascinating optimization! I left some comments/questions but this looks generally correct to me.

leadCost = Math.min(leadCost, competitiveIterator.cost());
}
long threshold = Math.min(leadCost >> 3, maxDoc >> 5);
thresholdAsLong = intersectThresholdValue(threshold);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if thresholdAsLong was -1 here, would we then run the computation again later? Do we need to make thresholdAsLong an Optional, or at least a nullable Long?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch!

private void tryReduceDisjunctionClause(CompetitiveIterator iter) {
int originalSize = iter.disis.size();
Predicate<DisiAndMostCompetitiveValue> isCompetitive =
d -> d.mostCompetitiveValue <= maxValueAsLong && d.mostCompetitiveValue >= minValueAsLong;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a bit awkward to define this predicate which is then only used once, could you inline it in the below loop?

}
long mostCompetitiveValue =
reverse == false
? Math.max(blockMinValue, minValueAsLong)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why we need to take the max with minValueAsLong?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The most left leaf block will kick in visitDocValues and only docs with values between minValueAsLong and blockMaxValue will be consumed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So blockMinValue should already be greater than or equal to minValueAsLong, shouldn't it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm.. forgive me if i misunderstand something, but I think blockMinValue can be smaller than minValueAsLong when the leaf block has CELL_CROSSED_QUERY relation with minValueAsLong ?

private int minBlockLength() {
// bottom value can be much more competitive than thresholdAsLong, recompute the cost.
int cost =
Math.toIntExact(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we actually safely cast to an int? the field may be multi-valued and have more than Integer.MAX_VALUE values?

FWIW I'd be ok with throwing an exception if there are more than MAX_DISJUNCTION_CLAUSE*Integer.MAX_VALUE values or something like that, so that minBlockLength can still be stored as an integer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we actually safely cast to an int? the field may be multi-valued and have more than Integer.MAX_VALUE values?

Nice catch! I thought the count of points between thresholdValueAsLong and topValue is limited to maxDoc >> 5 (link) so it can be cast safely. But if there are many many (more than Integer.MAX_VALUE - maxDoc / 128) points' value equals thresholdValueAsLong the cast will not be safe.

FWIW I'd be ok with throwing an exception if there are more than MAX_DISJUNCTION_CLAUSE*Integer.MAX_VALUE values or something like that, so that minBlockLength can still be stored as an integer.

+1. BTW I wonder if we should catch all exceptions when trying dynamic pruning and fallback to normal logic?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds too fragile to me, I'd rather check very specific conditions.

@github-actions github-actions bot removed the Stale label May 15, 2024
Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, this is a fascinating approach.

FYI we'll need to add // nightly-benchmarks-results-changed // to the commit message to set expectations with nightly benchmarks that sorting tasks return a different hit count with this change. cc @mikemccand

@mikemccand
Copy link
Member

FYI we'll need to add // nightly-benchmarks-results-changed // to the commit message to set expectations with nightly benchmarks that sorting tasks return a different hit count with this change. cc @mikemccand

I am so excited to see if this (nightly benchmarks auto-regolding) finally works :) It so rarely gets tested!

@gf2121 gf2121 merged commit 1ee4f8a into apache:main May 20, 2024
3 checks passed
@jpountz
Copy link
Contributor

jpountz commented May 21, 2024

I am so excited to see if this (nightly benchmarks auto-regolding) finally works

Looks like it worked! https://people.apache.org/~mikemccand/lucenebench/TermDayOfYearSort.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants