Multi range traversal for numeric range aggregations #13335

jainankitk · 2024-05-01T23:18:37Z

Description

In OpenSearch, we have introduced multi range traversal for collecting matching document count in single tree traversal (opensearch-project/OpenSearch#13317). That has helped improve the performance of numeric aggregations in OpenSearch significantly. I am wondering if there are other use cases that can benefit from this and change should be included in Lucene.

Constraints:

The ranges are non-overlapping and in increasing order. For example - In (a1,b1),(a2,b2)...(an,bn), it is assumed ai<aj for all i<j and bi < a(i+1) for every i
Field is 1 dimensional guaranteeing the points are stored in increasing order, and segment does not have any deletions

jainankitk · 2024-05-02T00:33:13Z

Looks related to #9814, but there are differences between the two. As discussed offline with @bowenlan and @rishabhmaurya, MultiRangeQuery only tells what matches with your multi ranges, not what matches with each of your multi ranges.

jainankitk · 2024-05-16T18:51:56Z

@jpountz - Thoughts?

mikemccand · 2024-05-17T18:46:52Z

I like this optimization. Maybe it best fits in Lucene's facet module? But, I don't think our facet impls today ever use points, directly, to do counting/aggregation -- it's a two step process of first collecting into a bitset holding the matched docs, and, second, iterating those docs and looking doc values or facet ords (also from doc values) and counting/aggregating from there. But in the browse-only cases where a query just wants counts of ranges across all docs in the index, this opto should be a crazy fast way to achieve it when there are no deletions. Even when there are deletions, this opto could visit all docs and check the live docs and count/aggregate accordingly? The time is no longer sub-linear, but it'd still be faster than the two phased approach that Lucene's facets use today?

@stefanvodita / @Shradha26 WDYT?

stefanvodita · 2024-05-20T12:11:54Z

It's true that we have this two-step process for aggregations (incl. counts) and that it's not always the most efficient solution.
+1 to try out this optimisation, sounds promising!

jainankitk added the type:enhancement label May 1, 2024

mikemccand mentioned this issue May 17, 2024

[DISCUSS] Identifying Gaps in Lucene’s Faceting #12553

Open

jainankitk mentioned this issue May 20, 2024

[RFC] Pre Compute Aggregations with Star Tree index opensearch-project/OpenSearch#12498

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi range traversal for numeric range aggregations #13335

Multi range traversal for numeric range aggregations #13335

jainankitk commented May 1, 2024

jainankitk commented May 2, 2024

jainankitk commented May 16, 2024

mikemccand commented May 17, 2024

stefanvodita commented May 20, 2024

Multi range traversal for numeric range aggregations #13335

Multi range traversal for numeric range aggregations #13335

Comments

jainankitk commented May 1, 2024

Description

jainankitk commented May 2, 2024

jainankitk commented May 16, 2024

mikemccand commented May 17, 2024

stefanvodita commented May 20, 2024