Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

facet_by's total_values is returning max_facet_value params value or default value when not present in params #1698

Open
shiyanshirani opened this issue Apr 30, 2024 · 2 comments

Comments

@shiyanshirani
Copy link

shiyanshirani commented Apr 30, 2024

Description

We're having a homepage where we show the total_count of facet values. We used to get this value in v0.24.1 in

{    # request
	"q": "*",
	"facet_by": "name",
	"max_facet_value": 12
}
{ # response
    "facet_counts": [
        {
            "counts": [
                {
                    "count": 10502,
                    "highlighted": "A",
                    "value": "A"
                }
            ],
            "field_name": "manufacturer_name",
            "stats": {
                "total_values": 1431   <-- this is the total count of facet values
            }
        }
    ],
}

in v.26.0 response is

{
    "facet_counts": [
        {
            "counts": [
                {
                    "count": 456802,
                    "highlighted": "A",
                    "value": "A"
                },
                {
                    "count": 401461,
                    "highlighted": "B",
                    "value": "B"
                },
		[... max_facet_value objects]
            ],
            "field_name": "name",
            "sampled": false,
            "stats": {
                "total_values": 10 <-- this is max_facet_values value we are sending in request or set default 10
            }
        }
    ],
}

Is this an intentional change because I have no other way to fetch total count of facet values without searching.
I tried this stack overflow's answer and it ended up in having unacceptable response time.

Typesense Version: v.26
OS: Ubuntu/MacOS

@kishorenc
Copy link
Member

We introduced a different way to do faceting in v26 which might have caused a regression here. Will investigate and update.

@kishorenc
Copy link
Member

Typesense supports two strategies for efficient faceting, and has some built-in heuristics to pick the right strategy for you by default. In 27.0.rc11 RC build, we have introduced a new query parameter called facet_strategy that will allow you to configure the faceting behavior.

To fix the issue you are facing, please send this additional search parameter:

"facet_strategy": "exhaustive"

This will force Typesense to compute facets in an exhaustive manner and allows the total_values to be exact.

Additional details

If you find that faceting is slow for your type of query patterns and shape of your dataset, you can use the facet_strategy parameter to override the default strategy, to fine tune performance.

The valid values for this parameter are exhaustive, top_values and automatic.

exhaustive - in this strategy, once we have the list of matching documents, we’ll simply iterate through each document’s facet_by fields, and sum up the number of documents for each unique facet value. This is effective when the number of documents is small (less than few tens of thousands of docs) and/or when the number of facet values requested (as defined by max_facet_values) is large.

top_values - in this strategy, once we have the list of matching documents, we’ll look up each facet field’s value in a reverse index that stores a mapping of {facet_field_value => [list of all documents that have this value]}. We’ll then find the intersection of these two lists of documents (the list of matching documents and the list of all documents that have this facet field value), and the length of the intersected list will give us the facet count. This strategy is efficient if we have a large number of hits, since we only have to do intersections on the top facet values (the values that have the largest number of documents in the reverse index). However, if the number of facet values to fetch (as configured by max_facet_values) is sufficiently large and the number of hits is small, then this strategy becomes less efficient, compared to the iterate_count strategy. Another downside of this approach is that it will not return an exact count for total_values in the facet stats because we only consider only consider limited number of facets for facet count intersections.

automatic - Typesense will pick an ideal strategy based on the heuristics described above and is the default value for this parameter.

You can specify a strategy for all facet fields in the query via:

"facet_strategy": "exhaustive"

or you can specify a different strategy for each field by using a comma separated list of field names that match the order of field names in facet_by. So for eg, if you have facet_by: field1, field2, field3 and facet_strategy: automatic, exhaustive, top_values, field1 will use the automatic mode, field2 will use the exhaustive mode and field3 will use the top_values mode.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

2 participants