Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for term alternatives in phrase. #2281

Open
npatsakula opened this issue Dec 16, 2023 · 8 comments
Open

Support for term alternatives in phrase. #2281

npatsakula opened this issue Dec 16, 2023 · 8 comments

Comments

@npatsakula
Copy link

npatsakula commented Dec 16, 2023

Hello! First of all, I want to thank you for your awesome work on tantivy, it's great!

Motivation

I'm trying to use quickwit/tantivy for non-structured log search and stuck with migration from Sphinx/Manticore.

Using Sphinx query syntax I can write something like:

level NEAR/0 (info | warn | error)

But with tantivy I need to expand this expression by hand:

"level info" OR "level warn" OR "level error"

If I have more than two elements (in alternative or in phrase), then writing and reading the query becomes more difficult:(

Solution

It would be great to have some syntactic sugar or rust low-level API (I can parse query by myself) to avoid such boilerplate. Also I would enjoy to implement this feature by myself, but I will need some help around query evaluation.

@PSeitz
Copy link
Contributor

PSeitz commented Dec 17, 2023

We support the sql like in parameter
level: IN [info warn error]

You may also have a look at quickwit, which is built on tantivy for log search.

@npatsakula
Copy link
Author

We support the sql like in parameter level: IN [info warn error]

You may also have a look at quickwit, which is built on tantivy for log search.

Hello, @PSeitz! Yes, but I work with non-structured logs (text logs with different formats from ~hundred sources): level is part of a body:(

@PSeitz
Copy link
Contributor

PSeitz commented Dec 17, 2023

That use case and syntax seems rather niche, so I'm not sure we would want to add that in the query parser. maybe @fulmicoton has an opinion on this

@adamreichold
Copy link
Contributor

Not a targetted solution, but wouldn't a RegexQuery be able to handle this? As an aside, I would actually be really interested in us exposing regex queries via the parser to make it easier for people to experiment with them, but I suspect the quoting/escaping will be somewhat messy.

@npatsakula
Copy link
Author

That use case and syntax seems rather niche

This use-case is one of the main reasons why Elastic Search Span API exists :)

Maybe I oversimplified the example, but analytical queries (e.g. incident detection) can consist of dozens of terms/alternatives.

we would want to add that in the query parser

Personally, I'm not sure about this as well (because of complexity for generic user). But it would be great to have ability to encode this information through some IR (like Elastic did with Span API).

Also I need to mention, that parsing is not the main problem with this issue: if we expand non-trivial phrase into alternative of trivial, we'll have $ \prod_{i=1}^n|p_i|, p \in [p_0, ..., p_n] (phrase)$ sub-queries. That's why this issue requires to modify query execution as well:(

be able to handle this

Hello, @adamreichold! Yes, it works.

@PSeitz
Copy link
Contributor

PSeitz commented Dec 17, 2023

So if I understand correctly, you would want a PhraseQuery that supports multiple terms. That's not supported currently and would probably make sense to add.

@fulmicoton
Copy link
Collaborator

@npatsakula is it for quickwit or tantivy?

@npatsakula
Copy link
Author

@fulmicoton, the ideal option would be to support the ES Span API in the quickwit, but it quite tricky and requires a lot of work (e.g. slop evaluation for non-trivial sub-queries). I assumed (maybe, by mistake) that change of that size and complexity would be unwanted in the quickwit mainstream, so I published smaller issue for tantivy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants