Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use skip index for labels #100

Open
R-omk opened this issue Feb 1, 2022 · 5 comments
Open

Use skip index for labels #100

R-omk opened this issue Feb 1, 2022 · 5 comments
Assignees
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@R-omk
Copy link

R-omk commented Feb 1, 2022

In this case, the index type 'tokenbf_v1' is suitable.
Before extracting the field from a json object, it need to check the presence of the key or the value using the hasToken function.
It also need to pay attention that word breaks are not allowed in tokens for indexing.

@lmangani lmangani added enhancement New feature or request help wanted Extra attention is needed labels Feb 6, 2022
@adubovikov
Copy link
Collaborator

Hi @R-omk ,

sorry for late reply, yes this is a good idea, the problem is that size of this index will be huge and insert will be a bit slower..
let us play around it and make it optional.

@R-omk
Copy link
Author

R-omk commented Feb 8, 2022

let us play around it and make it optional.

Yes, it's a good decision.

I also noticed that you can use an min/max index for the timestamp field, you can create a separate issue or include it as part of this one.

@R-omk
Copy link
Author

R-omk commented Feb 8, 2022

the problem is that size of this index will be huge and insert will be a bit slower

By the way, this is quite consistent with the general logging recommendation that the number of unique combination should be small. Therefore, this is not a big deal, since a new entry of labels will actually be added only once per partition (one per day for example).

@adubovikov
Copy link
Collaborator

yes, but lets think big and expect a lot of uniq tags from different sources :-) I.e. voice / IoT that includes IP/hostname.

@adubovikov
Copy link
Collaborator

but you are right - the update/insert should(MUST) not be offten.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

4 participants