Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding single field (extending fieldlist) to document #170

Open
KeKs0r opened this issue Jul 26, 2022 · 4 comments · May be fixed by #187
Open

Adding single field (extending fieldlist) to document #170

KeKs0r opened this issue Jul 26, 2022 · 4 comments · May be fixed by #187

Comments

@KeKs0r
Copy link

KeKs0r commented Jul 26, 2022

Hi Luca,

I mentioned this single use case in another ticket, where we discussed several things: #106

I wanted to create a dedicated issue for this specific case.
I am powering a filter/search of a table with minisearch. And users are able to add and remove columns from that table. the recreation of the index is quite the performance hit on that page and I am trying to make reindexing more performant on data changes. One of these scenarios is adding of a column. Currently I am creating a complete new index, just to add or remove a single field.

It would be great, if there is an api to add or remove a single field from the index. Or event doing it for every individual document with the current value of the field would be fine.

@lucaong
Copy link
Owner

lucaong commented Jul 28, 2022

Hi @KeKs0r ,
thanks for opening the issue. Let me think about it and come back to you.

@lucaong
Copy link
Owner

lucaong commented Jul 28, 2022

In short, the main challenge is that MiniSearch internally uses short document IDs for various efficiency reasons. Currently, it does not keep a mapping from original ID to short ID (only the opposite way around), so it doesn't know how to add a field to an existing document identified by ID.

This can probably be implemented, making it possible to add or remove some fields to/from an existing document, but it should be done in a way that does not make the data structures much larger.

Note that updating in place instead, either whole document or just a field, will not possible. One has to first remove the old document (or field, if we implement this), then add the new one. While it can be cumbersome for the application developer to keep the old document around so it can be deleted, the alternative is worse: MiniSearch would have to copy each indexed field in each document (just referencing won't work, because the document can change in place), or at least keep the list of processed tokens for each indexed field, so it can de-index them upon removal/update. This would make MiniSearch use a lot more memory and be slower to index for everyone, even those who don't use this feature. It is also possible for application developers to implement it more efficiently, even though a bit cumbersome, by implementing some "copy on write" mechanism on the documents.

@KeKs0r
Copy link
Author

KeKs0r commented Jul 29, 2022

I think keeping the "old" document around in this case is not an issue. This is already the case for "data changes" anyways.
There are 2 changes in regards to Columns

  1. Adding a column: purely additive, and maybe we find a way to just extend the index with some values for existing document Ids. Here we just have the issue you described with the original ID -> short ID mapping
  2. Removing a Column: Maybe this does not have the issue with the ID mapping, because we want to remove all values for the column. So we don't know the original ID, just wether the index came from a certain field. (Which I am not sure if that is possible)

Is the original ID -> short ID function a hash or is it random?
Also if it has the mapping shortID -> original ID, it is maybe not as efficient, but maybe its fine to do a full lookup of ids, only for this use case.
This way, it would not effect memory consumption for all other use cases.

@lucaong
Copy link
Owner

lucaong commented Nov 23, 2022

The new version v6.0.0-beta.1 includes changes that would make this feature possible in the near future (at least adding a field). I will consider this for 6.1.0.

@lucaong lucaong linked a pull request Nov 27, 2022 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants