Add an example application about how to properly deal with stale documents on the vector database #612

eolivelli · 2023-10-18T12:13:50Z

All the example applications that we currently have don't show how to deal with these two common issues:

Shorter pages

When you re-index a website then new version of the page may be shorter, so with less chunks.
You can override the chunks with lower ids, but you keep the old chunks with higher ids.
We need to show how to remove stale chunks

Pages that disappeared

This is trickier. When you know that you are re-indexing the whole corpus of documents (for instance a whole website) you should drop the documents that are no more available, the risks are to have outdated documents or to have duplicate content (in case of a page that has been renamed)

eolivelli · 2023-10-20T08:30:35Z

The first part has been delivered in the 0.3.0 release

eolivelli added this to the 0.3.0 milestone Oct 20, 2023

eolivelli closed this as completed Oct 20, 2023

eolivelli reopened this Oct 20, 2023

eolivelli removed this from the 0.3.0 milestone Oct 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add an example application about how to properly deal with stale documents on the vector database #612

Add an example application about how to properly deal with stale documents on the vector database #612

eolivelli commented Oct 18, 2023

eolivelli commented Oct 20, 2023

Add an example application about how to properly deal with stale documents on the vector database #612

Add an example application about how to properly deal with stale documents on the vector database #612

Comments

eolivelli commented Oct 18, 2023

Shorter pages

Pages that disappeared

eolivelli commented Oct 20, 2023