Vector Embedding Database not persisting when using YAML configuration #600

vnguye65 · 2023-11-15T04:05:16Z

I have the following workflow configuration with subindices for two different datasets.

workflow.yaml

writable: true
path: vector-database

embeddings:
  content: true
  defaults: false
  indexes: 
      document:
          path: sentence-transformers/multi-qa-mpnet-base-dot-v1
          tokenize: true
          columns:
              text: document
      csv: 
           path: sentence-transformers/multi-qa-mpnet-base-dot-v1
           tokenize: true
           columns:
               text: csv

After adding the data, app.count() returns 1. However, when this data doesn't persist when the session is refreshed. app.count() returns 0 when run in another separate environment.

from txtai.app import Application

app = Application("search-workflow.yaml")
app.add([{'document': 'dummy data 1',
            'csv': 'dummy data 2'}])
app.upsert()
app.count()

@davidmezzetti Could you please confirm if I am missing anything in the code and suggest what we could do to persist data?

The text was updated successfully, but these errors were encountered:

vnguye65 · 2023-11-15T21:50:06Z

I'm using this search workflow as an intermediate step in another workflow in a RAG solution. The search workflow is not able to find the data in the vector database to retrieve relevant texts.

davidmezzetti · 2023-11-17T17:49:02Z

Sorry for the delayed response. I will try your config and let you know.

davidmezzetti · 2023-11-24T12:49:27Z

I ran the above configuration and it saves content to the vector-database directory.

Can you share more on search workflow? Once you load an embeddings database it doesn't automatically refresh.You would need a way to reload the read-only search index when data is loaded. Could this be the issue?

vnguye65 · 2023-12-01T16:03:21Z

Sorry for the delay. Yes, that seems to be the issue. Looks like it doesn't automatically load in read-only search index.

davidmezzetti · 2023-12-02T00:49:44Z

Ok, thank you for confirming. I can think about a method to autodetect index changes and force a refresh. I think that would be a good addition.

davidmezzetti · 2023-12-02T00:52:06Z

One idea in the meantime to think about is if you can trigger anything with the read-only process when you update your index.

jbouder · 2023-12-12T22:16:30Z

Not sure if its the same thing, but with the same configuration above, i'm seeing situations where the index directories are not always created. I also noticed when I try to call index, it eventually errors. Should the index directories be added when txtai starts up or not until an index is called?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vector Embedding Database not persisting when using YAML configuration #600

Vector Embedding Database not persisting when using YAML configuration #600

vnguye65 commented Nov 15, 2023

vnguye65 commented Nov 15, 2023

davidmezzetti commented Nov 17, 2023

davidmezzetti commented Nov 24, 2023

vnguye65 commented Dec 1, 2023

davidmezzetti commented Dec 2, 2023

davidmezzetti commented Dec 2, 2023

jbouder commented Dec 12, 2023

Vector Embedding Database not persisting when using YAML configuration #600

Vector Embedding Database not persisting when using YAML configuration #600

Comments

vnguye65 commented Nov 15, 2023

vnguye65 commented Nov 15, 2023

davidmezzetti commented Nov 17, 2023

davidmezzetti commented Nov 24, 2023

vnguye65 commented Dec 1, 2023

davidmezzetti commented Dec 2, 2023

davidmezzetti commented Dec 2, 2023

jbouder commented Dec 12, 2023