Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very slow for moderate number of embeddings #106

Open
Nintorac opened this issue Oct 24, 2023 · 2 comments
Open

Very slow for moderate number of embeddings #106

Nintorac opened this issue Oct 24, 2023 · 2 comments

Comments

@Nintorac
Copy link

Here is a visual on how ingest time scales versus number of embeddings. If I log both axis' it looks approximately linear.

I also noticed that there only seems to be a single thread running for the entire duration of the ingest.

I am using embed dings with dimension 2560.

image

I am using python and have installed sqlite-vss via pip if that makes a difference

@asg017
Copy link
Owner

asg017 commented Dec 8, 2023

Do you happen to have the code you used to ingest embeddings into sqlite-vss? It shouldn't take 30 mins to insert 30k vectors. I suspect there's a number fixes that could be made to make it much faster, including:

  • Insert all vectors in one transaction (surround with BEGIN and COMMIT)
  • Avoid execute() and prefer executemany() if in Python
  • Insert vectors in all one go (depends on the source of your vectors)

Also depends if you're using a custom factory or now, so any example code would be great!

@Nintorac
Copy link
Author

i have lost the code sorry. If I remember right this was to create the index after all the data has been inserted

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants