perf: small fix for the insert-method of pgvector #1004

ArneJanning · 2024-02-14T17:14:58Z

Please describe the purpose of this pull request.
Thankfully @sarahwooders provided a fix for #988 , but assumed a fixed chunk size of 1,000 for pg8000 which slows down the ingestion-process significantly and is not future-proof.

So I added a small method to get the optimal chunk size based on the number of columns in the database.

How to test
memgpt load directory --name <some_name> --input-files=<very_large_input_files>

Have you tested this PR?
I tested it with several very large PDFs and TXTs (around 1.000 pages of scientific content).

Related issues or PRs
#988

Is your PR over 500 lines of code?
No.

sarahwooders

Thanks for adding this!

A minor nit - could you please set a self.insert_chunk_size = self.get_optimal_chunk_size in the class initialization, so we can avoid calling the function on every insert?

Also, would you be able to write a test to ensure that the chunking is working properly? Unfortunately all our tests user very small data, so I dont think this functionality will be covered. Maybe you could generate a long list of Passage objects and insert them?

Also, could you please just run the formatter with poetry run black --check . -l 140?

small fix for the insert-method of pgvector

f7edc9e

ArneJanning mentioned this pull request Feb 14, 2024

While storing vectors into pgvector: "struct.error: 'h' format requires -32768 <= number <= 32767" #988

Closed

2 tasks

sarahwooders self-requested a review February 14, 2024 18:14

sarahwooders changed the title ~~small fix for the insert-method of pgvector~~ perf: small fix for the insert-method of pgvector Feb 14, 2024

sarahwooders requested changes Feb 14, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: small fix for the insert-method of pgvector #1004

perf: small fix for the insert-method of pgvector #1004

ArneJanning commented Feb 14, 2024 •

edited

sarahwooders left a comment

perf: small fix for the insert-method of pgvector #1004

Are you sure you want to change the base?

perf: small fix for the insert-method of pgvector #1004

Conversation

ArneJanning commented Feb 14, 2024 • edited

sarahwooders left a comment

Choose a reason for hiding this comment

ArneJanning commented Feb 14, 2024 •

edited