You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm using Chroma in a Python chat type of app in order to store what could be considered entities and to do RAG on a few hundred documents. This data is mostly static - it updates very rarely, and when it does, by very little. Think a few new entities/keywords every hour and/or a couple more articles for RAG per day. However, every time I run the import scripts, even at 1 minute intervals, the SQLite DB grows by 50-100%. For example:
run 1: from empty db to 35 MB
run 2 (a few minutes later): 62 MB
run 3 (a few mins later): 89 MB
run 4 (a few mins later): 113 MB
I haven't diffed the data as it's coming from multiple sources, but I expect the data was 99.99% identical on every import.
The issue is that the db grows very fast (it was 3 GB in size in production after a few days) and Chroma becomes impossible to use (it clogs all the CPU cores and never fetches the data at that size).
PS - looking at the expansion rate, seems to grow by more or less the initial 35 MB.
I suspect that most of the expansion here is coming from the WAL. unfortunately we don't have first party support for cleaning the WAL right now but @tazarov has some community supported tools.
What happened?
I'm using Chroma in a Python chat type of app in order to store what could be considered entities and to do RAG on a few hundred documents. This data is mostly static - it updates very rarely, and when it does, by very little. Think a few new entities/keywords every hour and/or a couple more articles for RAG per day. However, every time I run the import scripts, even at 1 minute intervals, the SQLite DB grows by 50-100%. For example:
I haven't diffed the data as it's coming from multiple sources, but I expect the data was 99.99% identical on every import.
The issue is that the db grows very fast (it was 3 GB in size in production after a few days) and Chroma becomes impossible to use (it clogs all the CPU cores and never fetches the data at that size).
PS - looking at the expansion rate, seems to grow by more or less the initial 35 MB.
Versions
chromadb 0.4.24
python 3.10.10
LSB Version: :core-4.1-amd64:core-4.1-noarch
Distributor ID: CentOS
Description: CentOS Linux release 7.9.2009 (Core)
Release: 7.9.2009
Relevant log output
No response
The text was updated successfully, but these errors were encountered: