Randomness of MIlvus Collection #32733
-
I am working with the LangChain integration of Milvus and wanted to ask about how the Milvus HNSW algorithm works. Is there any degree of randomness introduced at any point of graph construction or search? Here is a general overview of the collection generation process that I am using:
I am not using any randomization in the insert process and have been using the same embedding model and question set for all tests, yet seem to be getting different search results each time I create a new collection. Does anyone know what the problem might be? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 10 replies
-
Some questions:
|
Beta Was this translation helpful? Give feedback.
I mean two different full test processes might get different segment size.
Let's say we want to test 1M entities with different index types.
Process 1: create collection A, insert data batch by batch, 10000 entities for each batch. create index, test search
Process 2: create collection B, insert data batch by batch, 500 entities for each batch. create index, test search
Assume the total data size is 500MB.
The process 1 might get these segments: 130MB + 130MB + 130MB + 110MB.
The process 2 might get these segments: 120MB + 120MB + 120MB + 140MB
Data distribution is different. Search result might be a bit different.
If you use the same collection to test, no additional data inserted. Yes, …