faiss index and retriever not able to save #3429

kashiftriffort · 2024-05-13T06:29:46Z

I am using faiss index for storing of embedding and text. Embedding is been created and we are saving chunks of embedding. However I am unable to save embedding and text retriever into some kind of numpy or pickle. Also index response is slow. Below is the code.

def generate_text_embedding_pair(list_of_texts, list_of_embedding):
    text_embedding_pairs = list(zip(list_of_texts, list_of_embedding))
    return text_embedding_pairs

def generate_array(list_of_embedding):
    embedding_array = np.array(list_of_embedding)
    print(embedding_array.shape)
    return embedding_array


g2m_packages_documents_quantizer = faiss.IndexFlatL2(1024)
g2m_packages_documents_retriever = faiss.IndexIVFFlat(g2m_packages_documents_quantizer, 1024, 50)

g2m_packages_documents_pair = []

def getembedding(list_of_chunks):
    quantizer = faiss.IndexFlatL2(1024)
    retriever = faiss.IndexIVFFlat(quantizer, 1024, 50)
    
    text_pair = []
    list_of_texts = [chunk.page_content for chunk in list_of_chunks]
    for i in range(0, len(list_of_texts), 512):
        chunk_batch = list_of_texts[i:i + 512]
        embedding_response = create_embedding(chunk_batch)        
        list_of_embedding = create_list_of_embedding(embedding_response)
        text_embedding_pairs = generate_text_embedding_pair(chunk_batch, list_of_embedding)
        text_pair.append(text_embedding_pairs)
        embedding_array = generate_array(list_of_embedding)
        print(retriever.is_trained)
        retriever.train(embedding_array)
        retriever.add(embedding_array)
        print(retriever.ntotal)
    return retriever, text_pair

g2m_packages_documents_retriever, g2m_packages_documents_pair = getembedding(g2m_packages_documents)

The text was updated successfully, but these errors were encountered:

mlomeli1 · 2024-05-13T15:46:59Z

Currently, you are not saving the index anywhere in your code, so this is not reproducible. Furthermore, the rest of the code is also not executable. I would recommend two things: i) create a fully executable reproducible toy example where you use faiss.write_index(<INDEX_OBJECT>,<INDEX_PATH>) to understand how this method works. ii) Try to integrate it into your code - you can still post here but please be sure to paste the full code otherwise we cannot help much.

kashiftriffort changed the title ~~faiss index is slow in notebook~~ faiss index and retriever not able to save May 13, 2024

mlomeli1 added the cant-repro label May 13, 2024

mlomeli1 closed this as completed May 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

faiss index and retriever not able to save #3429

faiss index and retriever not able to save #3429

kashiftriffort commented May 13, 2024

mlomeli1 commented May 13, 2024

faiss index and retriever not able to save #3429

faiss index and retriever not able to save #3429

Comments

kashiftriffort commented May 13, 2024

mlomeli1 commented May 13, 2024