Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

faiss index and retriever not able to save #3429

Closed
kashiftriffort opened this issue May 13, 2024 · 1 comment
Closed

faiss index and retriever not able to save #3429

kashiftriffort opened this issue May 13, 2024 · 1 comment

Comments

@kashiftriffort
Copy link

I am using faiss index for storing of embedding and text. Embedding is been created and we are saving chunks of embedding. However I am unable to save embedding and text retriever into some kind of numpy or pickle. Also index response is slow. Below is the code.

def generate_text_embedding_pair(list_of_texts, list_of_embedding):
    text_embedding_pairs = list(zip(list_of_texts, list_of_embedding))
    return text_embedding_pairs

def generate_array(list_of_embedding):
    embedding_array = np.array(list_of_embedding)
    print(embedding_array.shape)
    return embedding_array


g2m_packages_documents_quantizer = faiss.IndexFlatL2(1024)
g2m_packages_documents_retriever = faiss.IndexIVFFlat(g2m_packages_documents_quantizer, 1024, 50)

g2m_packages_documents_pair = []

def getembedding(list_of_chunks):
    quantizer = faiss.IndexFlatL2(1024)
    retriever = faiss.IndexIVFFlat(quantizer, 1024, 50)
    
    text_pair = []
    list_of_texts = [chunk.page_content for chunk in list_of_chunks]
    for i in range(0, len(list_of_texts), 512):
        chunk_batch = list_of_texts[i:i + 512]
        embedding_response = create_embedding(chunk_batch)        
        list_of_embedding = create_list_of_embedding(embedding_response)
        text_embedding_pairs = generate_text_embedding_pair(chunk_batch, list_of_embedding)
        text_pair.append(text_embedding_pairs)
        embedding_array = generate_array(list_of_embedding)
        print(retriever.is_trained)
        retriever.train(embedding_array)
        retriever.add(embedding_array)
        print(retriever.ntotal)
    return retriever, text_pair

g2m_packages_documents_retriever, g2m_packages_documents_pair = getembedding(g2m_packages_documents)
@kashiftriffort kashiftriffort changed the title faiss index is slow in notebook faiss index and retriever not able to save May 13, 2024
@mlomeli1
Copy link
Contributor

Currently, you are not saving the index anywhere in your code, so this is not reproducible. Furthermore, the rest of the code is also not executable. I would recommend two things: i) create a fully executable reproducible toy example where you use faiss.write_index(<INDEX_OBJECT>,<INDEX_PATH>) to understand how this method works. ii) Try to integrate it into your code - you can still post here but please be sure to paste the full code otherwise we cannot help much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants