Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot debug similarity search #3378

Closed
ssdidis opened this issue Apr 20, 2024 · 1 comment
Closed

Cannot debug similarity search #3378

ssdidis opened this issue Apr 20, 2024 · 1 comment

Comments

@ssdidis
Copy link

ssdidis commented Apr 20, 2024

am trying to build a similarity search in python, cannot debug the function:

def perform_similarity_search(query_text, index, embeddings, top_k=5):
"""Perform similarity search in the FAISS index for a given query text."""
# Use the embeddings object to embed the query_text into a vector.
# Ensure the text is passed as a list and the result is accessed correctly.
query_vector = embeddings.encode([query_text])
# Reshape the query_vector for compatibility with FAISS search method if necessary.
# FAISS expects the query vector to be a 2D array.
if len(query_vector.shape) == 1:
query_vector = query_vector.reshape(1, -1)

# Search the index using the reshaped query_vector.
distances, indices = index.search(query_vector, top_k)  # Search the index for the top_k closest vectors
return distances, indices

def run_indexing_pipeline():
documents = fetch_documents(documents_dir)
text_chunks = divide_documents_into_text_chunks(documents)
embeddings_model = prepare_embeddings()
faiss_index = build_and_store_faiss_index(text_chunks, embeddings_model, faiss_db_path)

# Example query for testing purposes
query = "Enter some example text here"
distances, indices = perform_similarity_search(query, faiss_index, embeddings_model)
print("Distances:", distances)
print("Indices:", indices)

def perform_similarity_search(query_text, index, embeddings, top_k=5):
"""Perform similarity search in the FAISS index for a given query text."""
# Use the embeddings object to embed the query_text into a vector.
# Ensure the text is passed as a list and the result is accessed correctly.
query_vector = embeddings.encode([query_text])
# Reshape the query_vector for compatibility with FAISS search method if necessary.
# FAISS expects the query vector to be a 2D array.
if len(query_vector.shape) == 1:
query_vector = query_vector.reshape(1, -1)

# Search the index using the reshaped query_vector.
distances, indices = index.search(query_vector, top_k)  # Search the index for the top_k closest vectors
return distances, indices

ERRORS:
Traceback (most recent call last):
File "/home/ubuntu/new_d.py", line 61, in
run_indexing_pipeline()
File "/home/ubuntu/new_d.py", line 56, in run_indexing_pipeline
distances, indices = perform_similarity_search(query, faiss_index, embeddings_model)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/new_d.py", line 36, in perform_similarity_search
query_vector = embeddings.encode([query_text])

This is error being shown, pls let me know how I can correct it

@mlomeli1
Copy link
Contributor

it looks like in this pipeline, the function build_and_store_faiss_index() is a wrapper that calls the faiss library. However, the rest of the functions are either user-defined or come from some other library - can't really tell because your code is not reproducible. Your error says you have a problem in embeddings.encode([query_text]) which is probably not using faiss since the core faiss does not support embedding text @ssdidis so this is out of scope for us.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants