Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vectorstore/lancedb #889

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

Conversation

akashAD98
Copy link

  • What kind of change does this PR introduce? (Bug fix, feature, docs update, ...)
    added lancedb as vector store

  • Why was this change needed? (You can also link to an open issue here)
    Lancedb is a serverless vector database for AI applications. Easily add long-term memory to your LLM apps

Copy link

vercel bot commented Mar 24, 2024

@akashAD98 is attempting to deploy a commit to the Arc53 Team on Vercel.

A member of the Team first needs to authorize it.

Copy link

codecov bot commented Mar 24, 2024

Codecov Report

Attention: Patch coverage is 52.63158% with 9 lines in your changes are missing coverage. Please review.

Project coverage is 20.19%. Comparing base (3c49206) to head (8c4f96d).
Report is 30 commits behind head on main.

❗ Current head 8c4f96d differs from pull request most recent head 187d7be. Consider uploading reports for the commit 187d7be to get more accurate results

Files Patch % Lines
application/vectorstore/lancedb.py 50.00% 9 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #889      +/-   ##
==========================================
+ Coverage   20.00%   20.19%   +0.18%     
==========================================
  Files          72       73       +1     
  Lines        3264     3283      +19     
==========================================
+ Hits          653      663      +10     
- Misses       2611     2620       +9     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

vercel bot commented Mar 27, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
docs-gpt ✅ Ready (Inspect) Visit Preview 💬 Add feedback Mar 27, 2024 7:12pm

@akashAD98
Copy link
Author

akashAD98 commented Mar 28, 2024

any chance to merge it @ajaythapliyal @dartpain

@siiddhantt
Copy link
Collaborator

Thanks @akashAD98 for your contribution and the effort you've put into this PR. I've taken a look at the code, and I have a few points I think we should address before merging:

  • Error Handling for docs_init: It's crucial to handle cases where docs_init is not provided. Without proper initialisation, the object won't be instantiated correctly, leading to potential issues with other methods dependent on it.
  • Missing Configuration Variables from Settings: It seems like there's a gap in providing necessary configuration variables from settings. This could limit the flexibility and usability of the module.
  • Implementation of delete_index method.


Overall, addressing these points will enhance the robustness and usability of the module. Looking forward to your updates!

@akashAD98
Copy link
Author

akashAD98 commented Apr 4, 2024

Thanks for reply & review
@siiddhantt 1. done
2. I'm already importing settings. & there is no ant other configuration needed
3. lancdb doesn't have support for delete methods as of now. when I check their langchain integration

@akashAD98
Copy link
Author

@siiddhantt @dartpain

@dartpain
Copy link
Contributor

My concern is how can we pass URI for example to make it work.

 uri = "data/sample-lancedb"
db = lancedb.connect(uri)

Check out the quickstart
https://lancedb.github.io/lancedb/basic/#installation

@akashAD98
Copy link
Author

akashAD98 commented May 5, 2024

ALSO we can pass (uri="/tmp/lancedb")

from langchain_community.vectorstores import LanceDB
from application.vectorstore.base import BaseVectorStore
from application.core.settings import settings

class LancedbStore(BaseVectorStore):
    def __init__(self, uri, embeddings_key):
        super().__init__()
        self.uri = uri
        self.embeddings_key = embeddings_key
        self.docsearch = None  
        
        # Initialize the embeddings using the provided key
        embeddings = self._get_embeddings(settings.EMBEDDINGS_NAME, self.embeddings_key)
        
        # Initialize LanceDB with the appropriate URI and embeddings
        self.docsearch = LanceDB(
            uri=self.uri,
            embedding=embeddings,
            api_key=settings.LANCE_API_KEY,  # Assuming API Key is managed in settings
            region=settings.LANCE_REGION    # Assuming Region is managed in settings
        )
    
    def search(self, query, k=5, **kwargs):
        # Perform a similarity search using LanceDB
        if self.docsearch:
            return self.docsearch.similarity_search(query=query, k=k, **kwargs)
        else:
            raise ValueError("LanceDB instance is not initialized.")
    
    def add_texts(self, texts, metadatas=None, ids=None, **kwargs):
        # Add texts to the LanceDB instance
        if self.docsearch:
            return self.docsearch.add_texts(texts, metadatas=metadatas, ids=ids, **kwargs)
        else:
            raise ValueError("LanceDB instance is not initialized.")
    
    def delete(self, ids=None, delete_all=False, filter=None, drop_columns=None, name=None, **kwargs):
        # Delete documents from the LanceDB instance
        if self.docsearch:
            self.docsearch.delete(ids=ids, delete_all=delete_all, filter=filter, drop_columns=drop_columns, name=name, **kwargs)
        else:
            raise ValueError("LanceDB instance is not initialized.")
    
    def save_local(self, *args, **kwargs):
        # Currently, it's just a placeholder as LanceDB operations are handled internally
        pass


can be done like this ? havnt tested but just approch @dartpain

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
application Application
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

None yet

3 participants