Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] feat: add support for LanceDB as vector db provider #210

Closed
wants to merge 21 commits into from

Conversation

sp6370
Copy link
Contributor

@sp6370 sp6370 commented Mar 26, 2024

Closes #238

Ellipsis 🚀 This PR description was created by Ellipsis for commit 0d085e0.

Summary:

This PR adds support for LanceDB as a vector database provider by updating dependencies, modifying the list of supported providers, and adding a new LanceDB class.

Key points:

  • Added lancedb as an optional dependency in pyproject.toml
  • Included lancedb in the list of supported providers in VectorDBProvider class in r2r/core/providers/vector_db.py
  • Added a new file r2r/vector_dbs/lancedb/base.py defining the LanceDB class that extends VectorDBProvider

Generated with ❤️ by ellipsis.dev

Copy link

vercel bot commented Mar 26, 2024

@sp6370 is attempting to deploy a commit to the Sciphi-Team Team on Vercel.

A member of the Team first needs to authorize it.

@sp6370 sp6370 marked this pull request as draft March 26, 2024 12:26
Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Changes requested.

  • Reviewed the entire pull request up to 0d085e0
  • Looked at 98 lines of code in 3 files
  • Took 1 minute and 40 seconds to review
More info
  • Skipped 0 files when reviewing.
  • Skipped posting 0 additional comments because they didn't meet confidence threshold of 50%.

Workflow ID: wflow_5hPttOs1lwtkvqhj


Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. We'll respond in a few minutes. Learn more here.

) -> None:
self.collection_name = collection_name
try:
import pyarrow # TODO ADD Dependency
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pyarrow package is imported here but it is not listed as a dependency in the pyproject.toml file. Please add it to the list of dependencies to ensure it is installed in the environment where this code is run.

except Exception:
# TODO - Handle more appropriately - create collection fails when it already exists
# https://lancedb.github.io/lancedb/python/python/#lancedb.db.DBConnection.create_table
print(Exception)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exception handling here is not providing useful information for debugging. Instead of printing the Exception class, print the actual exception instance e to get more detailed information about what went wrong.

# TODO - Handle more appropriately - create collection fails when it already exists
# https://lancedb.github.io/lancedb/python/python/#lancedb.db.DBConnection.create_table
print(Exception)
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exception handling here could lead to silent failures which are hard to debug. Instead of simply passing when an exception occurs, consider logging the exception or re-raising it after logging.

@jcllobet
Copy link
Contributor

Interesting! Sharing the specs:

  • Integrate LanceDB and local LLaMA support (gguf / OLLaMA) to enable running pipelines fully locally without relying on cloud services. Demonstrate setting this up and running everything locally.
  • Since LanceDB just launched lanceDB cloud we could also support that.

@sp6370

This comment was marked as resolved.

@AyushExel
Copy link

@sp6370 sounds good

@sp6370

This comment was marked as resolved.

@AyushExel
Copy link

please add @raghavdixit99 too.. He owns integrations and has some ideas for this integration as well

pyarrow.field(
"vector", pyarrow.list_(pyarrow.float32(), dimension)
),
# TODO Handle storing metadata
Copy link
Contributor Author

@sp6370 sp6370 Apr 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AyushExel I need a column in the table to store metadata information associated with the vector.

Metadata has the following type:

MetadataValues = Union[str, int, float, bool, List[str]]
Metadata = Dict[str, MetadataValues]

How can I achieve this with Pyarrow/LanceDB?

@sp6370 sp6370 closed this May 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Integrate LanceDB
3 participants