Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: added voyager in backend #1846

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

darshi1337
Copy link

See issue #1824.

@JoanFM
Copy link
Member

JoanFM commented Jan 12, 2024

Hello @darshi1337 ,

We need for you to sign the DCO agreement by signing off your commits

@JoanFM
Copy link
Member

JoanFM commented Jan 12, 2024

Thanks for the contribution,

since it seems to be a work in progress (no tests, print statements, etc...) I will put it as a Draft.

@JoanFM JoanFM marked this pull request as draft January 12, 2024 07:36
@darshi1337
Copy link
Author

Ohh forgot to ask something
Anything you think that I should add or improve before writing print statements and tests

Signed-off-by: darshi1337 <priyadarshi.annupam.mec22@itbhu.ac.in>
Copy link
Member

@JoanFM JoanFM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not see any obvious missing thing. Just to clarify what is possible and not possible with the backend.

docarray/index/backends/voyager.py Show resolved Hide resolved
if columns_str:
columns_str = ', ' + columns_str

query = f'CREATE TABLE IF NOT EXISTS docs (doc_id INTEGER PRIMARY KEY, data BLOB{columns_str})'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this make that column data will not be filterable at all?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SQL query provided creates a table named docs with a data column defined as BLOB (Binary Large Object). The BLOB data type is generally used for storing binary data, and it does not inherently support filtering or indexing.

If the intention is to make the data column filterable, I might want to consider using a different data type based on the nature of the data being stored. For example, if the data column contains text-based information, changing the data type to TEXT could be more appropriate.

Here's an example modification to the query:

if columns_str:
    columns_str = ', ' + columns_str

query = f'CREATE TABLE IF NOT EXISTS docs (doc_id INTEGER PRIMARY KEY, data TEXT{columns_str})'

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is what is done in HNSWDocumentIndex I believe

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes like something similar

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be good indeed to have it

Copy link

codecov bot commented Jan 12, 2024

Codecov Report

Attention: 116 lines in your changes are missing coverage. Please review.

Comparison is base (104b403) 85.06% compared to head (875d2c3) 84.01%.

Files Patch % Lines
docarray/index/backends/voyager.py 0.00% 116 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1846      +/-   ##
==========================================
- Coverage   85.06%   84.01%   -1.06%     
==========================================
  Files         136      137       +1     
  Lines        9260     9376     +116     
==========================================
  Hits         7877     7877              
- Misses       1383     1499     +116     
Flag Coverage Δ
docarray 84.01% <0.00%> (-1.06%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

if columns_str:
columns_str = ', ' + columns_str

query = f'CREATE TABLE IF NOT EXISTS docs (doc_id INTEGER PRIMARY KEY, data BLOB{columns_str})'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is what is done in HNSWDocumentIndex I believe

docarray/index/backends/voyager.py Show resolved Hide resolved
@JoanFM
Copy link
Member

JoanFM commented Feb 8, 2024

Hello @darshi1337 ,

is there any progress to be expected here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants