Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Support for Weaviate #74

Open
13 tasks
dhruv-anand-aintech opened this issue Apr 16, 2024 · 13 comments
Open
13 tasks

Add Support for Weaviate #74

dhruv-anand-aintech opened this issue Apr 16, 2024 · 13 comments

Comments

@dhruv-anand-aintech
Copy link
Member

dhruv-anand-aintech commented Apr 16, 2024

Follow the guidelines at https://github.com/AI-Northstar-Tech/vector-io#adding-a-new-vector-database to implement support for Weaviate in Vector-io

Join the Discord server for the library at https://discord.gg/RZbXha62Fg, and ask any questions on the #vector-io-dev channel.

Checklist of features for completion

  • Add mapping of distance metric names
  • Support local and cloud instances
  • Automatically create Python classes for index being exported
  • Export
    • Get all indexes by default
    • Option to Specify index names to export
    • DB-specific command line options (make_parser)
    • Allow input on terminal for each option above (via input() in python) export_vdb
    • Handle multiple vectors per row
  • Import
    • DB-specific command line options (make_parser)
    • Handle multiple vectors per row
    • Allow input on terminal for each option above (via input() in python) export_vdb
@dhruv-anand-aintech
Copy link
Member Author

/bounty $50

Copy link

algora-pbc bot commented Apr 16, 2024

💎 $50 bounty created by AI-Northstar-Tech
🙋 If you start working on this, comment /attempt #74 along with your implementation plan
👉 To claim this bounty, submit a pull request that includes the text /claim #74 somewhere in its body
📝 Before proceeding, please make sure you can receive payouts in your country
💵 Payment arrives in your account 2-5 days after the bounty is rewarded
💯 You keep 100% of the bounty award
🙏 Thank you for contributing to AI-Northstar-Tech/vector-io!

👉 Add a bountyShare on socials

Attempt Started (GMT+0) Solution
🟢 @abhishek818 Apr 21, 2024, 4:40:48 PM WIP
🟢 @emekaokoli19 May 2, 2024, 8:24:16 AM #88

@abhishek818
Copy link

abhishek818 commented Apr 21, 2024

/attempt #74

Algora profile Completed bounties Tech Active attempts Options
@abhishek818 3 bounties from 3 projects
JavaScript, TypeScript
Cancel attempt

@dhruv-anand-aintech
Copy link
Member Author

@abhishek818 have you made some progress on this task?

@abhishek818
Copy link

Yes, will raise a PR by tomorrow. Got busy in some other tasks currently.

@dhruv-anand-aintech
Copy link
Member Author

@abhishek818 can you message me on Linkedin (https://www.linkedin.com/in/dhruv-anand-ainorthstartech/)? Thanks

@emekaokoli19
Copy link

emekaokoli19 commented Apr 26, 2024

Hi @dhruv-anand-aintech, is this issue still open?

@dhruv-anand-aintech
Copy link
Member Author

Hi @emekaokoli19, I'm waiting for a response from @abhishek818 on this one. But I have several other tasks that might interest you. Please ping me on linkedin and we can chat.

@abhishek818
Copy link

abhishek818 commented Apr 26, 2024

Go ahead @emekaokoli19 for this issue, as of now I have got stuck in several other tasks.

@abhishek818
Copy link

abhishek818 commented Apr 26, 2024

@emekaokoli19 attaching my halft-hearted attempt here (which i left in between) to give you a little headstart (Refer python client for weaviate ) :
import script:

import json
from dotenv import load_dotenv
import numpy as np
from tqdm import tqdm
from grpc import RpcError
from typing import Any, Dict, List
from PIL import Image


import concurrent.futures

import weaviate
from weaviate.connect import ConnectionParams
from weaviate.classes.init import AdditionalConfig, Timeout
import os

from vdf_io.names import DBNames
from vdf_io.util import (
    expand_shorthand_path,
    get_qdrant_id_from_id,
    set_arg_from_input,
    set_arg_from_password,
)
from vdf_io.import_vdf.vdf_import_cls import ImportVDB
from vdf_io.meta_types import NamespaceMeta

load_dotenv()


class ImportWeaviate(ImportVDB):
    DB_NAME_SLUG = DBNames.WEAVIATE

    @classmethod
    def import_vdb(cls, args):
        """
        Import data to Weaviate
        """
        set_arg_from_input(
            args,
            "url",
            "Enter the URL of Weaviate instance (default: 'http://localhost:8099'): ",
            str,
            "http://localhost:8099",
        )
        set_arg_from_input(
            args,
            "http_secure",
            "Whether to use a secure channel for the http port. (default: False): ",
            bool,
            False,
        )
        set_arg_from_input(
            args,
            "grpc_host",
            "Enter the host to use for the gRPC connection (default: "localhost"): ",
            str,
            "localhost",
        )
        set_arg_from_input(
            args,
            "grpc_port",
            "Enter the port to use for the gRPC connection (default: 50052): ",
            int,
            50052,
        )
        set_arg_from_input(
            args,
            "grpc_secure",
            "Whether to use a secure channel for the underlying gRPC API. (default: False): ",
            bool,
            False,
        )
        set_arg_from_input(
            args,
            "timeout_init",
            "(Optional) Enter the timeout for connection initialization (value in secs, default: 10): ",
            int,
            10,
        )
        set_arg_from_input(
            args,
            "timeout_query",
            "(Optional) Enter the timeout for query (value in secs, default: 45): ",
            int,
            45,
        )
        set_arg_from_input(
            args,
            "timeout_insert",
            "(Optional) Enter the timeout for data insertion (value in secs, default: 120): ",
            int,
            120,
        )
        set_arg_from_password(
            args,
            "weaviate_api_key",
            "Enter your Weaviate API key: ",
            "WEAVIATE_API_KEY"
        )
        set_arg_from_password(
            args,
            "openai_api_key",
            "(Optional) Enter your Open AI API key: ",
            "OPEN_AI_API_KEY"
        )
        weaviate_import = ImportWeaviate(args)
        weaviate_import.upsert_data()
        return weaviate_import

    @classmethod
    def make_parser(cls, subparsers):
        parser_qdrant = subparsers.add_parser(
            DBNames.WEAVIATE, help="Import data to Weaviate"
        )
        parser_qdrant.add_argument(
            "-u",
            "--url",
            type=str,
            help="Weaviate instance url",
            default="http://localhost:8099",
        )
        parser_qdrant.add_argument(
            "--http_secure",
            type=bool,
            help="Whether to use a secure channel for the http port",
            default=False,
        )
        parser_qdrant.add_argument(
            "--grpc_host",
            type=str,
            help="Host for the gRPC connection",
            default="localhost",
        )
        parser_qdrant.add_argument(
            "--grpc_port",
            type=int,
            help="Port for the gRPC connection",
            default=50052,
        )
        parser_qdrant.add_argument(
            "--grpc_secure",
            type=bool,
            help="Whether to use a secure channel for the underlying gRPC API.",
            default=False,
        )
        parser_qdrant.add_argument(
            "--timeout_init",
            type=int,
            help="Timeout for connection initialization (in secs)",
            default=10,
        )
        parser_qdrant.add_argument(
            "--timeout_query",
            type=int,
            help="Timeout for query (in secs)",
            default=45,
        )
        parser_qdrant.add_argument(
            "--timeout_insert",
            type=int,
            help="Timeout for data insertion (in secs)",
            default=120,
        )

    def __init__(self, args):
        # call super class constructor
        super().__init__(args)
        url, http_secure, grpc_host, grpc_port, grpc_secure, timeout_init, timeout_query, timeout_insert,
        weaviate_api_key, openai_api_key
        self.client = weaviate.WeaviateClient(
            connection_params=(
                weaviate.connect.base.ConnectionParams.from_url(
                    url=url,
                    http_secure=http_secure,
                    grpc_host=grpc_host,
                    grpc_port=grpc_port,
                    grpc_secure=grpc_secure)
            ),
            auth_client_secret=weaviate.auth.AuthApiKey(weaviate_api_key),
            additional_headers={
                "X-OpenAI-Api-Key": openai_api_key
                if openai_api_key
                else None
            },
            additional_config=AdditionalConfig(
                timeout=Timeout(init=timeout_init, query=timeout_query, insert=timeout_insert),
            ),
        )

    def upsert_data(self):
        max_hit = False
        total_imported_count = 0
        # we know that the self.vdf_meta["indexes"] is a list
        index_meta: Dict[str, List[NamespaceMeta]] = {}
        for index_name, index_meta in tqdm(
            self.vdf_meta["indexes"].items(), desc="Importing indexes"
        ):
        ....

@emekaokoli19
Copy link

@emekaokoli19 attaching my halft-hearted attempt here (which i left in between) to give you a little headstart (Refer python client for weaviate ) : import script:

import json
from dotenv import load_dotenv
import numpy as np
from tqdm import tqdm
from grpc import RpcError
from typing import Any, Dict, List
from PIL import Image


import concurrent.futures

import weaviate
from weaviate.connect import ConnectionParams
from weaviate.classes.init import AdditionalConfig, Timeout
import os

from qdrant_client.http.exceptions import UnexpectedResponse
from qdrant_client.http.models import VectorParams, Distance, PointStruct

from vdf_io.names import DBNames
from vdf_io.util import (
    expand_shorthand_path,
    get_qdrant_id_from_id,
    set_arg_from_input,
    set_arg_from_password,
)
from vdf_io.import_vdf.vdf_import_cls import ImportVDB
from vdf_io.meta_types import NamespaceMeta

load_dotenv()


class ImportWeaviate(ImportVDB):
    DB_NAME_SLUG = DBNames.WEAVIATE

    @classmethod
    def import_vdb(cls, args):
        """
        Import data to Weaviate
        """
        set_arg_from_input(
            args,
            "url",
            "Enter the URL of Weaviate instance (default: 'http://localhost:8099'): ",
            str,
            "http://localhost:8099",
        )
        set_arg_from_input(
            args,
            "http_secure",
            "Whether to use a secure channel for the http port. (default: False): ",
            bool,
            False,
        )
        set_arg_from_input(
            args,
            "grpc_host",
            "Enter the host to use for the gRPC connection (default: "localhost"): ",
            str,
            "localhost",
        )
        set_arg_from_input(
            args,
            "grpc_port",
            "Enter the port to use for the gRPC connection (default: 50052): ",
            int,
            50052,
        )
        set_arg_from_input(
            args,
            "grpc_secure",
            "Whether to use a secure channel for the underlying gRPC API. (default: False): ",
            bool,
            False,
        )
        set_arg_from_input(
            args,
            "timeout_init",
            "(Optional) Enter the timeout for connection initialization (value in secs, default: 10): ",
            int,
            10,
        )
        set_arg_from_input(
            args,
            "timeout_query",
            "(Optional) Enter the timeout for query (value in secs, default: 45): ",
            int,
            45,
        )
        set_arg_from_input(
            args,
            "timeout_insert",
            "(Optional) Enter the timeout for data insertion (value in secs, default: 120): ",
            int,
            120,
        )
        set_arg_from_password(
            args,
            "weaviate_api_key",
            "Enter your Weaviate API key: ",
            "WEAVIATE_API_KEY"
        )
        set_arg_from_password(
            args,
            "openai_api_key",
            "(Optional) Enter your Open AI API key: ",
            "OPEN_AI_API_KEY"
        )
        weaviate_import = ImportWeaviate(args)
        weaviate_import.upsert_data()
        return weaviate_import

    @classmethod
    def make_parser(cls, subparsers):
        parser_qdrant = subparsers.add_parser(
            DBNames.WEAVIATE, help="Import data to Weaviate"
        )
        parser_qdrant.add_argument(
            "-u",
            "--url",
            type=str,
            help="Weaviate instance url",
            default="http://localhost:8099",
        )
        parser_qdrant.add_argument(
            "--http_secure",
            type=bool,
            help="Whether to use a secure channel for the http port",
            default=False,
        )
        parser_qdrant.add_argument(
            "--grpc_host",
            type=str,
            help="Host for the gRPC connection",
            default="localhost",
        )
        parser_qdrant.add_argument(
            "--grpc_port",
            type=int,
            help="Port for the gRPC connection",
            default=50052,
        )
        parser_qdrant.add_argument(
            "--grpc_secure",
            type=bool,
            help="Whether to use a secure channel for the underlying gRPC API.",
            default=False,
        )
        parser_qdrant.add_argument(
            "--timeout_init",
            type=int,
            help="Timeout for connection initialization (in secs)",
            default=10,
        )
        parser_qdrant.add_argument(
            "--timeout_query",
            type=int,
            help="Timeout for query (in secs)",
            default=45,
        )
        parser_qdrant.add_argument(
            "--timeout_insert",
            type=int,
            help="Timeout for data insertion (in secs)",
            default=120,
        )

    def __init__(self, args):
        # call super class constructor
        super().__init__(args)
        url, http_secure, grpc_host, grpc_port, grpc_secure, timeout_init, timeout_query, timeout_insert,
        weaviate_api_key, openai_api_key
        self.client = weaviate.WeaviateClient(
            connection_params=(
                weaviate.connect.base.ConnectionParams.from_url(
                    url=url,
                    http_secure=http_secure,
                    grpc_host=grpc_host,
                    grpc_port=grpc_port,
                    grpc_secure=grpc_secure)
            ),
            auth_client_secret=weaviate.auth.AuthApiKey(weaviate_api_key),
            additional_headers={
                "X-OpenAI-Api-Key": openai_api_key
                if openai_api_key
                else None
            },
            additional_config=AdditionalConfig(
                timeout=Timeout(init=timeout_init, query=timeout_query, insert=timeout_insert),
            ),
        )

    def upsert_data(self):
        max_hit = False
        total_imported_count = 0
        # we know that the self.vdf_meta["indexes"] is a list
        index_meta: Dict[str, List[NamespaceMeta]] = {}
        for index_name, index_meta in tqdm(
            self.vdf_meta["indexes"].items(), desc="Importing indexes"
        ):
        ....

Thank you @abhishek818 I will try to complete it

@emekaokoli19
Copy link

emekaokoli19 commented May 2, 2024

/attempt #74

Copy link

algora-pbc bot commented May 2, 2024

💡 @emekaokoli19 submitted a pull request that claims the bounty. You can visit your bounty board to reward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants