Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for different knowledge retrieval methods #2

Open
transitive-bullshit opened this issue Nov 15, 2023 · 0 comments
Open
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed

Comments

@transitive-bullshit
Copy link
Owner

transitive-bullshit commented Nov 15, 2023

This is for the built-in retrieval tool.

Currently, the current knowledge retrieval implementation uses a very naive retrieval which simply returns the full contents of every attached file (source).

The current implementation also only support text file types like text/plain and markdown, as no preprocessing or conversions are done at the moment.

It shouldn't be too hard to add support for more legit knowledge retrieval approaches, which would require:

  • processForFileAssistant - File ingestion pre-processing for files marked with purpose: 'assistants'

    • converting non-text files to a common format like markdown (this is probably the hardest step to do well across all of the most common file types)
    • chunking files
    • embedding chunks
    • storing embeddings to an external vector store; make sure to store the file_id each chunk comes from for filtering purposes
  • retrievalTool - Performs knowledge retrieval for a given query on a set of file_ids for RAG.

    • embed query
    • semantic search over vector store filtering by the given file_ids

Integrations here with LangChain and/or LlamaIndex would be great for their flexibility, but we could also KISS and roll out own using https://github.com/dexaai/dexter

@transitive-bullshit transitive-bullshit added enhancement New feature or request help wanted Extra attention is needed good first issue Good for newcomers labels Nov 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

1 participant