Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate and store image embeddings in search index #748

Closed
7 tasks done
Tracked by #323
adamdougal opened this issue Apr 24, 2024 · 4 comments
Closed
7 tasks done
Tracked by #323

Generate and store image embeddings in search index #748

adamdougal opened this issue Apr 24, 2024 · 4 comments
Assignees
Labels
subtask A subtask

Comments

@adamdougal
Copy link
Collaborator

adamdougal commented Apr 24, 2024

Required by #323

Description

Generate image embeddings using computer vision and store them in a modified search index.

Tasks

@adamdougal
Copy link
Collaborator Author

Update 26th April:

  • I'm investigating what changes are going to be required to the app code and index to be able to store and query an additional image embedding
  • I've hit a problem where it does not appear that LangChain has support for Azure AI Computer Vision OR has the ability to query AzureSearch for more than one vector embedding.
  • One option that might work is to have two separate indexes and combine the search results in the application code. I'm not 100% sure if this is viable as yet
  • Another option is to remove the use of LangChain in the QuestionAnswerTool and AzureSearchHelper
    • We would still keep the LangChain orchestration option, but it would only be used in the LangChainAgent
    • This has also been done (for now) in the azure-sample-openai-demo repo
    • Benefits:
      • This would give us full control over how we query and store data in Azure AI Search
      • We wouldn't be blocked on LangChain updates to use the latest features from OpenAI and Azure
      • Potentially simpler code base
    • Downsides:
      • Potentially significant breaking changes required

My next steps are to investigate if removing LangChain allows easier integration of vision as well as seeing what else it might affect. If all goes well, I'll raise an ADR to discuss futher with the team.

adamdougal added a commit that referenced this issue Apr 30, 2024
- To enable storing and querying image embeddings

Required by #748
github-merge-queue bot pushed a commit that referenced this issue Apr 30, 2024
* Raise ADR proposing to remove Langchain from tools

- To enable storing and querying image embeddings

Required by #748

* Add note about what we lose by removing langchain

* Optimism
@adamdougal
Copy link
Collaborator Author

Update 1st May:

  • We are proceeding with removing LangChain from the tools
  • Before starting to make this change, I am going to expand the current functional tests to ensure we are testing index creation. This will help ensure no unexpected changes are being made to the index.

adamdougal added a commit that referenced this issue May 2, 2024
- In preparation for changing the tools we use to create it
- This ensures that we don't make any unintended changes to the index

Required by #748
github-merge-queue bot pushed a commit that referenced this issue May 3, 2024
- In preparation for changing the tools we use to create it
- This ensures that we don't make any unintended changes to the index

Required by #748
adamdougal added a commit that referenced this issue May 7, 2024
- This is preparation for adding an additional image vector field for
  advanced image processing
- I've tried to keep changes to a minimum
- Still using langchain in the question and answer tool for now
- Increased unit test coverage

Required by #748
adamdougal added a commit that referenced this issue May 7, 2024
- This is preparation for adding an additional image vector field for
  advanced image processing
- I've tried to keep changes to a minimum
- Still using langchain in the question and answer tool for now
- Increased unit test coverage

Required by #748
adamdougal added a commit that referenced this issue May 9, 2024
- This is preparation for adding an additional image vector field for
  advanced image processing
- I've tried to keep changes to a minimum
- Still using langchain in the question and answer tool for now
- Increased unit test coverage

Required by #748
@adamdougal
Copy link
Collaborator Author

adamdougal commented May 13, 2024

13th May:

Mini code review:

  • Include original exception when catching and throwing
  • Move inline patches to annotations
  • Make sure new files are pep8 compliant

Get wider feedback:

  • The use of pytest-httpserver in unit tests

@adamdougal
Copy link
Collaborator Author

14th May Update:

  • A PR has been raised to add the call to computer vision to generate embeddings of the images
  • Next steps are to take these embeddings and store them in a modified index

adamdougal added a commit that referenced this issue May 15, 2024
- hard coded the vector search dimensions for now
  - once we have implemented the `vectorizeText` call when searching, we
    could do the same thing we've done for GPT embeddings

Required by #748
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
subtask A subtask
Projects
None yet
Development

No branches or pull requests

2 participants