Generate and store image embeddings in search index #748

adamdougal · 2024-04-24T07:24:05Z

Required by #323

Description

Generate image embeddings using computer vision and store them in a modified search index.

Tasks

Investigate what needs to change with the search index and how to encorporate that into CWYDSA
- POC: POC: Stop using langchain for Azure Search and OpenAI calls #786
- ADR: Raise ADR proposing to remove Langchain from tools #788
Add functional test for current index creation - Modify functional tests to include index creation #802
Stop using Langchain for creating, populating and searching index - refactor: Remove langchain from index operations #827
Generate image embeddings using computer vision - feat: Generate embeddings for images #892
Store image embeddings in search index - feat: Store image embeddings in search index #921

adamdougal · 2024-04-26T12:57:35Z

Update 26th April:

I'm investigating what changes are going to be required to the app code and index to be able to store and query an additional image embedding
I've hit a problem where it does not appear that LangChain has support for Azure AI Computer Vision OR has the ability to query AzureSearch for more than one vector embedding.
One option that might work is to have two separate indexes and combine the search results in the application code. I'm not 100% sure if this is viable as yet
Another option is to remove the use of LangChain in the QuestionAnswerTool and AzureSearchHelper
- We would still keep the LangChain orchestration option, but it would only be used in the LangChainAgent
- This has also been done (for now) in the azure-sample-openai-demo repo
- Benefits:
  - This would give us full control over how we query and store data in Azure AI Search
  - We wouldn't be blocked on LangChain updates to use the latest features from OpenAI and Azure
  - Potentially simpler code base
- Downsides:
  - Potentially significant breaking changes required

My next steps are to investigate if removing LangChain allows easier integration of vision as well as seeing what else it might affect. If all goes well, I'll raise an ADR to discuss futher with the team.

- To enable storing and querying image embeddings Required by #748

* Raise ADR proposing to remove Langchain from tools - To enable storing and querying image embeddings Required by #748 * Add note about what we lose by removing langchain * Optimism

adamdougal · 2024-05-01T08:53:44Z

Update 1st May:

We are proceeding with removing LangChain from the tools
Before starting to make this change, I am going to expand the current functional tests to ensure we are testing index creation. This will help ensure no unexpected changes are being made to the index.

- In preparation for changing the tools we use to create it - This ensures that we don't make any unintended changes to the index Required by #748

- This is preparation for adding an additional image vector field for advanced image processing - I've tried to keep changes to a minimum - Still using langchain in the question and answer tool for now - Increased unit test coverage Required by #748

adamdougal · 2024-05-13T10:33:07Z

13th May:

Mini code review:

Include original exception when catching and throwing
Move inline patches to annotations
Make sure new files are pep8 compliant

Get wider feedback:

The use of pytest-httpserver in unit tests

adamdougal · 2024-05-14T12:25:17Z

14th May Update:

A PR has been raised to add the call to computer vision to generate embeddings of the images
Next steps are to take these embeddings and store them in a modified index

- hard coded the vector search dimensions for now - once we have implemented the `vectorizeText` call when searching, we could do the same thing we've done for GPT embeddings Required by #748

adamdougal mentioned this issue Apr 24, 2024

Include GPT-4 V model to be able to search for images and embedding images. #323

Open

12 tasks

adamdougal added the subtask A subtask label Apr 24, 2024

adamdougal self-assigned this Apr 24, 2024

adamdougal mentioned this issue Apr 30, 2024

POC: Stop using langchain for Azure Search and OpenAI calls #786

Closed

adamdougal added a commit that referenced this issue Apr 30, 2024

Raise ADR proposing to remove Langchain from tools

df6f8d6

- To enable storing and querying image embeddings Required by #748

adamdougal mentioned this issue Apr 30, 2024

Raise ADR proposing to remove Langchain from tools #788

Merged

adamdougal added a commit that referenced this issue May 2, 2024

Modify functional tests to include index creation

18fbe22

- In preparation for changing the tools we use to create it - This ensures that we don't make any unintended changes to the index Required by #748

adamdougal mentioned this issue May 2, 2024

Modify functional tests to include index creation #802

Merged

github-merge-queue bot pushed a commit that referenced this issue May 3, 2024

Modify functional tests to include index creation (#802)

3705186

- In preparation for changing the tools we use to create it - This ensures that we don't make any unintended changes to the index Required by #748

adamdougal mentioned this issue May 7, 2024

refactor: Remove langchain from index operations #827

Merged

3 tasks

adamdougal assigned cecheta May 9, 2024

cecheta mentioned this issue May 10, 2024

test: Add functional tests for batch_push_results #873

Merged

2 tasks

cecheta mentioned this issue May 13, 2024

feat: Generate embeddings for images #892

Merged

2 tasks

adamdougal mentioned this issue May 15, 2024

feat: Store image embeddings in search index #921

Merged

adamdougal closed this as completed May 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generate and store image embeddings in search index #748

Generate and store image embeddings in search index #748

adamdougal commented Apr 24, 2024 •

edited

adamdougal commented Apr 26, 2024

adamdougal commented May 1, 2024

adamdougal commented May 13, 2024 •

edited by cecheta

adamdougal commented May 14, 2024

Generate and store image embeddings in search index #748

Generate and store image embeddings in search index #748

Comments

adamdougal commented Apr 24, 2024 • edited

Description

Tasks

adamdougal commented Apr 26, 2024

adamdougal commented May 1, 2024

adamdougal commented May 13, 2024 • edited by cecheta

adamdougal commented May 14, 2024

adamdougal commented Apr 24, 2024 •

edited

adamdougal commented May 13, 2024 •

edited by cecheta