Skip to content

This sample shows how to build vector similarity search on Azure Cosmos DB for PostgreSQL using the pgvector extension and the multi-modal embeddings APIs of Azure AI Vision.

License

Notifications You must be signed in to change notification settings

sfoteini/vector-search-azure-cosmos-db-postgresql

Repository files navigation

Image similarity search on Azure Cosmos DB for PostgreSQL with pgvector

This project demonstrates the creation of an image similarity search application utilizing Azure Cosmos DB for PostgreSQL as a vector database and Azure AI Vision for generating embeddings. It serves as a starting point that can be used for the development of more sophisticated vector search solutions.

In this sample application, we will explore image similarity search on Azure Cosmos DB for PostgreSQL using the SemArt Dataset. This dataset contains approximately 21k paintings gathered from the Web Gallery of Art. Each painting comes with various attributes, like a title, description, and the name of the artist.

Prerequisites

Before you start, ensure that you have the following prerequisites installed and configured:

Set-up your working environment

Before running the Python scripts and Jupyter Notebooks, you should:

  1. Clone this repository to to have it locally available.

  2. Download the SemArt Dataset into the semart_dataset directory.

  3. Create a virtual environment and activate it.

  4. Install the required Python packages using the following command:

    pip install -r requirements.txt
  5. Generate a .env file by using the provided .env.sample file from this repository.

How to use the samples

Data processing

Sample Description
Data Preprocessing Cleans up the SemArt Dataset and creates the final dataset that is utilized in our application.
Embeddings Generation Generates vector embeddings for the images in the dataset using the Azure AI Vision Vectorize Image API and creates the final dataset that is utilized in the image search application.

Data upload and index creation

Sample Description
Upload images to Azure Blob Storage Creates an Azure Blob Storage container and uploads the paintings' images.
Insert data to Azure Cosmos DB for PostgreSQL Creates a table in the Azure Cosmos DB for PostgreSQL cluster and populates it with data from the dataset.
Insert data to Azure Cosmos DB for PostgreSQL and create IVFFlat index Creates a table in the Azure Cosmos DB for PostgreSQL cluster, populates it with data from the dataset, and creates an IVFFlat index.
Insert data to Azure Cosmos DB for PostgreSQL and create HNSW index Creates a table in the Azure Cosmos DB for PostgreSQL cluster, populates it with data from the dataset, and creates an HNSW index.

Vector search

Sample Description
Exact nearest neighbor search with pgvector Demonstrates text-to-image and image-to-image search approaches, along with a simple method for metadata filtering.
Approximate Nearest Neighbor Search with IVFFlat Index Demonstrates text-to-image and image-to-image search approaches utilizing the IVFFlat index and compares the results with those retrieved through exact search.
Approximate Nearest Neighbor Search with HNSW Index Demonstrates text-to-image and image-to-image search approaches utilizing the HNSW index and compares the results with those retrieved through exact search.

Resources

Blog Posts

Title Summary
Use the Azure AI Vision multi-modal embeddings API for image retrieval Explore the basics of vector search and generate vector embeddings for images and text using the Azure AI Vision multi-modal embeddings APIs.
Generate embeddings with Azure AI Vision multi-modal embeddings API Discover the art of generating vector embeddings for paintings’ images using the Azure AI Vision multi-modal embeddings APIs in Python.
Store embeddings in Azure Cosmos DB for PostgreSQL with pgvector Learn how to configure Azure Cosmos DB for PostgreSQL as a vector database and insert embeddings into a table using the pgvector extension.
Use pgvector for searching images on Azure Cosmos DB for PostgreSQL Learn how to write SQL queries to search for and identify images that are semantically similar to a reference image or text prompt using pgvector.
Use IVFFlat index on Azure Cosmos DB for PostgreSQL for similarity search Explore vector similarity search using the Inverted File with Flat Compression (IVFFlat) index of pgvector on Azure Cosmos DB for PostgreSQL.
Use HNSW index on Azure Cosmos DB for PostgreSQL for similarity search Explore vector similarity search using the Hierarchical Navigable Small World (HNSW) index of pgvector on Azure Cosmos DB for PostgreSQL.

References

Feel free to experiment with the project and modify the code to meet your specific use cases and requirements!

About

This sample shows how to build vector similarity search on Azure Cosmos DB for PostgreSQL using the pgvector extension and the multi-modal embeddings APIs of Azure AI Vision.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published