IMAGE SEARCH APP

Project Intro

The advent of the internet revolutionized the way we access information through potent search engines such as Google, Bing, and Yandex. With just a few keywords, we can swiftly locate web pages pertinent to our queries. As technology, particularly AI, advances, many search engines now facilitate online image searches.

Various techniques for image searching have emerged, including:

image search by metadata: Here, the search is not based on the image itself but rather on the metadata following the image like (keywords, text, filename,date etc.)
image search based on image content: This approach uses, state of the art computer vision techniques to extract shape, colour, any relevant features from an image. This is the technique we are going to use.

In this project, we will use a pre-trained Convolutional Neural Network (CNN) to extract valuable features from the images. This methodology, a key component of content-based image search, provides the following benefits:

CNN are robust: CNN have proven to be very powerful to extract key features from an image.
CNN can reduce dimension: The CNN output typically represents a condensed, relevant representation of the image often called feature map or embedding or vectors, as not every pixel holds significant information. This condensed representation often has smaller dimensions.

In summary, in this study we will like to answer the following question: Are two similar images associated embedding are still similar?

Technologies / Frameworks used

Project Description

For this project, we've used the Cifar-10. It's is a freely available dataset comprising 60,000 color images, each measuring 32x32 pixels. These images belong to 10 distinct categories: Airplane, Automobile, Bird, Cat, Deer, Dog, Frog, Horse, Ship, and Truck. To obtain their corresponding embeddings, we applied a pre-trained CNN model, specifically VGG-16, to extract essential features. The resulting vector is 512-dimensional. Within Pinecone, we created an index named "images" with a dimension of 512, where all these vectors will be stored.

THe idea behind this project is to find if similar images of Birds for example have similar embeddings. To do so,we've uploaded 50,000 out of the total 60,000 images associated embedding to a pinecone index . This partition was made to ensure that we have entirely new and distinct images compared to those already stored as vectors in Pinecone. Also note that this paritioning is already done by the cifar-10 dataset into train and test batches representing the serialized versions of the original images arrays.

Working principle

The picutre below describe the whole process of storing the embeddings to a pinecone index. From the first step of reading the images, applying a pre-trained VGG16 neural network to generate 512 dimensional embbedings which are then upserted (ie stored) in a pinecone index.

Running time

In this project, we are handling 50 thousand images, which pose some challenges in terms of compution especially when reading images, unpickle (we've downlaoded the CIFAR-10 dataset serialized version) them and extracting features via a CNN. We tried to leverage the power of parrallel computing when running our code so that everything runs as fast as possible on multiple CPU cores via multithreading.
Note: If possible, run this project on a GPU powered environment for faster computations.

Getting Started

Create a pinecone account for free here.
Get the api key and environement associated to your pinecone account
Clone this repo (for help see this tutorial).
Create a virtual environment in the project folder (for help see this tutorial).
Run the following command to install the necessary packages.

For linux users:

pip3 install -r requirements.txt

For windows users:

pip install -r requirements.txt

Launch the image insertion script using the following.

python insert_data.py -key <API_KEY>  -env <ENV>  -metric <METRIC>

Replace <ENV> and <API_KEY> with the values you get from your pinecone account. Wait for the script to be done. 7. Launch the app using the following.

streamlit run app.py -- -key <API_KEY> -env <ENV>

Once everything is done, you should see something like this:

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
data		data
utility		utility
.gitignore		.gitignore
README.md		README.md
app.py		app.py
architecture.png		architecture.png
home.png		home.png
insert_data.py		insert_data.py
pinecone.png		pinecone.png
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

utility

utility

.gitignore

.gitignore

README.md

README.md

app.py

app.py

architecture.png

architecture.png

home.png

home.png

insert_data.py

insert_data.py

pinecone.png

pinecone.png

requirements.txt

requirements.txt

Repository files navigation

IMAGE SEARCH APP

Project Intro

Technologies / Frameworks used

Project Description

Working principle

Running time

Getting Started

About

Releases

Packages

Languages

csk99/image_search

Folders and files

Latest commit

History

Repository files navigation

IMAGE SEARCH APP

Project Intro

Technologies / Frameworks used

Project Description

Working principle

Running time

Getting Started

About

Topics

Resources

Stars

Watchers

Forks

Languages