Skip to content

This project focuses on evaluating Convolutional Neural Networks (CNN) and Vision Transformers (ViT) for image classification tasks, specifically distinguishing between Asian elephants and African elephants.

Notifications You must be signed in to change notification settings

hadar-hai/vit-vs-cnn-on-elephants

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ViT vs. CNN on Elephants

ViT vs. CNN on Elephants - Image Binary Classification on African vs. Asian Elephants Dataset


ViT vs. CNN on Elephants
Image Binary Classification on African vs. Asian Elephants Dataset

Deep Learning Course Final Project

046211 ECE Technion 2024

Elephant Image Classification: ViT vs CNN

This project focuses on evaluating Convolutional Neural Networks (CNN) and Vision Transformers (ViT) for image classification tasks, specifically distinguishing between Asian elephants and African elephants. The project focuses on the models of MobileNet, ResNet, ViT-b-16 and DINOv2. Leveraging transfer learning with pre-trained models, we aim to achieve accurate analysis and classification of images depicting these majestic creatures. By utilizing publicly available dataset, this project contributes to elephant conservation and research efforts.

Key Findings

  • ViT Models: Outperformed the CNN models, with self-supervised ViT models achieving the highest accuracy, highlighting their effectiveness in capturing spatial relationships.
  • Importance of Pretrained Dataset Size: Achieving good results for ViT on small dataset requires utilizing pretrained model on a huge dataset such as ImageNet.

Results

The following images depict the performance metrics and visualizations from the experiments conducted in this project for best CNN model (ResNet) and best ViT model (DINOv2):

Image Human Classification Tool

Human Classification Demo

CNN Results - Activation Map

Conclusion

Overall, the project provides valuable insights into the performance of different models and underscores the importance of considering factors such as model architecture, dataset size and pretraining strategies in image classification tasks. Specifically, when dealing with small datasets, Self-supervised ViT (pretrained on large dataset) outperforms other methods.

Prerequisites

Library Why
matplotlib Plotting and visualization
time Time-related functions
os Operating system interface
copy Shallow and deep copy operations
PIL Python Imaging Library for image processing
cv2 OpenCV library for computer vision tasks
pandas Data manipulation and analysis
torch Deep learning framework
torchvision Datasets and transformations for vision tasks
sklearn Machine learning library
IPython Displaying images in IPython
kornia Differentiable computer vision library for PyTorch
pytorch_grad_cam Package for visualizing convolutional neural network activation maps
tkinter GUI Toolkit for Python
datetime Date and time manipulation
random Random number generation

Datasets

Dataset Notes Link
Asian vs. African Elephant Image Classification The dataset is not available in this repository, please download it from the link Kaggle

Repository Organization

File name Content
/checkpoints directory for trained checkpoints models and histories
/assets directory for assets (gifs, images, etc.)
/docs various documentation files
/notebooks Jupyter Notebooks used for training and evaluation
/logs human classification results per person
/tools/human_classification.py tkinter-based interactive GUI to collect data on human classification performance on the African vs. Asian dataset
requirements.txt requirements file for pip

References

Acknowledgments

This project is a part of the ECE 046211 Deep Learning course at the Technion. We would like to express our gratitude to Tal Daniel and Prof. Daniel Soudry for their guidance and support throughout this project and the course.

License

This project is licensed under the MIT License.

About

This project focuses on evaluating Convolutional Neural Networks (CNN) and Vision Transformers (ViT) for image classification tasks, specifically distinguishing between Asian elephants and African elephants.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published