ViT vs. CNN on Elephants

ViT vs. CNN on Elephants - Image Binary Classification on African vs. Asian Elephants Dataset

ViT vs. CNN on Elephants
Image Binary Classification on African vs. Asian Elephants Dataset

Ariel Lulinsky • Hadar Hai

Deep Learning Course Final Project

046211 ECE Technion 2024

Project Presentation Video

Elephant Image Classification: ViT vs CNN

This project focuses on evaluating Convolutional Neural Networks (CNN) and Vision Transformers (ViT) for image classification tasks, specifically distinguishing between Asian elephants and African elephants. The project focuses on the models of MobileNet, ResNet, ViT-b-16 and DINOv2. Leveraging transfer learning with pre-trained models, we aim to achieve accurate analysis and classification of images depicting these majestic creatures. By utilizing publicly available dataset, this project contributes to elephant conservation and research efforts.

Key Findings

ViT Models: Outperformed the CNN models, with self-supervised ViT models achieving the highest accuracy, highlighting their effectiveness in capturing spatial relationships.
Importance of Pretrained Dataset Size: Achieving good results for ViT on small dataset requires utilizing pretrained model on a huge dataset such as ImageNet.

Results

The following images depict the performance metrics and visualizations from the experiments conducted in this project for best CNN model (ResNet) and best ViT model (DINOv2):

Image Human Classification Tool

CNN Results - Activation Map

Conclusion

Overall, the project provides valuable insights into the performance of different models and underscores the importance of considering factors such as model architecture, dataset size and pretraining strategies in image classification tasks. Specifically, when dealing with small datasets, Self-supervised ViT (pretrained on large dataset) outperforms other methods.

Prerequisites

Library	Why
`matplotlib`	Plotting and visualization
`time`	Time-related functions
`os`	Operating system interface
`copy`	Shallow and deep copy operations
`PIL`	Python Imaging Library for image processing
`cv2`	OpenCV library for computer vision tasks
`pandas`	Data manipulation and analysis
`torch`	Deep learning framework
`torchvision`	Datasets and transformations for vision tasks
`sklearn`	Machine learning library
`IPython`	Displaying images in IPython
`kornia`	Differentiable computer vision library for PyTorch
`pytorch_grad_cam`	Package for visualizing convolutional neural network activation maps
`tkinter`	GUI Toolkit for Python
`datetime`	Date and time manipulation
`random`	Random number generation

Datasets

Dataset	Notes	Link
Asian vs. African Elephant Image Classification	The dataset is not available in this repository, please download it from the link	Kaggle

Repository Organization

File name	Content
`/checkpoints`	directory for trained checkpoints models and histories
`/assets`	directory for assets (gifs, images, etc.)
`/docs`	various documentation files
`/notebooks`	Jupyter Notebooks used for training and evaluation
`/logs`	human classification results per person
`/tools/human_classification.py`	`tkinter`-based interactive GUI to collect data on human classification performance on the African vs. Asian dataset
`requirements.txt`	requirements file for `pip`

References

Jacob Gil. "pytorch-grad-cam repository." GitHub. https://github.com/jacobgil/pytorch-grad-cam/tree/master
Luthei. "VIT vs CNN on WikiArt." Kaggle. https://www.kaggle.com/code/luthei/vit-vs-cnn-on-wikiart
Vivmankar. "Asian vs African Elephant Image Classification Dataset." Kaggle. https://www.kaggle.com/datasets/vivmankar/asian-vs-african-elephant-image-classification
Nasruddin Az. "Elephclass: Asian vs African Elephants Classifier." Kaggle. https://www.kaggle.com/code/nasruddinaz/elephclass-asian-vs-african-elephants-classifier
"An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale" by Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, et al. ArXiv

Acknowledgments

This project is a part of the ECE 046211 Deep Learning course at the Technion. We would like to express our gratitude to Tal Daniel and Prof. Daniel Soudry for their guidance and support throughout this project and the course.

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 154 Commits
assets		assets
checkpoints		checkpoints
docs		docs
logs		logs
notebooks		notebooks
tools		tools
README.md		README.md
requirements		requirements

hadar-hai/vit-vs-cnn-on-elephants

Folders and files

Latest commit

History

Repository files navigation

ViT vs. CNN on Elephants

ViT vs. CNN on Elephants Image Binary Classification on African vs. Asian Elephants Dataset

Ariel Lulinsky • Hadar Hai

Deep Learning Course Final Project

046211 ECE Technion 2024

Elephant Image Classification: ViT vs CNN

Key Findings

Results

Image Human Classification Tool

CNN Results - Activation Map

Conclusion

Prerequisites

Datasets

Repository Organization

References

Acknowledgments

License

About

Topics

Resources

Stars

Watchers

Forks

Languages

ViT vs. CNN on Elephants
Image Binary Classification on African vs. Asian Elephants Dataset