Skip to content

reshalfahsi/image-classification-augmentation

Repository files navigation

Image Classification Using Swin Transformer With RandAugment, CutMix, and MixUp

colab

In this project, we will explore three distinct Swin Transformers, i.e., without augmentation, with augmentation, and without using the pre-trained weight (or from scratch). Here, the augmentation is undertaken with RandAugment, CutMix, and MixUp. We are about to witness the consequences of utilizing augmentation and pre-trained weight (transfer learning) on the models on the imbalanced dataset, i.e., Caltech-256. The dataset is split per category with a ratio of 81:9:10 for the training, validation, and testing sets. For the from scratch model, each category is truncated to 100 instances. Applying the augmentation and pre-trained weight clearly boosts the performance of the model. Not to mention the pre-trained weight insanely pushes the model to effectively predict the right label in the top-1 and top-5.

Experiment

Check out this notebook to see and ponder the full implementation.

Result

Quantitative Result

The result below shows the performance of three different Swin Transformer models: without augmentation, with augmentation, and from scratch, quantitatively.

Model Loss Top-1 Acc. Top-5 Acc.
No Augmentation 0.369 90.17% 97.68%
Augmentation 0.347 91.57% 98.75%
From Scratch 4.544 11.58% 27.09%

Validation Accuracy and Loss Curve

acc_curve
Accuracy curves of the models on the validation set.

loss_curve
Loss curves of the models on the validation set.

Qualitative Result

The following collated pictures visually delineate the quality of the prediction of the three models.

no_aug_qualitative
The prediction result of Swin Transformer without augmentation.

aug_qualitative
The prediction result of Swin Transformer with augmentation.

scratch_qualitative
The prediction result of Swin Transformer from scratch (no pre-trained).

Credit