Convolutional Neural Network - EMNIST Balanced Dataset

(using plaidML and Metal)

Description

This is a simple convolutional neural network (CNN) trained on the EMNIST Balanced dataset designed to test the performance of an environment built with plaidML and Metal. PlaidML is a software framework that enables Keras to execute calculations on a GPU using OpenCL instead of CUDA. ¹

Early stopping was added to halt training once the model performance failed to improve on the validation dataset. This ensures the avoidance of both overfitting (by using too many training epochs) and underfitting (by using too few training epochs).

CNN Architecture

Convolutional (Conv2D)

Pooling (MaxPooling)

Convolutional (Conv2D)

Pooling (MaxPooling)

Flattening

Dense (ReLU)

Dropout

Dense (SoftMax)

Hardware

iMac Pro
- 10-core Intel Xenon Processor
- 128 GB RAM (2666 MHz DDR4)
- Radeon Pro Vega 56 8 GB

Software Environment

PyCharm for Anaconda
- Conda virtual environment
  - Python 3.7
  - plaidML
  - See environment.yml for package list

Dataset

The EMNIST dataset is a set of handwritten character digits derived from the NIST Special Database 19 and converted to a 28x28 pixel image format and dataset structure that directly matches the MNIST dataset. Further information on the dataset contents and conversion process can be found in the paper available at https://arxiv.org/abs/1702.05373v1.

Format

There are six different splits provided in this dataset and each are provided in two formats:

Binary (see emnistsourcefiles.zip)
CSV (combined labels and images)
- Each row is a separate image
- 785 columns
- First column = class_label (see mappings.txt for class label definitions)
- Each column after represents one pixel value (784 total for a 28 x 28 image)

EMNIST Balanced Dataset

The EMNIST Balanced dataset is meant to address the balance issues in the ByClass and ByMerge datasets. It is derived from the ByMerge dataset to reduce mis-classification errors due to capital and lower case letters and also has an equal number of samples per class. This dataset is meant to be the most applicable.

train: 112,800
test: 18,800
total: 131,600
classes: 47 (balanced)

References:

Cohen, G., Afshar, S., Tapson, J., & van Schaik, A. (2017). EMNIST: an extension of MNIST to handwritten letters.
Crawford, C. (n.d.). EMNIST (Extended MNIST)Dataset.
Di Sipio, R. (2019). GPU-Accelerated Machine Learning on MacOS.
Jindal, A. (n.d.). EMNIST using Keras CNN.
Ollis, N. (2019). macOS Machine Learning in 2019.

Footnotes

1 GPU-Accelerated Machine Learning on MacOS ↩

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md
cnn_emnist_plaidML.ipynb		cnn_emnist_plaidML.ipynb
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

cnn_emnist_plaidML.ipynb

cnn_emnist_plaidML.ipynb

environment.yml

environment.yml

Repository files navigation

Convolutional Neural Network - EMNIST Balanced Dataset

(using plaidML and Metal)

Description

CNN Architecture

Hardware

Software Environment

Dataset

Format

EMNIST Balanced Dataset

References:

Footnotes

About

Releases

Packages

Languages

oceallaigh-p/cnn_emnist_plaidML

Folders and files

Latest commit

History

Repository files navigation

Convolutional Neural Network - EMNIST Balanced Dataset

(using plaidML and Metal)

Description

CNN Architecture

Hardware

Software Environment

Dataset

Format

EMNIST Balanced Dataset

References:

Footnotes

About

Topics

Resources

Stars

Watchers

Forks

Languages