E2E Machine Learning Pipeline

An automated tool for binary and multi-class classification and hyper-parameter optimization on stationary and streaming type datasets. Trains different architectures for traditional batch-type datasets (KNN, DT, Random Forests, SVM, Bagging, Boosting etc.) and streaming datasets (Hoeffding Tree classifier, SAM-KNN, Adaptive Hoeffding Trees, Adaptive Random Forests, OzaBag, OzaBoost etc.) and generates metric dumps and performance evaluation graphs (ROCs) comparing the best models. Hypothesis Testing using Friedmans Statistics and Nemenyis Post-hoc test is also supported for comparative analysis of algorithms using statistical techniques.

File Structure

├───configs
│   │───model_hparams.py
│
├───data
│   |───drug_consumption.data
│
├───dataset
│   │───dataset_base.py
│   │───feature_select.py
│
├───driver
│   |───driver.py
│
├───models
│   │───models.py
│
├───output
│   ├───run_20221001-124242
│       ...
|       ...
|       ...
└───utils
    │───plot_results.py
    │───scoring.py

File descriptions

model_hparams.py: Hyperparameter combinations for each model can be specified here.
drug_consumption.data: Stores the dataset (all datasets are stored under data folder.)
driver.py: Starting point for execution of the program (default).
driver_online.py: Starting point for execution of the program for online models.
models.py: Model classes and definitions.
plot_results.py: Utility to plot ROC curves
scoring.py: Utility to compute different metrics such as GMean, F-score, AUC etc.
dataset.py: Dataset class, used for preparing train test splits and pre-processing data.
feature_select.py: Feature Selection algorithms used for feature reduction based on statistical tests.
output: Directory where run dumps are generated with evaluation of models and vizualisation of performance through ROC plots and confusion metrics.

Run cmd

Batch based Models

# Navigate to the root directory
>> python ./driver/driver.py

Online Models

# Navigate to the root directory
>> python ./driver/driver_online.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

analysis

analysis

configs

configs

data

data

dataset

dataset

driver

driver

evaluate

evaluate

models

models

notebooks

notebooks

output

output

utils

utils

.gitattributes

.gitattributes

.gitignore

.gitignore

README.md

README.md

Repository files navigation

E2E Machine Learning Pipeline

File Structure

File descriptions

Run cmd

About

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
analysis		analysis
configs		configs
data		data
dataset		dataset
driver		driver
evaluate		evaluate
models		models
notebooks		notebooks
output		output
utils		utils
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md

edwinthomas444/machine-learning

Folders and files

Latest commit

History

Repository files navigation

E2E Machine Learning Pipeline

File Structure

File descriptions

Run cmd

About

Topics

Resources

Stars

Watchers

Forks

Languages