Skip to content

tonio73/data-science

Repository files navigation

Learning data science step by step

Most of the examples presented in Internet tutorials are either using powerful libraries (Scikit Learn, Keras...), complex models (neural nets), or based on data samples with many features.

In this collection of workbooks, I want to start from simple examples and raw Python code and then progressively complexify the data sets and use more complex technics and libraries.

On purpose, most datasets are generated in order to adjust the parameters fitting with the demonstration.

The notebooks are of type Jupyter, using Python 3.7

To read or edit the notebooks you may :

Do not get confused

Linear regression

Let's progressively start from simple univariate example and then add progressively more complexity:

  • Univariate function approximation with linear regression,
    • Closed form, with Numpy, Scipy or SciKit Learn, eventually with gradient descent and stochastic gradient descent (HTML / Notebook)
    • Using Tensor Flow (HTML / Jupyter)
  • Bivariate function approximation with linear regression,
    • Closed form, using SciKit Learn, (stochastic) gradient descent, adding regularizer (HTML / Jupyter)
    • Using Keras, single perceptron linear regression, two layer model (HTML / Jupyter)
    • Model confidence and quality evaluation in the Gaussian model case (HTML / Jupyter)
  • Feature engineering or feature learning with linear regression (HTML / Jupyter)

Classification

Binary classification with parametric models

  • Univariate function as boundary on a two classes data, approximated with logistic regression,
  • Bivariate parametric function as a boundary, approximated with logistic regression,
    • Homemade, using SciKit Learn (HTML / Jupyter)
    • Using Tensor flow (HTML / Jupyter)
    • Using Keras, adding regularizers and eventually a two layer neural net (HTML / Jupyter)

Binary classification with non-parametric models

  • Bivariate with K Nearest Neighbors (KNN), homemade, using SciKit Learn (HTML / Jupyter)
  • Non linear problem solving with Support Vector Machine (SVM) (HTML / Jupyter)

Multi-class classification with regression or neural networks

  • Two features to separate the 2D plan into 3 or more categories
    • Using Keras matching on linearly separable problem (Czech flag) and not linearly separable problem (Norway flag), using 2 and 3 layer neural net to handle the second problem (HTML / Jupyter)

Multi-class classification with non-parametric models

  • Multi-class classification using decision trees (HTML / Jupyter)

Deep learning

Convolutional neural networks (CNN)

  • Introduction to CNN as an image filter
    • Part 1 - Horizontal edge detector using a simple 1-2 layer neural nets (HTML / Jupyter)
    • coming soon Part 2 - Combined horizontal-vertical edge detector using multiple convolutionnal units
  • CNN versus Dense comparison on MNIST
    • Part 1 - Design and performance comparison (HTML / Jupyter)
    • Part 2 - Visualization with UMAP (HTML / Jupyter)
    • coming soon Part 3 - Resilience to geometric transformations
  • Interpretability
    • Activation maps on CIFAR-10 (HTML / Jupyter)
    • Saliency maps on CIFAR-10 (HTML / Jupyter)
    • Saliency maps on Imagenet (subset) with ResNet50 (HTML / Jupyter) (WORK ON GOING)
    • CNN as a graph using NetworkX, extract centrality values (HTML / Jupyter) (WORK ON GOING)
  • Other CNNs
    • Fashion MNIST CNN with Data Augmentation (HTML / Jupyter)

Generative networks (VAE, GAN)

  • Generative Adversarial Networkds (GAN), the basics on MNIST, with Tensorflow 2 / Keras and Tensorflow Datasets
    • Original GAN using Dense layers (HTML / Jupyter)
    • GAN with convolutions (DCGAN) (HTML / Jupyter)
    • GAN with convolutions (DCGAN), no Dense layer on the generator path (HTML / Jupyter)
    • GAN and Bayesian network on ski outing reports and prediction of global warming impact on skiing in the Alps (HTML / Jupyter)

Natural Language Processing (NLP)

Reading list

Books

  • Deep Learning - I. Goodfellow, Y. Bengio, A. Courville, The MIT Press.
    • Very good overview of machine learning and its extension to deep learning
  • An Introduction to Statistical Learning with Applications in R - G. James, D. Witten, T. Hastie, R. Tibshirani.
    • Traditional machine learning including regressions, clustering, SVM...

Nice notebooks

Tutorials and courses

Papers

Data / model sources

Word embeddings & analysis