Skip to content

elisiojsj/Data-Science

Repository files navigation

Data-Science

Examples of data science analysis and model building

This is a good example of data science analysis. Main topics of this work are:

  • Missing data cleaning.
  • Outliers dealing.
  • Data patterns visualization.
  • Correlation analysis.
  • Prediction model with Linear Regression, Random Forest and XGBoost.
  • XGBoost tuning with Optuna.

Correlation


A Probabilistic Programming approach to the famous Galton Hereditary Stature dataset which is the basis for the paper that popularized the concept of regression.

  • Analysis of the dataset
  • Bayesian Hierarchical Linear Regression using PyMC3.

Galton


Statistical Rethinking - Homeworks

Homeworks of the course Statistical Rethinking, which is a probabilistic programming course/book, solved using PyMC3.


Simple simulation of spurious association modeled with probabilistic programming.

Spurious


Simple simulation of masking relationship modeled with probabilistic programming.

Masking


Simulation of the 4 elemental variable relations: the fork, pipe, collider, and descendant.


Software Used

Python version: 3.7.4

Packages: pandas, sklearn, seaborn, matplotlib, optuna, xgboost, statsmodels, pymc3.