Examples of data science analysis and model building
This is a good example of data science analysis. Main topics of this work are:
- Missing data cleaning.
- Outliers dealing.
- Data patterns visualization.
- Correlation analysis.
- Prediction model with Linear Regression, Random Forest and XGBoost.
- XGBoost tuning with Optuna.
A Probabilistic Programming approach to the famous Galton Hereditary Stature dataset which is the basis for the paper that popularized the concept of regression.
- Analysis of the dataset
- Bayesian Hierarchical Linear Regression using PyMC3.
Homeworks of the course Statistical Rethinking, which is a probabilistic programming course/book, solved using PyMC3.
- Homework week 2
- Homework week 3
- Homework week 4
- Homework week 5
- Homework week 6
- Homework week 8-pt1
- Homework week 8-pt2
- Homework week 8-pt3
Simple simulation of spurious association modeled with probabilistic programming.
Simple simulation of masking relationship modeled with probabilistic programming.
Simulation of the 4 elemental variable relations: the fork, pipe, collider, and descendant.
Python version: 3.7.4
Packages: pandas, sklearn, seaborn, matplotlib, optuna, xgboost, statsmodels, pymc3.