GitHub - charlotte0408/Research-about-ML-regularized-algorithms

Research Project about Machine Learning Project

Brief Introduction

This project is conducted by Professor Amanda Montoya and me, and this project was invited by 2019 Symposium on Data Science & Statistics for poster presentation. The poster and presentation slides are included in this repository. Please feel free to check it out.

Title

Comparing Performance of Lasso, Group Lasso, and Linear Regression with Categorical Predictors

Abstract

Machine learning is used frequently to train models and predict outcomes in different scientific areas. Lasso is a method that perform variable selection and regularization, and is often regarded as an advanced version of linear regression. People try to use lasso in the same way as linear regression, assuming they share same properties. For models with categorical predictors, group lasso has been suggested as an alternative to lasso to align with properties from linear regression. The goal of my project is to show that linear regression, lasso, and group lasso have distinct pros and cons and should be treated accordingly. By analyzing wage data with 6 variables with 20 categories total, we determined that lasso predicts better than group lasso which predicts better than linear regression. We also analyzed the effect of choosing different coding strategies on the predicted results. Linear regression is not affected when different coding strategies are chosen. However, using different coding strategies for categorical predictors, lasso builds model with different variable selection. Group lasso fixes the issue with coding strategy, but it can cause overfitting. Using Monte-Carlo simulation, we created a categorical predictor with one dominant category and several non-predictive categories. When there are few non-predictive categories, group lasso is more likely to include the categorical variable with only one dominant category than lasso. Group lasso is less likely to include this categorical variable than lasso when the number of non-predictive categories increases. Researchers primarily focus on the similarity between linear regression and lasso, but pay little attention to their different properties, particularly involving categorical predictors. This project demonstrates that when using lasso, the effect of choosing different coding strategies should be considered and group lasso should be avoided when a dominant category is expected.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
1_28 determining value.Rmd		1_28 determining value.Rmd
1_28 without intercept.Rmd		1_28 without intercept.Rmd
Coding strategy(complete).Rmd		Coding strategy(complete).Rmd
Coding strategy.Rmd		Coding strategy.Rmd
GroupLasso_04012019.pptx		GroupLasso_04012019.pptx
Linear_vs_GGlasso.Rmd		Linear_vs_GGlasso.Rmd
Linear_vs_Lasso.Rmd		Linear_vs_Lasso.Rmd
README.md		README.md
Simulation_MSE(separate).Rmd		Simulation_MSE(separate).Rmd
Simulation_MSE.Rmd		Simulation_MSE.Rmd
poster_sdss_final.pdf		poster_sdss_final.pdf
predict linear model.Rmd		predict linear model.Rmd
wage_data_prediction.Rmd		wage_data_prediction.Rmd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

1_28 determining value.Rmd

1_28 determining value.Rmd

1_28 without intercept.Rmd

1_28 without intercept.Rmd

Coding strategy(complete).Rmd

Coding strategy(complete).Rmd

Coding strategy.Rmd

Coding strategy.Rmd

GroupLasso_04012019.pptx

GroupLasso_04012019.pptx

Linear_vs_GGlasso.Rmd

Linear_vs_GGlasso.Rmd

Linear_vs_Lasso.Rmd

Linear_vs_Lasso.Rmd

README.md

README.md

Simulation_MSE(separate).Rmd

Simulation_MSE(separate).Rmd

Simulation_MSE.Rmd

Simulation_MSE.Rmd

poster_sdss_final.pdf

poster_sdss_final.pdf

predict linear model.Rmd

predict linear model.Rmd

wage_data_prediction.Rmd

wage_data_prediction.Rmd

Repository files navigation

Research Project about Machine Learning Project

Brief Introduction

Title

Abstract

About

Releases

Packages

charlotte0408/Research-about-ML-regularized-algorithms

Folders and files

Latest commit

History

Repository files navigation

Research Project about Machine Learning Project

Brief Introduction

Title

Abstract

About

Topics

Resources

Stars

Watchers

Forks