Skip to content

This repo contains raw data and codes to construct a gender classifier based on the first name. Also, the link, files, and codes of a shiny app where you could use the model for inference purposes.

Notifications You must be signed in to change notification settings

DavidSolan0/gender_name_clf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Goal

This repo contains raw data and codes to construct a gender classifier based on the first name. Also, the link, files, and codes of a shiny app where you could use the model for inference purposes.

Folders

clf

In clf folder, you will find the data used to train and test the model and the R code with four classifications with their hyperparameter tunning using cv with three folds.

The preprocessing only includes lowercase and removes punctuation from the text. I used glove embeddings from thetextdata package. Based on metrics, I picked an SVM as the best classifier. The AUC from my model is equal to 0.84 with an accuracy of 0.8. Below you can take a look at the ROC curve.

image

It is worth mentioning that the XGB classifier has similar metrics AUC equal to 0.838 to and accuracy of 0.793, followed by Random Forest with AUC equal to 0.793 and an accuracy of 0.762. Finally, it is a naivebayes classifier with AUC equal to 0.733 and accuracy of 0.561.

shiny_app

This folder includes the UI and server codes to deploy a shinyApp with the model. Remember that you have to save the model as a .rds file and save it in this folder for work on your machine.

Discussion

Future work could include trying different length embeddings and other classifiers.

DATA WAS TAKEN FROM HERE

About

This repo contains raw data and codes to construct a gender classifier based on the first name. Also, the link, files, and codes of a shiny app where you could use the model for inference purposes.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages