Skip to content

yurayli/kaggle-redhat

Repository files navigation

Code for Red Hat Kaggle competition

Competition page https://www.kaggle.com/c/predicting-red-hat-business-value

features: almost all categorical features, and some with high cardinality; one-hot encoding for categorical features; discard 'char_10' in training data; cross validation set split by 'people_id'

xgboost gbtree

model: performance better without one-hot encoding.

Performance:
CV scorePublicPrivate
No leak0.9474680.9539070.953896
With leakN/A0.9906100.990595

xgboost gblinear

model: using sparse data.

Performance:
CV scorePublicPrivate
No leak0.9796110.9807650.980584
With leakN/A0.9901580.990171

neural net

model: with embedding layer on 'group_1' + Batch Normalization + Dropout

Performance:
CV scorePublicPrivate
No leak0.9851890.9886110.988523
With leakN/A0.9909790.990986

ensembling

average of 6 nn + gbl + 3 gbt

Best performance: (~22% of leaderboard)
PublicPrivate
No leak0.9877250.987664
With leak0.9910870.991075