Competition page https://www.kaggle.com/c/predicting-red-hat-business-value
features: almost all categorical features, and some with high cardinality; one-hot encoding for categorical features; discard 'char_10' in training data; cross validation set split by 'people_id'
model: performance better without one-hot encoding.
Performance:CV score | Public | Private | |
No leak | 0.947468 | 0.953907 | 0.953896 |
With leak | N/A | 0.990610 | 0.990595 |
model: using sparse data.
Performance:CV score | Public | Private | |
No leak | 0.979611 | 0.980765 | 0.980584 |
With leak | N/A | 0.990158 | 0.990171 |
model: with embedding layer on 'group_1' + Batch Normalization + Dropout
Performance:CV score | Public | Private | |
No leak | 0.985189 | 0.988611 | 0.988523 |
With leak | N/A | 0.990979 | 0.990986 |
average of 6 nn + gbl + 3 gbt
Best performance: (~22% of leaderboard)Public | Private | |
No leak | 0.987725 | 0.987664 |
With leak | 0.991087 | 0.991075 |