Full stack machine learning - a scikit model in action
An attempt to cluster the data using t-SNE algorithm (n_components=3
, perplexity=50
, n_iter=300
).
It's better to use the Recall metric. Recall (R) is defined as the number of true positives (T_p) over the number of true positives plus the number of false negatives (F_n) - it makes false negatives unwanted - which is good.
Thanks to that we'll recognize sick patients (1) even if sometimes it would cause notifying a healthy patient (0) that he might be potentially sick.
In short:
- It's possible to build a model which predicts health with 0.965667 recall and 0.973480 F-1 score
- Best algorithm of machine learning is the Multi-layer Perceptron classifier
- It's possible to hyper tune it and get only 0.1% better F-1 score
- Model is robust, which was proved by cross validation and running on a bigger test sample
- Dataset needs some feature engineering though
- We may play around more with dimensionality reduction by using methods alternative to PCA
- It's not obvious which score function to choose, although a harmonic sum of precision and recall should be enough
- There may be a slight risk of over-fitting, but I'd need more of your data to verify this