Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Precision-recall description improvement #18719

Open
daniel-yj-yang opened this issue Nov 1, 2020 · 3 comments · May be fixed by #28967
Open

Precision-recall description improvement #18719

daniel-yj-yang opened this issue Nov 1, 2020 · 3 comments · May be fixed by #28967

Comments

@daniel-yj-yang
Copy link

Describe the issue linked to the documentation

https://scikit-learn.org/stable/auto_examples/model_selection/plot_precision_recall.html

It says, "high precision relates to a low false positive rate" and some a few places it links these two together, e.g., "false positives, decreasing precision."

Suggest a potential alternative/fix

"Precision = 1 - false discovery rate" and "Specificity = 1 - false positive rate"

Thus, the term "false discovery rate" should be emphasized, and "false positive rate" should be deemphasized when talking about high precision.

@NicolasHug
Copy link
Member

I'm not sure what's so wrong about the current version, but feel free to open a PR so we can make a more tangible review @daniel-yj-yang

@kushwahvikram15
Copy link

@NicolasHug I also agree with you. I didn't found any issue with current version.

@jnothman
Copy link
Member

I think you're technically right, @daniel-yj-yang, for those who have been trained to use terms like "false discovery rate". This is true of a lot of the medical community, but unfortunately not for much of the machine learning community. The problem here is that a technical term is inadvertently being used: an increase in false positives will indeed decrease precision, if the number of true positives remains constant; and indeed the count of "false positives" and of "false negatives" is all that differs between the formulas for P & R. The reason for a difference between FPR and FDR is that the denominator of FDR is dependent on the estimator, whereas the denominator of FPR is dependent only on the ground truth. It is an important difference, but one that might not be easily drawn out in the context of this example. In any case, an attempt to improve the wording that avoids misuse of jargon, would be helpful.

@lucyleeow lucyleeow linked a pull request May 7, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants