Skip to content

How certain are we that cleanlab can find errors on the dataset? #696

Discussion options

You must be logged in to vote

Heya @MocktaiLEngineer, great question.

Obvious answer

If your model's performance on a perfect version of your dataset (no outliers, no label issues, etc) was only 50%, you shouldn't expect Cleanlab to boost you beyond that.

Rule of thumb answer

The accuracy of cleanlab is correlated with the accuracy of your model and the amount of error in your dataset. For example, if a model's error rate on your dataset is 30% (70% accuracy) and your dataset contains 20% errors, you might expect the accuracy of errors/issues found by cleanlab to be something like 100% - (30% + 20%) = 50%. This has some minimal theoretical justification in the theory section of this paper, but is largely an empirical …

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by cgnorthcutt
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants