New feature: extend package to gold (verified) labels #1004

jwmueller · 2024-02-13T02:59:25Z

We want to extend some of the core methods in this package, eg:

cleanlab.filter.find_label_issues

cleanlab.multiannotator.get_label_quality_multiannotator

to be more useful in settings where there are some gold labels available.

The gold labels are verified correct already (say by an expert), and can simply be specified by user via an optional verified_labels argument, which say is a sparse array which only contains classes at indices i corresponding to datapoints whose ground-truth label has been verified as verified_labels[i].

Most of the time, users will probably use verified_labels only to specify which labels are correct. But occasionally they may also specify which labels are wrong via this argument, specifying the correct label for those datapoints. For datapoints i which are verified mislabeled, but no correct label exists, we could allow verified_labels[i] to be a missing-value say.

What can be done with these gold labels?

We can do hyperparameter-tuning of all cleanlab arguments to ensure the set of returned label quality scores and issues aligns best with the gold/verified information. This is different than the cleanlab argument hyperparameter tuning done in this example which is instead about maximizing predictive accuracy of a ML model.

Here we are interested in maximizing the label error detection performance with respect to the gold labels. For instance, when verified _labels only contains verifications of certain given labels that are correct, we can optimize for the false positive rate of label error detection. If verified_labels contains verifications of certain labels that are correct and some that are incorrect, then we can optimize for most interesting label error detection metrics such as: AUROC, AUPRC, precision@k, etc.

The text was updated successfully, but these errors were encountered:

jwmueller added the needs triage label Feb 13, 2024

jwmueller added the help-wanted We need your help to add this, but it may be more challenging than a "good first issue" label Apr 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New feature: extend package to gold (verified) labels #1004

New feature: extend package to gold (verified) labels #1004

jwmueller commented Feb 13, 2024

New feature: extend package to gold (verified) labels #1004

New feature: extend package to gold (verified) labels #1004

Comments

jwmueller commented Feb 13, 2024

What can be done with these gold labels?