How to get the most probable labels for datapoints that have label issues #405
-
If a datapoint is judged to be an issued one, how to get the most probable label of this datapoint? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
Good question, @luo3300612! When you use By default, the function returns a boolean mask instead of indices that you can use to select the predicted probabilities of examples with label-issues. from cleanlab.filter import find_label_issues
import numpy as np
labels = [0, 0, 1, 1, 2, 2]
pred_probs = np.array([
[0.9, 0.025, 0.075],
[0.75, 0.15, 0.1],
[0.1, 0.05, 0.85], # Predicted label is 2
[0.3, 0.5, 0.2],
[0.05, 0.9, 0.05], # Predicted label is 1
[0.1, 0.1, 0.8],
])
issues = find_label_issues(labels, pred_probs)
# array([False, False, True, False, True, False])
np.argmax(pred_probs[issues], axis=1)
# array([2, 1]) The same thing can be done with indices: issue_indices = find_label_issues(
labels,
pred_probs,
return_indices_ranked_by="self_confidence"
)
# array([2, 4])
np.argmax(pred_probs[issue_indices], axis=1)
# array([2, 1]) but note that the indices will be sorted by their associated label quality scores. |
Beta Was this translation helpful? Give feedback.
-
Hi @luo3300612 You can get the predictions more easily than @elisno answer by using Then just That will return a data frame with everything you need. If your goal is improve/correct the labels, we created a tool for you that does that automatically called Cleanlab Studio (https://Cleanlab.ai/studio). This will provide a nice interface for you to obtain a much more accurately labeled dataset if that's your goal. |
Beta Was this translation helpful? Give feedback.
Hi @luo3300612
You can get the predictions more easily than @elisno answer by using
from cleanlab.classification import CleanLearning
Then just
CleanLearning().find_label_issues(data, labels, pred_probs)
That will return a data frame with everything you need.
If your goal is improve/correct the labels, we created a tool for you that does that automatically called Cleanlab Studio (https://Cleanlab.ai/studio). This will provide a nice interface for you to obtain a much more accurately labeled dataset if that's your goal.