CleanLab with Pseudo Labels #84
Replies: 1 comment 2 replies
-
Hi @filippoBUO apologies for the late response, but just wanted to check if you're still interested in this topic! Your pseudolabeling use-case sounds super interesting, and cleanlab's label error identification algorithms definitely share similarities with ideas in pseudolabeling. If possible, could you share a minimal code example to help us make the discussion more precise? In particular, I'm not sure how exactly you are artificially injecting noise during the pseuodlabeling process. If you artificially inject noisy labels at the end where you randomly flip some pseudolabels to incorrect other labels, I would imagine cleanlab can easily discover these noisy examples. |
Beta Was this translation helpful? Give feedback.
-
Hi!
I'd like to try this library in a self-learning scenario. At the moment I've a very big dataset with very few labeled samples, let's say 10% of all the samples, and 90% of unlabeled samples. The goal of self-learning is to label all the unlabeled part in an iterative manner. A chosen classifier is trained on the initial 10% of the labeled dataset and it predicts the labels for the unlabeled part. The K most confident prediction are added to the labeled training dataset and all the process continues up to when all the samples have been labeled. At the end I will have a vector of pseudo labels that is noisy.
My idea was to build a self-learning algorithm using one of the two options:
What I found is that in this setting CleanLab seems not to work as good as when I generate a Transition Matrix and Inject artificially the noise.
Even in the extreme case where with one iteration I want to label all the unlabeled samples at once and then stop, CleanLab is not able to find noise.
Am I missing something?
Thank you a lot and congratulations for this project!
Beta Was this translation helpful? Give feedback.
All reactions