Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

All real/authentic audios in real subfolder are classified as 'fake' with the pre-trained model #10

Open
irdance opened this issue Jun 15, 2020 · 7 comments

Comments

@irdance
Copy link

irdance commented Jun 15, 2020

Hi when I ran the inference.py file with all the audio files in the 'real' subfolder - it misclassified them as 'fake'. I just wanted to check that the pre-trained model is the correct one?

@headcrabz
Copy link

yea I got the same thing, pre trained model seems to be inaccurate

@ranasac19878
Copy link
Collaborator

Hi guys,
Thanks for pointing out this issue. Currently, the pretrained model only works for audio files that are in distribution of input data it was trained on. We deliberately provided hard and out of distribution audio files in the 'real' and 'fake' subfolders to show that this work is still in progress. It is very hard to train a generalized model that will work on any audio files out of the blue. It will be great if you guys can put in some ideas.

Thanks,
Sachin

@irdance
Copy link
Author

irdance commented Jul 6, 2020

Thanks @ranasac19878 just for clarity was the pre-trained model trained on the test dataset as the model does quite well on the test dataset. The test set does contain 'out of distribution' audio files as some of the fake audio files in the test set are generated from different deepfake audio models.

My hunch is that the variety of accents in the dataset (train + test) is limited and therefore may not work well with different accents.

@ranasac19878
Copy link
Collaborator

@irdance the model was not trained on test data set but the test set was used as a validation set to set the hyperparams for the neural network. It is not technically correct to do so but it was very difficult to get a model perform good on test otherwise since the distribution of test set is different than validation set.

In the way forward, we will be working to make the model more resilient using adversarial training and other data augmentation techniques.

Yes speech accent is definitely an indication of distrbution difference but there may be some other small differences in the distribution like the number of pauses, time between pauses etc. that the model might have overfitted on given the training data.

Sachin

@thaya-k
Copy link

thaya-k commented Oct 26, 2020

Hi, I placed all my audios (both natural and synthesized; total 280) in the path "/data/inference_data/unlabeled" and used the pre-trained model for the classification. Since I am using the terminal mode (Ubuntu), I can't see the "print out with information on predictions of the model, the accuracy of the model on your provided data." However, the result shows likelihood values (correct me if I'm wrong) with a sentence "The probability of the clip being real is: 0.00%". How can I interpret the results?
P.S. I have attached the results in a graph format with likelihood values.

Screen Shot 2020-10-26 at 11 59 37 AM

Picture1

@ranasac19878
Copy link
Collaborator

Hi Thaya,

Thanks for the info. Currently, the pretrained model works well only for data it is trained/ validated on. If the data distrubution changes, this model will always default its prediction to fake since the original data had 1:9 ratio or real to fake audio clips. We are working on training another model that will work for out data distribution audio clips in coming months.

The likelihood value is the model propensity score of a clip being real or not.

Thanks,
Sachin

@yzslry
Copy link

yzslry commented Apr 10, 2022

The link to download the asv data in this project seems to be invalid. Can you provide the data or link in the project?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants