Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bart-large-mnli multi_class does not agree with Python version #101

Open
jimidle opened this issue Jul 6, 2021 · 7 comments
Open

bart-large-mnli multi_class does not agree with Python version #101

jimidle opened this issue Jul 6, 2021 · 7 comments

Comments

@jimidle
Copy link

jimidle commented Jul 6, 2021

If you convert facebook/bart-large-mnli and use it to evaluate the demo text at huggingface and compare against a local Python setup for verification, we find that:

  • the online demo card and the local Python agree on the label score
  • the label probabilities given back are vastly different
  • the Python version takes roughly 16 seconds on my local machine, but the Spago version takes 37 seconds - this is a MAC and there is no GPU available

Python code is

    text = "A new model offers an explanation for how the Galilean satellites formed around the solar system’s " \
           "largest world. Konstantin Batygin did not set out to solve one of the solar system’s most puzzling " \
           "mysteries when he went for a run up a hill in Nice, France. Dr. Batygin, a Caltech researcher, " \
           "best known for his contributions to the search for the solar system’s missing “Planet Nine,” spotted a " \
           "beer bottle. At a steep, 20 degree grade, he wondered why it wasn’t rolling down the hill. He realized " \
           "there was a breeze at his back holding the bottle in place. Then he had a thought that would only pop " \
           "into the mind of a theoretical astrophysicist: “Oh! This is how Europa formed.” Europa is one of " \
           "Jupiter’s four large Galilean moons. And in a paper published Monday in the Astrophysical Journal, " \
           "Dr. Batygin and a co-author, Alessandro Morbidelli, a planetary scientist at the Côte d’Azur Observatory " \
           "in France, present a theory explaining how some moons form around gas giants like Jupiter and Saturn, " \
           "suggesting that millimeter-sized grains of hail produced during the solar system’s formation became " \
           "trapped around these massive worlds, taking shape one at a time into the potentially habitable moons we " \
           "know today. "
    cc = pipeline("zero-shot-classification", model="facebook/bart-large-mnli")
    labels = ['space & cosmos', 'scientific discovery', 'microbiology', 'robots', 'archeology']
    r = cc(text, labels, multi_class=True)

Go code, with same text and classes, is:

bartModel, err = zsc.LoadModel(bartDir)
// ... check err
result, err := bartModel.Classify(c.InputText, "", classes, true)

Similarly using the model valhalla/distilbart-mnli-12-3 also gives wildly different results to the online huggingface demo, using the same text and label set as above.

So, is there something else I need to do, or is the zsc code not working? My go code is essentially just like the zsc demo code.

@abishekmuthian
Copy link

I didn't read this issue properly and might have opened a duplicate - #103 .

Have you found any further information regarding this since you've opened the issue?

@matteo-grella
Copy link
Member

Hello both of you @jimidle and @abishekmuthian. I apologize for taking so long to respond!

I somehow missed the GitHub notifications, and @marco-nicola was sure I had already dealt with your request.

In fact, this is not a desired behavior. Here we probably have either a bug or a serious loss of accuracy during the various steps of the forward.

Need to investigate, probably comparing the intermediate values of the python implementation. Would any of you be able to assist?

@abishekmuthian
Copy link

Thanks for getting back @matteo-grella , GitHub seems to remove notifications from the site if it sends the email and your emails might not be reaching your inbox.

It does indeed seem to be a serious bug. I can test the project if there's any new patches, But I'm afraid I cannot aid in active debugging immediately due to time constraint.

@matteo-grella
Copy link
Member

Hey @abishekmuthian,

It was a simple True flag that was supposed to be a False flag that confused us both!

Hence, with spaGO, output is forced to be a distribution (sum must be 1), whereas with Python, the output is free.
In fact, the scores reported from you here do not sum to 1.

Let me know

Matteo

@abishekmuthian
Copy link

I see, Glad that the issue was simple to identify and fix. I will conduct some more tests this weekend and get back.

@abishekmuthian
Copy link

abishekmuthian commented Feb 13, 2022

@matteo-grella

Got back to testing Spago.

If I understand your last comment correctly, multiClass flag in Spago is !multi_label flag of Python i.e. It needs to be false to enable it.

Making it false does distribute the confidence values to ~1 but still doesn't match the python #103.

Model: spago/valhalla/distilbart-mnli-12-3

Text: "PalmOS on Raspberry Pi"

Spago multiClass:false

0. tech [0.86]
1. startup [0.05]
2. legal [0.05]
3. business [0.03]

Spago multiClass:true

0. tech [0.89]
1. startup [0.02]
2. legal [0.01]
3. business [0.00]

Python multi_label:true

0. tech [0.99]
1. legal [0.77]
2. startup [0.05]
3. business [0.00]

@matteo-grella
Copy link
Member

@marco-nicola

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants