bart-large-mnli multi_class does not agree with Python version #101

jimidle · 2021-07-06T05:27:06Z

If you convert facebook/bart-large-mnli and use it to evaluate the demo text at huggingface and compare against a local Python setup for verification, we find that:

the online demo card and the local Python agree on the label score
the label probabilities given back are vastly different
the Python version takes roughly 16 seconds on my local machine, but the Spago version takes 37 seconds - this is a MAC and there is no GPU available

Python code is

    text = "A new model offers an explanation for how the Galilean satellites formed around the solar system’s " \
           "largest world. Konstantin Batygin did not set out to solve one of the solar system’s most puzzling " \
           "mysteries when he went for a run up a hill in Nice, France. Dr. Batygin, a Caltech researcher, " \
           "best known for his contributions to the search for the solar system’s missing “Planet Nine,” spotted a " \
           "beer bottle. At a steep, 20 degree grade, he wondered why it wasn’t rolling down the hill. He realized " \
           "there was a breeze at his back holding the bottle in place. Then he had a thought that would only pop " \
           "into the mind of a theoretical astrophysicist: “Oh! This is how Europa formed.” Europa is one of " \
           "Jupiter’s four large Galilean moons. And in a paper published Monday in the Astrophysical Journal, " \
           "Dr. Batygin and a co-author, Alessandro Morbidelli, a planetary scientist at the Côte d’Azur Observatory " \
           "in France, present a theory explaining how some moons form around gas giants like Jupiter and Saturn, " \
           "suggesting that millimeter-sized grains of hail produced during the solar system’s formation became " \
           "trapped around these massive worlds, taking shape one at a time into the potentially habitable moons we " \
           "know today. "
    cc = pipeline("zero-shot-classification", model="facebook/bart-large-mnli")
    labels = ['space & cosmos', 'scientific discovery', 'microbiology', 'robots', 'archeology']
    r = cc(text, labels, multi_class=True)

Go code, with same text and classes, is:

bartModel, err = zsc.LoadModel(bartDir)
// ... check err
result, err := bartModel.Classify(c.InputText, "", classes, true)

Similarly using the model valhalla/distilbart-mnli-12-3 also gives wildly different results to the online huggingface demo, using the same text and label set as above.

So, is there something else I need to do, or is the zsc code not working? My go code is essentially just like the zsc demo code.

The text was updated successfully, but these errors were encountered:

abishekmuthian · 2021-09-14T02:58:10Z

I didn't read this issue properly and might have opened a duplicate - #103 .

Have you found any further information regarding this since you've opened the issue?

matteo-grella · 2021-09-29T18:04:30Z

Hello both of you @jimidle and @abishekmuthian. I apologize for taking so long to respond!

I somehow missed the GitHub notifications, and @marco-nicola was sure I had already dealt with your request.

In fact, this is not a desired behavior. Here we probably have either a bug or a serious loss of accuracy during the various steps of the forward.

Need to investigate, probably comparing the intermediate values of the python implementation. Would any of you be able to assist?

abishekmuthian · 2021-09-30T04:53:24Z

Thanks for getting back @matteo-grella , GitHub seems to remove notifications from the site if it sends the email and your emails might not be reaching your inbox.

It does indeed seem to be a serious bug. I can test the project if there's any new patches, But I'm afraid I cannot aid in active debugging immediately due to time constraint.

matteo-grella · 2021-12-01T18:17:22Z

Hey @abishekmuthian,

It was a simple True flag that was supposed to be a False flag that confused us both!

Hence, with spaGO, output is forced to be a distribution (sum must be 1), whereas with Python, the output is free.
In fact, the scores reported from you here do not sum to 1.

Let me know

Matteo

abishekmuthian · 2021-12-02T16:53:31Z

I see, Glad that the issue was simple to identify and fix. I will conduct some more tests this weekend and get back.

abishekmuthian · 2022-02-13T13:51:12Z

@matteo-grella

Got back to testing Spago.

If I understand your last comment correctly, multiClass flag in Spago is !multi_label flag of Python i.e. It needs to be false to enable it.

Making it false does distribute the confidence values to ~1 but still doesn't match the python #103.

Model: spago/valhalla/distilbart-mnli-12-3

Text: "PalmOS on Raspberry Pi"

Spago multiClass:false

0. tech [0.86]
1. startup [0.05]
2. legal [0.05]
3. business [0.03]

Spago multiClass:true

0. tech [0.89]
1. startup [0.02]
2. legal [0.01]
3. business [0.00]

Python multi_label:true

0. tech [0.99]
1. legal [0.77]
2. startup [0.05]
3. business [0.00]

matteo-grella · 2022-02-13T22:12:10Z

@marco-nicola

matteo-grella mentioned this issue Dec 1, 2021

Differences in the output of zero shot classification between python & spago for the same model #103

Open

Tonghua-Li mentioned this issue Jan 2, 2022

QA Chinese model result does not match python version #108

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bart-large-mnli multi_class does not agree with Python version #101

bart-large-mnli multi_class does not agree with Python version #101

jimidle commented Jul 6, 2021

abishekmuthian commented Sep 14, 2021

matteo-grella commented Sep 29, 2021

abishekmuthian commented Sep 30, 2021

matteo-grella commented Dec 1, 2021

abishekmuthian commented Dec 2, 2021

abishekmuthian commented Feb 13, 2022 •

edited

matteo-grella commented Feb 13, 2022

bart-large-mnli multi_class does not agree with Python version #101

bart-large-mnli multi_class does not agree with Python version #101

Comments

jimidle commented Jul 6, 2021

abishekmuthian commented Sep 14, 2021

matteo-grella commented Sep 29, 2021

abishekmuthian commented Sep 30, 2021

matteo-grella commented Dec 1, 2021

abishekmuthian commented Dec 2, 2021

abishekmuthian commented Feb 13, 2022 • edited

matteo-grella commented Feb 13, 2022

abishekmuthian commented Feb 13, 2022 •

edited