New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
recent version of Transformers seems to mess with forward/__call__. Breaks patching loss function #30753
Comments
Hey! Thanks for opening the issue. |
Hi @ArthurZucker I don't see And just tried again on 4.40.1 and 4.40.2 with fresh env/installs and still see the error (from
|
Just to make sure, I just installed from source (4.41.0.dev0) and still get the error |
I am unable to reproduce. Can you share a google colab with this? 🤗 |
Seems like it is specific to using |
Yep, I was running on a single device. |
System Info
I updated to recent version of transformers for various models/bugs and believe something is happening from transformers that is breaking the ability to patch/wrap a forward that takes in
labels
. I am completely at a loss where it could be happening but seems like it is for many different models that either seem to usetransformers.modeling_utils.PreTrainedModel
,transformers.modeling_utils.ModuleUtilsMixin
,transformers.integrations.peft.PeftAdapterMixin
. A similar but dumber example withouttransformers
I tried seems like it is not having this issue.This code is a weird (in the sense that I am not sure if this is the best way to do something like this) and simplified reproduction but the general idea is that if I pass in labels, I would not want to pass the labels to the forward of the original model (for instance if you want to pass kwargs for weights/reduction/etc to CrossEntropyLoss). If I pass in labels to forward and just overwrite the outputs.loss value, it then works but there are various reasons you may not way to do this (its computing the loss 2x, the labels may be intended for a different loss function that wont work with the original model loss, etc)
Including a minimal reproduction below and didnt see any other recent issues that seem to say similar. Am also really hoping this isnt user error but I believe something like this worked fine when I had a transformers version from a few months ago.
Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Running this, you would expect the loss to be a single tensor value
Expected behavior
outputs.loss
of the model should be similar to the originaloutputs.loss
which is a single tensor valueThe text was updated successfully, but these errors were encountered: