You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
"In order to stay as close as possible to the original Transformer model, we made use of an additional
[class] token, which is taken as image representation. The output of this token is then transformed into a class prediction via a small multi-layer perceptron (MLP) with tanh as non-linearity
in the single hidden layer."
In the paper:
"In order to stay as close as possible to the original Transformer model, we made use of an additional
[class] token, which is taken as image representation. The output of this token is then transformed into a class prediction via a small multi-layer perceptron (MLP) with tanh as non-linearity
in the single hidden layer."
vit-pytorch/vit_pytorch/vit.py
Lines 110 to 113 in 5699ed7
Should there be a Tanh() function applied after the linear layer?
The text was updated successfully, but these errors were encountered: