Apply Tanh activation function to ViT - MLP Head #255

joeycouse · 2023-02-21T17:08:01Z

In the paper:

"In order to stay as close as possible to the original Transformer model, we made use of an additional
[class] token, which is taken as image representation. The output of this token is then transformed into a class prediction via a small multi-layer perceptron (MLP) with tanh as non-linearity
in the single hidden layer."

vit-pytorch/vit_pytorch/vit.py

Lines 110 to 113 in 5699ed7

    
           self.mlp_head = nn.Sequential( 
        
               nn.LayerNorm(dim), 
        
               nn.Linear(dim, num_classes) 
        
           )

Should there be a Tanh() function applied after the linear layer?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Apply Tanh activation function to ViT - MLP Head #255

Apply Tanh activation function to ViT - MLP Head #255

joeycouse commented Feb 21, 2023

Apply Tanh activation function to ViT - MLP Head #255

Apply Tanh activation function to ViT - MLP Head #255

Comments

joeycouse commented Feb 21, 2023