-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Layers in Swin Transformer #323
Comments
Please reply, if anyone knows. @zeliu98 @ancientmooner Please clear my doubt. Regards |
The architecture has four swin transformer blocks, and each block also consists of two. In my understanding, the given layers indicate how many times you should perform each swin transformer block. |
Why does the number of block repetitions follow the logic of having the highest number of repetitions in the third stage? Other Swin variants follow [2,2,18,2]. Can this logic be generalised to other modalities? |
To achieve receptive field the window partition switches each second block inside a stage, this way information between the chunks of window divided tokens can be exchanged slowly with each other. This is why the block size is always dividable by 2. |
I had a doubt about layers in Swin Transformer. As it is mentioned in the architecture of Swin-T that there are 2, 2, 6, 2 layers at stage 1,2,3 and 4.
What does it mean by 2 layers at 1st stage and 6 layers at 3rd stage.
Although there are 2 successive swin transformer blocks, but I am confused with the term layers.
Does it mean that at Layer 1, W-MSA block will be executed and output given to SW-MSA block, then what happens next? What about Layer 2. Does the W-MSA block again executed on the output of SW-MSA block?
@zeliu98 @ancientmooner Please help. Others can also give their views.
Thankyou.
The text was updated successfully, but these errors were encountered: