Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about Abstractor's FFN and Attention #219

Open
jp1924 opened this issue Apr 17, 2024 · 0 comments
Open

Question about Abstractor's FFN and Attention #219

jp1924 opened this issue Apr 17, 2024 · 0 comments

Comments

@jp1924
Copy link

jp1924 commented Apr 17, 2024

@LukeForeverYoung @MAGAer13
First of all, thanks for your great work.

I have a question regarding the Feed Forward Network (FFN) of the Abstarctor and the forward method of MplugOwlVisualAbstractorAttention.

From a #10 issue, I knowed that the abstractor uses an FFN that applies Llama's SwinGLU.
However, in mPlugOwl, it uses LayerNorm instead of Llama's RMSNorm.
Is there a reason for this change? Is LayerNorm used instead of RMSNorm because the Abstarctor is a module for processing images?

Also, as far as I know, MplugOwlVisualAbstractorAttention is designed based on the Q-Former from BLIP-2.

# HACK we apply norm on q and k
hidden_states = self.norm1(hidden_states)
encoder_hidden_states = self.normk(encoder_hidden_states)

However, there is a piece of code in the forward method of MplugOwlVisualAbstractorAttention that does not exist in the Q-Former. Was there a problem in the implementation that required this addition?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant