Question about Abstractor's FFN and Attention #219

jp1924 · 2024-04-17T07:20:49Z

@LukeForeverYoung @MAGAer13
First of all, thanks for your great work.

I have a question regarding the Feed Forward Network (FFN) of the Abstarctor and the forward method of MplugOwlVisualAbstractorAttention.

From a #10 issue, I knowed that the abstractor uses an FFN that applies Llama's SwinGLU.
However, in mPlugOwl, it uses LayerNorm instead of Llama's RMSNorm.
Is there a reason for this change? Is LayerNorm used instead of RMSNorm because the Abstarctor is a module for processing images?

Also, as far as I know, MplugOwlVisualAbstractorAttention is designed based on the Q-Former from BLIP-2.

# HACK we apply norm on q and k
hidden_states = self.norm1(hidden_states)
encoder_hidden_states = self.normk(encoder_hidden_states)

However, there is a piece of code in the forward method of MplugOwlVisualAbstractorAttention that does not exist in the Q-Former. Was there a problem in the implementation that required this addition?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about Abstractor's FFN and Attention #219

Question about Abstractor's FFN and Attention #219

jp1924 commented Apr 17, 2024 •

edited

Question about Abstractor's FFN and Attention #219

Question about Abstractor's FFN and Attention #219

Comments

jp1924 commented Apr 17, 2024 • edited

jp1924 commented Apr 17, 2024 •

edited