Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there any particular reason why bias term is kept as False in the projection layers #807

Open
Akshay1-6180 opened this issue Jan 30, 2024 · 2 comments

Comments

@Akshay1-6180
Copy link

In the code given here in this file https://github.com/mlfoundations/open_clip/blob/main/src/open_clip/hf_model.py for the projection head the bias is turned to False , I feel it shouldnt matter and keeping it as True would make it better , is there any particular reason this was kept as False


        d_model = getattr(self.config, arch_dict[self.config.model_type]["config_names"]["width"])
        if (d_model == output_dim) and (proj_type is None):  # do we always need a proj?
            self.proj = nn.Identity()
        elif proj_type == 'linear':
            self.proj = nn.Linear(d_model, output_dim, bias=False)
        elif proj_type == 'mlp':
            hidden_size = (d_model + output_dim) // 2
            self.proj = nn.Sequential(
                nn.Linear(d_model, hidden_size, bias=False),
                nn.GELU(),
                nn.Linear(hidden_size, output_dim, bias=False),
            )
@rwightman
Copy link
Collaborator

@Akshay1-6180 the original OpenAI CLIP model has no bias on the final vision and text tower projections, so this was to stick closer to that... but, no reason it wouldn't work, or possibly be better in some cases...

In the timm vision adapter there's a config value for the bias

if proj == 'linear':
head_layers['drop'] = nn.Dropout(drop)
head_layers['proj'] = nn.Linear(prev_chs, embed_dim, bias=proj_bias)
elif proj == 'mlp':
head_layers['mlp'] = Mlp(prev_chs, 2 * embed_dim, embed_dim, drop=(drop, 0), bias=(True, proj_bias))
self.head = nn.Sequential(head_layers)

A hf_proj_bias could be added...

timm_proj_bias: bool = False # enable bias final projection
timm_drop: float = 0. # head dropout
timm_drop_path: Optional[float] = None # backbone stochastic depth
@dataclass
class CLIPTextCfg:
context_length: int = 77
vocab_size: int = 49408
hf_tokenizer_name: Optional[str] = None
tokenizer_kwargs: Optional[dict] = None
width: int = 512
heads: int = 8
layers: int = 12
mlp_ratio: float = 4.0
ls_init_value: Optional[float] = None # layer scale initial value
embed_cls: bool = False
pad_id: int = 0
no_causal_mask: bool = False # disable causal masking
final_ln_after_pool: bool = False # apply final LayerNorm after pooling
pool_type: str = 'argmax'
proj_bias: bool = False
output_tokens: bool = False
act_kwargs: dict = None
norm_kwargs: dict = None
# HuggingFace specific text tower config
hf_model_name: Optional[str] = None
hf_model_pretrained: bool = True
hf_proj_type: str = 'mlp'
hf_pooler_type: str = 'mean_pooler' # attentional pooling for HF models

@Akshay1-6180
Copy link
Author

Akshay1-6180 commented Feb 4, 2024

@rwightman Thanks for the detailed reply , I guess the openAI team through empirical analysis might have seen that there is no difference is adding bias or making the mlp layer more dense and might have gone for the simplest one for interpretability. (Occam's razor)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants