Is there any particular reason why bias term is kept as False in the projection layers #807

Akshay1-6180 · 2024-01-30T09:32:54Z

In the code given here in this file https://github.com/mlfoundations/open_clip/blob/main/src/open_clip/hf_model.py for the projection head the bias is turned to False , I feel it shouldnt matter and keeping it as True would make it better , is there any particular reason this was kept as False


        d_model = getattr(self.config, arch_dict[self.config.model_type]["config_names"]["width"])
        if (d_model == output_dim) and (proj_type is None):  # do we always need a proj?
            self.proj = nn.Identity()
        elif proj_type == 'linear':
            self.proj = nn.Linear(d_model, output_dim, bias=False)
        elif proj_type == 'mlp':
            hidden_size = (d_model + output_dim) // 2
            self.proj = nn.Sequential(
                nn.Linear(d_model, hidden_size, bias=False),
                nn.GELU(),
                nn.Linear(hidden_size, output_dim, bias=False),
            )

The text was updated successfully, but these errors were encountered:

rwightman · 2024-02-04T00:42:02Z

@Akshay1-6180 the original OpenAI CLIP model has no bias on the final vision and text tower projections, so this was to stick closer to that... but, no reason it wouldn't work, or possibly be better in some cases...

In the timm vision adapter there's a config value for the bias

open_clip/src/open_clip/timm_model.py

Lines 102 to 108 in 3ff1faf

    
           if proj == 'linear': 
        
               head_layers['drop'] = nn.Dropout(drop) 
        
               head_layers['proj'] = nn.Linear(prev_chs, embed_dim, bias=proj_bias) 
        
           elif proj == 'mlp': 
        
               head_layers['mlp'] = Mlp(prev_chs, 2 * embed_dim, embed_dim, drop=(drop, 0), bias=(True, proj_bias)) 
        
           self.head = nn.Sequential(head_layers)

A hf_proj_bias could be added...

open_clip/src/open_clip/model.py

Lines 52 to 83 in 3ff1faf

    
               timm_proj_bias: bool = False  # enable bias final projection 
        
               timm_drop: float = 0.  # head dropout 
        
               timm_drop_path: Optional[float] = None  # backbone stochastic depth 
        
           @dataclass 
        
           class CLIPTextCfg: 
        
               context_length: int = 77 
        
               vocab_size: int = 49408 
        
               hf_tokenizer_name: Optional[str] = None 
        
               tokenizer_kwargs: Optional[dict] = None 
        
               width: int = 512 
        
               heads: int = 8 
        
               layers: int = 12 
        
               mlp_ratio: float = 4.0 
        
               ls_init_value: Optional[float] = None  # layer scale initial value 
        
               embed_cls: bool = False 
        
               pad_id: int = 0 
        
               no_causal_mask: bool = False  # disable causal masking 
        
               final_ln_after_pool: bool = False  # apply final LayerNorm after pooling 
        
               pool_type: str = 'argmax' 
        
               proj_bias: bool = False 
        
               output_tokens: bool = False 
        
               act_kwargs: dict = None 
        
               norm_kwargs: dict = None 
        
               # HuggingFace specific text tower config 
        
               hf_model_name: Optional[str] = None 
        
               hf_model_pretrained: bool = True 
        
               hf_proj_type: str = 'mlp' 
        
               hf_pooler_type: str = 'mean_pooler'  # attentional pooling for HF models

Akshay1-6180 · 2024-02-04T05:41:29Z

@rwightman Thanks for the detailed reply , I guess the openAI team through empirical analysis might have seen that there is no difference is adding bias or making the mlp layer more dense and might have gone for the simplest one for interpretability. (Occam's razor)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there any particular reason why bias term is kept as False in the projection layers #807

Is there any particular reason why bias term is kept as False in the projection layers #807

Akshay1-6180 commented Jan 30, 2024

rwightman commented Feb 4, 2024

Akshay1-6180 commented Feb 4, 2024 •

edited

Is there any particular reason why bias term is kept as False in the projection layers #807

Is there any particular reason why bias term is kept as False in the projection layers #807

Comments

Akshay1-6180 commented Jan 30, 2024

rwightman commented Feb 4, 2024

Akshay1-6180 commented Feb 4, 2024 • edited

Akshay1-6180 commented Feb 4, 2024 •

edited