You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
How does the logit_scale vary while training , i noitced that in my case it starts from the 14.28(1/0.07) and then just goes down and towards the end of the training it reaches 1
#815
Open
Akshay1-6180 opened this issue
Feb 13, 2024
· 2 comments
I am running clip on my own dataset and noticed this where the logit_scale converges to 1.
Is this a good behavior to expect , i noticed that the loss becomes constant during this time.
I know that a higher logit_scale amplifies differences between the logits, making the softmax output distribution sharper and thus making the model more confident in its most likely predictions , does the model lowering the logits mean the model is becoming less confident or the model is getting confused
reducing the learning rate resolves this issue but it will start converting towards a value lower than 14 (mostly btw 6-8).Not sure what conclusion i can make from this.
I use adamw optimizer with a VIT B vision model , Bert text encoder and weight decay is 0.1 , eps is 1e-8,betas=[0.9,0.999]
The text was updated successfully, but these errors were encountered:
I am running clip on my own dataset and noticed this where the logit_scale converges to 1.
Is this a good behavior to expect , i noticed that the loss becomes constant during this time.
I know that a higher logit_scale amplifies differences between the logits, making the softmax output distribution sharper and thus making the model more confident in its most likely predictions , does the model lowering the logits mean the model is becoming less confident or the model is getting confused
reducing the learning rate resolves this issue but it will start converting towards a value lower than 14 (mostly btw 6-8).Not sure what conclusion i can make from this.
I use adamw optimizer with a VIT B vision model , Bert text encoder and weight decay is 0.1 , eps is 1e-8,betas=[0.9,0.999]
The text was updated successfully, but these errors were encountered: