How does the logit_scale vary while training , i noitced that in my case it starts from the 14.28(1/0.07) and then just goes down and towards the end of the training it reaches 1 #815

Akshay1-6180 · 2024-02-13T08:17:00Z

I am running clip on my own dataset and noticed this where the logit_scale converges to 1.
Is this a good behavior to expect , i noticed that the loss becomes constant during this time.
I know that a higher logit_scale amplifies differences between the logits, making the softmax output distribution sharper and thus making the model more confident in its most likely predictions , does the model lowering the logits mean the model is becoming less confident or the model is getting confused
reducing the learning rate resolves this issue but it will start converting towards a value lower than 14 (mostly btw 6-8).Not sure what conclusion i can make from this.
I use adamw optimizer with a VIT B vision model , Bert text encoder and weight decay is 0.1 , eps is 1e-8,betas=[0.9,0.999]

rom1504 · 2024-02-13T09:04:30Z

https://wandb.ai/rom1504/open-clip/reports/xlm-roberta-base-B-32--VmlldzoyOTQ5OTE2 find here one fairly normal clip run

You should see logit scale going to 1, loss decreasing, lr decreasing and accuracy increasing, all fairly in sync.

Akshay1-6180 · 2024-02-13T11:16:11Z

Thanks @rom1504 Thanks for the logs , isnt the logit_scale going towards 100 as the loss decreases in this case , not to 1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How does the logit_scale vary while training , i noitced that in my case it starts from the 14.28(1/0.07) and then just goes down and towards the end of the training it reaches 1 #815

How does the logit_scale vary while training , i noitced that in my case it starts from the 14.28(1/0.07) and then just goes down and towards the end of the training it reaches 1 #815

Akshay1-6180 commented Feb 13, 2024

rom1504 commented Feb 13, 2024

Akshay1-6180 commented Feb 13, 2024

How does the logit_scale vary while training , i noitced that in my case it starts from the 14.28(1/0.07) and then just goes down and towards the end of the training it reaches 1 #815

How does the logit_scale vary while training , i noitced that in my case it starts from the 14.28(1/0.07) and then just goes down and towards the end of the training it reaches 1 #815

Comments

Akshay1-6180 commented Feb 13, 2024

rom1504 commented Feb 13, 2024

Akshay1-6180 commented Feb 13, 2024