Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How does the logit_scale vary while training , i noitced that in my case it starts from the 14.28(1/0.07) and then just goes down and towards the end of the training it reaches 1 #815

Open
Akshay1-6180 opened this issue Feb 13, 2024 · 2 comments

Comments

@Akshay1-6180
Copy link

Screenshot 2024-02-13 at 1 31 12 PM Screenshot 2024-02-13 at 1 31 25 PM

I am running clip on my own dataset and noticed this where the logit_scale converges to 1.
Is this a good behavior to expect , i noticed that the loss becomes constant during this time.
I know that a higher logit_scale amplifies differences between the logits, making the softmax output distribution sharper and thus making the model more confident in its most likely predictions , does the model lowering the logits mean the model is becoming less confident or the model is getting confused
reducing the learning rate resolves this issue but it will start converting towards a value lower than 14 (mostly btw 6-8).Not sure what conclusion i can make from this.
I use adamw optimizer with a VIT B vision model , Bert text encoder and weight decay is 0.1 , eps is 1e-8,betas=[0.9,0.999]

@rom1504
Copy link
Collaborator

rom1504 commented Feb 13, 2024

https://wandb.ai/rom1504/open-clip/reports/xlm-roberta-base-B-32--VmlldzoyOTQ5OTE2 find here one fairly normal clip run

You should see logit scale going to 1, loss decreasing, lr decreasing and accuracy increasing, all fairly in sync.

@Akshay1-6180
Copy link
Author

Thanks @rom1504 Thanks for the logs , isnt the logit_scale going towards 100 as the loss decreases in this case , not to 1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants