Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The Adam in 5-1.Transformer should be replaced by SGD #76

Open
Cheng0829 opened this issue Sep 16, 2022 · 0 comments
Open

The Adam in 5-1.Transformer should be replaced by SGD #76

Cheng0829 opened this issue Sep 16, 2022 · 0 comments

Comments

@Cheng0829
Copy link

Line 202 :
optimizer = optim.Adam(model.parameters(), lr=0.001)

In practice, I think the effect of Adam is quite bad. When epoch = 10, cost is 1.6; when epoch = 100 or 1000, cost is still equal to 1.6.
So I think we can change Adam to SGD, that is, optimizer = optim.SGD(model.parameters(), lr=0.001)

Here are the effects of using SGD:

Epoch: 0100 cost = 0.047965
Epoch: 0200 cost = 0.020129
Epoch: 0300 cost = 0.012563
Epoch: 0400 cost = 0.009101
Epoch: 0500 cost = 0.007131
Epoch: 0600 cost = 0.005862
Epoch: 0700 cost = 0.004978
Epoch: 0800 cost = 0.004325
Epoch: 0900 cost = 0.003823
Epoch: 1000 cost = 0.003426
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant