Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimize clip_by_norm #183

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

zhangting2020
Copy link
Contributor

优化点:

  • mul + set_value,其中set_value导致大量的memcpy,替换成inplace的操作
  • 冗余的clip_norm计算,增加need_grad_norm,仅在tensorboard需要观察该值时再进行计算,否则会引入大量的norm算子
  • 冗余的cast,PR代码中的 paddle_dtype打印结果为“float32”,但是实际tensor.dtype得到的是paddle.float32,会导致判断失败,从而引入无意义的cast。另外O2下梯度为fp16,原始写法需要每次将clip_coef_clamped转换成fp16,实际只需要计算一次,其他的梯度直接使用即可,所以添加了clip_coef_clamped_low_precison 变量。

效果:

  • O1:0.739 steps/s -> 2.194 steps/s
  • O2:1.067 steps/s -> 2.655 steps/s

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


zhangting_2017@163.com seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants