Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flash_attn版本讨论 #148

Open
zhanghang-official opened this issue Mar 15, 2024 · 3 comments
Open

flash_attn版本讨论 #148

zhanghang-official opened this issue Mar 15, 2024 · 3 comments

Comments

@zhanghang-official
Copy link

我们复现的模型性能与发布模型性能相差4-5pp(mvbench),考虑是否有flash_attn版本不一致的原因。
发布版本的flash_attn==1.0.4,我们机器安装flash_attn==1.0.4报错,但是可以顺利安装flash_attn==2.4.2。由于flash_attn==2.4.2对于flash_attn==1.0.4是完全重构,想了解一下flash_attn升级版本是否对模型性能产生影响,贵团队是否利用flash_attn==2.4.2训练并测试过模型性能。

@Andy1621
Copy link
Collaborator

没有测过不同版本的flash-attn欸,不过我觉得这个影响不大,直接测开源的model结果是否接近呢?训练数据和超参有没有区别嘞?其他benchmark的结果差的多吗?

@zhanghang-official
Copy link
Author

image
我这边查看了代码,你们训练确实使用的是v2版本的flash_attn,麻烦确认下具体是哪个版本的flash_attn

@Andy1621
Copy link
Collaborator

flash-attn 2.1.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants