We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
我们复现的模型性能与发布模型性能相差4-5pp(mvbench),考虑是否有flash_attn版本不一致的原因。 发布版本的flash_attn==1.0.4,我们机器安装flash_attn==1.0.4报错,但是可以顺利安装flash_attn==2.4.2。由于flash_attn==2.4.2对于flash_attn==1.0.4是完全重构,想了解一下flash_attn升级版本是否对模型性能产生影响,贵团队是否利用flash_attn==2.4.2训练并测试过模型性能。
The text was updated successfully, but these errors were encountered:
没有测过不同版本的flash-attn欸,不过我觉得这个影响不大,直接测开源的model结果是否接近呢?训练数据和超参有没有区别嘞?其他benchmark的结果差的多吗?
Sorry, something went wrong.
我这边查看了代码,你们训练确实使用的是v2版本的flash_attn,麻烦确认下具体是哪个版本的flash_attn
flash-attn 2.1.1
No branches or pull requests
我们复现的模型性能与发布模型性能相差4-5pp(mvbench),考虑是否有flash_attn版本不一致的原因。
发布版本的flash_attn==1.0.4,我们机器安装flash_attn==1.0.4报错,但是可以顺利安装flash_attn==2.4.2。由于flash_attn==2.4.2对于flash_attn==1.0.4是完全重构,想了解一下flash_attn升级版本是否对模型性能产生影响,贵团队是否利用flash_attn==2.4.2训练并测试过模型性能。
The text was updated successfully, but these errors were encountered: