Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DeepSpeed Sparse Attention is Broken #863

Open
dashstander opened this issue Mar 29, 2023 · 2 comments
Open

DeepSpeed Sparse Attention is Broken #863

dashstander opened this issue Mar 29, 2023 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@dashstander
Copy link
Contributor

SparseAttention relies on Triton for specific kernels. GPT-NeoX currently has as a dependency triton==0.4.2, which is behind the DeepSpeed version of 1.0.0. It is far behind the version of Triton that we would like to use, 2.0.0.dev20221202, which is required for new Triton features.

Current NeoX and DeepSpeed code cannot use Sparse Attention with any of these versions.

@dashstander dashstander added the bug Something isn't working label Mar 29, 2023
@StellaAthena
Copy link
Member

Is this a “real” issue or can we just change the required version to obtain support?

@dashstander
Copy link
Contributor Author

I tried a range of versions (including with a handful of easy changes to the code) and nothing worked right away.

With an updated Triton version it probably wouldn't be very much effort to fix, but this was in the tail end of adding support for the new Triton Flash Attention and so @Quentin-Anthony advised to just separate this out as an issue. Though, to be clear, the issue isn't introduced by Triton Flash Attention -- DeepSpeed updated without us and now just bumping up the version isn't quite enough to put things right.

@dashstander dashstander self-assigned this Oct 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

2 participants