DeepSpeed Sparse Attention is Broken #863

dashstander · 2023-03-29T18:59:40Z

SparseAttention relies on Triton for specific kernels. GPT-NeoX currently has as a dependency triton==0.4.2, which is behind the DeepSpeed version of 1.0.0. It is far behind the version of Triton that we would like to use, 2.0.0.dev20221202, which is required for new Triton features.

Current NeoX and DeepSpeed code cannot use Sparse Attention with any of these versions.

The text was updated successfully, but these errors were encountered:

StellaAthena · 2023-03-30T16:36:10Z

Is this a “real” issue or can we just change the required version to obtain support?

dashstander · 2023-03-30T17:23:14Z

I tried a range of versions (including with a handful of easy changes to the code) and nothing worked right away.

With an updated Triton version it probably wouldn't be very much effort to fix, but this was in the tail end of adding support for the new Triton Flash Attention and so @Quentin-Anthony advised to just separate this out as an issue. Though, to be clear, the issue isn't introduced by Triton Flash Attention -- DeepSpeed updated without us and now just bumping up the version isn't quite enough to put things right.

dashstander added the bug Something isn't working label Mar 29, 2023

dashstander mentioned this issue Mar 29, 2023

ALibi & Flash Attention #864

Merged

dashstander self-assigned this Oct 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DeepSpeed Sparse Attention is Broken #863

DeepSpeed Sparse Attention is Broken #863

dashstander commented Mar 29, 2023

StellaAthena commented Mar 30, 2023

dashstander commented Mar 30, 2023

DeepSpeed Sparse Attention is Broken #863

DeepSpeed Sparse Attention is Broken #863

Comments

dashstander commented Mar 29, 2023

StellaAthena commented Mar 30, 2023

dashstander commented Mar 30, 2023