You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
SparseAttention relies on Triton for specific kernels. GPT-NeoX currently has as a dependency triton==0.4.2, which is behind the DeepSpeed version of 1.0.0. It is far behind the version of Triton that we would like to use, 2.0.0.dev20221202, which is required for new Triton features.
Current NeoX and DeepSpeed code cannot use Sparse Attention with any of these versions.
The text was updated successfully, but these errors were encountered:
I tried a range of versions (including with a handful of easy changes to the code) and nothing worked right away.
With an updated Triton version it probably wouldn't be very much effort to fix, but this was in the tail end of adding support for the new Triton Flash Attention and so @Quentin-Anthony advised to just separate this out as an issue. Though, to be clear, the issue isn't introduced by Triton Flash Attention -- DeepSpeed updated without us and now just bumping up the version isn't quite enough to put things right.
SparseAttention relies on Triton for specific kernels. GPT-NeoX currently has as a dependency
triton==0.4.2
, which is behind the DeepSpeed version of1.0.0
. It is far behind the version of Triton that we would like to use,2.0.0.dev20221202
, which is required for new Triton features.Current NeoX and DeepSpeed code cannot use Sparse Attention with any of these versions.
The text was updated successfully, but these errors were encountered: