-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Equivalent of cuSparse - start with sparse matvec #208
Comments
Metal's performance shaders library does not support sparse arrays. Apple Accelerate does, but that's for the CPU. Maybe that's good enough, though (with the memory being unified anyway)? A generic implementation would be nice, but I don't have much experience with sparse algorithms. What operations would be important? |
There is a
|
What makes |
I guess I am trying to figure out what is the right programming model to keep in mind here would be. Getting a fast sparse matvec (and getting Conjugate Gradient working) followed by a fast matmul would be a good starting point to explore what is possible. I'll experiment with a few things and see how far I can get. |
There's some native kernels I wrote in CUDA.jl, https://github.com/JuliaGPU/CUDA.jl/blob/master/lib/cusparse/broadcast.jl, which use row/column iterators that 'zip' the multiple inputs. Thus, they parallelize across one dimension of the sparse input. Multiplication is much more difficult though, as there isn't a straightforward dimension to accelerate over. (The crux of the issue is that we cannot have an efficient |
Also note that those CUDA.jl kernels are ideally suited to be ported to GPUArrays using KA.jl, once we start doing that, as they don't use any advanced CUDA features. |
From After that, a sparse matmul is valuable. Since CUDA uses CSR, perhaps we could just use that for Metal.jl as well. |
I believe there is currently no sparse matrix capability in Metal.jl. What is the easiest way to get some basic things working?
Perhaps a bigger question is whether we can have a generic sparse matrix implementation that can work on all our GPU backends.
The text was updated successfully, but these errors were encountered: