-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compilation failure due to high register usage #214
Comments
Does Metal support register spilling? What happens when you exceed the total numbers of registers available (on CUDA 255 iirc) |
Actually, this isn't a launch failure, it's a compilation failure. So it has nothing to do with the launch configuration. It also means that there's a hard limit on how many registers a kernel can use, however, there's no way to query either that limit or the amount of registers a kernel uses. So I guess we can't do anything about this... |
I think it can spill though. Dummy kernel: using Metal
function kernel(a::AbstractArray{<:NTuple{N, T}}) where {N, T}
i = thread_position_in_grid_1d()
@inbounds begin
# load a large tuple
x = a[i]
# force all of the tuple to be available
s = zero(T)
for i in 1:N
s += x[i]
end
y = let s = s
ntuple(i->x[i]+s, Val(N))
end
# write back out
a[i] = y
end
return
end
function main(N=1)
x = MtlArray{NTuple{N, Int}}(undef, 1)
@metal threads=len kernel(x)
end If I have it load a large amount of data (1000 elements), the generated code starts with:
i.e. loading device memory into registers, and spilling it immediately after. Computing the sum and storing the resulting tuple then consists of a sequence of:
|
As seen on DiffEqGPU.jl:
It's interesting because IIUC the dynamic workgroup size setting there should have used
maxTotalThreadsPerThreadgroup
, which in the case of CUDA takes register usage into account. Maybe there's additional limits we need to respect with Metal?The text was updated successfully, but these errors were encountered: