You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am a computer architecture PhD student, and I hope to use flashinfer to profile the details computing process like dense layer or attention layer, instead of the whole kernel, like the experiments in https://le.qun.ch/en/blog/2023/05/13/transformer-batching/. However, when I see the code like 'python/csrc/single_decode.cu', it seems the matrix multiplication process is not included in it.
I am not familiar with the CUDA code but I am trying to do that. Can I use flashinfer to do that? Could you pls give me some advices? Thank you.
The text was updated successfully, but these errors were encountered:
Dear exploiter,
I am a computer architecture PhD student, and I hope to use flashinfer to profile the details computing process like dense layer or attention layer, instead of the whole kernel, like the experiments in https://le.qun.ch/en/blog/2023/05/13/transformer-batching/. However, when I see the code like 'python/csrc/single_decode.cu', it seems the matrix multiplication process is not included in it.
I am not familiar with the CUDA code but I am trying to do that. Can I use flashinfer to do that? Could you pls give me some advices? Thank you.
The text was updated successfully, but these errors were encountered: