Can I only profile dense layer or attention layer in flashinfer rather than the whole kernel? #139

yintao-he · 2024-02-27T00:50:40Z

Dear exploiter,

I am a computer architecture PhD student, and I hope to use flashinfer to profile the details computing process like dense layer or attention layer, instead of the whole kernel, like the experiments in https://le.qun.ch/en/blog/2023/05/13/transformer-batching/. However, when I see the code like 'python/csrc/single_decode.cu', it seems the matrix multiplication process is not included in it.
I am not familiar with the CUDA code but I am trying to do that. Can I use flashinfer to do that? Could you pls give me some advices? Thank you.

yzh119 self-assigned this Feb 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can I only profile dense layer or attention layer in flashinfer rather than the whole kernel? #139

Can I only profile dense layer or attention layer in flashinfer rather than the whole kernel? #139

yintao-he commented Feb 27, 2024

Can I only profile dense layer or attention layer in flashinfer rather than the whole kernel? #139

Can I only profile dense layer or attention layer in flashinfer rather than the whole kernel? #139

Comments

yintao-he commented Feb 27, 2024