Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can I only profile dense layer or attention layer in flashinfer rather than the whole kernel? #139

Open
yintao-he opened this issue Feb 27, 2024 · 0 comments
Assignees

Comments

@yintao-he
Copy link

Dear exploiter,

I am a computer architecture PhD student, and I hope to use flashinfer to profile the details computing process like dense layer or attention layer, instead of the whole kernel, like the experiments in https://le.qun.ch/en/blog/2023/05/13/transformer-batching/. However, when I see the code like 'python/csrc/single_decode.cu', it seems the matrix multiplication process is not included in it.
I am not familiar with the CUDA code but I am trying to do that. Can I use flashinfer to do that? Could you pls give me some advices? Thank you.

@yzh119 yzh119 self-assigned this Feb 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants