Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why AXPY? #185

Closed
richardweii opened this issue May 13, 2024 · 2 comments
Closed

Why AXPY? #185

richardweii opened this issue May 13, 2024 · 2 comments
Labels
question Further information is requested

Comments

@richardweii
Copy link

非常感谢PowerInfer的工作和开源,本人对具体的实现非常感兴趣,在经过仔细代码阅读之后,有些地方还是不太理解。

请问为什么要在下投影层实现AXPY算子来计算稀疏矩阵,而上投影层则是正常的vec_dot形式的矩阵乘法?在设计上是有什么独到的考虑吗?

@richardweii richardweii added the question Further information is requested label May 13, 2024
@YixinSong-e
Copy link
Collaborator

这是考虑到了神经元的激活在下投影层是按列的,导致了下投影层加载时的不连续性,decoding阶段LLM推理是memory bandwidth bound的,实现思路就是尽可能保证连续的load,并且跳过不必要的memory access。尽管GPU上shared memory是可以从HBM先不连续的load进来再进行计算,但是CPU上的cache没办法显式控制,AXPY可以保证CPU/GPU都能做到连续的memory access,因此设计了AXPY。

@richardweii
Copy link
Author

很好的设计思路,非常感谢解答!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants