We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
非常感谢PowerInfer的工作和开源,本人对具体的实现非常感兴趣,在经过仔细代码阅读之后,有些地方还是不太理解。
请问为什么要在下投影层实现AXPY算子来计算稀疏矩阵,而上投影层则是正常的vec_dot形式的矩阵乘法?在设计上是有什么独到的考虑吗?
The text was updated successfully, but these errors were encountered:
这是考虑到了神经元的激活在下投影层是按列的,导致了下投影层加载时的不连续性,decoding阶段LLM推理是memory bandwidth bound的,实现思路就是尽可能保证连续的load,并且跳过不必要的memory access。尽管GPU上shared memory是可以从HBM先不连续的load进来再进行计算,但是CPU上的cache没办法显式控制,AXPY可以保证CPU/GPU都能做到连续的memory access,因此设计了AXPY。
Sorry, something went wrong.
很好的设计思路,非常感谢解答!
No branches or pull requests
非常感谢PowerInfer的工作和开源,本人对具体的实现非常感兴趣,在经过仔细代码阅读之后,有些地方还是不太理解。
请问为什么要在下投影层实现AXPY算子来计算稀疏矩阵,而上投影层则是正常的vec_dot形式的矩阵乘法?在设计上是有什么独到的考虑吗?
The text was updated successfully, but these errors were encountered: