Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

关于在A100显卡上测得的效果异常的疑问 #184

Open
bulaikexiansheng opened this issue May 4, 2024 · 1 comment
Open

关于在A100显卡上测得的效果异常的疑问 #184

bulaikexiansheng opened this issue May 4, 2024 · 1 comment
Labels
question Further information is requested

Comments

@bulaikexiansheng
Copy link

作者您好!我在3090、4090、A100-80G上进行了复现,复现的细节如下

选用的模型:PowerInfer/ReluLLaMA-70B-PowerInfer-GGUF

通过对输出的结果分析,得到各个指标如下面的柱状图:

当限制输出长度为128时,
限制一张卡可见_avg_4090_a100_3090

当限制输出长度为256时,
限制一张卡可见_avg_4090_a100_3090

从两个图看出,3090和4090表现符合预期,因为4090计算能力上高于3090,但是对于A100来说显示比较异常;

因此我有以下几个问题

  1. A100的带宽大于4090和3090,为何在load_time上表现最差?
  2. A100的计算能力比4090和3090强,为何token生成速率慢于4090和3090?
  3. 除了A100,上述图中还有没有其他错误的地方?

在所有的机器上,我的推理代码都是:
./build/bin/main -m ./[PowerInfer/ReluLLaMA-70B-PowerInfer-GGUF/llama2-70b-relu.q4.powerinfer.gguf -n 128 -t 8 -p "Once upon a time"

这种异常的情况是因为环境的原因吗?我在A100-40G和A100-80G上测得都有这种情况。

@bulaikexiansheng bulaikexiansheng added the question Further information is requested label May 4, 2024
@bulaikexiansheng
Copy link
Author

补充一下我做的实验过程,每台机器上间隔5秒跑10次取平均值./build/bin/main -m ./[PowerInfer/ReluLLaMA-70B-PowerInfer-GGUF/llama2-70b-relu.q4.powerinfer.gguf -n 128 -t 8 -p "Once upon a time"代码。还有一个疑问就是,按理来说,无论生成序列长度是多少,load_time不应在A100波动这么大才对。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant