We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
作者您好!我在3090、4090、A100-80G上进行了复现,复现的细节如下
选用的模型:PowerInfer/ReluLLaMA-70B-PowerInfer-GGUF
通过对输出的结果分析,得到各个指标如下面的柱状图:
当限制输出长度为128时,
当限制输出长度为256时,
从两个图看出,3090和4090表现符合预期,因为4090计算能力上高于3090,但是对于A100来说显示比较异常;
因此我有以下几个问题
在所有的机器上,我的推理代码都是: ./build/bin/main -m ./[PowerInfer/ReluLLaMA-70B-PowerInfer-GGUF/llama2-70b-relu.q4.powerinfer.gguf -n 128 -t 8 -p "Once upon a time"
./build/bin/main -m ./[PowerInfer/ReluLLaMA-70B-PowerInfer-GGUF/llama2-70b-relu.q4.powerinfer.gguf -n 128 -t 8 -p "Once upon a time"
这种异常的情况是因为环境的原因吗?我在A100-40G和A100-80G上测得都有这种情况。
The text was updated successfully, but these errors were encountered:
补充一下我做的实验过程,每台机器上间隔5秒跑10次取平均值./build/bin/main -m ./[PowerInfer/ReluLLaMA-70B-PowerInfer-GGUF/llama2-70b-relu.q4.powerinfer.gguf -n 128 -t 8 -p "Once upon a time"代码。还有一个疑问就是,按理来说,无论生成序列长度是多少,load_time不应在A100波动这么大才对。
Sorry, something went wrong.
No branches or pull requests
作者您好!我在3090、4090、A100-80G上进行了复现,复现的细节如下
选用的模型:PowerInfer/ReluLLaMA-70B-PowerInfer-GGUF
通过对输出的结果分析,得到各个指标如下面的柱状图:
当限制输出长度为128时,
当限制输出长度为256时,
从两个图看出,3090和4090表现符合预期,因为4090计算能力上高于3090,但是对于A100来说显示比较异常;
因此我有以下几个问题
在所有的机器上,我的推理代码都是:
./build/bin/main -m ./[PowerInfer/ReluLLaMA-70B-PowerInfer-GGUF/llama2-70b-relu.q4.powerinfer.gguf -n 128 -t 8 -p "Once upon a time"
这种异常的情况是因为环境的原因吗?我在A100-40G和A100-80G上测得都有这种情况。
The text was updated successfully, but these errors were encountered: