New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
关于token生成速率的计算问题 #150
Comments
如我们论文中所述,我们在评估模型推理速度使用的计算方法是使用prompt和输出的总token数,除以prompt phase和generation phase的总时间。另外,为了控制输出token数确定,我们会使用 从你的例子来看,正确的计算结果是 (79 + 253) / (2031.23/1000 + 10300.94/1000) = 26.92 tokens/s。 |
感谢您的解答!请问您能提供github中在4090上evaluation的Falcon 40B和LLaMA 70B所使用的代码吗?我的代码如下: falcon-40b-512 算得:590/(1747.90 / 1000 + 20435.84 / 1000) = 26.60 |
论文evaluation所使用的代码是在内部代码库开发的,和开源版本代码略有不同,我们无法直接提供。目前开源代码在不需要过多干预offloading等细节的前提下,性能已经可以和内部版本经过复杂的手动配置后保持一致。 如果你想在现在的基础上进一步提升性能,可以考虑使用略微超过硬件上限的 |
您好!感谢您出色的工作!我在复现您的工作的时候尝试计算每一个模型的生成速率(针对于您提供的Evaluation中的Falcon-40b)。代码为
./build/bin/main -m /data/models/falcon-40b-relu-powerinfer/falcon-40b-relu.q4.powerinfer.gguf -n 512 -t 8 -p "In the depths of twilight, where shadows dance with whispers, ancient secrets stir beneath the surface, beckoning the curious to unravel mysteries that linger within the fabric of time and space, awaiting discovery and enlightenment. The moon casts its gentle glow, illuminating pathways obscured by darkness, guiding intrepid souls on a journey towards the unknown, where truths and wonders intertwine in the eternal dance of existence." --vram-budget 22
我使用以下的方法计算Falcon-40B:
llama_print_timings: load time = 13806.71 ms
llama_print_timings: sample time = 240.38 ms / 254 runs ( 0.95 ms per token, 1056.67 tokens per second)
llama_print_timings: prompt eval time = 2031.23 ms / 79 tokens ( 25.71 ms per token, 38.89 tokens per second)
llama_print_timings: eval time = 10300.94 ms / 253 runs ( 40.72 ms per token, 24.56 tokens per second)
llama_print_timings: total time = 12701.22 ms
Log end
1000 / (0.95 + 25.71 + 40.72)= 1000 / 67.38 = 14.84 token/s
想请问您这样的计算方式是正确的吗?如果不正确,您论文中的计算方式是什么的呢?期待您的解答!
The text was updated successfully, but these errors were encountered: