-
Notifications
You must be signed in to change notification settings - Fork 308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] 请问使用vllm评测时怎么实现类似HF多卡数据并行? #1002
Comments
@liushz Thank you for your response; I appreciate your clarification. However, the parameter in your reply pertains to setting tensor parallelism in vLLM. My intention is to load the entire model onto each of the eight GPUs, thereby distributing tasks in parallel across these GPUs. This approach should theoretically yield an eightfold acceleration in evaluation speed. |
hi, @liushz , I also want to know how to achieve data parallelism in vLLM when evaluating |
Please try |
@tonysy Could you possibly offer a quick example? I'm quite unsure how to ues it. Many thanks for your assistance. |
@IcyFeather233 谢谢你😂,我明白这个tensor_parallel_size可以设定为GPU数2,4,8实现模型分片并行。我这里意思是tensor_parallel_size为1,但是GPU 每张卡都加载一整个模型,然后数据并行,同时评测一个任务的不同数据。最近我实现了该种功能,使用NumWorkerPartitioner。以下为关键参数配置:有需要的可以借鉴。 @darrenglow 。同时感谢 @tonysy 。要是能尽快更新到文档就更好了。 |
@noforit 我是这样配的,但还是只有一张卡在跑,能帮我看看原因吗;
|
@IcyFeather233 我知道你的意思, |
@Zbaoli 我看你的参数和我 差了一个 |
@noforit 谢谢你的回复,但我在models的配置中加了 |
@Zbaoli 奇怪😂。在程序运行前 加上 CUDA_VISIBLE_DEVICES 呢 |
这里使用了NumWorkerPartitioner后,数据集被拆分成了8份,但最终的summary没法将拆分后的数据集的指标结果汇总在一起,请问您会这样吗? |
我请教下,opencompass提供的Sizepartitioner不就可以对数据集进行切割么?还是说NumWorkerPartitioner的partition方式要更高效一些? |
size partitioner和numworker partitioner是两种不同的切分方式,一个是按给定的size切分,一个是按照卡的数目切分 |
描述该功能
我在评测时的模型type 为vllm,参数如下:
但是显卡占用只使用了一张卡来评测任务
我想让任务划分为几份分别在8张卡上评测,这种功能可以添加吗?还是说可以实现,麻烦解答一下。非常感激!
类似我如果设定为模型type为HF的话,会自动达到这种效果。
是否希望自己实现该功能?
The text was updated successfully, but these errors were encountered: