[Bug] Parallel GPU Memory Capacity unbalance #1563

ruifengma · 2024-05-09T02:12:54Z

Checklist

1. I have searched related issues but cannot get the expected help.
2. The bug has not been fixed in the latest version.

Describe the bug

I'm trying to deploy InternVL-Chat 1.5. When I use the official repo（InternVL-Chat 1.5）controller-worker-gradio web server architecture to deploy the model, it took around 32GB/46GB，33GB/46GB on two A40 GPUs. But When I use lmdeploy, it took 43G/46G and 32G/46G for the same two A40 GPUs.

Reproduction

CUDA_VISIBLE_DEVICES=0,1 lmdeploy serve api_server /home/models/InternVL-Chat-V1-5/ --model-name InternVL-Chat-V1-5 --server-port 23333 --tp2

Environment

OS:CentOS 7
Python=3.10
CUDA=12.1
Torch=2.2.1 (via pip)

Error traceback

No response

ruifengma · 2024-05-09T02:16:28Z

This is actually reduce the capability for high concurrency because for the 43G/46G GPU, it can easily OOM for more than 2 requests

irexyc · 2024-05-09T02:47:32Z

Currently, the vision model are not loaded balanced, we are working on this and should be fixed in the next release.

For now you can change this value to 1 and see if it could prevent OOM. Using a smaller value of cache-max-entry-count will also reduce the memory usage.

ruifengma · 2024-05-09T03:25:24Z

Currently, the vision model are not loaded balanced, we are working on this and should be fixed in the next release.

For now you can change this value to 1 and see if it could prevent OOM. Using a smaller value of cache-max-entry-count will also reduce the memory usage.

Thanks @irexyc for the advices, I tried to lower the value cache-max-entry-count from default 0.8 to 0.6 and the corresponding GPU memory capacity is lower to 40G/46G and 29G/46G. Anyway, thanks for the excellent work and advice, when do we plan to release the the next version? Looking forward to that : )

lvhan028 assigned irexyc May 9, 2024

This was referenced May 13, 2024

[Feature] Support for LLaVA-NeXT Qwen1.5-110, Qwen1.5-72B, LLaMA3-8B #1583

Closed

[Bug] 多卡部署InternVL-Chat-V1-5时，在显存足够的情况下也会OutOfMemory。 #1555

Closed

[求教]4张t4卡，能用哪个版本？能否使用 #1542

Closed

irexyc mentioned this issue May 14, 2024

Balance vision model weights on multi gpus #1591

Merged

2 tasks

irexyc closed this as completed May 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Parallel GPU Memory Capacity unbalance #1563

[Bug] Parallel GPU Memory Capacity unbalance #1563

ruifengma commented May 9, 2024

ruifengma commented May 9, 2024

irexyc commented May 9, 2024

ruifengma commented May 9, 2024

[Bug] Parallel GPU Memory Capacity unbalance #1563

[Bug] Parallel GPU Memory Capacity unbalance #1563

Comments

ruifengma commented May 9, 2024

Checklist

Describe the bug

Reproduction

Environment

Error traceback

ruifengma commented May 9, 2024

irexyc commented May 9, 2024

ruifengma commented May 9, 2024