You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
1. I have searched related issues but cannot get the expected help.
2. The bug has not been fixed in the latest version.
Describe the bug
I'm trying to deploy InternVL-Chat 1.5. When I use the official repo(InternVL-Chat 1.5)controller-worker-gradio web server architecture to deploy the model, it took around 32GB/46GB,33GB/46GB on two A40 GPUs. But When I use lmdeploy, it took 43G/46G and 32G/46G for the same two A40 GPUs.
Currently, the vision model are not loaded balanced, we are working on this and should be fixed in the next release.
For now you can change this value to 1 and see if it could prevent OOM. Using a smaller value of cache-max-entry-count will also reduce the memory usage.
Currently, the vision model are not loaded balanced, we are working on this and should be fixed in the next release.
For now you can change this value to 1 and see if it could prevent OOM. Using a smaller value of cache-max-entry-count will also reduce the memory usage.
Thanks @irexyc for the advices, I tried to lower the value cache-max-entry-count from default 0.8 to 0.6 and the corresponding GPU memory capacity is lower to 40G/46G and 29G/46G. Anyway, thanks for the excellent work and advice, when do we plan to release the the next version? Looking forward to that : )
Checklist
Describe the bug
I'm trying to deploy InternVL-Chat 1.5. When I use the official repo(InternVL-Chat 1.5)controller-worker-gradio web server architecture to deploy the model, it took around 32GB/46GB,33GB/46GB on two A40 GPUs. But When I use lmdeploy, it took 43G/46G and 32G/46G for the same two A40 GPUs.
Reproduction
CUDA_VISIBLE_DEVICES=0,1 lmdeploy serve api_server /home/models/InternVL-Chat-V1-5/ --model-name InternVL-Chat-V1-5 --server-port 23333 --tp2
Environment
Error traceback
No response
The text was updated successfully, but these errors were encountered: