[Bug] 0.4.0是还不支持qwen1.5 110b吗? #1536

starsliao · 2024-04-30T23:19:03Z

Checklist

1. I have searched related issues but cannot get the expected help.
2. The bug has not been fixed in the latest version.

Describe the bug

我用0.4.0的版本跑qwen110b的模型一直报错内存不足, --quant-policy 4也不行. 是目前还不支持吗?
我用vllm跑qwen110b PF16, GPTQ-Int4 都是运行正常的.
我的机器是8卡 v100 32G

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.32 GiB. GPU 7 has a total capacity of 31.73 GiB of which 848.44 MiB is free. Including non-PyTorch memory, this process has 30.90 GiB memory in use. Of the allocated memory 0 bytes is allocated by PyTorch, and 0 bytes is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

Reproduction

lmdeploy serve api_server /Qwen1.5-110B-Chat --server-port 23333 --model-name qwen110b --tp 8 --log-level INFO --quant-policy 4

Environment

LMDeploy: 0.4.0+
transformers: 4.40.1
gradio: Not Found
fastapi: 0.110.3
pydantic: 2.7.1
triton: 2.2.0

Error traceback

No response

The text was updated successfully, but these errors were encountered:

lvhan028 · 2024-04-30T23:46:12Z

支持的

lvhan028 · 2024-04-30T23:46:45Z

请看一下pipeline.md文档关于内存分配的说明

starsliao · 2024-05-01T01:51:02Z

请看一下pipeline.md文档关于内存分配的说明
我调整了内存和token用下面命令跑也不行.

lmdeploy serve api_server /Qwen1.5-110B-Chat --server-port 23333 --model-name qwen110b --tp 8 --log-level INFO --cache-max-entry-count 0.01 --session-len 2000

我使用vllm跑8K token, 执行如下命令可以正常运行:

python -m vllm.entrypoints.openai.api_server --served-model-name qwen110b --model /Qwen1.5-110B-Chat/ --dtype=float16 --tensor-parallel-size=8 --gpu-memory-utilization=0.99 --max-model-len=8000 --block-size=32

lmdeploy是要消耗更多内存么?

lzhangzz · 2024-05-01T04:55:17Z

目前码表没有按TP切分，Qwen的码表特别大影响会比较明显。

我看看怎么加一下

starsliao · 2024-05-01T07:39:26Z

目前码表没有按TP切分，Qwen的码表特别大影响会比较明显。

我看看怎么加一下

好的感谢.

starsliao · 2024-05-07T07:29:25Z

目前码表没有按TP切分，Qwen的码表特别大影响会比较明显。

我看看怎么加一下

大佬们么, 8卡 v100 32G,有希望跑qwen110b么 ,这个框架性能确实强,跑72b比vllm强很多, 速度快,token能跑满32K.很期待.

lzhangzz · 2024-05-08T05:41:34Z

会支持，不过没那么快，估计2周以后了。

lvhan028 assigned lzhangzz May 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] 0.4.0是还不支持qwen1.5 110b吗? #1536

[Bug] 0.4.0是还不支持qwen1.5 110b吗? #1536

starsliao commented Apr 30, 2024

lvhan028 commented Apr 30, 2024

lvhan028 commented Apr 30, 2024

starsliao commented May 1, 2024

lzhangzz commented May 1, 2024 •

edited

starsliao commented May 1, 2024

starsliao commented May 7, 2024

lzhangzz commented May 8, 2024

[Bug] 0.4.0是还不支持qwen1.5 110b吗? #1536

[Bug] 0.4.0是还不支持qwen1.5 110b吗? #1536

Comments

starsliao commented Apr 30, 2024

Checklist

Describe the bug

Reproduction

Environment

Error traceback

lvhan028 commented Apr 30, 2024

lvhan028 commented Apr 30, 2024

starsliao commented May 1, 2024

lzhangzz commented May 1, 2024 • edited

starsliao commented May 1, 2024

starsliao commented May 7, 2024

lzhangzz commented May 8, 2024

lzhangzz commented May 1, 2024 •

edited