[Bug] 3090量化llama3 70b爆显存 #1562

lg123666 · 2024-05-08T13:01:50Z

Checklist

1. I have searched related issues but cannot get the expected help.
2. The bug has not been fixed in the latest version.

Describe the bug

在3090显卡，24g显存，使用lmdeploy lite awq量化llama3 70b在79层爆显存，按照建议增加了PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
在nvidia-smi看只使用了第一张卡，可以使用多张卡吗？

Reproduction

lmdeploy lite auto_awq /Meta-Llama-3-70B-Instruct --work-dir /Meta-Llama-3-70B-Instruct-4bit

Environment

sys.platform: linux
Python: 3.10.14 (main, Mar 21 2024, 16:24:04) [GCC 11.2.0]
CUDA available: True
MUSA available: False
numpy_random_seed: 2147483648
GPU 0,1,2,3,4,5,6,7: NVIDIA GeForce RTX 3090
CUDA_HOME: /usr/local/cuda-11.8
NVCC: Cuda compilation tools, release 11.8, V11.8.89
GCC: gcc (GCC) 9.3.0
PyTorch: 2.2.2+cu118
PyTorch compiling details: PyTorch built with:
  - GCC 9.3
  - C++ Version: 201703
  - Intel(R) oneAPI Math Kernel Library Version 2022.2-Product Build 20220804 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v3.3.2 (Git Hash 2dc95a2ad0841e29db8b22fbccaf3e5da7992b01)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX512
  - CUDA Runtime 11.8
  - NVCC architecture flags: -gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_90,code=sm_90
  - CuDNN 8.9.4
    - Built with CuDNN 8.7
  - Magma 2.6.1
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.8, CUDNN_VERSION=8.7.0, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=2.2.2, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF, 

LMDeploy: 0.4.0+
transformers: 4.40.1
gradio: Not Found
fastapi: 0.111.0
pydantic: 2.7.1
triton: 2.2.0

Error traceback

No response

AllentDan · 2024-05-09T01:45:13Z

可以 calib_seqlen 设小一些试试

lg123666 · 2024-05-09T03:33:23Z

可以 calib_seqlen 设小一些试试

非常感谢~！lmdeploy lite auto_awq /Meta-Llama-3-70B-Instruct --work-dir /Meta-Llama-3-70B-Instruct-4bit --calib-seqlen 1024成功编译。顺便请教下，awq编译的w4a16精度下降大概多少呢？lmdeploy有工具可以直接测试ppl吗？

AllentDan · 2024-05-09T03:48:33Z

可以用 opencompass。https://github.com/open-compass/opencompass/blob/main/docs/zh_cn/advanced_guides/evaluation_turbomind.md

lvhan028 · 2024-05-09T04:06:32Z

https://lmdeploy.readthedocs.io/en/latest/advance/long_context.html#perplexity
可以参考上面的文档，计算 ppl

lg123666 · 2024-05-09T05:42:45Z

可以 calib_seqlen 设小一些试试
llama3 70b awq量化后的模型输出乱码，但是8b的结果是OK的，请问可能是什么原因呢？

AllentDan · 2024-05-09T06:51:45Z

应该就是量化失败了。目前 lmdeploy 的量化没有搜 ratio 的能力，可以用 auto_awq，或者等 lmdeploy 下个版本试试。

lvhan028 · 2024-05-09T06:56:48Z

@AllentDan 基于你最近的工作，量化下llama3-70b，结果正常么？

AllentDan · 2024-05-09T07:17:43Z

@AllentDan 基于你最近的工作，量化下llama3-70b，结果正常么？

之前的版本就可以量化 llava3_llama70b，对话正常。可能是xtuner 微调过权重变了。目前没有过 70b 搜 scale，很耗时

lg123666 · 2024-05-09T09:43:19Z

应该就是量化失败了。目前 lmdeploy 的量化没有搜 ratio 的能力，可以用 auto_awq，或者等 lmdeploy 下个版本试试。

这里用的就是auto_awq

lg123666 · 2024-05-09T09:45:13Z

@AllentDan 基于你最近的工作，量化下llama3-70b，结果正常么？

之前的版本就可以量化 llava3_llama70b，对话正常。可能是xtuner 微调过权重变了。目前没有过 70b 搜 scale，很耗时

没有做微调，用的是原生权重。请问这里auto_awq的scale是预先生成的吗？

AllentDan · 2024-05-09T10:49:04Z

https://github.com/casper-hansen/AutoAWQ 是指这个项目

github-actions · 2024-05-21T02:10:09Z

This issue is marked as stale because it has been marked as invalid or awaiting response for 7 days without any further response. It will be closed in 5 days if the stale label is not removed or if there is no further response.

github-actions · 2024-05-26T02:19:26Z

This issue is closed because it has been stale for 5 days. Please open a new issue if you have similar issues or you have any new updates now.

lvhan028 assigned AllentDan May 8, 2024

lvhan028 added the awaiting response label May 13, 2024

github-actions bot added the Stale label May 21, 2024

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale May 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] 3090量化llama3 70b爆显存 #1562

[Bug] 3090量化llama3 70b爆显存 #1562

lg123666 commented May 8, 2024

AllentDan commented May 9, 2024

lg123666 commented May 9, 2024

AllentDan commented May 9, 2024

lvhan028 commented May 9, 2024

lg123666 commented May 9, 2024

AllentDan commented May 9, 2024

lvhan028 commented May 9, 2024

AllentDan commented May 9, 2024 •

edited

lg123666 commented May 9, 2024

lg123666 commented May 9, 2024

AllentDan commented May 9, 2024

github-actions bot commented May 21, 2024

github-actions bot commented May 26, 2024

[Bug] 3090量化llama3 70b爆显存 #1562

[Bug] 3090量化llama3 70b爆显存 #1562

Comments

lg123666 commented May 8, 2024

Checklist

Describe the bug

Reproduction

Environment

Error traceback

AllentDan commented May 9, 2024

lg123666 commented May 9, 2024

AllentDan commented May 9, 2024

lvhan028 commented May 9, 2024

lg123666 commented May 9, 2024

AllentDan commented May 9, 2024

lvhan028 commented May 9, 2024

AllentDan commented May 9, 2024 • edited

lg123666 commented May 9, 2024

lg123666 commented May 9, 2024

AllentDan commented May 9, 2024

github-actions bot commented May 21, 2024

github-actions bot commented May 26, 2024

AllentDan commented May 9, 2024 •

edited