Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] llama3和Chinese-LLaMA-Alpaca-3 无论是pipeline方式,还是lmdeploy chat方式都无法生成结果(llama2都是好的) #1538

Open
2 tasks
zhanghui-china opened this issue May 1, 2024 · 27 comments
Assignees

Comments

@zhanghui-china
Copy link

Checklist

  • 1. I have searched related issues but cannot get the expected help.
  • 2. The bug has not been fixed in the latest version.

Describe the bug

start_chinese_llama3.py

import lmdeploy
from lmdeploy import pipeline
pipe = lmdeploy.pipeline("/home/zhanghui/models/ChineseAlpacaGroup/llama-3-chinese-8b-instruct", model_name='llama3')

response=pipe(['你是谁'])
print(response)

start_meta_llama3.py

from lmdeploy import pipeline

pipe = pipeline('/home/zhanghui/models/LLM-Research/Meta-Llama-3-8B-Instruct', model_name='llama3')

responese = pipe([
    'Who are you?',
    'Hello!'
])

print(responese)

start_chinese_llama2.py

import lmdeploy
from lmdeploy import pipeline
pipe = lmdeploy.pipeline("/mnt/d/models/chinese-alpaca-2-7b-hf", model_name='llama2')

response=pipe(['你是谁'])
print(response)

start_meta_llama2.py

import lmdeploy
from lmdeploy import pipeline
pipe = lmdeploy.pipeline("/mnt/d/models/meta-llama/Llama-2-7b-chat-hf", model_name='llama2')

response=pipe(['你是谁'])
print(response)

Reproduction

llama3代
python start_chinese_llama3.py

100030e4c0a171514f8974c1c485a5a

python start_meta_llama3.py

e835774f9afb4d1435f0a8fe3862263

llama2代
python start_chinese_llama2.py

f0def03fa89ce4c3c0b76837cd25b65

python start_meta_llama2.py

3a7ddcc288ac4efa27d77d8315f25b9

Environment

(lmdeploy0430) zhanghui@zhanghui:~/Chinese-LLaMA-Alpaca-3$  lmdeploy check_env
sys.platform: linux
Python: 3.10.14 (main, Mar 21 2024, 16:24:04) [GCC 11.2.0]
CUDA available: True
MUSA available: False
numpy_random_seed: 2147483648
GPU 0: NVIDIA GeForce RTX 3080 Laptop GPU
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 11.6, V11.6.124
GCC: gcc (Ubuntu 9.5.0-1ubuntu1~22.04) 9.5.0
PyTorch: 2.1.2+cu121
PyTorch compiling details: PyTorch built with:
  - GCC 9.3
  - C++ Version: 201703
  - Intel(R) oneAPI Math Kernel Library Version 2022.2-Product Build 20220804 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v3.1.1 (Git Hash 64f6bcbcbab628e96f33a62c3e975f8535a7bde4)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX512
  - CUDA Runtime 12.1
  - NVCC architecture flags: -gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90
  - CuDNN 8.9.7  (built against CUDA 11.8)
    - Built with CuDNN 8.9.2
  - Magma 2.6.1
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=12.1, CUDNN_VERSION=8.9.2, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -Wno-invalid-partial-specialization -Wno-unused-private-field -Wno-aligned-allocation-unavailable -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_DISABLE_GPU_ASSERTS=ON, TORCH_VERSION=2.1.2, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,

LMDeploy: 0.4.0+1e8d7b8
transformers: 4.38.2
gradio: Not Found
fastapi: 0.110.3
pydantic: 2.7.1
triton: 2.1.0

Error traceback

No response

@zhanghui-china zhanghui-china changed the title [Bug] llama3和Chinese-LLaMA-Alpaca-2 无论是pipeline方式,还是lmdeploy chat方式都无法生成结果(llama2都是好的) [Bug] llama3和Chinese-LLaMA-Alpaca-3 无论是pipeline方式,还是lmdeploy chat方式都无法生成结果(llama2都是好的) May 1, 2024
@lvhan028
Copy link
Collaborator

lvhan028 commented May 1, 2024

Have you tried the official llama3-8b-instruct?

@zhanghui-china
Copy link
Author

Have you tried the official llama3-8b-instruct?

我下载的应该是modelscope官方的。https://www.modelscope.cn/models/LLM-Research/Meta-Llama-3-8B-Instruct/summary

这个链接就是官方认证的。

@zhanghui-china
Copy link
Author

有5万次下载量·呢!
4b8ae083922417dbc1f4ff4d267b20d

@lvhan028
Copy link
Collaborator

lvhan028 commented May 3, 2024

I cannot reproduce your issue

@lvhan028
Copy link
Collaborator

lvhan028 commented May 3, 2024

Please pass log_level=INFO to the pipeline and run it again. Let's check what the log says

@zhanghui-china
Copy link
Author

zhanghui-china commented May 4, 2024

我从huggingface官网下载了 权重文件。

···
import os

os.environ['HF_ENDPOINT'] = 'https://hf-mirror.com'

command_str = 'huggingface-cli download --token hf_Bxxx --resume-download meta-llama/Meta-Llama-3-8B-Instruct --local-dir ' + os.environ.get('HOME') + '/models/meta-llama/Meta-Llama-3-8B-Instruct1'

os.system(command_str)
···
下载结果如下:
image
image
image

进入lmdeploy 0.4.0环境
conda deactivate
conda activate lmdeploy040
lmdeploy chat /home/zhanghui/models/meta-llama/Meta-Llama-3-8B-Instruct1 --model-name llama3

99c099031bdd6f55648b1f806cb1cad 7fe9be0b9ab982d5fe6b28a5b4ca1e8

@zhanghui-china
Copy link
Author

zhanghui-china commented May 4, 2024

修改 start_meta_llama3.py

from lmdeploy import pipeline

#pipe = pipeline('/home/zhanghui/models/LLM-Research/Meta-Llama-3-8B-Instruct', model_name='llama3')
pipe = pipeline('/home/zhanghui/models/meta-llama/Meta-Llama-3-8B-Instruct1', model_name='llama3')

responese = pipe([
    'Who are you?',
    'Hello!'
])

print(responese)

@zhanghui-china
Copy link
Author

5d7016e8d8fd77cd654536904d1b9ea

@zhanghui-china
Copy link
Author

92e11744d32cd15c2e58e77a9115ecf 加了环境变量之后,好像也一样

@zhanghui-china
Copy link
Author

I cannot reproduce your issue

我是在WSL环境的。个人怀疑是WSL环境中 lmdeploy转换有问题。

@lvhan028
Copy link
Collaborator

lvhan028 commented May 6, 2024

@irexyc may follow up this issue

@lvhan028
Copy link
Collaborator

lvhan028 commented May 6, 2024

@lzhangzz FYI

@liaoduoduo
Copy link

你好,我也遇到这个问题了,请问解决了吗

@lvhan028
Copy link
Collaborator

lvhan028 commented May 9, 2024

你好,我也遇到这个问题了,请问解决了吗

请问你也是用的wsl环境么?

@liaoduoduo
Copy link

你好,我也遇到这个问题了,请问解决了吗

请问你也是用的wsl环境么?

我用的是windows环境

@lvhan028
Copy link
Collaborator

lvhan028 commented May 9, 2024

裸机上吗?

@liaoduoduo
Copy link

裸机上吗?

是的,直接裸机跑,llama2是没问题的,4080显卡

@lvhan028
Copy link
Collaborator

lvhan028 commented May 9, 2024

那感觉就不是我们讨论的那个问题,可能还在别处。@irexyc @lzhangzz
能不能麻烦把 log-level 设置为 INFO,把日志结果贴出来?
我们手头没有 windows + 24G显卡的机器,debugging这个问题比较棘手

@liaoduoduo
Copy link

那感觉就不是我们讨论的那个问题,可能还在别处。@irexyc @lzhangzz 能不能麻烦把 log-level 设置为 INFO,把日志结果贴出来? 我们手头没有 windows + 24G显卡的机器,debugging这个问题比较棘手
好的,以下是我的日志,推理输出结果为空
PixPin_2024-05-09_13-12-12
PixPin_2024-05-09_13-12-42

@liaoduoduo
Copy link

那感觉就不是我们讨论的那个问题,可能还在别处。@irexyc @lzhangzz 能不能麻烦把 log-level 设置为 INFO,把日志结果贴出来? 我们手头没有 windows + 24G显卡的机器,debugging这个问题比较棘手

加载llama2模型却可正常推理
PixPin_2024-05-09_13-17-21

@liaoduoduo
Copy link

Environment
sys.platform: win32
Python: 3.10.14 | packaged by Anaconda, Inc. | (main, Mar 21 2024, 16:20:14) [MSC v.1916 64 bit (AMD64)]
CUDA available: True
MUSA available: False
numpy_random_seed: 2147483648
GPU 0: NVIDIA GeForce RTX 4080
CUDA_HOME: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4
NVCC: Cuda compilation tools, release 12.4, V12.4.131
GCC: n/a
PyTorch: 2.2.2+cu121
PyTorch compiling details: PyTorch built with:

  • C++ Version: 201703
  • MSVC 192930151
  • Intel(R) Math Kernel Library Version 2020.0.2 Product Build 20200624 for Intel(R) 64 architecture applications
  • Intel(R) MKL-DNN v3.3.2 (Git Hash 2dc95a2ad0841e29db8b22fbccaf3e5da7992b01)
  • OpenMP 2019
  • LAPACK is enabled (usually provided by MKL)
  • CPU capability usage: AVX2
  • CUDA Runtime 12.1
  • NVCC architecture flags: -gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90
  • CuDNN 8.8.1 (built against CUDA 12.0)
  • Magma 2.5.4
  • Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=12.1, CUDNN_VERSION=8.8.1, CXX_COMPILER=C:/actions-runner/_work/pytorch/pytorch/builder/windo
    ws/tmp_bin/sccache-cl.exe, CXX_FLAGS=/DWIN32 /D_WINDOWS /GR /EHsc /Zc:__cplusplus /bigobj /FS /utf-8 -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI
    -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE /wd4624 /wd4068 /wd4067 /wd4267 /wd4661 /wd4717 /wd4244 /wd4804 /wd4273, LAP
    ACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=2.2.2, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=OFF, USE_NNPACK=OFF, USE_OPENMP=ON, USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF,

TorchVision: 0.17.2+cu121
LMDeploy: 0.4.1+unknown
transformers: 4.40.0
gradio: Not Found
fastapi: 0.110.3
pydantic: 2.7.1
triton: Not Found

@lvhan028
Copy link
Collaborator

lvhan028 commented May 9, 2024

那感觉就不是我们讨论的那个问题,可能还在别处。@irexyc @lzhangzz 能不能麻烦把 log-level 设置为 INFO,把日志结果贴出来? 我们手头没有 windows + 24G显卡的机器,debugging这个问题比较棘手
好的,以下是我的日志,推理输出结果为空
PixPin_2024-05-09_13-12-12
PixPin_2024-05-09_13-12-42

这个日志中显示 invalid infer request,code是6. 表示 too long
日志是全的吗?应该还有一些关于block size,block count的日志。

@liaoduoduo
Copy link

那感觉就不是我们讨论的那个问题,可能还在别处。@irexyc @lzhangzz 能不能麻烦把 log-level 设置为 INFO,把日志结果贴出来? 我们手头没有 windows + 24G显卡的机器,debugging这个问题比较棘手
好的,以下是我的日志,推理输出结果为空
PixPin_2024-05-09_13-12-12
PixPin_2024-05-09_13-12-42

这个日志中显示 invalid infer request,code是6. 表示 too long 日志是全的吗?应该还有一些关于block size,block count的日志。

补上刚刚的日志
PixPin_2024-05-09_15-08-36

应该是显存不足导致的,换了V100 32G显存进行测试,可以正常推理了

@lvhan028
Copy link
Collaborator

lvhan028 commented May 9, 2024

@zhanghui-china 麻烦看下是不是在你这边也是类似的问题?

@lvhan028
Copy link
Collaborator

lvhan028 commented May 9, 2024

@liaoduoduo 你的 4080 显卡,内存是多大的?

@liaoduoduo
Copy link

@lvhan028 是16GB的

@lvhan028
Copy link
Collaborator

lvhan028 commented May 9, 2024

16G是不够的。LMDeploy按理说要报 OOM 才对。
@irexyc 总感觉 lmdeploy 在 windows 上磕磕绊绊的,跟进下吧。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants