[Bug] llama3和Chinese-LLaMA-Alpaca-3 无论是pipeline方式，还是lmdeploy chat方式都无法生成结果（llama2都是好的） #1538

zhanghui-china · 2024-05-01T02:20:37Z

Checklist

1. I have searched related issues but cannot get the expected help.
2. The bug has not been fixed in the latest version.

Describe the bug

start_chinese_llama3.py

import lmdeploy
from lmdeploy import pipeline
pipe = lmdeploy.pipeline("/home/zhanghui/models/ChineseAlpacaGroup/llama-3-chinese-8b-instruct", model_name='llama3')

response=pipe(['你是谁'])
print(response)

start_meta_llama3.py

from lmdeploy import pipeline

pipe = pipeline('/home/zhanghui/models/LLM-Research/Meta-Llama-3-8B-Instruct', model_name='llama3')

responese = pipe([
    'Who are you?',
    'Hello!'
])

print(responese)

start_chinese_llama2.py

import lmdeploy
from lmdeploy import pipeline
pipe = lmdeploy.pipeline("/mnt/d/models/chinese-alpaca-2-7b-hf", model_name='llama2')

response=pipe(['你是谁'])
print(response)

start_meta_llama2.py

import lmdeploy
from lmdeploy import pipeline
pipe = lmdeploy.pipeline("/mnt/d/models/meta-llama/Llama-2-7b-chat-hf", model_name='llama2')

response=pipe(['你是谁'])
print(response)

Reproduction

llama3代
python start_chinese_llama3.py

python start_meta_llama3.py

llama2代
python start_chinese_llama2.py

python start_meta_llama2.py

Environment

(lmdeploy0430) zhanghui@zhanghui:~/Chinese-LLaMA-Alpaca-3$  lmdeploy check_env
sys.platform: linux
Python: 3.10.14 (main, Mar 21 2024, 16:24:04) [GCC 11.2.0]
CUDA available: True
MUSA available: False
numpy_random_seed: 2147483648
GPU 0: NVIDIA GeForce RTX 3080 Laptop GPU
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 11.6, V11.6.124
GCC: gcc (Ubuntu 9.5.0-1ubuntu1~22.04) 9.5.0
PyTorch: 2.1.2+cu121
PyTorch compiling details: PyTorch built with:
  - GCC 9.3
  - C++ Version: 201703
  - Intel(R) oneAPI Math Kernel Library Version 2022.2-Product Build 20220804 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v3.1.1 (Git Hash 64f6bcbcbab628e96f33a62c3e975f8535a7bde4)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX512
  - CUDA Runtime 12.1
  - NVCC architecture flags: -gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90
  - CuDNN 8.9.7  (built against CUDA 11.8)
    - Built with CuDNN 8.9.2
  - Magma 2.6.1
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=12.1, CUDNN_VERSION=8.9.2, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -Wno-invalid-partial-specialization -Wno-unused-private-field -Wno-aligned-allocation-unavailable -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_DISABLE_GPU_ASSERTS=ON, TORCH_VERSION=2.1.2, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,

LMDeploy: 0.4.0+1e8d7b8
transformers: 4.38.2
gradio: Not Found
fastapi: 0.110.3
pydantic: 2.7.1
triton: 2.1.0

Error traceback

No response

The text was updated successfully, but these errors were encountered:

lvhan028 · 2024-05-01T11:25:46Z

Have you tried the official llama3-8b-instruct?

zhanghui-china · 2024-05-02T16:46:49Z

Have you tried the official llama3-8b-instruct?

我下载的应该是modelscope官方的。https://www.modelscope.cn/models/LLM-Research/Meta-Llama-3-8B-Instruct/summary

这个链接就是官方认证的。

zhanghui-china · 2024-05-02T16:47:45Z

有5万次下载量·呢！

lvhan028 · 2024-05-03T08:39:01Z

I cannot reproduce your issue

lvhan028 · 2024-05-03T08:39:44Z

Please pass log_level=INFO to the pipeline and run it again. Let's check what the log says

zhanghui-china · 2024-05-04T11:23:57Z

我从huggingface官网下载了权重文件。

···
import os

os.environ['HF_ENDPOINT'] = 'https://hf-mirror.com'

command_str = 'huggingface-cli download --token hf_Bxxx --resume-download meta-llama/Meta-Llama-3-8B-Instruct --local-dir ' + os.environ.get('HOME') + '/models/meta-llama/Meta-Llama-3-8B-Instruct1'

os.system(command_str)
···
下载结果如下：

进入lmdeploy 0.4.0环境
conda deactivate
conda activate lmdeploy040
lmdeploy chat /home/zhanghui/models/meta-llama/Meta-Llama-3-8B-Instruct1 --model-name llama3

zhanghui-china · 2024-05-04T11:27:07Z

修改 start_meta_llama3.py

from lmdeploy import pipeline

#pipe = pipeline('/home/zhanghui/models/LLM-Research/Meta-Llama-3-8B-Instruct', model_name='llama3')
pipe = pipeline('/home/zhanghui/models/meta-llama/Meta-Llama-3-8B-Instruct1', model_name='llama3')

responese = pipe([
    'Who are you?',
    'Hello!'
])

print(responese)

zhanghui-china · 2024-05-04T11:28:01Z

zhanghui-china · 2024-05-04T11:30:59Z

加了环境变量之后，好像也一样

zhanghui-china · 2024-05-04T11:31:42Z

I cannot reproduce your issue

我是在WSL环境的。个人怀疑是WSL环境中 lmdeploy转换有问题。

lvhan028 · 2024-05-06T08:25:50Z

@irexyc may follow up this issue

lvhan028 · 2024-05-06T08:26:11Z

@lzhangzz FYI

liaoduoduo · 2024-05-09T03:02:40Z

你好，我也遇到这个问题了，请问解决了吗

lvhan028 · 2024-05-09T03:20:42Z

你好，我也遇到这个问题了，请问解决了吗

请问你也是用的wsl环境么？

liaoduoduo · 2024-05-09T03:24:59Z

你好，我也遇到这个问题了，请问解决了吗

请问你也是用的wsl环境么？

我用的是windows环境

lvhan028 · 2024-05-09T03:29:43Z

裸机上吗？

liaoduoduo · 2024-05-09T03:31:08Z

裸机上吗？

是的，直接裸机跑，llama2是没问题的，4080显卡

lvhan028 · 2024-05-09T03:43:29Z

那感觉就不是我们讨论的那个问题，可能还在别处。@irexyc @lzhangzz
能不能麻烦把 log-level 设置为 INFO，把日志结果贴出来？
我们手头没有 windows + 24G显卡的机器，debugging这个问题比较棘手

liaoduoduo · 2024-05-09T05:14:29Z

那感觉就不是我们讨论的那个问题，可能还在别处。@irexyc @lzhangzz 能不能麻烦把 log-level 设置为 INFO，把日志结果贴出来？我们手头没有 windows + 24G显卡的机器，debugging这个问题比较棘手
好的，以下是我的日志，推理输出结果为空

liaoduoduo · 2024-05-09T05:19:36Z

那感觉就不是我们讨论的那个问题，可能还在别处。@irexyc @lzhangzz 能不能麻烦把 log-level 设置为 INFO，把日志结果贴出来？我们手头没有 windows + 24G显卡的机器，debugging这个问题比较棘手

加载llama2模型却可正常推理

liaoduoduo · 2024-05-09T05:59:07Z

Environment
sys.platform: win32
Python: 3.10.14 | packaged by Anaconda, Inc. | (main, Mar 21 2024, 16:20:14) [MSC v.1916 64 bit (AMD64)]
CUDA available: True
MUSA available: False
numpy_random_seed: 2147483648
GPU 0: NVIDIA GeForce RTX 4080
CUDA_HOME: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4
NVCC: Cuda compilation tools, release 12.4, V12.4.131
GCC: n/a
PyTorch: 2.2.2+cu121
PyTorch compiling details: PyTorch built with:

C++ Version: 201703
MSVC 192930151
Intel(R) Math Kernel Library Version 2020.0.2 Product Build 20200624 for Intel(R) 64 architecture applications
Intel(R) MKL-DNN v3.3.2 (Git Hash 2dc95a2ad0841e29db8b22fbccaf3e5da7992b01)
OpenMP 2019
LAPACK is enabled (usually provided by MKL)
CPU capability usage: AVX2
CUDA Runtime 12.1
NVCC architecture flags: -gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90
CuDNN 8.8.1 (built against CUDA 12.0)
Magma 2.5.4
Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=12.1, CUDNN_VERSION=8.8.1, CXX_COMPILER=C:/actions-runner/_work/pytorch/pytorch/builder/windo
ws/tmp_bin/sccache-cl.exe, CXX_FLAGS=/DWIN32 /D_WINDOWS /GR /EHsc /Zc:__cplusplus /bigobj /FS /utf-8 -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI
-DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE /wd4624 /wd4068 /wd4067 /wd4267 /wd4661 /wd4717 /wd4244 /wd4804 /wd4273, LAP
ACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=2.2.2, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=OFF, USE_NNPACK=OFF, USE_OPENMP=ON, USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF,

TorchVision: 0.17.2+cu121
LMDeploy: 0.4.1+unknown
transformers: 4.40.0
gradio: Not Found
fastapi: 0.110.3
pydantic: 2.7.1
triton: Not Found

lvhan028 · 2024-05-09T06:15:53Z

那感觉就不是我们讨论的那个问题，可能还在别处。@irexyc @lzhangzz 能不能麻烦把 log-level 设置为 INFO，把日志结果贴出来？我们手头没有 windows + 24G显卡的机器，debugging这个问题比较棘手
好的，以下是我的日志，推理输出结果为空

这个日志中显示 invalid infer request，code是6. 表示 too long
日志是全的吗？应该还有一些关于block size，block count的日志。

liaoduoduo · 2024-05-09T07:08:55Z

那感觉就不是我们讨论的那个问题，可能还在别处。@irexyc @lzhangzz 能不能麻烦把 log-level 设置为 INFO，把日志结果贴出来？我们手头没有 windows + 24G显卡的机器，debugging这个问题比较棘手
好的，以下是我的日志，推理输出结果为空



这个日志中显示 invalid infer request，code是6. 表示 too long 日志是全的吗？应该还有一些关于block size，block count的日志。

补上刚刚的日志

应该是显存不足导致的，换了V100 32G显存进行测试，可以正常推理了

lvhan028 · 2024-05-09T07:11:23Z

@zhanghui-china 麻烦看下是不是在你这边也是类似的问题？

lvhan028 · 2024-05-09T07:21:00Z

@liaoduoduo 你的 4080 显卡，内存是多大的？

liaoduoduo · 2024-05-09T07:22:09Z

@lvhan028 是16GB的

lvhan028 · 2024-05-09T07:27:56Z

16G是不够的。LMDeploy按理说要报 OOM 才对。
@irexyc 总感觉 lmdeploy 在 windows 上磕磕绊绊的，跟进下吧。

lvhan028 assigned irexyc May 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] llama3和Chinese-LLaMA-Alpaca-3 无论是pipeline方式，还是lmdeploy chat方式都无法生成结果（llama2都是好的） #1538

[Bug] llama3和Chinese-LLaMA-Alpaca-3 无论是pipeline方式，还是lmdeploy chat方式都无法生成结果（llama2都是好的） #1538

zhanghui-china commented May 1, 2024

lvhan028 commented May 1, 2024

zhanghui-china commented May 2, 2024

zhanghui-china commented May 2, 2024

lvhan028 commented May 3, 2024

lvhan028 commented May 3, 2024

zhanghui-china commented May 4, 2024 •

edited

zhanghui-china commented May 4, 2024 •

edited

zhanghui-china commented May 4, 2024

zhanghui-china commented May 4, 2024

zhanghui-china commented May 4, 2024

lvhan028 commented May 6, 2024

lvhan028 commented May 6, 2024

liaoduoduo commented May 9, 2024

lvhan028 commented May 9, 2024

liaoduoduo commented May 9, 2024

lvhan028 commented May 9, 2024

liaoduoduo commented May 9, 2024

lvhan028 commented May 9, 2024

liaoduoduo commented May 9, 2024

liaoduoduo commented May 9, 2024

liaoduoduo commented May 9, 2024

lvhan028 commented May 9, 2024

liaoduoduo commented May 9, 2024

lvhan028 commented May 9, 2024

lvhan028 commented May 9, 2024

liaoduoduo commented May 9, 2024

lvhan028 commented May 9, 2024

[Bug] llama3和Chinese-LLaMA-Alpaca-3 无论是pipeline方式，还是lmdeploy chat方式都无法生成结果（llama2都是好的） #1538

[Bug] llama3和Chinese-LLaMA-Alpaca-3 无论是pipeline方式，还是lmdeploy chat方式都无法生成结果（llama2都是好的） #1538

Comments

zhanghui-china commented May 1, 2024

Checklist

Describe the bug

Reproduction

Environment

Error traceback

lvhan028 commented May 1, 2024

zhanghui-china commented May 2, 2024

zhanghui-china commented May 2, 2024

lvhan028 commented May 3, 2024

lvhan028 commented May 3, 2024

zhanghui-china commented May 4, 2024 • edited

zhanghui-china commented May 4, 2024 • edited

zhanghui-china commented May 4, 2024

zhanghui-china commented May 4, 2024

zhanghui-china commented May 4, 2024

lvhan028 commented May 6, 2024

lvhan028 commented May 6, 2024

liaoduoduo commented May 9, 2024

lvhan028 commented May 9, 2024

liaoduoduo commented May 9, 2024

lvhan028 commented May 9, 2024

liaoduoduo commented May 9, 2024

lvhan028 commented May 9, 2024

liaoduoduo commented May 9, 2024

liaoduoduo commented May 9, 2024

liaoduoduo commented May 9, 2024

lvhan028 commented May 9, 2024

liaoduoduo commented May 9, 2024

lvhan028 commented May 9, 2024

lvhan028 commented May 9, 2024

liaoduoduo commented May 9, 2024

lvhan028 commented May 9, 2024

zhanghui-china commented May 4, 2024 •

edited

zhanghui-china commented May 4, 2024 •

edited