Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

使用turbomind部署CodeQwen1.5模型,推理效果变差 #1580

Open
Lanyu123 opened this issue May 10, 2024 · 5 comments
Open

使用turbomind部署CodeQwen1.5模型,推理效果变差 #1580

Lanyu123 opened this issue May 10, 2024 · 5 comments
Assignees

Comments

@Lanyu123
Copy link

Lanyu123 commented May 10, 2024

本地环境:
cuda: 11.8
python: 3.11
transformers: 4.39.3
lmdeploy: 0.4.0
torch: 2.2.1


问题描述:
我使用以下命令启动部署了CodeQwen1.5-7b模型的服务,在测试的过程中发现。模型推理生成的回复相较于从本地直接使用huggingface transformers加载推理的结果,效果肉眼可见的变差,推理生成的回复中频繁出现循环重复的情况。
启动命令:
lmdeploy serve api_server ./CodeQwen1.5-7B/ --server-name 0.0.0.0 --server-port 8001 --session-len 4096 --max-batch-size 10 --tp 2


对比实例如下:
输入:

prompt = """
def bubble_sort(nums):
    #获取数组的长度"""

方式1:使用huggingface transformers 本地加载推理
结果:

    length = len(nums)
    # 遍历数组的长度
    for i in range(length-1):
        # 遍历当前元素到最后一个
        for j in range(length-i-1):
            # 如果当前元素大于后面的元素, 进行交换
            if nums[j] > nums[j+1]:
                nums[j], nums[j+1] = nums[j+1], nums[j]
    return nums

arr = [6, 7, 2, 9, 11, 1, 8]
bubble_sort(arr)
print(arr)

方式2:使用lmdeploy serve api_server命令部署服务,并用APIClient调用做推理,推理参数如下:

for item in api_client.completions_v1(
    prompt=prompt,
    model="qwen",
    max_tokens=128,
    temperature=0.8,
    top_p=0.85,
    top_k=1,
    stream=False,
    session_id=-1,
    ignore_eos=False,
    n=1,
):
    ........

结果:

    length = len(nums)
    # 遍历数组
    for i in range(length):
        #遍历数组
        for j in range(length):
            # 获取数组
            if j < length:
                 # 获取数组
                 if j < length:
                     #获取数组
                 if j < length:
                     #获取数组
                 if j < length:
                     #获取数组
.......

我另使用codellam-7b模型在同样的环境下,同样的命令,同样的调用方式做了对比,不会像CodeQwen1.5-7b模型一样出现循环重复的情况,我使用多个例子做了测试,结果多有类似上面的情况。
以上即是问题描述,盼回复,谢谢~

@lzhangzz
Copy link
Collaborator

lzhangzz commented May 11, 2024

有没有可能不用 TP 试试?

@Lanyu123
Copy link
Author

有没有可能不用 TP 试试?

单卡同样会出现重复的问题,推理结果跟方式2的输出一样

@lzhangzz
Copy link
Collaborator

采样参数和官方对齐下呢?top_k 不用加

  "repetition_penalty": 1.0,
  "temperature": 1.0,
  "top_p": 0.95,

@Lanyu123
Copy link
Author

Lanyu123 commented May 13, 2024

repetition_penalty
还是不行,我尝试了你上面的参数, 得到下面的代码:

    n = len(nums)
    for i in range(n):
        # 从前往n, 每次减少
        for j in range(n):
            if nums[i] < nums[n]:
            num[i] < num[n]
            # 交换
            t = nums[i]
            nums[i] = num[i]
# 获取数组
# [i]
[n] = num[i]
#
# [i] < [i]
n= [i]
[n] = 

可以看到代码逻辑混乱,基本是不可用的。

@lvhan028 lvhan028 self-assigned this May 13, 2024
@lvhan028
Copy link
Collaborator

很抱歉,目前暂无人力跟进和处理这个问题,我们先记录下。等冲刺完下一个版本,我们再来处理。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants