Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Qwen-Audio + VAD 搭配使用报错 #1728

Open
zhangyucha0 opened this issue May 14, 2024 · 1 comment
Open

Qwen-Audio + VAD 搭配使用报错 #1728

zhangyucha0 opened this issue May 14, 2024 · 1 comment
Assignees
Labels
bug Something isn't working

Comments

@zhangyucha0
Copy link

🐛 Bug

qwen-audio + vad 运行报错

To Reproduce

  1. Run cmd python qwen_demo.py
  2. See error
2024-05-14 11:09:35,110 - modelscope - INFO - PyTorch version 2.3.0 Found.
2024-05-14 11:09:35,110 - modelscope - INFO - Loading ast index from /root/.cache/modelscope/ast_indexer
2024-05-14 11:09:35,135 - modelscope - INFO - Loading done! Current index file version is 1.14.0, with md5 7f17021ca099dd6760d43c7a9e69c36a and a total number of 976 components indexed
Detect model requirements, begin to install it: /root/.cache/modelscope/hub/Qwen/Qwen-Audio/requirements.txt
install model requirements successfully
WARNING:transformers_modules.Qwen-Audio.modeling_qwen:The model is automatically converting to bf16 for faster inference. If you want to disable the automatic precision, please manually add bf16/fp16/fp32=True to "AutoModelForCausalLM.from_pretrained".
WARNING:transformers_modules.Qwen-Audio.modeling_qwen:Try importing flash-attention for faster inference...
WARNING:transformers_modules.Qwen-Audio.modeling_qwen:Warning: import flash_attn rotary fail, please install FlashAttention rotary to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/rotary
WARNING:transformers_modules.Qwen-Audio.modeling_qwen:Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm
WARNING:transformers_modules.Qwen-Audio.modeling_qwen:Warning: import flash_attn fail, please install FlashAttention to get higher efficiency https://github.com/Dao-AILab/flash-attention
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:00<00:00, 13.09it/s]
audio_start_id: 155163, audio_end_id: 155164, audio_pad_id: 151851.
2024-05-14 11:09:42,213 - modelscope - WARNING - Using the master branch is fragile, please use it with caution!
2024-05-14 11:09:42,213 - modelscope - INFO - Use user-specified model revision: master
ckpt: /root/.cache/modelscope/hub/iic/speech_fsmn_vad_zh-cn-16k-common-pytorch/model.pt
rtf_avg: 0.019: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  9.60it/s]
  0%|                                                                                                                                                                               | 0/1 [00:00<?, ?it/sTraceback (most recent call last):                                                                                                                                                   | 0/1 [00:00<?, ?it/s]
  File "/root/.cache/huggingface/modules/transformers_modules/Qwen-Audio/audio.py", line 91, in load_audio
    out = run(cmd, capture_output=True, check=True).stdout
  File "/root/miniconda3/envs/funasr/lib/python3.8/subprocess.py", line 516, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ffmpeg', '-nostdin', '-threads', '0', '-i', 'tensor([-0.0001, -0.0002,  0.0007,  ...,  0.0000,  0.0000,  0.0000])', '-f', 's16le', '-ac', '1', '-acodec', 'pcm_s16le', '-ar', '16000', '-']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "qwen_demo.py", line 18, in <module>
    res = model.generate(input=audio_in, prompt=prompt, batch_size_s=0,)
  File "/root/miniconda3/envs/funasr/lib/python3.8/site-packages/funasr/auto/auto_model.py", line 248, in generate
    return self.inference_with_vad(input, input_len=input_len, **cfg)
  File "/root/miniconda3/envs/funasr/lib/python3.8/site-packages/funasr/auto/auto_model.py", line 394, in inference_with_vad
    results = self.inference(
  File "/root/miniconda3/envs/funasr/lib/python3.8/site-packages/funasr/auto/auto_model.py", line 285, in inference
    res = model.inference(**batch, **kwargs)
  File "/root/miniconda3/envs/funasr/lib/python3.8/site-packages/funasr/models/qwen_audio/model.py", line 66, in inference
    audio_info = self.tokenizer.process_audio(query)
  File "/root/.cache/huggingface/modules/transformers_modules/Qwen-Audio/tokenization_qwen.py", line 556, in process_audio
    audio = load_audio(audio_path)
  File "/root/.cache/huggingface/modules/transformers_modules/Qwen-Audio/audio.py", line 93, in load_audio
    raise RuntimeError(f"Failed to load audio: {e.stderr.decode()}") from e
RuntimeError: Failed to load audio: ffmpeg version 4.2.7-0ubuntu0.1 Copyright (c) 2000-2022 the FFmpeg developers
  built with gcc 9 (Ubuntu 9.4.0-1ubuntu1~20.04.1)
  configuration: --prefix=/usr --extra-version=0ubuntu0.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-avresample --disable-filter=resample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librsvg --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-nvenc --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared
  libavutil      56. 31.100 / 56. 31.100
  libavcodec     58. 54.100 / 58. 54.100
  libavformat    58. 29.100 / 58. 29.100
  libavdevice    58.  8.100 / 58.  8.100
  libavfilter     7. 57.100 /  7. 57.100
  libavresample   4.  0.  0 /  4.  0.  0
  libswscale      5.  5.100 /  5.  5.100
  libswresample   3.  5.100 /  3.  5.100
  libpostproc    55.  5.100 / 55.  5.100
tensor([-0.0001, -0.0002,  0.0007,  ...,  0.0000,  0.0000,  0.0000]): No such file or directory

  0%|                                                                                                                                                                               | 0/1 [00:00<?, ?it/s]
  0%|                                                                                                                                                                               | 0/1 [00:00<?, ?it/s]

Code sample

qwen_demo.py

#!/usr/bin/env python3
# -*- encoding: utf-8 -*-
# Copyright FunASR (https://github.com/alibaba-damo-academy/FunASR). All Rights Reserved.
#  MIT License  (https://opensource.org/licenses/MIT)

# To install requirements: pip3 install -U "funasr[llm]"

from funasr import AutoModel

model = AutoModel(model="Qwen-Audio",
        vad_model="iic/speech_fsmn_vad_zh-cn-16k-common-pytorch",
        vad_kwargs={"max_single_segment_time": 30000},
        )

audio_in = "asr_example_zh.wav"
prompt = "<|startoftranscription|><|zh|><|transcribe|><|zh|><|notimestamps|><|wo_itn|>"

res = model.generate(input=audio_in, prompt=prompt, batch_size_s=0,)
print(res)

Environment

  • OS: Ubuntu 20.04.6 LTS
  • FunASR Version: 1.0.26
  • ModelScope Version: 1.14.0
  • PyTorch Version: 2.3.0
  • How you installed funasr: pip
  • Python version: 3.8.19
  • GPU: 4090
  • CUDA/cuDNN version: cuda11.8
@zhangyucha0 zhangyucha0 added the bug Something isn't working label May 14, 2024
@LauraGPT LauraGPT self-assigned this May 15, 2024
@LauraGPT
Copy link
Collaborator

on going

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants