Qwen-Audio + VAD 搭配使用报错 #1728

zhangyucha0 · 2024-05-14T03:22:27Z

🐛 Bug

qwen-audio + vad 运行报错

To Reproduce

Run cmd python qwen_demo.py
See error

2024-05-14 11:09:35,110 - modelscope - INFO - PyTorch version 2.3.0 Found.
2024-05-14 11:09:35,110 - modelscope - INFO - Loading ast index from /root/.cache/modelscope/ast_indexer
2024-05-14 11:09:35,135 - modelscope - INFO - Loading done! Current index file version is 1.14.0, with md5 7f17021ca099dd6760d43c7a9e69c36a and a total number of 976 components indexed
Detect model requirements, begin to install it: /root/.cache/modelscope/hub/Qwen/Qwen-Audio/requirements.txt
install model requirements successfully
WARNING:transformers_modules.Qwen-Audio.modeling_qwen:The model is automatically converting to bf16 for faster inference. If you want to disable the automatic precision, please manually add bf16/fp16/fp32=True to "AutoModelForCausalLM.from_pretrained".
WARNING:transformers_modules.Qwen-Audio.modeling_qwen:Try importing flash-attention for faster inference...
WARNING:transformers_modules.Qwen-Audio.modeling_qwen:Warning: import flash_attn rotary fail, please install FlashAttention rotary to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/rotary
WARNING:transformers_modules.Qwen-Audio.modeling_qwen:Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm
WARNING:transformers_modules.Qwen-Audio.modeling_qwen:Warning: import flash_attn fail, please install FlashAttention to get higher efficiency https://github.com/Dao-AILab/flash-attention
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:00<00:00, 13.09it/s]
audio_start_id: 155163, audio_end_id: 155164, audio_pad_id: 151851.
2024-05-14 11:09:42,213 - modelscope - WARNING - Using the master branch is fragile, please use it with caution!
2024-05-14 11:09:42,213 - modelscope - INFO - Use user-specified model revision: master
ckpt: /root/.cache/modelscope/hub/iic/speech_fsmn_vad_zh-cn-16k-common-pytorch/model.pt
rtf_avg: 0.019: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  9.60it/s]
  0%|                                                                                                                                                                               | 0/1 [00:00<?, ?it/sTraceback (most recent call last):                                                                                                                                                   | 0/1 [00:00<?, ?it/s]
  File "/root/.cache/huggingface/modules/transformers_modules/Qwen-Audio/audio.py", line 91, in load_audio
    out = run(cmd, capture_output=True, check=True).stdout
  File "/root/miniconda3/envs/funasr/lib/python3.8/subprocess.py", line 516, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ffmpeg', '-nostdin', '-threads', '0', '-i', 'tensor([-0.0001, -0.0002,  0.0007,  ...,  0.0000,  0.0000,  0.0000])', '-f', 's16le', '-ac', '1', '-acodec', 'pcm_s16le', '-ar', '16000', '-']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "qwen_demo.py", line 18, in <module>
    res = model.generate(input=audio_in, prompt=prompt, batch_size_s=0,)
  File "/root/miniconda3/envs/funasr/lib/python3.8/site-packages/funasr/auto/auto_model.py", line 248, in generate
    return self.inference_with_vad(input, input_len=input_len, **cfg)
  File "/root/miniconda3/envs/funasr/lib/python3.8/site-packages/funasr/auto/auto_model.py", line 394, in inference_with_vad
    results = self.inference(
  File "/root/miniconda3/envs/funasr/lib/python3.8/site-packages/funasr/auto/auto_model.py", line 285, in inference
    res = model.inference(**batch, **kwargs)
  File "/root/miniconda3/envs/funasr/lib/python3.8/site-packages/funasr/models/qwen_audio/model.py", line 66, in inference
    audio_info = self.tokenizer.process_audio(query)
  File "/root/.cache/huggingface/modules/transformers_modules/Qwen-Audio/tokenization_qwen.py", line 556, in process_audio
    audio = load_audio(audio_path)
  File "/root/.cache/huggingface/modules/transformers_modules/Qwen-Audio/audio.py", line 93, in load_audio
    raise RuntimeError(f"Failed to load audio: {e.stderr.decode()}") from e
RuntimeError: Failed to load audio: ffmpeg version 4.2.7-0ubuntu0.1 Copyright (c) 2000-2022 the FFmpeg developers
  built with gcc 9 (Ubuntu 9.4.0-1ubuntu1~20.04.1)
  configuration: --prefix=/usr --extra-version=0ubuntu0.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-avresample --disable-filter=resample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librsvg --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-nvenc --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared
  libavutil      56. 31.100 / 56. 31.100
  libavcodec     58. 54.100 / 58. 54.100
  libavformat    58. 29.100 / 58. 29.100
  libavdevice    58.  8.100 / 58.  8.100
  libavfilter     7. 57.100 /  7. 57.100
  libavresample   4.  0.  0 /  4.  0.  0
  libswscale      5.  5.100 /  5.  5.100
  libswresample   3.  5.100 /  3.  5.100
  libpostproc    55.  5.100 / 55.  5.100
tensor([-0.0001, -0.0002,  0.0007,  ...,  0.0000,  0.0000,  0.0000]): No such file or directory

  0%|                                                                                                                                                                               | 0/1 [00:00<?, ?it/s]
  0%|                                                                                                                                                                               | 0/1 [00:00<?, ?it/s]

Code sample

qwen_demo.py

#!/usr/bin/env python3
# -*- encoding: utf-8 -*-
# Copyright FunASR (https://github.com/alibaba-damo-academy/FunASR). All Rights Reserved.
#  MIT License  (https://opensource.org/licenses/MIT)

# To install requirements: pip3 install -U "funasr[llm]"

from funasr import AutoModel

model = AutoModel(model="Qwen-Audio",
        vad_model="iic/speech_fsmn_vad_zh-cn-16k-common-pytorch",
        vad_kwargs={"max_single_segment_time": 30000},
        )

audio_in = "asr_example_zh.wav"
prompt = "<|startoftranscription|><|zh|><|transcribe|><|zh|><|notimestamps|><|wo_itn|>"

res = model.generate(input=audio_in, prompt=prompt, batch_size_s=0,)
print(res)

Environment

OS: Ubuntu 20.04.6 LTS
FunASR Version: 1.0.26
ModelScope Version: 1.14.0
PyTorch Version: 2.3.0
How you installed funasr: pip
Python version: 3.8.19
GPU: 4090
CUDA/cuDNN version: cuda11.8

The text was updated successfully, but these errors were encountered:

LauraGPT · 2024-05-15T12:03:35Z

on going

zhangyucha0 added the bug Something isn't working label May 14, 2024

LauraGPT self-assigned this May 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qwen-Audio + VAD 搭配使用报错 #1728

Qwen-Audio + VAD 搭配使用报错 #1728

zhangyucha0 commented May 14, 2024

LauraGPT commented May 15, 2024

Qwen-Audio + VAD 搭配使用报错 #1728

Qwen-Audio + VAD 搭配使用报错 #1728

Comments

zhangyucha0 commented May 14, 2024

🐛 Bug

To Reproduce

Code sample

Environment

LauraGPT commented May 15, 2024