Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XPU inference failed #78

Open
doubtfire009 opened this issue Mar 31, 2024 · 1 comment
Open

XPU inference failed #78

doubtfire009 opened this issue Mar 31, 2024 · 1 comment

Comments

@doubtfire009
Copy link

doubtfire009 commented Mar 31, 2024

My code is based on bigdl-llm

`from langchain import LLMChain, PromptTemplate
from bigdl.llm.langchain.llms import TransformersLLM
from langchain.memory import ConversationBufferWindowMemory

chatglm3_6b = 'D:/AI_projects/Langchain-Chatchat/llm_model/THUDM/chatglm3-6b'

llm_model_path = chatglm3_6b # huggingface llm 模型的路径

CHATGLM_V3_PROMPT_TEMPLATE = "问:{prompt}\n\n答:"

prompt = PromptTemplate(input_variables=["history", "human_input"], template=CHATGLM_V3_PROMPT_TEMPLATE)
max_new_tokens = 128

llm = TransformersLLM.from_model_id(
model_id=llm_model_path,
model_kwargs={"trust_remote_code": True, "temperature": 0},
)

llm_chain = LLMChain(
llm=llm,
prompt=prompt,
verbose=True,
llm_kwargs={"max_new_tokens":max_new_tokens},
memory=ConversationBufferWindowMemory(k=2),
)

VICUNA_PROMPT_TEMPLATE = "USER: {prompt}\nASSISTANT:"

llm_result = llm.generate([VICUNA_PROMPT_TEMPLATE.format(prompt="讲一个笑话"), VICUNA_PROMPT_TEMPLATE.format(prompt="作一首诗")]*3)

print("-"*20+"number of generations"+"-"*20)
print(len(llm_result.generations))
print("-"*20+"the first generation"+"-"*20)
print(llm_result.generations[0][0].text)
`

but returns:

`Traceback (most recent call last):
File "D:\AI_projects\ipex-samples\main-bigdl.py", line 32, in
llm_result = llm.generate([VICUNA_PROMPT_TEMPLATE.format(prompt="讲一个笑话"), VICUNA_PROMPT_TEMPLATE.format(prompt="作一首诗")]*3)
File "C:\ProgramData\anaconda3\envs\llm_39\lib\site-packages\langchain_core\language_models\llms.py", line 741, in generate
output = self._generate_helper(
File "C:\ProgramData\anaconda3\envs\llm_39\lib\site-packages\langchain_core\language_models\llms.py", line 605, in _generate_helper
raise e
File "C:\ProgramData\anaconda3\envs\llm_39\lib\site-packages\langchain_core\language_models\llms.py", line 592, in _generate_helper
self._generate(
File "C:\ProgramData\anaconda3\envs\llm_39\lib\site-packages\langchain_core\language_models\llms.py", line 1177, in _generate
self._call(prompt, stop=stop, run_manager=run_manager, **kwargs)
File "C:\ProgramData\anaconda3\envs\llm_39\lib\site-packages\bigdl\llm\langchain\llms\transformersllm.py", line 248, in _call
output = self.model.generate(input_ids, streamer=streamer,
File "C:\ProgramData\anaconda3\envs\llm_39\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "C:\ProgramData\anaconda3\envs\llm_39\lib\site-packages\transformers\generation\utils.py", line 1538, in generate
return self.greedy_search(
File "C:\ProgramData\anaconda3\envs\llm_39\lib\site-packages\transformers\generation\utils.py", line 2362, in greedy_search
outputs = self(
File "C:\ProgramData\anaconda3\envs\llm_39\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\ProgramData\anaconda3\envs\llm_39\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\conte/.cache\huggingface\modules\transformers_modules\chatglm3-6b\modeling_chatglm.py", line 941, in forward
transformer_outputs = self.transformer(
File "C:\ProgramData\anaconda3\envs\llm_39\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\ProgramData\anaconda3\envs\llm_39\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "C:\ProgramData\anaconda3\envs\llm_39\lib\site-packages\bigdl\llm\transformers\models\chatglm2.py", line 167, in chatglm2_model_forward
hidden_states, presents, all_hidden_states, all_self_attentions = self.encoder(
File "C:\ProgramData\anaconda3\envs\llm_39\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\ProgramData\anaconda3\envs\llm_39\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\conte/.cache\huggingface\modules\transformers_modules\chatglm3-6b\modeling_chatglm.py", line 641, in forward
layer_ret = layer(
File "C:\ProgramData\anaconda3\envs\llm_39\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\ProgramData\anaconda3\envs\llm_39\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\conte/.cache\huggingface\modules\transformers_modules\chatglm3-6b\modeling_chatglm.py", line 544, in forward
attention_output, kv_cache = self.self_attention(
File "C:\ProgramData\anaconda3\envs\llm_39\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\ProgramData\anaconda3\envs\llm_39\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "C:\ProgramData\anaconda3\envs\llm_39\lib\site-packages\bigdl\llm\transformers\models\chatglm2.py", line 191, in chatglm2_attention_forward
return forward_function(
File "C:\ProgramData\anaconda3\envs\llm_39\lib\site-packages\bigdl\llm\transformers\models\chatglm2.py", line 377, in chatglm2_attention_forward_8eb45c
query_layer = apply_rotary_pos_emb_chatglm(query_layer, rotary_pos_emb)
NotImplementedError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
RuntimeError: Could not run 'torch_ipex::mul_add' with arguments from the 'CPU' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'torch_ipex::mul_add' is only available for these backends: [XPU, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradMPS, AutogradXPU, AutogradHPU, AutogradLazy, AutogradMeta, Tracer, AutocastCPU, AutocastXPU, AutocastCUDA, FuncTorchBatched, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PreDispatch, PythonDispatcher].

XPU: registered at C:/jenkins/workspace/IPEX-GPU-ARC770-windows/frameworks.ai.pytorch.ipex-gpu/csrc/gpu/aten/operators/TripleOps.cpp:521 [kernel]
BackendSelect: fallthrough registered at C:\jenkins\workspace\IPEX-GPU-ARC770-windows\frameworks.ai.pytorch.private-gpu\aten\src\ATen\core\BackendSelectFallbackKernel.cpp:3 [backend fallback]
Python: registered at C:\jenkins\workspace\IPEX-GPU-ARC770-windows\frameworks.ai.pytorch.private-gpu\aten\src\ATen\core\PythonFallbackKernel.cpp:153 [backend fallback]
FuncTorchDynamicLayerBackMode: registered at C:\jenkins\workspace\IPEX-GPU-ARC770-windows\frameworks.ai.pytorch.private-gpu\aten\src\ATen\functorch\DynamicLayer.cpp:498 [backend fallback]
Functionalize: registered at C:\jenkins\workspace\IPEX-GPU-ARC770-windows\frameworks.ai.pytorch.private-gpu\aten\src\ATen\FunctionalizeFallbackKernel.cpp:290 [backend fallback]
Named: registered at C:\jenkins\workspace\IPEX-GPU-ARC770-windows\frameworks.ai.pytorch.private-gpu\aten\src\ATen\core\NamedRegistrations.cpp:7 [backend fallback]
Conjugate: registered at C:\jenkins\workspace\IPEX-GPU-ARC770-windows\frameworks.ai.pytorch.private-gpu\aten\src\ATen\ConjugateFallback.cpp:17 [backend fallback]
Negative: registered at C:\jenkins\workspace\IPEX-GPU-ARC770-windows\frameworks.ai.pytorch.private-gpu\aten\src\ATen\native\NegateFallback.cpp:19 [backend fallback]
ZeroTensor: registered at C:\jenkins\workspace\IPEX-GPU-ARC770-windows\frameworks.ai.pytorch.private-gpu\aten\src\ATen\ZeroTensorFallback.cpp:86 [backend fallback]
ADInplaceOrView: fallthrough registered at C:\jenkins\workspace\IPEX-GPU-ARC770-windows\frameworks.ai.pytorch.private-gpu\aten\src\ATen\core\VariableFallbackKernel.cpp:86 [backend fallback]
AutogradOther: registered at C:\jenkins\workspace\IPEX-GPU-ARC770-windows\frameworks.ai.pytorch.private-gpu\aten\src\ATen\core\VariableFallbackKernel.cpp:53 [backend fallback]
AutogradCPU: registered at C:\jenkins\workspace\IPEX-GPU-ARC770-windows\frameworks.ai.pytorch.private-gpu\aten\src\ATen\core\VariableFallbackKernel.cpp:57 [backend fallback]
AutogradCUDA: registered at C:\jenkins\workspace\IPEX-GPU-ARC770-windows\frameworks.ai.pytorch.private-gpu\aten\src\ATen\core\VariableFallbackKernel.cpp:65 [backend fallback]
AutogradXLA: registered at C:\jenkins\workspace\IPEX-GPU-ARC770-windows\frameworks.ai.pytorch.private-gpu\aten\src\ATen\core\VariableFallbackKernel.cpp:69 [backend fallback]
AutogradMPS: registered at C:\jenkins\workspace\IPEX-GPU-ARC770-windows\frameworks.ai.pytorch.private-gpu\aten\src\ATen\core\VariableFallbackKernel.cpp:77 [backend fallback]
AutogradXPU: registered at C:\jenkins\workspace\IPEX-GPU-ARC770-windows\frameworks.ai.pytorch.private-gpu\aten\src\ATen\core\VariableFallbackKernel.cpp:61 [backend fallback]
AutogradHPU: registered at C:\jenkins\workspace\IPEX-GPU-ARC770-windows\frameworks.ai.pytorch.private-gpu\aten\src\ATen\core\VariableFallbackKernel.cpp:90 [backend fallback]
AutogradLazy: registered at C:\jenkins\workspace\IPEX-GPU-ARC770-windows\frameworks.ai.pytorch.private-gpu\aten\src\ATen\core\VariableFallbackKernel.cpp:73 [backend fallback]
AutogradMeta: registered at C:\jenkins\workspace\IPEX-GPU-ARC770-windows\frameworks.ai.pytorch.private-gpu\aten\src\ATen\core\VariableFallbackKernel.cpp:81 [backend fallback]
Tracer: registered at C:\jenkins\workspace\IPEX-GPU-ARC770-windows\frameworks.ai.pytorch.private-gpu\torch\csrc\autograd\TraceTypeManual.cpp:296 [backend fallback]
AutocastCPU: fallthrough registered at C:\jenkins\workspace\IPEX-GPU-ARC770-windows\frameworks.ai.pytorch.private-gpu\aten\src\ATen\autocast_mode.cpp:382 [backend fallback]
AutocastXPU: registered at C:/jenkins/workspace/IPEX-GPU-ARC770-windows/frameworks.ai.pytorch.ipex-gpu/csrc/gpu/aten/operators/TripleOps.cpp:521 [kernel]
AutocastCUDA: fallthrough registered at C:\jenkins\workspace\IPEX-GPU-ARC770-windows\frameworks.ai.pytorch.private-gpu\aten\src\ATen\autocast_mode.cpp:249 [backend fallback]
FuncTorchBatched: registered at C:\jenkins\workspace\IPEX-GPU-ARC770-windows\frameworks.ai.pytorch.private-gpu\aten\src\ATen\functorch\LegacyBatchingRegistrations.cpp:710 [backend fallback]
FuncTorchVmapMode: fallthrough registered at C:\jenkins\workspace\IPEX-GPU-ARC770-windows\frameworks.ai.pytorch.private-gpu\aten\src\ATen\functorch\VmapModeRegistrations.cpp:28 [backend fallback]
Batched: registered at C:\jenkins\workspace\IPEX-GPU-ARC770-windows\frameworks.ai.pytorch.private-gpu\aten\src\ATen\LegacyBatchingRegistrations.cpp:1075 [backend fallback]
VmapMode: fallthrough registered at C:\jenkins\workspace\IPEX-GPU-ARC770-windows\frameworks.ai.pytorch.private-gpu\aten\src\ATen\VmapModeRegistrations.cpp:33 [backend fallback]
FuncTorchGradWrapper: registered at C:\jenkins\workspace\IPEX-GPU-ARC770-windows\frameworks.ai.pytorch.private-gpu\aten\src\ATen\functorch\TensorWrapper.cpp:203 [backend fallback]
PythonTLSSnapshot: registered at C:\jenkins\workspace\IPEX-GPU-ARC770-windows\frameworks.ai.pytorch.private-gpu\aten\src\ATen\core\PythonFallbackKernel.cpp:161 [backend fallback]
FuncTorchDynamicLayerFrontMode: registered at C:\jenkins\workspace\IPEX-GPU-ARC770-windows\frameworks.ai.pytorch.private-gpu\aten\src\ATen\functorch\DynamicLayer.cpp:494 [backend fallback]
PreDispatch: registered at C:\jenkins\workspace\IPEX-GPU-ARC770-windows\frameworks.ai.pytorch.private-gpu\aten\src\ATen\core\PythonFallbackKernel.cpp:165 [backend fallback]
PythonDispatcher: registered at C:\jenkins\workspace\IPEX-GPU-ARC770-windows\frameworks.ai.pytorch.private-gpu\aten\src\ATen\core\PythonFallbackKernel.cpp:157 [backend fallback]`

my pip list is:
Package Version


accelerate 0.21.0
aiofiles 23.2.1
aiohttp 3.9.3
aiosignal 1.3.1
altair 5.3.0
annotated-types 0.6.0
antlr4-python3-runtime 4.9.3
anyio 4.3.0
arxiv 2.1.0
async-timeout 4.0.3
attrs 23.2.0
backoff 2.2.1
beautifulsoup4 4.12.3
bigdl-core-xe-21 2.5.0b20240324
bigdl-llm 2.5.0b20240330
blinker 1.7.0
blis 0.7.11
Brotli 1.1.0
cachetools 5.3.3
catalogue 2.0.10
certifi 2024.2.2
cffi 1.16.0
chardet 5.2.0
charset-normalizer 3.3.2
click 8.1.7
cloudpathlib 0.16.0
colorama 0.4.6
coloredlogs 15.0.1
confection 0.1.4
contourpy 1.2.0
cryptography 42.0.5
cycler 0.12.1
cymem 2.0.8
dataclasses-json 0.6.4
deepdiff 6.7.1
Deprecated 1.2.14
deprecation 2.1.0
distro 1.9.0
duckduckgo-search 3.9.9
effdet 0.4.1
einops 0.7.0
emoji 2.11.0
et-xmlfile 1.1.0
exceptiongroup 1.2.0
faiss-cpu 1.7.4
fastapi 0.109.0
feedparser 6.0.10
filelock 3.13.3
filetype 1.2.0
flatbuffers 24.3.25
fonttools 4.50.0
frozenlist 1.4.1
fschat 0.2.35
fsspec 2024.3.1
gitdb 4.0.11
GitPython 3.1.43
greenlet 3.0.3
h11 0.14.0
h2 4.1.0
hpack 4.0.0
httpcore 1.0.5
httpx 0.26.0
httpx-sse 0.4.0
huggingface-hub 0.22.2
humanfriendly 10.0
hyperframe 6.0.1
idna 3.6
importlib_metadata 7.1.0
importlib_resources 6.4.0
iniconfig 2.0.0
intel-extension-for-pytorch 2.1.10+xpu
intel-openmp 2024.1.0
iopath 0.1.10
Jinja2 3.1.3
joblib 1.3.2
jsonpatch 1.33
jsonpath-python 1.0.6
jsonpointer 2.4
jsonschema 4.21.1
jsonschema-specifications 2023.12.1
kiwisolver 1.4.5
langchain 0.0.354
langchain-community 0.0.20
langchain-core 0.1.23
langchain-experimental 0.0.47
langcodes 3.3.0
langdetect 1.0.9
langsmith 0.0.87
layoutparser 0.3.4
llama-index 0.9.35
lxml 5.2.0
Markdown 3.6
markdown-it-py 3.0.0
markdown2 2.4.13
markdownify 0.11.6
MarkupSafe 2.1.5
marshmallow 3.21.1
matplotlib 3.8.3
mdurl 0.1.2
metaphor-python 0.1.23
mpmath 1.3.0
msg-parser 1.2.0
multidict 6.0.5
murmurhash 1.0.10
mypy-extensions 1.0.0
nest-asyncio 1.6.0
networkx 3.2.1
nh3 0.2.17
nltk 3.8.1
numexpr 2.8.6
numpy 1.26.4
olefile 0.47
omegaconf 2.3.0
onnx 1.16.0
onnxruntime 1.15.1
openai 1.9.0
opencv-python 4.9.0.80
openpyxl 3.1.2
ordered-set 4.1.0
packaging 23.2
pandas 2.0.3
pathlib 1.0.1
pdf2image 1.17.0
pdfminer.six 20231228
pdfplumber 0.11.0
pikepdf 8.4.1
Pillow 9.5.0
pillow_heif 0.15.0
pip 23.3.1
pluggy 1.4.0
portalocker 2.8.2
preshed 3.0.9
prompt-toolkit 3.0.43
protobuf 4.25.3
psutil 5.9.8
py-cpuinfo 9.0.0
pyarrow 15.0.2
pyclipper 1.3.0.post5
pycocotools 2.0.7
pycparser 2.22
pydantic 1.10.13
pydantic_core 2.16.3
pydeck 0.8.1b0
Pygments 2.17.2
PyJWT 2.8.0
pylibjpeg-libjpeg 2.1.0
PyMuPDF 1.23.16
PyMuPDFb 1.23.9
pypandoc 1.13
pyparsing 3.1.2
pypdf 4.1.0
pypdfium2 4.28.0
pyreadline3 3.4.1
pytesseract 0.3.10
pytest 7.4.3
python-dateutil 2.9.0.post0
python-decouple 3.8
python-docx 1.1.0
python-iso639 2024.2.7
python-magic 0.4.27
python-magic-bin 0.4.14
python-multipart 0.0.9
python-pptx 0.6.23
pytz 2024.1
pywin32 306
PyYAML 6.0.1
rapidfuzz 3.7.0
rapidocr-onnxruntime 1.3.8
referencing 0.34.0
regex 2023.12.25
requests 2.31.0
rich 13.7.1
rpds-py 0.18.0
safetensors 0.4.2
scikit-learn 1.4.1.post1
scipy 1.12.0
sentence-transformers 2.2.2
sentencepiece 0.2.0
setuptools 68.2.2
sgmllib3k 1.0.0
shapely 2.0.3
shortuuid 1.0.13
simplejson 3.19.2
six 1.16.0
smart-open 6.4.0
smmap 5.0.1
sniffio 1.3.1
socksio 1.0.0
soupsieve 2.5
spacy 3.7.2
spacy-legacy 3.0.12
spacy-loggers 1.0.5
SQLAlchemy 2.0.25
srsly 2.4.8
sse-starlette 1.8.2
starlette 0.35.0
streamlit 1.30.0
streamlit-aggrid 0.3.4.post3
streamlit-antd-components 0.3.1
streamlit-chatbox 1.1.11
streamlit-feedback 0.1.3
streamlit-modal 0.1.0
streamlit-option-menu 0.3.12
strsimpy 0.2.1
svgwrite 1.4.3
sympy 1.12
tabulate 0.9.0
tenacity 8.2.3
thinc 8.2.3
threadpoolctl 3.4.0
tiktoken 0.5.2
timm 0.9.16
tokenizers 0.13.3
toml 0.10.2
tomli 2.0.1
toolz 0.12.1
torch 2.1.0a0+cxx11.abi
torchaudio 2.1.2
torchvision 0.16.0a0+cxx11.abi
tornado 6.4
tqdm 4.66.1
transformers 4.31.0
transformers-stream-generator 0.0.4
typer 0.9.4
typing_extensions 4.10.0
typing-inspect 0.9.0
tzdata 2024.1
tzlocal 5.2
unstructured 0.12.5
unstructured-client 0.22.0
unstructured-inference 0.7.23
unstructured.pytesseract 0.3.12
urllib3 2.2.1
uvicorn 0.29.0
validators 0.24.0
wasabi 1.1.2
watchdog 3.0.0
wavedrom 2.0.3.post3
wcwidth 0.2.13
weasel 0.3.4
websockets 12.0
wheel 0.41.2
wrapt 1.16.0
xformers 0.0.23.post1
xlrd 2.0.1
XlsxWriter 3.2.0
yarl 1.9.4
youtube-search 2.1.2
zipp 3.18.1

Hope you can help me with this!

@Oscilloscope98
Copy link
Collaborator

Hi @doubtfire009,

It seems like you are using a xpu environment but running your code on CPU.

Would you mind changing your code with:

llm = TransformersLLM.from_model_id(
model_id=llm_model_path,
model_kwargs={"trust_remote_code": True, "temperature": 0},
device_map='xpu'
)

and have a try again? :)

For more details with running our optimization with LangChain on Intel GPU, you could refer to here for more information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants