Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

streamlit iGPU -RuntimeError: Native API failed. Native API returns: -999 (Unknown PI error) -999 (Unknown PI error) #10778

Open
JamieVC opened this issue Apr 17, 2024 · 1 comment

Comments

@JamieVC
Copy link

JamieVC commented Apr 17, 2024

I followed installation guide https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html
in order to run IPEX-LLM on iGPU of Meteor Lake with Windows OS platform.

---Steps to set environment ---
(llm) PS C:\Users\S54 PR> pip install --pre --upgrade ipex-llm[xpu]
(llm) PS C:\Users\S54 PR> pip install streamlit streamlit_chat
(llm) PS C:\Users\S54 PR> New-Item -Path Env:\BIGDL_LLM_XMX_DISABLED -Value '1'
(llm) PS C:\Users\S54 PR> New-Item -Path Env:\SYCL_CACHE_PERSISTENT -Value '1'

---Good when called by an IPEX-llm official demo---
It works well when I run demo_ipexllm.py
that comes from https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/install_windows_gpu.html

image

---Error when called by a streamlit webapp---

I made a streamlit webapp that can be found in chat_streamlit_20240416.zip .
Run it on Windows OS, and then I got error message below. Would you have suggestions to improve the issue?

(llm) PS C:\source\ipex\demo_igpu> streamlit run chat_streamlit.py -- --model_id "meta-llama/Llama-2-7b-chat-hf"

You can now view your Streamlit app in your browser.

Local URL: http://localhost:8505/
Network URL: http://10.174.192.123:8505/

C:\miniconda3\envs\llm\Lib\site-packages\torchvision\io\image.py:13: UserWarning: Failed to load image Python extension: 'Could not find module 'C:\miniconda3\envs\llm\Lib\site-packages\torchvision\image.pyd' (or one of its dependencies). Try using the full path with constructor syntax.'If you don't plan on using image functionality from torchvision.io, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have libjpeg or libpng installed before building torchvision from source?
warn(
2024-04-16 17:39:57,163 - INFO - intel_extension_for_pytorch auto imported
LlamaModel()
Loading models...
Loading models...
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 2.33it/s]
2024-04-16 17:40:00,833 - INFO - Converting the current model to sym_int4 format......
Successfully loaded Tokenizer and optimized Model!
Configuration...
Configuration...
question:
question:
time taken 0.00048689999857742805
time taken 0.0002223999999841908
Loading models...
Configuration...
question: what is human?
time taken 0.0009090999992622528
Loading models...
Configuration...
question: what is human?
Send!
next_answer
Preparing the response
st empty
build_inputs()
prompt: [INST] <>
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.

If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
<>

what is human? [/INST]
generate_iterate()
TextIteratorStreamer()
Thread()
t.start()
C:\miniconda3\envs\llm\Lib\site-packages\ipex_llm\transformers\models\llama.py:238: UserWarning: Passing padding_mask is deprecated and will be removed in v4.37.Please make sure use attention_mask instead.`
warnings.warn(
Exception in thread Thread-17 (generate):
Traceback (most recent call last):
File "C:\miniconda3\envs\llm\Lib\threading.py", line 1045, in _bootstrap_inner
self.run()
File "C:\miniconda3\envs\llm\Lib\threading.py", line 982, in run
self._target(*self._args, **self._kwargs)
File "C:\miniconda3\envs\llm\Lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "C:\miniconda3\envs\llm\Lib\site-packages\transformers\generation\utils.py", line 1588, in generate
return self.sample(
^^^^^^^^^^^^
File "C:\miniconda3\envs\llm\Lib\site-packages\transformers\generation\utils.py", line 2642, in sample
outputs = self(
^^^^^
File "C:\miniconda3\envs\llm\Lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\miniconda3\envs\llm\Lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\miniconda3\envs\llm\Lib\site-packages\transformers\models\llama\modeling_llama.py", line 806, in forward
outputs = self.model(
^^^^^^^^^^^
File "C:\miniconda3\envs\llm\Lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\miniconda3\envs\llm\Lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\miniconda3\envs\llm\Lib\site-packages\ipex_llm\transformers\models\llama.py", line 1980, in llama_model_forward
layer_outputs = decoder_layer(
^^^^^^^^^^^^^^
File "C:\miniconda3\envs\llm\Lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\miniconda3\envs\llm\Lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\miniconda3\envs\llm\Lib\site-packages\ipex_llm\transformers\models\llama.py", line 248, in llama_decoder_forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
^^^^^^^^^^^^^^^
File "C:\miniconda3\envs\llm\Lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\miniconda3\envs\llm\Lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\miniconda3\envs\llm\Lib\site-packages\ipex_llm\transformers\models\llama.py", line 334, in llama_attention_forward_4_31
return forward_function(
^^^^^^^^^^^^^^^^^
File "C:\miniconda3\envs\llm\Lib\site-packages\ipex_llm\transformers\models\llama.py", line 399, in llama_attention_forward_4_31_quantized
query_states = self.q_proj(hidden_states)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\miniconda3\envs\llm\Lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\miniconda3\envs\llm\Lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\miniconda3\envs\llm\Lib\site-packages\ipex_llm\transformers\low_bit_linear.py", line 685, in forward
result = linear_q4_0.forward_new(x_2d, self.weight.data, self.weight.qtype,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Native API failed. Native API returns: -999 (Unknown PI error) -999 (Unknown PI error)

Thanks
Jamie

@Oscilloscope98
Copy link
Contributor

Oscilloscope98 commented Apr 17, 2024

Hi @JamieVC,

On our machine (MTL iGPU with 16GB mem), we did not reproduce your issue.

However, Native API returns: -999 seems to infer an OOM error. There are several things you could choose to do to save iGPU memory, especially if you only have 8GB memory for iGPU:

  1. Restart your machine to release idle iGPU memory usage
  2. set cpu_embedding=True in the from_pretrained function (seems that you have done this one)
  3. set IPEX_LLM_LOW_MEM=1 in your environment
  4. use model = model.half().to('xpu') instead of model = model.to('xpu')

Please let us know for any further problems :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants