Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flex 170 GPU ollama unbale to detect GPU and sycl-ls also not detecting it !!! #10801

Open
shailesh837 opened this issue Apr 18, 2024 · 3 comments

Comments

@shailesh837
Copy link

I am testing old ollama bin file with llama2:7b model on Intel Flex 170 GPU:
I have followed new ipex-llm documentation for driver install and rest steps, but when i am running ollama serve it doesn't detect GPU also i see some dll error when ollama serve run.
Attaching ollama serve log in issue :
ollama_serve_flex_170.log

spandey2@IMU-NEX-EMR1-SUT:~/LLM_SceneScape_ChatBot$ sudo xpu-smi discovery
+-----------+--------------------------------------------------------------------------------------+
| Device ID | Device Information |
+-----------+--------------------------------------------------------------------------------------+
| 0 | Device Name: Intel(R) Data Center GPU Flex 170
|
| | Vendor Name: Intel(R) Corporation |
| | SOC UUID: 00000000-0000-0000-dc8d-d308092ec026 |
| | PCI BDF Address: 0000:29:00.0 |
| | DRM Device: /dev/dri/card1 |
| | Function Type: physical |
+-----------+--------------------------------------------------------------------------------------+

spandey2@IMU-NEX-EMR1-SUT:~/LLM_SceneScape_ChatBot$ sycl-ls
[opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2 [2023.16.12.0.12_195853.xmain-hotfix]
[opencl:cpu:1] Intel(R) OpenCL, INTEL(R) XEON(R) PLATINUM 8580 OpenCL 3.0 (Build 0) [2023.16.12.0.12_195853.xmain-hotfix]

Modelfile :
FROM llama2:latest
PARAMETER num_gpu 999
PARAMETER temperature 0
PARAMETER num_ctx 4096
PARAMETER use_mmap false

Ollama serve log:
time=2024-04-18T23:44:00.314+02:00 level=INFO source=gpu.go:285 msg="Searching for GPU management library libze_intel_gpu.so"
time=2024-04-18T23:44:00.315+02:00 level=INFO source=gpu.go:331 msg="Discovered GPU libraries: [/usr/lib/x86_64-linux-gnu/libze_intel_gpu.so.1.3.27642.40]"
time=2024-04-18T23:44:00.345+02:00 level=INFO source=gpu.go:377 msg="Unable to load oneAPI management library /usr/lib/x86_64-linux-gnu/libze_intel_gpu.so.1.3.27642.40: oneapi vram init failure: 2013265921"
time=2024-04-18T23:44:00.345+02:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-04-18T23:44:00.345+02:00 level=INFO source=routes.go:1044 msg="no GPU detected"
[GIN] 2024/04/18 - 23:45:10 | 200 | 120.449µs | 127.0.0.1 | HEAD "/"
[GIN] 2024/04/18 - 23:45:10 | 200 | 1.357185ms | 127.0.0.1 | GET "/api/tags"
time=2024-04-18T23:46:49.533+02:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-04-18T23:46:49.533+02:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-04-18T23:46:49.533+02:00 level=INFO source=llm.go:77 msg="GPU not available, falling back to CPU"
loading library /tmp/ollama1871007333/cpu_avx2/libext_server.so

llm_load_print_meta: LF token = 13 '<0x0A>'
llm_load_tensors: ggml ctx size = 0.11 MiB
llm_load_tensors: offloading 0 repeating layers to GPU
llm_load_tensors: offloaded 0/33 layers to GPU
llm_load_tensors: CPU buffer size = 3647.87 MiB
..................................................................................................
llama_new_context_with_model: n_ctx = 4096
llama_new_context_with_model: freq_base = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init: CPU KV buffer size = 2048.00 MiB
llama_new_context_with_model: KV self size = 2048.00 MiB, K (f16): 1024.00 MiB, V (f16): 1024.00 MiB
llama_new_context_with_model: CPU input buffer size = 16.02 MiB
llama_new_context_with_model: CPU compute buffer size = 308.00 MiB

sudo xpu-smi stats -d 0 IMU-NEX-EMR1-SUT: Thu Apr 18 23:49:41 2024

+-----------------------------+--------------------------------------------------------------------+
| Device ID | 0 |
+-----------------------------+--------------------------------------------------------------------+
| GPU Utilization (%) | 0 |
| EU Array Active (%) | N/A |
| EU Array Stall (%) | N/A |
| EU Array Idle (%) | N/A |
| | |
| Compute Engine Util (%) | 0; Engine 0: 0, Engine 1: 0, Engine 2: 0, Engine 3: 0 |
| Render Engine Util (%) | 0; Engine 0: 0 |
| Media Engine Util (%) | 0 |
| Decoder Engine Util (%) | Engine 0: 0, Engine 1: 0 |
| Encoder Engine Util (%) | Engine 0: 0, Engine 1: 0 |
| Copy Engine Util (%) | 0; Engine 0: 0 |
| Media EM Engine Util (%) | Engine 0: 0, Engine 1: 0 |
| 3D Engine Util (%) | N/A |
+-----------------------------+--------------------------------------------------------------------+
| Reset | N/A |
| Programming Errors | N/A |
| Driver Errors | N/A |
| Cache Errors Correctable | N/A |
| Cache Errors Uncorrectable | N/A |
| Mem Errors Correctable | N/A |
| Mem Errors Uncorrectable | N/A |
+-----------------------------+--------------------------------------------------------------------+
| GPU Power (W) | 43 |
| GPU Frequency (MHz) | 2050 |
| Media Engine Freq (MHz) | 1025 |
| GPU Core Temperature (C) | 87 |
| GPU Memory Temperature (C) | N/A |
| GPU Memory Read (kB/s) | N/A |
| GPU Memory Write (kB/s) | N/A
|
| GPU Memory Bandwidth (%) | 0 |
| GPU Memory Used (MiB) | 31 |
| GPU Memory Util (%) | 0 |
| Xe Link Throughput (kB/s) | N/A |
+-----------------------------+--------------------------------------------------------------------+

@sgwhat
Copy link
Contributor

sgwhat commented Apr 19, 2024

Hi Shailesh, I didn't see any gpu device from sycl-ls in your log. Could you please check your oneapi installation and remember to source /opt/intel/oneapi/setvars.sh?

spandey2@IMU-NEX-EMR1-SUT:~/LLM_SceneScape_ChatBot$ sycl-ls
[opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2 [2023.16.12.0.12_195853.xmain-hotfix]
[opencl:cpu:1] Intel(R) OpenCL, INTEL(R) XEON(R) PLATINUM 8580 OpenCL 3.0 (Build 0) [2023.16.12.0.12_195853.xmain-hotfix]

@shailesh837
Copy link
Author

Issue was libllama_bigdl_core.so was missing in /usr/lib folder, But there are 2 important issues we are seeing :
a) why we need to create Modelfile with below parameters:
FROM llama2:latest
PARAMETER num_gpu 999
PARAMETER temperature 0
PARAMETER num_ctx 4096
PARAMETER use_mmap false

As older version of ollama was working without modelfile and we don't need to set it to GPU, it was running all layers to GPU.

b) The Loading of model by ollama serve takes 50 seconds and response takes 20 seconds ? why
Make model and same ollama serve on Arc GPU 770 [16 GB GPU memory , same as Flex 170] does loading in 10 seconds or less and response is 1-2 seconds

@sgwhat
Copy link
Contributor

sgwhat commented Apr 22, 2024

Hi @shailesh837,

  1. You may switch to our latest release version of Ollama by using the command pip install --pre --upgrade ipex-llm[cpp]. In this version, libllama_bigdl_core.so is no longer required.
  2. In the latest version of Ollama, it is not necessary to set PARAMETER num_gpu 999 in the Model file. The usage remains the same as in the previous versions.
  3. Please ensure that Ollama is running on a GPU device. We will investigate the causes of the reduced performance on Flex170.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants