Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash in hipDriverGetVersion on windows #4094

Closed
ggjk616 opened this issue May 2, 2024 · 7 comments
Closed

Crash in hipDriverGetVersion on windows #4094

ggjk616 opened this issue May 2, 2024 · 7 comments
Assignees
Labels
amd Issues relating to AMD GPUs and ROCm bug Something isn't working windows

Comments

@ggjk616
Copy link

ggjk616 commented May 2, 2024

What is the issue?

Can you help me,In the documentation, I noticed the following statement: “You can set OLLAMA_LLM_LIBRARY to any of the available LLM libraries to bypass autodetection, so for example, if you have a CUDA card, but want to force the CPU LLM library with AVX2 vector support, use:
OLLAMA_LLM_LIBRARY="cpu_avx2" ollama serve”
But After setting OLLAMA_LLM_LIBRARY=“cpu_avx2”, the program still detects my GPU when loading the model, resulting in an error: Error: Post “https://127.0.0.1:11434/api/chat”: read tcp 127.0.0.1:56915->127.0.0.1:11434: wsarecv: An existing connection was forcibly closed by the remote host.

OS

Windows

GPU

AMD

CPU

Intel

Ollama version

No response

@ggjk616 ggjk616 added the bug Something isn't working label May 2, 2024
@dhiltgen
Copy link
Collaborator

dhiltgen commented May 2, 2024

Ollama is a client server architecture. My suspicion is you're setting this flag in the client, not the server.

On windows, typically you set this as a system environment variable. See https://github.com/ollama/ollama/blob/main/docs/faq.md#setting-environment-variables-on-windows

That said, it shouldn't crash when running on the GPU. Can you share the server log for your crash scenario? https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md#how-to-troubleshoot-issues

@dhiltgen dhiltgen self-assigned this May 2, 2024
@ggjk616
Copy link
Author

ggjk616 commented May 3, 2024

Certainly, here is the translation of your issue into English:

In fact, the server crash was caused by my old GPU (Radeon520). I noticed that when I did not disable it, as soon as I used the command: ollama run models_name, it would cause the server to crash with an error: Error: Post "https://127.0.0.1:11434/api/chat": read tcp 127.0.0.1:56915->127.0.0.1:11434: wsarecv: An existing connection was forcibly closed by the remote host.
This is the log information when the problem occurs:

time=2024-05-01T10:06:51.721+02:00 level=INFO source=gpu.go:268 msg="Searching for GPU management library nvml.dll"
time=2024-05-01T10:06:51.734+02:00 level=INFO source=gpu.go:314 msg="Discovered GPU libraries:[]"
time=2024-05-01T10:06:51.734+02:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
Exception 0xc0000005 0x8 0x256f1f30780 0x256f1f30780
PC= 0x256f1f30780
signal arrived during external code execution

When I manually disable it (Radeon520), ollama can successfully load the model and run the model using the CPU. The relevant log information is as follows:

time=2024-05-01T10:07:59.188+02:00 level=INFO source=gpu.go:268 msg="Searching for GPU management library nvml.dll"
time=2024-05-01T10:07:59.198+02:00 level=INFO source=gpu.go:314 msg="Discovered GPU libraries:[]"
time=2024-05-01T10:07:59.198+02:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-05-01T10:07:59.235+02:00 level=INFO source=amd_windows.go:40 msg="AMD Driver:324007"
time=2024-05-01T10:07:59.235+02:00 level=INFO source=amd_hip_windows.go:97 msg="AMD ROCm reports no devices found"

So after reading the documentation, I thought by setting the environment variable OLLAMA_LLM_LIBRARY to force the use of the AVX2 library and not let it detect my GPU, I should be able to load the model smoothly, but it seems that it did not work. Could you please tell me where I went wrong?

@dhiltgen
Copy link
Collaborator

dhiltgen commented May 4, 2024

You didn't mention what version you were running. On versions before 0.1.33 we didn't handle spaces and quotes on the OLLAMA_LLM_LIBRARY variable properly, so it's possible you may have included quotes and that could explain it not working. Please give 0.1.33 a try and if it still isn't respecting your OLLAMA_LLM_LIBRARY setting, share you server.log so we can see more details. It may be helpful to set OLLAMA_DEBUg=1 as well

@ggjk616
Copy link
Author

ggjk616 commented May 5, 2024

I have updated to the latest version, here is the relevant log information,
serves.log.txt

@dhiltgen
Copy link
Collaborator

dhiltgen commented May 5, 2024

Thanks for the server log @ggjk616

It looks like we're crashing while trying to call an AMD Driver API to check the version via hipDriverGetVersion

Exception 0xc0000005 0x8 0x142a70baf60 0x142a70baf60
PC=0x142a70baf60
signal arrived during external code execution

runtime.cgocall(0x923c20, 0x20d66c0)
	runtime/cgocall.go:157 +0x3e fp=0xc00051d140 sp=0xc00051d108 pc=0x8b92fe
syscall.SyscallN(0x7fffe5b8dd00?, {0xc00019f710?, 0x1?, 0x7fffe5880000?})
	runtime/syscall_windows.go:544 +0x107 fp=0xc00051d1b8 sp=0xc00051d140 pc=0x91f147
github.com/ollama/ollama/gpu.(*HipLib).AMDDriverVersion(0xc0000feab0)
	github.com/ollama/ollama/gpu/amd_hip_windows.go:82 +0x69 fp=0xc00051d228 sp=0xc00051d1b8 pc=0xd4f489
github.com/ollama/ollama/gpu.AMDGetGPUInfo()

Can you share some more information about your system? Which version of Windows? Home/Pro? Is your AMD Driver up to date? Do other GPU apps work correctly on your GPU?

@dhiltgen dhiltgen added the amd Issues relating to AMD GPUs and ROCm label May 5, 2024
@dhiltgen dhiltgen changed the title Is there a problem with the document? Crash in hipDriverGetVersion on windows May 5, 2024
@ggjk616
Copy link
Author

ggjk616 commented May 6, 2024

After reading your reply, I checked my drivers and indeed the issue was caused by the drivers not being the latest version. After updating the drivers, I was able to load the model without any issues. Thank you very much for your help! However, I'm still a bit curious as to why setting the OLLAMA_LLM_LIBRARY environment variable didn't work. You can now close this issue, and once again, thank you for your assistance!

@dhiltgen
Copy link
Collaborator

dhiltgen commented May 6, 2024

The next release should have better parsing of quotes and spaces around our env vars.

@dhiltgen dhiltgen closed this as completed May 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
amd Issues relating to AMD GPUs and ROCm bug Something isn't working windows
Projects
None yet
Development

No branches or pull requests

2 participants