35b-beta-long (command-r finetune) iquants will not offload into gpu #842

lemon07r · 2024-05-12T03:34:58Z

Model in question: https://huggingface.co/bartowski/35b-beta-long-GGUF
The regular q_K quants from the same repo (tested with q4k_M) offload just fine into gpu. Tested with a 6900 xt using vulkan. None of the iq4 and iq3 quants would load for me with gpu offloading, but work just fine in cpu only inference (clblast).

EDIT - Here's the last message I see on screen before it crashes:

GGML_ASSERT: ggml-vulkan.cpp:2940: !qx_needs_dequant || to_fp16_vk_0 != nullptr

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

35b-beta-long (command-r finetune) iquants will not offload into gpu #842

35b-beta-long (command-r finetune) iquants will not offload into gpu #842

lemon07r commented May 12, 2024 •

edited

35b-beta-long (command-r finetune) iquants will not offload into gpu #842

35b-beta-long (command-r finetune) iquants will not offload into gpu #842

Comments

lemon07r commented May 12, 2024 • edited

lemon07r commented May 12, 2024 •

edited