New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CudaMalloc failed: out of memory with TinyLlama-1.1B #372
Comments
Try smaller version of TinyLlama, Q8 instead of F32: TinyLlama-1.1B-Chat-v1.0.Q8_0.llamafile |
Can you try llamafile-0.8.1 which was just released and tell me if it works? |
Meaculpa, above, I make working a model with a lower quantization formats. So I downloaded many models. I reboot my machine and make test again. And the model what was working for me this morning (Model/TinyLlama-1.1B-Chat-v1.0.F16.llamafile), now is everytime in SIGSEGV. The SIGSEGV issue has been report there #378 |
I am trying to make working with GPU Tinyllama with:
But it seem not possible to allocate 66.50 MB of memory on my card, even if I just boot the machine without any use of the GPU before.
Here the error:
I have the cuda in this version:
Here the spec of my machine.
System: Kernel: 6.6.26-1-MANJARO arch: x86_64 bits: 64 compiler: gcc v: 13.2.1 Desktop: GNOME v: 45.4 tk: GTK v: 3.24.41 Distro: Manjaro base: Arch Linux Machine: Type: Laptop System: HP product: HP Pavilion Gaming Laptop 15-cx0xxx Memory: System RAM: total: 32 GiB available: 31.24 GiB used: 4.16 GiB (13.3%) CPU: Info: model: Intel Core i7-8750H bits: 64 type: MT MCP arch: Coffee Lake gen: core 8 level: v3 note: Graphics: Device-2: NVIDIA GP107M [GeForce GTX 1050 Ti Mobile] vendor: Hewlett-Packard driver: nvidia v: 550.67 alternate: nouveau,nvidia_drm non-free: 545.xx+ status: current (as of 2024-04; EOL~2026-12-xx) arch: Pascal code: GP10x process: TSMC 16nm built: 2016-2021 pcie: gen: 1 speed: 2.5 GT/s lanes: 16 link-max: gen: 3 speed: 8 GT/s bus-ID: 01:00.0 chip-ID: 10de:1c8c class-ID: 0300
Is there a way to solve that?
The text was updated successfully, but these errors were encountered: