stupid question, any way to avoid running out of memory? #22

jmugan · 2024-05-12T03:03:51Z

If it put an input of 17,000 tokens into model.generate(x, temperature) I get

libc++abi: terminating due to uncaught exception of type std::runtime_error: Attempting to allocate 19081554496 bytes which is greater than the maximum allowed buffer size of 17179869184 bytes.

I guess it is trying to use the mac GPU? Or if regular memory, it can't swap? I can run this Llama 3 8b instruct with regular Transformers, it is just really slow.

There's no flag for use_swap=True or anything like that, right?

The text was updated successfully, but these errors were encountered:

riccardomusmeci · 2024-05-12T05:46:13Z

As far as I remember, Apple Silicon Macs have the memory shared between CPU and GPU, and that's why you can do a lot of stuff also with 8GB Macs.

I don't know about memory swap on MLX, but I guess, given the architecture of Apple Silicon, that it is not the case.

What Mac are you using? RAM is very important

jmugan · 2024-05-12T17:47:36Z

I've got an M1 with 32G of RAM, but I guess the context length is so long that even that isn't enough, at least when using MLX.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

stupid question, any way to avoid running out of memory? #22

stupid question, any way to avoid running out of memory? #22

jmugan commented May 12, 2024

riccardomusmeci commented May 12, 2024

jmugan commented May 12, 2024

stupid question, any way to avoid running out of memory? #22

stupid question, any way to avoid running out of memory? #22

Comments

jmugan commented May 12, 2024

riccardomusmeci commented May 12, 2024

jmugan commented May 12, 2024