Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stupid question, any way to avoid running out of memory? #22

Open
jmugan opened this issue May 12, 2024 · 2 comments
Open

stupid question, any way to avoid running out of memory? #22

jmugan opened this issue May 12, 2024 · 2 comments

Comments

@jmugan
Copy link

jmugan commented May 12, 2024

If it put an input of 17,000 tokens into model.generate(x, temperature) I get

libc++abi: terminating due to uncaught exception of type std::runtime_error: Attempting to allocate 19081554496 bytes which is greater than the maximum allowed buffer size of 17179869184 bytes.

I guess it is trying to use the mac GPU? Or if regular memory, it can't swap? I can run this Llama 3 8b instruct with regular Transformers, it is just really slow.

There's no flag for use_swap=True or anything like that, right?

@riccardomusmeci
Copy link
Owner

As far as I remember, Apple Silicon Macs have the memory shared between CPU and GPU, and that's why you can do a lot of stuff also with 8GB Macs.

I don't know about memory swap on MLX, but I guess, given the architecture of Apple Silicon, that it is not the case.

What Mac are you using? RAM is very important

@jmugan
Copy link
Author

jmugan commented May 12, 2024

I've got an M1 with 32G of RAM, but I guess the context length is so long that even that isn't enough, at least when using MLX.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants