You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If it put an input of 17,000 tokens into model.generate(x, temperature) I get
libc++abi: terminating due to uncaught exception of type std::runtime_error: Attempting to allocate 19081554496 bytes which is greater than the maximum allowed buffer size of 17179869184 bytes.
I guess it is trying to use the mac GPU? Or if regular memory, it can't swap? I can run this Llama 3 8b instruct with regular Transformers, it is just really slow.
There's no flag for use_swap=True or anything like that, right?
The text was updated successfully, but these errors were encountered:
If it put an input of 17,000 tokens into
model.generate(x, temperature)
I getlibc++abi: terminating due to uncaught exception of type std::runtime_error: Attempting to allocate 19081554496 bytes which is greater than the maximum allowed buffer size of 17179869184 bytes.
I guess it is trying to use the mac GPU? Or if regular memory, it can't swap? I can run this Llama 3 8b instruct with regular Transformers, it is just really slow.
There's no flag for
use_swap=True
or anything like that, right?The text was updated successfully, but these errors were encountered: