You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
To better assist you, could you please clarify the context? For example, what's the hardware spec and the model you want to use?
In general, PowerInfer is designed to automatically offload model weights to VRAM to utilize GPU as possible. If you're looking to further restrict VRAM usage, you might consider using the --vram-budget parameter to specify your VRAM limitations. You can refer to our inference README for some examples.
7b is > 12gb ram use, can u do one that is maybe 3b parameters or have one 7b whose quantisation is 4_0 gguf or something?
The text was updated successfully, but these errors were encountered: