Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

possible to do one that can fit into 7GB vram? #141

Open
sprappcom opened this issue Feb 4, 2024 · 2 comments
Open

possible to do one that can fit into 7GB vram? #141

sprappcom opened this issue Feb 4, 2024 · 2 comments
Labels
question Further information is requested

Comments

@sprappcom
Copy link

7b is > 12gb ram use, can u do one that is maybe 3b parameters or have one 7b whose quantisation is 4_0 gguf or something?

@sprappcom sprappcom added the question Further information is requested label Feb 4, 2024
@hodlen
Copy link
Collaborator

hodlen commented Feb 5, 2024

To better assist you, could you please clarify the context? For example, what's the hardware spec and the model you want to use?

In general, PowerInfer is designed to automatically offload model weights to VRAM to utilize GPU as possible. If you're looking to further restrict VRAM usage, you might consider using the --vram-budget parameter to specify your VRAM limitations. You can refer to our inference README for some examples.

@sprappcom
Copy link
Author

it's not obvious on the "speed up", the quality generated is not ideal at this stage.

maybe i'll wait for mistral 7b. hope to see this mainstream.

p.s. : i'm using 4060 laptop with 7gb vram for testing. it has 8gb but 1gb seemed reserved for display use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants