Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Peak gpu memory use not scale linearly with the percentage of gpu usage of weight #108

Open
frankxyy opened this issue Apr 17, 2023 · 0 comments

Comments

@frankxyy
Copy link

frankxyy commented Apr 17, 2023

command 1:
python -m flexgen.flex_opt --model facebook/opt-30b --path _DUMMY_ --prompt-len 20 --gen-len 15 --percent 25 75 60 40 0 100 --gpu-batch-size 1 --num-gpu-batches 2 --cpu-cache-compute --debug fewer_batch
peak gpu mem: 6.0679 GB

command 2:
python -m flexgen.flex_opt --model facebook/opt-30b --path _DUMMY_ --prompt-len 20 --gen-len 15 --percent 30 70 60 40 0 100 --gpu-batch-size 1 --num-gpu-batches 2 --cpu-cache-compute --debug fewer_batch
gpu oom

The only difference of command 2 from command 1 is the percentage of gpu usage of weight to increase from 25% to 30%.

The capacity of my gpu is 24 GB.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant