Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU memory overflow in ROCm environment #131

Open
labeldock opened this issue Apr 4, 2024 · 3 comments
Open

GPU memory overflow in ROCm environment #131

labeldock opened this issue Apr 4, 2024 · 3 comments
Labels
enhancement New feature or request

Comments

@labeldock
Copy link

I'm looking for an option to release GPU memory after Whisper tasks. Sometimes my PC shuts down due to overflowing GPU memory.
Running on Debian(Bookworm) + Docker(rocm/pytorch:rocm6.0.2_ubuntu22.04_py3.10_pytorch_2.1.2) + RX6800

I am aware that there is no official support for Linux or Docker environments.
However, I believe that supporting these options would definitely have a positive impact in the future.

  • Free GPU memory immediately after GENERATE SUBTITLE is finished
  • Free GPU memory if it is not used for a certain period after GENERATE SUBTITLE is finished

I have no experience with Python, pytorch, etc., so my ability to interpret the project is limited.
If could determine whether the feature implementation is possible, I will try to contribute in any way I can.

@labeldock labeldock added the enhancement New feature or request label Apr 4, 2024
@jhj0517
Copy link
Owner

jhj0517 commented Apr 7, 2024

Hi! We've attempted to address this in #15.
Right now, we're calling torch.cuda.empty_cache() after each transcription.

If anyone has idea or PR for a better solution, it would be very appreciated!

@labeldock
Copy link
Author

labeldock commented Apr 9, 2024

@jhj0517
Thank you for your response. Unfortunately, in my environment, the memory is not observed to be fully released. I have a follow-up question to understand whether this is an issue occurring in Docker or ROCm environments.

Upon checking the document at https://pytorch.org/docs/stable/notes/cuda.html#memory-management, there is a section stating, empty_cache() "occupied GPU memory by tensors will not be freed." How large is the size of GPU memory occupied by tensors? Is this phenomenon also present in Nvidia environments?

Below are the details of my testing:

스크린샷 2024-04-09 151629

This is the baseline state. The Graphics pipe shown in the screenshot represents the energy in use / VRAM as GPU memory. Other applications are running, occupying 4607M of graphics memory.

스크린샷 2024-04-09 151744

스크린샷 2024-04-09 152013

This is during the execution of large-v3.1 11569M, 5466M was observed.

스크린샷 2024-04-09 151947

Some time after the execution of large-v3 ended, 11116M was observed.

스크린샷 2024-04-09 152031

This is during the execution with the medium model. It shows 8238M of memory in use.

스크린샷 2024-04-09 152122

This is just after the execution of the medium model has ended. 7838M was observed.

스크린샷 2024-04-09 153126

The whisper-webui process has been terminated. The memory has returned to its initial state.

@labeldock labeldock changed the title GPU memory overflow GPU memory overflow in ROCm environment Apr 9, 2024
@jhj0517
Copy link
Owner

jhj0517 commented Apr 9, 2024

Thanks for sharing your experience!
According to here, "occupied GPU memory by tensors will not be freed" is normal behavior because it only frees GPU memory cache that can be freed.

Here's someone's experience running this web UI on an AMD GPU:

According to this, faster-whisper does not work with ROCm.
So if you encounter any error while running this web UI, disabling faster-whisper could be helpful:

python app.py --disable_faster_whisper

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants