GPU memory overflow in ROCm environment #131

labeldock · 2024-04-04T05:29:56Z

I'm looking for an option to release GPU memory after Whisper tasks. Sometimes my PC shuts down due to overflowing GPU memory.
Running on Debian(Bookworm) + Docker(rocm/pytorch:rocm6.0.2_ubuntu22.04_py3.10_pytorch_2.1.2) + RX6800

I am aware that there is no official support for Linux or Docker environments.
However, I believe that supporting these options would definitely have a positive impact in the future.

Free GPU memory immediately after GENERATE SUBTITLE is finished
Free GPU memory if it is not used for a certain period after GENERATE SUBTITLE is finished

I have no experience with Python, pytorch, etc., so my ability to interpret the project is limited.
If could determine whether the feature implementation is possible, I will try to contribute in any way I can.

The text was updated successfully, but these errors were encountered:

jhj0517 · 2024-04-07T18:31:43Z

Hi! We've attempted to address this in #15.
Right now, we're calling torch.cuda.empty_cache() after each transcription.

If anyone has idea or PR for a better solution, it would be very appreciated!

labeldock · 2024-04-09T06:58:57Z

@jhj0517
Thank you for your response. Unfortunately, in my environment, the memory is not observed to be fully released. I have a follow-up question to understand whether this is an issue occurring in Docker or ROCm environments.

Upon checking the document at https://pytorch.org/docs/stable/notes/cuda.html#memory-management, there is a section stating, empty_cache() "occupied GPU memory by tensors will not be freed." How large is the size of GPU memory occupied by tensors? Is this phenomenon also present in Nvidia environments?

Below are the details of my testing:

This is the baseline state. The Graphics pipe shown in the screenshot represents the energy in use / VRAM as GPU memory. Other applications are running, occupying 4607M of graphics memory.

This is during the execution of large-v3.1 11569M, 5466M was observed.

Some time after the execution of large-v3 ended, 11116M was observed.

This is during the execution with the medium model. It shows 8238M of memory in use.

This is just after the execution of the medium model has ended. 7838M was observed.

The whisper-webui process has been terminated. The memory has returned to its initial state.

jhj0517 · 2024-04-09T10:42:26Z

Thanks for sharing your experience!
According to here, "occupied GPU memory by tensors will not be freed" is normal behavior because it only frees GPU memory cache that can be freed.

Here's someone's experience running this web UI on an AMD GPU:

AMD GPU (GFX803) RX 580 #85

According to this, faster-whisper does not work with ROCm.
So if you encounter any error while running this web UI, disabling faster-whisper could be helpful:

python app.py --disable_faster_whisper

labeldock added the enhancement New feature or request label Apr 4, 2024

labeldock changed the title ~~GPU memory overflow~~ GPU memory overflow in ROCm environment Apr 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU memory overflow in ROCm environment #131

GPU memory overflow in ROCm environment #131

labeldock commented Apr 4, 2024

jhj0517 commented Apr 7, 2024 •

edited

labeldock commented Apr 9, 2024 •

edited

jhj0517 commented Apr 9, 2024

GPU memory overflow in ROCm environment #131

GPU memory overflow in ROCm environment #131

Comments

labeldock commented Apr 4, 2024

jhj0517 commented Apr 7, 2024 • edited

labeldock commented Apr 9, 2024 • edited

Below are the details of my testing:

jhj0517 commented Apr 9, 2024

jhj0517 commented Apr 7, 2024 •

edited

labeldock commented Apr 9, 2024 •

edited