Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about CUDA failed with error out of memory error #809

Open
Jorman opened this issue Apr 25, 2024 · 2 comments
Open

Question about CUDA failed with error out of memory error #809

Jorman opened this issue Apr 25, 2024 · 2 comments

Comments

@Jorman
Copy link

Jorman commented Apr 25, 2024

Hi, I have recently changed the server and with this I have the possibility of an intel i5-10th iGPU and a nvidia GTX 1650
I am installing openai-whisper with the faster_whisper module, this is the docker-compose

services:
  whisperasr:
    image: onerahmet/openai-whisper-asr-webservice:latest-gpu
    container_name: Openai-Whisper
    environment:
      - ASR_MODEL=large
      - ASR_ENGINE=faster_whisper
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
    ports:
      - 9000:9000
    restart: unless-stopped

I was hoping to be able to use the large model but I think it is too big, in fact I get the error CUDA failed with error out of memory, it seems like it wants to dump it all into the video card ram, maybe what I am asking is impossible, but I don't know how the whole system works and I don't see mounted volumes, so I ask hoping the question is not too stupid, can't you download the model locally and use it without the need for it all to be loaded into memory? Or is there a way to share the system ram with the video card ram when needed?

J

@Purfview
Copy link
Contributor

Purfview commented Apr 26, 2024

Use smaller model or use device="cpu".

can't you download the model locally and use it without the need for it all to be loaded into memory?

You can't.

@Jorman
Copy link
Author

Jorman commented Apr 26, 2024

Thank you @Purfview for your reply.
For cpu ok, but what is the difference if I run this docker-compose?

services:
  whisperasr:
    image: onerahmet/openai-whisper-asr-webservice:latest
    environment:
      - ASR_MODEL=small
      - ASR_ENGINE=faster_whisper
    ports:
      - 9000:9000
    restart: unless-stopped

Is the same or is better to remain with this and specify the cpu mode?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants