You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
I'm facing a problem with nvtop + Slurm interactive session. I get an interactive session in a machine with two GPUs. Slurm controls access to them, so in this particular case I'm requesting just one of them. I verify that I can use the GPU for computation, and the tool nvidia-smi detects this GPU (it shows only one, because that is what Slurm is giving me access to), but as you can see below, nvtop says that there is no GPU to monitor. I have no idea what could be going on in here. Any ideas of how to debug this issue and/or things I could try?
Thanks,
$ nvidia-smi
Tue Jan 9 07:20:01 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.23.06 Driver Version: 545.23.06 CUDA Version: 12.3 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 Tesla P100-PCIE-12GB On | 00000000:02:00.0 Off | 0 |
| N/A 39C P0 25W / 250W | 0MiB / 12288MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+
$ nvtop
No GPU to monitor.
The text was updated successfully, but these errors were encountered:
Hello,
I don't know how Slurm allocates the GPUs, could you check if the library libnvidia-ml.so is available?
That's the library used to get the GPU information, nvidia-smi directly queries the driver or is statically linked to this library and hence will work without it.
Hi,
I'm facing a problem with nvtop + Slurm interactive session. I get an interactive session in a machine with two GPUs. Slurm controls access to them, so in this particular case I'm requesting just one of them. I verify that I can use the GPU for computation, and the tool nvidia-smi detects this GPU (it shows only one, because that is what Slurm is giving me access to), but as you can see below, nvtop says that there is no GPU to monitor. I have no idea what could be going on in here. Any ideas of how to debug this issue and/or things I could try?
Thanks,
The text was updated successfully, but these errors were encountered: