I know with nvidia-smi an overview is generated like:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.66 Driver Version: 375.66 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Quadro P4000 Off | 0000:01:00.0 Off | N/A |
| N/A 43C P0 26W / N/A | 227MiB / 8114MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1724 G /usr/bin/X 219MiB |
| 0 8074 G qtcreator 6MiB |
+-----------------------------------------------------------------------------+
However, for the parameters I'd like to break it down for each process (e.g. GPU usage, used memory). I can't find a respective query, but then again I can't imagine that such a basic function is not implemented. Hence
Is there an easy way to display the GPU parameters for each process?
I don't think it gets any closer to nvidia-smi pmon:
# gpu pid type sm mem enc dec fb command
# Idx # C/G % % % % MB name
0 1750 G 1 0 0 0 179 X
0 3734 G 0 0 0 0 7 qtcreator
Related
I'm connected to Google Colab through SSH (using this method). I get the following error when trying to use the GPU.
python lstm_example.py
Num GPUs Available: 1
(25000,)
(25000,)
2022-03-21 12:43:53.301917: W tensorflow/stream_executor/cuda/cuda_driver.cc:374] A non-primary context 0x559ed434d210 for device 0 exists before initializing the StreamExecutor. The primary context is now 0. We haven't verified StreamExecutor works with that.
2022-03-21 12:43:53.302331: F tensorflow/core/platform/statusor.cc:33] Attempting to fetch value instead of handling error INTERNAL: failed initializing StreamExecutor for CUDA device ordinal 0: INTERNAL: failed call to cuDevicePrimaryCtxRetain: CUDA_ERROR_UNKNOWN: unknown error
Aborted (core dumped)
GPU info
nvidia-smi
Mon Mar 21 13:00:24 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.54 Driver Version: 460.32.03 CUDA Version: 11.6 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 00000000:00:04.0 Off | 0 |
| N/A 50C P0 59W / 149W | Function Not Found | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
/usr/local/cuda/bin/nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Mon_Oct_12_20:09:46_PDT_2020
Cuda compilation tools, release 11.1, V11.1.105
Build cuda_11.1.TC455_06.29190527_0
I've added the following lines
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"]="0"
The same code works when run in a notebook cell. I also notice that Memory_Usage is available when running nvidia-smi from a notebook and the CUDA version used is different (11.2).
Tue Mar 22 10:52:20 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03 Driver Version: 460.32.03 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 00000000:00:04.0 Off | 0 |
| N/A 43C P8 31W / 149W | 3MiB / 11441MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-
I am using this code (please excuse its messiness) to run on my CPU. I have a custom RL environment that I have created myself and I am using DQN agent.
But when I run this code on GPU, it doesn't utilize much of it and in fact it is slower than my CPU.
This is the output of nvidia-smi. As you can see my processes are running on GPU but the speed is much slower than I would expect.
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.82 Driver Version: 440.82 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 TITAN Xp Off | 00000000:00:05.0 Off | N/A |
| 23% 37C P2 60W / 250W | 11619MiB / 12196MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 TITAN Xp Off | 00000000:00:06.0 Off | N/A |
| 23% 29C P8 9W / 250W | 157MiB / 12196MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 25540 C python3 11609MiB |
| 1 25540 C python3 147MiB |
+-----------------------------------------------------------------------------+
Can anyone point out what can I do to change my code for GPU capabilities?
PS: Notice that I have two GPUs and my process is running on both of them. Even if I use any one of two GPUs, the issue is that my GPU is not utilized and the speed is comparatively slower than GPU so two GPUs is not the issue
I am currently running a few Tensorflow training jobs with gpu and am trying to export models from one such job. I have set
os.environ['CUDA_VISIBLE_DEVICES'] = '-1'
both in code and in terminal. Also I have removed all mentions of gpu devices in the training code, as well as moved graph.pbtxt away. I used inspect_checkpoint.py to see that the model checkpoint keys contain no mention of gpu either. I have also set
session_config = tf.ConfigProto(
device_count={'GPU': 0 if export else config.num_gpus},
allow_soft_placement=True,
gpu_options=None if export else tf.GPUOptions(allow_growth=True))
Still I am getting the following error message towards the end of export:
2018-09-15 03:20:30.597742: E
tensorflow/core/common_runtime/direct_session.cc:158] Internal: failed
initializing StreamExecutor for CUDA device ordinal 0: Internal: failed
call to cuDevicePrimaryCtxRetain: CUDA_ERROR_OUT_OF_MEMORY; total memory
reported: 16936861696
nvidia-smi | head -20
Sat Sep 15 03:25:28 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.130 Driver Version: 384.130 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla V100-PCIE... Off | 00000000:02:00.0 Off | 0 |
| N/A 35C P0 50W / 250W | 15800MiB / 16152MiB | 89% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla V100-PCIE... Off | 00000000:04:00.0 Off | 0 |
| N/A 35C P0 37W / 250W | 0MiB / 16152MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla V100-PCIE... Off | 00000000:83:00.0 Off | 0 |
| N/A 36C P0 37W / 250W | 0MiB / 16152MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla V100-PCIE... Off | 00000000:84:00.0 Off | 0 |
| N/A 38C P0 39W / 250W | 0MiB / 16152MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
I have access to a cluster that's run by Slurm, in which each node has 4 GPUs.
I have a code that needs 8 gpus.
So the question is how can I request 8 gpus on a cluster that each node has only 4 gpus?
So this is the job that I tried to submit via sbatch:
#!/bin/bash
#SBATCH --gres=gpu:8
#SBATCH --nodes=2
#SBATCH --mem=16000M
#SBATCH --time=0-01:00
But then I get the following error:
sbatch: error: Batch job submission failed: Requested node configuration is not available
Then I changed my the settings to this and submitted again:
#!/bin/bash
#SBATCH --gres=gpu:4
#SBATCH --nodes=2
#SBATCH --mem=16000M
#SBATCH --time=0-01:00
nvidia-smi
and the result shows only 4 gpus not 8.
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.66 Driver Version: 375.66 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla P100-PCIE... Off | 0000:03:00.0 Off | 0 |
| N/A 32C P0 31W / 250W | 0MiB / 12193MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla P100-PCIE... Off | 0000:04:00.0 Off | 0 |
| N/A 37C P0 29W / 250W | 0MiB / 12193MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla P100-PCIE... Off | 0000:82:00.0 Off | 0 |
| N/A 35C P0 28W / 250W | 0MiB / 12193MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla P100-PCIE... Off | 0000:83:00.0 Off | 0 |
| N/A 33C P0 26W / 250W | 0MiB / 12193MiB | 4% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Thanks.
Slurm does not support what you need. It only can assign to your job GPUs/node, not GPUs/cluster.
So, unlike CPUs or other consumable resources, GPUs are not consumable and are binded to the node where they are hosted.
If you are interested in this topic, there is a research effort to turn GPUs into consumable resources, check this paper.
There you'll find how to do it using GPU virtualization technologies.
Job script: You are requesting 2 nodes with each of them 4 GPUs. Tolal 8 GPUs are assigned to you. You are running "nvidia-smi". nvidia-smi does not aware of SLURM nor MPI. It runs only on first node assigned to you. So it shows only 4 GPUs, result is normal.
If you run GPU based engineering application like Ansys HFSS or CST, They can use all 8 GPUs.
I just built TensorFlow v1.0 and I am trying to run MNIST test just to see if it's working. Seems like it is, but i am observing weird behaiviour.
My system has two Tesla P100, and nvidia-smi shows the following:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 361.107 Driver Version: 361.107 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla P100-SXM2... Off | 0002:01:00.0 Off | 0 |
| N/A 34C P0 114W / 300W | 15063MiB / 16280MiB | 51% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla P100-SXM2... Off | 0006:01:00.0 Off | 0 |
| N/A 27C P0 35W / 300W | 14941MiB / 16280MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 67288 C python3 15061MiB |
| 1 67288 C python3 14939MiB |
+-----------------------------------------------------------------------------+
As it shown, python3 ate all the memory on both GPUs, but computational load are placed only on first.
Exporting CUDA_VISIBLE_DEVICES I can limit GPU to be used, but it's not affect computational time. So no gain from adding second GPU. Single GPU
real 2m23.496s
user 4m26.597s
sys 0m12.587s
Two GPUs:
real 2m18.165s
user 4m18.625s
sys 0m12.958s
So the question is, how to load both GPUs?