What is the difference between nvidia-smi exclusive mode and export cuda_visible_devices? - gpu

I read nvidia MPS docs but still can't understand the exact difference between
nvidia-smi -i 0 -c 3 and export CUDA_VISIBLE_DEVICES=0.
Can someone explain the difference?
Thank you.

Related

How to select a gpu with minimum gpu-memory of 20GB in qsub/PBS (for tensorflow2.0)?

In a node of our cluster we have gpus some of them are already in use by someone else. I am submitting a job using qsub that runs a jupyter-notebook using one gpu.
#!/bin/sh
#PBS -N jupyter_gpu
#PBS -q long
##PBS -j oe
#PBS -m bae
#PBS -l nodes=anodeX:ppn=16:gpus=3
jupyter-notebook --port=111 --ip=anodeX
However, I find that qsub blocks the gpu that is already in use (the available memory shown is pretty low), thus my code gets an error of low memory. If I aks for more gpus (say 3), the code runs fine only if the GPU:0 has sufficient memory. I am struggling to understand what is happening.
Is there a way to request gpu-memory in qsub?
Note that #PBS -l mem=20gb demands only the cpu memory. I am using tensorflow 2.9.1.

shuf generates "Bad file descriptor" error on nfs but only when run as a background process

Here is an interesting mystery ...
This code ...
shuf $TRAINING_UNSHUFFLED > $TRAINING_SHUFFLED
wc -l $TRAINING_UNSHUFFLED
wc -l $TRAINING_SHUFFLED
shuf $VALIDATION_UNSHUFFLED > $VALIDATION_SHUFFLED
wc -l $VALIDATION_UNSHUFFLED
wc -l $VALIDATION_SHUFFLED
generates this error ...
shuf: read error: Bad file descriptor
8122 /nfs/digits/datasets/com-aosvapps-distracted-driving3/databases/TrainImagePathsAndLabels_AlpineTest1.csv
0 /nfs/digits/datasets/com-aosvapps-distracted-driving3/databases/TrainImagePathsAndLabels_AlpineTest1_Shuffled.csv
shuf: read error: Bad file descriptor
882 /nfs/digits/datasets/com-aosvapps-distracted-driving3/databases/ValImagePathsAndLabels_AlpineTest1.csv
0 /nfs/digits/datasets/com-aosvapps-distracted-driving3/databases/ValImagePathsAndLabels_AlpineTest1_Shuffled.csv
but ONLY when I run it as a background job like so ...
tf2$nohup ./shuffle.sh >> /tmp/shuffle.log 2>&1 0>&- &
[1] 6897
When I run it directly in an interactive shell, it seems to work fine.
tf2$./shuffle.sh > /tmp/shuffle.log
I am guessing that this has something to do with the fact that both the input and output files reside on an nfs share on a different aws ec2 instance.
The severing of stdin, stderr and stdin in the background process example is suspicious. This is done so that the process will not die if the terminal session is closed. I have many other commands that read and write from this share without any problems at all. Only the shuf command is being difficult.
I am curious as to what might be causing this and if it is fixable without seeking an alternative to shuf?
I am using shuf (GNU coreutils) 8.21 on Ubuntu 14.04.5 LTS.
tf2$which shuf
/usr/bin/shuf
tf2$shuf --version
shuf (GNU coreutils) 8.21
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Written by Paul Eggert.
tf2$lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 14.04.5 LTS
Release: 14.04
Codename: trusty
UPDATE: eliminating the severing of STDIN makes the problem go away
ie. if instead of doing this ...
$nohup ./shuffle.sh > /tmp/shuffle.log 2>&1 0>&- &
I do this ...
$nohup ./shuffle.sh > /tmp/shuffle.log 2>&1 &
the "Bad descriptor" error goes away.
However, the severing of stdin/stdout/stderr is there to ensure that killing the terminal session will not kill the process, so this solution is not entirely satisfactory.
Furthermore, it only seems be be necessary to do this for shuf. None of other commands which read files from this file system cause any errors.
This turned out to be a bug in glibc.
The details are here:
https://debbugs.gnu.org/cgi/bugreport.cgi?bug=25029
The work-around is simple:
instead of
shuf $TRAINING_UNSHUFFLED > $TRAINING_SHUFFLED
do
shuf < $TRAINING_UNSHUFFLED > $TRAINING_SHUFFLED
Thanks to Pádraig Brady on the coreutils team.

Fail configure DPDK in OVS:DPDK support not built

I am installing DPDK in Open vSwitch (OVS).
https://github.com/openvswitch/ovs/blob/master/INSTALL.DPDK.md
The problem is when I ran 2 commands like this.
export DB_SOCK=/usr/local/var/run/openvswitch/db.sock
sudo ovs-vswitchd --dpdk -c 0x1 -n 4 -- unix:$DB_SOCK --pidfile --detach
I got error:
ovs-vswitchd: DPDK support not built into this copy of Open vSwitch.
Could anyone please explain to me how to fix this pronlem.
Thank in advance for your help!
You need to compile OVS against DPDK with DPDK support enabled (--with-dpdk):
export DPDK_BUILD=$DPDK_DIR/x86_64-native-linuxapp-gcc/
./configure --with-dpdk=$DPDK_BUILD
make
make install

Tesla GPU Usage

In my machine three GPUs are connected.ie, Tesla M2090. I want to get the usage of that GPUs. There is a tool called NVIDIA SMI which show shows the GPU usage. But when i tried to run the Option nvidia-smi.exe -d (I want to know memory and GPU Utilization). Please help
Driver version : 275.65
OS: Windows Server 2008 R2
Yes I got it by following
https://developer.nvidia.com/sites/default/files/akamai/cuda/files/CUDADownloads/NVML_cuda5/nvidia-smi.4.304.pdf
I give the command as follows
nvidia-smi.exe -q -d MEMORY,UTILIZATION

Compiling ssh using intel compiler

Do you think it's possible to compile ssh using the Intel compiler? I don't really know where to start and there's not much info on google, so I thought I'd ask the community.
I really want to take advantage of the compression performance improvements. My idea is to set up an unencrypted ssh tunnel (but with maximum compression) as follows:
ssh -N -g -f -C -o CompressionLevel=9 -o Cipher=none eamorr#172.16.1.218 -L 6999:172.16.1.218:3129
Any advice greatly appreciated,
Build instructions for OpenSSH can be found here: http://unixwiz.net/techtips/openssh.html.
When you do the ./configure steps you'll want to do something like ./configure CC=icc CXX=icpc in order to use the ICC compiler rather than gcc.
If you've done it right then when you subsequently do a make you should see during the build that the compile lines will start with icc ... or icpc ... rather than gcc ... or g++ ....