How to select a gpu with minimum gpu-memory of 20GB in qsub/PBS (for tensorflow2.0)?

How to select a gpu with minimum gpu-memory of 20GB in qsub/PBS (for tensorflow2.0)? - tensorflow

In a node of our cluster we have gpus some of them are already in use by someone else. I am submitting a job using qsub that runs a jupyter-notebook using one gpu.
#!/bin/sh
#PBS -N jupyter_gpu
#PBS -q long
##PBS -j oe
#PBS -m bae
#PBS -l nodes=anodeX:ppn=16:gpus=3
jupyter-notebook --port=111 --ip=anodeX
However, I find that qsub blocks the gpu that is already in use (the available memory shown is pretty low), thus my code gets an error of low memory. If I aks for more gpus (say 3), the code runs fine only if the GPU:0 has sufficient memory. I am struggling to understand what is happening.
Is there a way to request gpu-memory in qsub?
Note that #PBS -l mem=20gb demands only the cpu memory. I am using tensorflow 2.9.1.

Related

Inconsistent performance of GPU subclusters

I'm running my MATLAB code on subclusters provided by my school. One subcluster named 'G' uses Nvidia A100 GPU card and has 12 nodes (G[000-011]) and 128 cores/node.
Whenever I run my code on G[005] and G[006], my code finishes running in just 2 hours. However, strangely, when I run it on any other nodes (i.e.G[000-004, 007-011]), the computation becomes extremely slow (> 4 hours). Since all the nodes should be using the same hardware, I have no idea what is causing this difference.
Does anyone have an idea what is going on? Below is my SLURM job submission file.
Note that I already consulted with a support center at my school, but they also have no idea about this problem yet, so I thought I could get some help here...
#!/bin/sh -l
#SBATCH -A standby
#SBATCH -N 1
#SBATCH -G 1
#SBATCH -n 12
#SBATCH -t 4:00:00
#SBATCH --constraint="C|G|I|J"
#SBATCH --output=slurm-%j-%N.out
/usr/bin/sacct -j "$SLURM_JOBID" --batch-script
/usr/bin/sacct -j "$SLURM_JOBID" --format=NodeList,JobID
echo "------------------------"
cd ..
module load matlab/R2022a
matlab -batch "myfuncion(0,0,0)"

Stuck at training model with CPU

As the example points out:
docker run -it -p 8500:8500 --gpus all tensorflow/serving:latest-devel
should train the mnist mode, however I want to use intel cpu for training, not gpu. But no luck, it stucked at Training model...
Here is the command I used:
docker run -it -p 8500:8500 tensorflow/serving:latest-devel

I found out that it will download resources at first, which a proxy is needed sometimes.

SLURM overcommiting GPU

How can one run multiple jobs in parallel on one GPU? One option which works is to run a script that spawn child processes. But is there also a way to do it with SLURM itself? I tried
#!/usr/bin/env bash
#SBATCH --nodes=1
#SBATCH --gres=gpu:1
#SBATCH --overcommit
srun python script1.py &
srun python script2.py &
wait
But that still runs them sequentially.
EDIT: We still want to allocate recourses exlusively, i.e. one SBATCH job should allocate a whole GPU for itself. The question is whether there is an easy way to start multiple scripts within the SBATCH in parallel, without having to setup a multiprocessing environment.

Can ARM qemu system emulator boot from card image without kernel param?

I've seen a lot of examples how to run a QEMU ARM board emulator. In every case, besides SD card image param, QEMU was also always supplied with kernel param, i.e.:
qemu-system-arm -M versatilepb \
-kernel vmlinuz-2.6.18-6-versatile \ #KERNEL PARAM HERE
-initrd initrd.gz \
-hda hda.img -append "root=/dev/ram"
I am palying with bootloaders and want to create my own bootable SD card, but don't have a real board yet and want to learn with an emulated one. However, if run as described above, QEMU skips bootloader stage and runs kernel.
So what should I do to emulate a full boot sequence on QEMU so that it executes bootloader? Should I get a ROM dump and pass it as a -bios param?

You can do that by feeding the uboot image. I never used ROM dump.
QEMU BOOT SEQUENCE:
On real, physical boards the boot process usually involves a non-volatile memory (e.g. a Flash) containing a boot-loader and the operating system. On power on, the core loads and runs the boot-loader, that in turn loads and runs the operating system.
QEMU has the possibility to emulate Flash memory on many platforms, but not on the VersatilePB. There are patches ad procedures available that can add flash support, but for now I wanted to leave QEMU as it is.
QEMU can load a Linux kernel using the -kernel and -initrd options; at a low level, these options have the effect of loading two binary files into the emulated memory: the kernel binary at address 0x10000 (64KiB) and the ramdisk binary at address 0x800000 (8MiB).
Then QEMU prepares the kernel arguments and jumps at 0x10000 (64KiB) to execute Linux. I wanted to recreate this same situation using U-Boot, and to keep the situation similar to a real one I wanted to create a single binary image containing the whole system, just like having a Flash on board. The -kernel option in QEMU will be used to load the Flash binary into the emulated memory, and this means the starting address of the binary image will be 0x10000 (64KiB).
This example is based of ARM versatilepb board
make CROSS_COMPILE=arm-none-eabi- versatilepb_config
make CROSS_COMPILE=arm-none-eabi- all
Creating the Flash image
* download u-boot-xxx.x source tree and extract it
* cd into the source tree directory and build it
mkimage -A arm -C none -O linux -T kernel -d zImage -a 0x00010000 -e 0x00010000 zImage.uimg
mkimage -A arm -C none -O linux -T ramdisk -d rootfs.img.gz -a 0x00800000 -e 0x00800000 rootfs.uimg
dd if=/dev/zero of=flash.bin bs=1 count=6M
dd if=u-boot.bin of=flash.bin conv=notrunc bs=1
dd if=zImage.uimg of=flash.bin conv=notrunc bs=1 seek=2M
dd if=rootfs.uimg of=flash.bin conv=notrunc bs=1 seek=4M
Booting Linux
To boot Linux we can finally call:
qemu-system-arm -M versatilepb -m 128M -kernel flash.bin -serial stdio

You will need to pass it some kind of bootloader image via -bios (or a pflash option), yes. I doubt that a ROM dump would work though -- typically the ROM will assume much closer fidelity to the real hardware than QEMU provides. You'd want a bootloader written and tested to work with QEMU. One example of that is if you use the 'virt' board and a UEFI image which is built for QEMU.
Otherwise QEMU will use its "built in bootloader" which is a handful of instructions that are capable of booting the kernel you pass it with -kernel.

How can a specific application be monitored by perf inside the kvm?

I have an application which I want to monitor it via perf stat when running inside a kvm VM.
After Googling I have found that perf kvm stat can do this. However there is an error by running the command:
sudo perf kvm stat record -p appPID
which results in help representation ...
usage: perf kvm stat record [<options>]
-p, --pid <pid> record events on existing process id
-t, --tid <tid> record events on existing thread id
-r, --realtime <n> collect data with this RT SCHED_FIFO priority
--no-buffering collect data without buffering
-a, --all-cpus system-wide collection from all CPUs
-C, --cpu <cpu> list of cpus to monitor
-c, --count <n> event period to sample
-o, --output <file> output file name
-i, --no-inherit child tasks do not inherit counters
-m, --mmap-pages <pages[,pages]>
number of mmap data pages and AUX area tracing mmap pages
-v, --verbose be more verbose (show counter open errors, etc)
-q, --quiet don't print any message
Does any one know what is the problem?

Use kvm with vPMU (virtualization of PMU counters) - https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Virtualization_Tuning_and_Optimization_Guide/sect-Virtualization_Tuning_Optimization_Guide-Monitoring_Tools-vPMU.html "2.2. VIRTUAL PERFORMANCE MONITORING UNIT (VPMU)"). Then run perf record -p $pid and perf stat -p $pid inside the guest.
Host system has no knowledge (tables) of guest processes (they are managed by guest kernel, which can be non Linux, or different version of linux with incompatible table format), so host kernel can't profile some specific guest process. It only can profile whole guest (and there is perf kvm command - https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Virtualization_Tuning_and_Optimization_Guide/chap-Virtualization_Tuning_Optimization_Guide-Monitoring_Tools.html#sect-Virtualization_Tuning_Optimization_Guide-Monitoring_Tools-perf_kvm)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to select a gpu with minimum gpu-memory of 20GB in qsub/PBS (for tensorflow2.0)? - tensorflow

Related

Inconsistent performance of GPU subclusters

Stuck at training model with CPU

SLURM overcommiting GPU

Can ARM qemu system emulator boot from card image without kernel param?

How can a specific application be monitored by perf inside the kvm?

Categories

Resources