Wrong qstat GPU resource count SGE - gpu

I have a gpu resource called gpus. When I run qstat -F gpus I get weird output of the format "qc:gpus=-1" , thus negative number of available gpus are reported. If i run qstat -g c says I have multiple GPUs available. Multiple jobs fail because of "unavailable gpus". It's like the counting of GPUs starts from 1 instead of 8 on each node, so if I used more than 1 it becomes negative. My queue is :
hostlist node-01 node-02 node-03 node-04 node-05
seq_no 0
load_thresholds NONE
suspend_thresholds NONE
nsuspend 1
suspend_interval 00:05:00
priority 0
min_cpu_interval 00:05:00
processors UNDEFINED
qtype BATCH INTERACTIVE
ckpt_list NONE
pe_list smp mpich2
rerun FALSE
slots 1,[node-01=8],[node-02=8],[node-03=8],[node-04=8],[node-05=8]
Does anyone have any idea why this is happening?

I believe you set the "gpus" complex in the host configuration. You can see it if you do
qconf -se node-01
And you can check the definition of the "gpus" complex with
qconf -sc
For instance, my UGE has this definition for the "ngpus" complex:
#name shortcut type relop requestable consumable default urgency
ngpus gpu INT <= YES YES 0 1000
And an example node "qconf -se gpu01":
hostname gpu01.cm.cluster
...
complex_values exclusive=true,m_mem_free=65490.000000M, \
m_mem_free_n0=32722.546875M,m_mem_free_n1=32768.000000M, \
ngpus=2,slots=16,vendor=intel
You can modify the value by "qconf -me node-01". See the man page complex(5) for details.

Related

Returning a map object in cypher

I need to create edges between a set of nodes but it is not guaranteed that the edge is not exists already, I need to know which edges has been created so I can increment the edges counter for the two connected nodes.
I want to know the edges count for every node without querying the graph each time.
Example:
MERGE (u:user {id:999049043279872})
MERGE (g1:group {id:346709075951616})
MERGE (g2:group {id:346709075951617})
MERGE (g1)-[m1:member]->(u)
MERGE (g2)-[m2:member]->(u)
Sometimes the user is already a member of the group so I don't want to increment the counter in this case.
I tried to use the result statistics but it returns the created relationships number only, I thought also about using a map and then fill the content using ON CREATE SET after MERGE:
WITH {g1:0, g2:0} as res
MERGE (u:user {id:999049043279872})
MERGE (g1:group {id:346709075951616})
MERGE (g2:group {id:346709075951617})
MERGE (g1)-[m1:member]->(u)
ON CREATE SET res.g1 = 1
MERGE (g2)-[m2:member]->(u)
ON CREATE SET res.g2 = 1
RETURN res
But it does not works; the server crashes immediately after executing the query.
Exception:
------ FAST MEMORY TEST ------
17235:M 28 Feb 2022 16:56:50.016 # main thread terminated
17235:M 28 Feb 2022 16:56:50.017 # Bio thread for job type #0 terminated
17235:M 28 Feb 2022 16:56:50.017 # Bio thread for job type #1 terminated
17235:M 28 Feb 2022 16:56:50.018 # Bio thread for job type #2 terminated
Fast memory test PASSED, however your memory can still be broken.
Please run a memory test for several hours if possible.
------ DUMPING CODE AROUND EIP ------
Symbol: (null) (base: (nil))
Module: /lib/x86_64-linux-gnu/libc.so.6 (base 0x7fbfe3dcc000)
$ xxd -r -p /tmp/dump.hex /tmp/dump.bin
$ objdump --adjust-vma=(nil) -D -b binary -m i386:x86-64 /tmp/dump.bin
=== REDIS BUG REPORT END. Make sure to include from START to END. ===
Please report the crash by opening an issue on github:
http://github.com/redis/redis/issues
Suspect RAM error? Use redis-server --test-memory to verify it.
Segmentation fault
Any ideas?
Thanks in advance
Neo4j stores already a counter inside each node to count the number of relationships and to provide a fast count access. When you want to get the number of members in a group, you can simply do:
MATCH (g:group)
return size((g)<-[:member]-())

ZFS: Unable to expand pool after increasing disk size in vmware

I have a Centos7 VM with ZFS on linux installed.
The VM has a disk /dev/sdb, that I've added to a pool named 'backup', and in this pool created a dataset.
Now, I wanted to increase the size of the disk in VMware, and then expand the size of the pool, but I'm not getting this to work.
I've tried 'zpool online -e backup sdb', but nothing changes.
I've tried running 'partprobe /dev/sdb' before and after the live above, but nothing changes.
I've tried rebooting + the above, nothing changes.
I've tried "parted /dev/sdb",resizing the partition (it suggests the actual new size of the volume), and then all of the above. But nothing changes
I've tried 'zpool export backup' + 'zpool import backup' in various combinations with all of the above. No luck
And also: 'lsblk' and 'df -h' reports the old/wrong size of /dev/sdb, even if parted seems to understand that it has been increased.
PS: autoexpand=on
What to do?
I faced a similar issue today and had to try a lot before finding the solution.
When I tried the known solutions (using zpool) of setting autoexpand as on and also restarting the partprobe, system would not auto expand (even after a restart).
Finally, I could solve it using parted instead of getting into zpool at all.
We need to be careful here since wrong partition selections can cause data loss.
What worked for me in your situation
Step 1: Find which pool you are trying to expand. In my case, it is 5 as seen below (unallocated space is after this pool). Use parted -l
parted -l
Output
Model: VMware, VMware Virtual S (scsi)
Disk /dev/sda: 69.8GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags:
Number Start End Size File system Name Flags
1 1049kB 2097kB 1049kB bios_grub
2 2097kB 540MB 538MB fat32 EFI System Partition boot, esp
3 540MB 2009MB 1469MB swap
4 2009MB 3592MB 1583MB zfs
5 3592MB 32.2GB 28.6GB zfs
Step 2: Instructing explictly to expany pool number 5 to 100% available. Note that '5' is not static. You need to use the pool id you wish to expand. Double-check this. Use parted /dev/XXX resizepart YY 100%
parted /dev/sda resizepart 5 100%
After this, I was able to use the entire space in VM.
For reference:
LSBSK Before
sda 8:0 0 65G 0 disk
├─sda1 8:1 0 1M 0 part
├─sda2 8:2 0 513M 0 part /boot/grub
│ /boot/efi
├─sda3 8:3 0 1.4G 0 part
│ └─cryptoswap 253:1 0 1.4G 0 crypt [SWAP]
├─sda4 8:4 0 1.5G 0 part
└─sda5 8:5 0 29.5G 0 part
LSBSK After
sda 8:0 0 65G 0 disk
├─sda1 8:1 0 1M 0 part
├─sda2 8:2 0 513M 0 part /boot/grub
│ /boot/efi
├─sda3 8:3 0 1.4G 0 part
│ └─cryptoswap 253:1 0 1.4G 0 crypt [SWAP]
├─sda4 8:4 0 1.5G 0 part
└─sda5 8:5 0 61.7G 0 part

How to get the ID of GPU allocated to a SLURM job on a multiple GPUs node?

When I submit a SLURM job with the option --gres=gpu:1 to a node with two GPUs, how can I get the ID of the GPU which is allocated for the job? Is there an environment variable for this purpose? The GPUs I'm using are all nvidia GPUs.
Thanks.
You can get the GPU id with the environment variable CUDA_VISIBLE_DEVICES. This variable is a comma separated list of the GPU ids assigned to the job.
You can check the environment variables SLURM_STEP_GPUS or SLURM_JOB_GPUS for a given node:
echo ${SLURM_STEP_GPUS:-$SLURM_JOB_GPUS}
Note CUDA_VISIBLE_DEVICES may not correspond to the real value (see #isarandi's comment).
Also, note this should work for non-Nvidia GPUs as well.
Slurm stores this information in an environment variable, SLURM_JOB_GPUS.
One way to keep track of such information is to log all SLURM related variables when running a job, for example (following Kaldi's slurm.pl, which is a great script to wrap Slurm jobs) by including the following command within the script run by sbatch:
set | grep SLURM | while read line; do echo "# $line"; done

Setting SGE for running an executable with different input files on different nodes

I used to work with a cluster using SLURM scheduler, but now I am more or less forced to switch to a SGE-based cluster, and I'm trying to get a hang of it. The thing I was working on SLURM system involves running an executable using N input files, and set a SLURM configuration file in this fashion,
slurmConf.conf SLURM configuration file
0 /path/to/exec /path/to/input1
1 /path/to/exec /path/to/input2
2 /path/to/exec /path/to/input3
3 /path/to/exec /path/to/input4
4 /path/to/exec /path/to/input5
5 /path/to/exec /path/to/input6
6 /path/to/exec /path/to/input7
7 /path/to/exec /path/to/input8
8 /path/to/exec /path/to/input9
9 /path/to/exec /path/to/input10
And my working submission script in SLURM contains this line;
srun -n $SLURM_NNODES --multi-prog $slconf
$slconf refers to a path to that configuration file
This setup worked as I wanted - to run the executable with 10 different inputs at the same time with 10 nodes. Now that I just transitioned to SGE system, I want to do the same thing but I tried to read the manual and found nothing quite like SLURM. Could you please give me some light on how to achieve the same thing on SGE system?
Thank you very much!
You could use the "job array" feature of the Grid Engine.
Create a shell script sge_job.sh
#!/bin/sh
#
# sge_job.sh -- SGE job description script
#
#$ -t 1-10
/path/to/exec /path/to/input$SGE_TASK_ID
And submit this script to SGE with qsub.
qsub sge_job.sh
Dmitri Chubarov's answer is excellent, and the most robust way to proceed as it places less load on the submit node when submitting many jobs (>1000). Alternatively, you can wrap qsub in a for loop:
for i in {1..10}
do
echo "/path/to/exec /path/to/input${i}" | qsub
done
I sometimes use the above when whatever varies as input is not easily captured as a range of integers.
Example:
for f in `ls /some/path/input*`
do
echo "/path/to/exec ${f}" | qsub
done

Hadoop jobs getting poor locality

I have some fairly simple Hadoop streaming jobs that look like this:
yarn jar /usr/lib/hadoop-mapreduce/hadoop-streaming-2.2.0.2.0.6.0-101.jar \
-files hdfs:///apps/local/count.pl \
-input /foo/data/bz2 \
-output /user/me/myoutput \
-mapper "cut -f4,8 -d," \
-reducer count.pl \
-combiner count.pl
The count.pl script is just a simple script that accumulates counts in a hash and prints them out at the end - the details are probably not relevant but I can post it if necessary.
The input is a directory containing 5 files encoded with bz2 compression, roughly the same size as each other, for a total of about 5GB (compressed).
When I look at the running job, it has 45 mappers, but they're all running on one node. The particular node changes from run to run, but always only one node. Therefore I'm achieving poor data locality as data is transferred over the network to this node, and probably achieving poor CPU usage too.
The entire cluster has 9 nodes, all the same basic configuration. The blocks of the data for all 5 files are spread out among the 9 nodes, as reported by the HDFS Name Node web UI.
I'm happy to share any requested info from my configuration, but this is a corporate cluster and I don't want to upload any full config files.
It looks like this previous thread [ why map task always running on a single node ] is relevant but not conclusive.
EDIT: at #jtravaglini's suggestion I tried the following variation and saw the same problem - all 45 map jobs running on a single node:
yarn jar \
/usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.2.0.2.0.6.0-101.jar \
wordcount /foo/data/bz2 /user/me/myoutput
At the end of the output of that task in my shell, I see:
Launched map tasks=45
Launched reduce tasks=1
Data-local map tasks=18
Rack-local map tasks=27
which is the number of data-local tasks you'd expect to see on one node just by chance alone.