Selecting from multiple SLURM GPU resources - gpu

I'm submitting jobs to a cluster via SLURM scheduler, and let's say I have access to 5 types of GPUs in my cluster. They are GPUs of type A,B,C,D,E. I would like to submit a job that requests the use of GPUs of type A or B or C but NOT of type D or E. So I need some type of OR logic with the --gres flag.
As a concrete example, here is what it looks like when I request a gpu of a single type (in this case, an RTX 2080):
qlogin -p gpu --gres=gpu:rtx2080:1 --mem=8g -c 2 I'd like to do this but allowing SLURM to pick from a list of allowed GPU types

Slurm does not have that option at this time.
One workaround is for the system administrator to setup features of the node with the GPU type to allow a request such as:
qlogin -p gpu --gres=gpu:1 --constraint="rtx2080|rtx3090" --mem=8g -c 2
(assuming qlogin uses the same options as sbatch)
If that is not possible, you can submit as many job as there are types of GPU that you want, all with the same --job-name=<SOME_NAME> and the --dependency=singleton option. Then you use whichever job starts first and cancel the other with
scancel --jobname <SOME_NAME> --state=PENDING
The --dependency option makes sure only one job is started at a time.

Related

JVM Runtime.availableProcessors() returns 2 when it should be 4

I'm running openjdk11 on alpine linux in a container in an AWS EKS cluster.
The application determines the size of a threadpool based on the number of CPUs as returned by Runtime.getRuntime().availableProcessors()
This call is returning 2 processors even though the container shows that 4 CPUs are available:
# cat /proc/cpuinfo | grep processor
processor : 0
processor : 1
processor : 2
processor : 3
Any idea why and how to solve the problem?
Update
Doing some more digging (prompted by some great questions from #gohm'c in the comments), I found a way to add some trace log prints to the JVM with -Xlog:os+container=trace
[0.001s][trace][os,container] CPU Shares is: 1536
[0.001s][trace][os,container] CPU Share count based on shares: 2
Now, I defined in resources.requests.cpu: "1500m".
I don't know why the slight discrepancy but I changed the value of the CPU request, and indeed the CPU Shares in the log trace changes accordingly.
I understand how the resources.limits.cpu value could affect the CPUs that the JVM sees. But why is the resources.requests.cpu value doing that! This seems like a bug to me? Any thoughts?

Python multiprocessing between ubuntu and centOS

I am trying to run some parallel jobs through Python multiprocessing. Here is an example code:
import multiprocessing as mp
import os
def f(name, total):
print('process {:d} starting doing business in {:d}'.format(name, total))
#there will be some unix command to run external program
if __name__ == '__main__':
total_task_num = 100
mp.Queue()
all_processes = []
for i in range(total_task_num):
p = mp.Process(target=f, args=(i,total_task_num))
all_processes.append(p)
p.start()
for p in all_processes:
p.join()
I also set export OMP_NUM_THREADS=1 to make sure that only one thread for one process.
Now I have 20 cores in my desktop. For 100 parallel jobs, I want to let it run 5 cycles so that each core run one job (20*5=100).
I tried to do the same code in CentOS and ubuntu. It seems that CentOS will automatically do a job splitting. In other words, there will be only 20 parallel running jobs at the same time. However, ubuntu will start 100 jobs simultaneously. As such, each core will be occupied by 5 jobs. This will significantly increase the total run time due to high work load.
I wonder if there is an elegant solution to teach ubuntu to run only 1 job per core.
To enable a process run on a specific CPU, you use the command taskset in linux. Accordingly you can arrive on a logic based on "taskset -p [mask] [pid]" that assigns each process to a specific core in a loop.
Also , python helps in incorporation of affinity control via sched_setaffinity that can be checked for confining a process to specific cores. Accordingly , you can arrive on a logic for usage of "os.sched_setaffinity(pid, mask)" where pid is the process id of the process whose mask represents the group of CPUs to which the process shall be confined to.
In python, there are also other tools like https://pypi.org/project/affinity/ that can be explored for usage.

attempt to call field 'replicate_commands' (a nil value)

I use jedis + lua to eval script, here is my lua script:
redis.replicate_commands()
local second = redis.call('TIME')[1]
local currentKey = KEYS[1]..second
if redis.call('EXISTS', currentKey) == 0 then
redis.call('SETEX', currentKey, 1, 1)
return 1
else
return redis.call('INCR', currentKey)
end
As I use 'Time', it reports error:Write commands not allowed after non deterministic commands.
after searching on internet, I add 'redis.replicate_commands()' as first line of lua script, but it still reports error:ERR Error running script (call to f_c89a6ee8ad732a325e530f4a69226851cde302e2): #user_script:1: user_script:1: attempt to call field 'replicate_commands' (a nil value)
Does replicate_commands need arguments or is there a way to solve my problem?
redis version:3.0
jedis version:2.9
lua version: I don't know where to find
The error attempt to call field 'replicate_commands' (a nil value) means replicate_commands() doesn't exists in the redis object. It is a Lua-side error message.
replicate_commands() was introduced until Redis 3.2. See EVAL - Replicating commands instead of scripts. Consider upgrading.
The first error message (Write commands not allowed after non deterministic commands) is a redis-side message, you cannot call write-commands (like SET, SETEX, INCR, etc) after calling non-deterministic commands (like SPOP, SCAN, RANDOMKEY, TIME, etc).
A very important part of scripting is writing scripts that are pure functions.
Scripts executed in a Redis instance are, by default, propagated to
replicas and to the AOF file by sending the script itself -- not the
resulting commands.
This is so if the Redis server is restarted, playing again the AOF log, or also if replicated in a slave, the script should deliver the same dataset.
This is why in Redis 3.2 replicate_commands() was introduced. And starting with Redis 5 scripts are always replicated as effects -- as if replicate_commands() was called when the script started. But for versions before 3.2, you simply cannot do this.
Therefore, either upgrade to 3.2 or later, or pass currentKey already calculated to the script from the client instead.
Note that creating currentKey dynamically makes your script single-instance-only.
All Redis commands must be analyzed before execution to determine
which keys the command will operate on. In order for this to be true
for EVAL, keys must be passed explicitly. This is useful in many ways,
but especially to make sure Redis Cluster can forward your request to
the appropriate cluster node.
Note this rule is not enforced in order to provide the user with
opportunities to abuse the Redis single instance configuration, at the
cost of writing scripts not compatible with Redis Cluster.
Finally, the Lua version at Redis 3.0.0 is Lua 5.1.5, same as all the way up to Redis 6 RC1.

Can I add secondary gpu to baremetal server

Can I add secondary or another GPU to bare metal server,
I tried to get the price details to confirm the same but getting following error
slcli order place --verify --billing monthly --complex-type SoftLayer_Container_Product_Order_Hardware_Server DUAL_E52600_V4_12_DRIVES DALLAS12 REBOOT_KVM_OVER_IP UNLIMITED_SSL_VPN_USERS_1_PPTP_VPN_USER_PER_ACCOUNT NESSUS_VULNERABILITY_ASSESSMENT_REPORTING NOTIFICATION_EMAIL_AND_TICKET 1_IP_ADDRESS AUTOMATED_NOTIFICATION MONITORING_HOST_PING BANDWIDTH_500_GB REDUNDANT_POWER_SUPPLY INTEL_TXT_TRUSTED_EXECUTION_TECHNOLOGY OS_UBUNTU_16_04_LTS_XENIAL_XERUS_MINIMAL_64_BIT INTEL_INTEL_XEON_E52620_V4_2_10 RAM_128_GB_DDR4_2133_ECC_REG 10_GBPS_REDUNDANT_PUBLIC_PRIVATE_NETWORK_UPLINKS DISK_CONTROLLER_NONRAID HARD_DRIVE_1_9TB_SSD_SED_5DWPD HARD_DRIVE_2_00_TB_SATA_2 HARD_DRIVE_3_8TB_SSD_SED_3DWPD GPU_NVIDIA_TESLA_K80 GPU_NVIDIA_TESLA_M10_ACCELERATOR
SoftLayerAPIError(SoftLayer_Exception_Public): Unable to add NVIDIA Tesla M10 GPU Accelerator because a Graphics Processing Unit price has already been added.
slcli order place --verify --billing monthly --complex-type SoftLayer_Container_Product_Order_Hardware_Server DUAL_E52600_V4_12_DRIVES DALLAS12 REBOOT_KVM_OVER_IP UNLIMITED_SSL_VPN_USERS_1_PPTP_VPN_USER_PER_ACCOUNT NESSUS_VULNERABILITY_ASSESSMENT_REPORTING NOTIFICATION_EMAIL_AND_TICKET 1_IP_ADDRESS AUTOMATED_NOTIFICATION MONITORING_HOST_PING BANDWIDTH_500_GB REDUNDANT_POWER_SUPPLY INTEL_TXT_TRUSTED_EXECUTION_TECHNOLOGY OS_UBUNTU_16_04_LTS_XENIAL_XERUS_MINIMAL_64_BIT INTEL_INTEL_XEON_E52620_V4_2_10 RAM_128_GB_DDR4_2133_ECC_REG 10_GBPS_REDUNDANT_PUBLIC_PRIVATE_NETWORK_UPLINKS DISK_CONTROLLER_NONRAID HARD_DRIVE_1_9TB_SSD_SED_5DWPD HARD_DRIVE_2_00_TB_SATA_2 HARD_DRIVE_3_8TB_SSD_SED_3DWPD GPU_NVIDIA_TESLA_K80 GPU_NVIDIA_TESLA_V100
SoftLayerAPIError(SoftLayer_Exception_Order_Item_Rule): The V100 can only be used with a V100
Currently it is not possible to order a baremetal servers with two GPU items through the SLCLI client.
It seems that there are issues in the slcli, it list only gpu0 items but there should be gpu0 and gpu1. The following issues were opened: #983, #984, and #985
In the mean time, the only way is placing the order manually through the placeOrder method, use verifyOrder if you are not ready to order, following have examples about how to order a baremetal, you need to change the values with your own and set the prices:
https://softlayer.github.io/rest/place_order/ (CURL or REST)
Adding SSH keys to Softlayer JSON order for Bare metal (Python)
Subnet error while ordering baremetal server
To order a DUAL_E52600_V4_12_DRIVES you need to use the packageId 553 and to know which item prices you need to send during the order you can use the following REST call:
https://api.softlayer.com/rest/v3.1/SoftLayer_Product_Package/553/getItemPrices?objectMask=mask[categories,pricingLocationGroup[locations]]
For DALLAS12 search for those items prices which have locationGroupId = null.

julia on PBS cluster: what to give to addprocs()?

I'm trying to setup a cluster across machines on a PBS managed cluster. I'm perfectly able to compute within one node by saying julia -p 12 (after having reserved one node with 12 CPUs).
I understand that to use several machines, I have to add them to the master process with addprocs. I was able to do that on a different cluster (SGE). on this one here something is going wrong.
You can see everything I'm doing, including submit scripts etc, on this branch of a github repo.
to get a list of machines, I parse the PBS_NODEFILE, which for the case of a submit script with option
#PBS -l nodes=2:ppn=12 # give me 2 nodes with 12 processors each
looks like something like this:
red0004
red0004
...
red0004
red0347
...
red0347
I parse this file with bind_pe_procs() in sge.jl in the repo and give a vector of machine names to addprocs. When I submit this I get this error which I put up a gist with the resulting SSH error. I don't know what it means.
has this to do with a system setting, ie do i have to talk to the sys admin about SSH between machines? What are the right questions to ask?
I am unsure about what exactly I have to give to addprocs(). I don't want to add the master process (I don't want worker 1 SSHing into itself?), so I exclude ENV["HOST"] = node001 from my list. but what about all processors with the same name node002? do i list all of those
machines = [ "red0347" for i=1:12]
or just once
machines = ["red0347"]
in addprocs(machines)
thanks!