Balancing OpenMPI Workload - Distribute Over Master and Slave Nodes

Balancing OpenMPI Workload - Distribute Over Master and Slave Nodes - load-balancing

Hopefully this isn't a repeat question but I'm having an issue balancing a workload on my local cluster. This is my current MPI hostfile:
#The Hostfile for Open MPI
#Master Node, 'slots=2' is used because we are running an 2-core machine
localhost slots=2
#Slave nodes, 8-core machines as well
slave1-ip slots=2
slave2-ip slots=2
slave3-ip slots=2
When I run mpirun -np 4 --hostfile my_hostfile program, it will prefer to do all of the calculations on the local host first.
For example, in my nqueens code, the distribution of calculations at the end are:
Node 1 computed load of 1963
Node 2 computed load of 0
Node 3 computed load of 0
Node 4 computed load of 1
However, when I modify my hostfile so that the line localhost slots=2 to localhost slots=1, all of the calculations are run on the slave nodes and I get a much more even distribution:
Node 1 computed load of 497
Node 2 computed load of 486
Node 3 computed load of 493
Node 4 computed load of 488
Is there a way to load balance on the master thread so that it will spread work over both the master and slave nodes when I have localhost slots=2? Is there some sort of config file that will specify this? I have tried the --loadbalance flag and that did nothing.
P.S. I followed this tutorial when setting up my cluster:
http://techtinkering.com/2009/12/02/setting-up-a-beowulf-cluster-using-open-mpi-on-linux/

Related

How to run redis sentinel monitoring redis servers

I have 3 redis servers running with 3 sentinels on each host
3 redis-3 sentinels(total 3 hosts)
Can I run sentinel on a separate host or it should always run along with redis-server?
3 redis on 3 hosts
3 sentinels on 3 other hosts(total 6 hosts)
Is it possible to monitor all the 3 redis servers with only one redis sentinel? 3 redis on 3 host
1 sentinel on 1 host(total 3 or 4 hosts)

You can run sentinel on a separate hosts or on the same hosts.
The benefit of running it in separate hosts is that the sentinel instances will not be affected by load on the Redis instances.
The benefit of running it on the same hosts is mainly cost.
It might be possible but doesn't make any sense.
The benefit of Redis sentinel deployment over Redis single node deployment is that it adds high availability (HA).
It means that in case of master failure one of the slaves will be promoted to master and the cluster will continue to function.
If you have only single sentinel instance, you don't have HA since failure in sentinel instance will cause the cluster to fail.
Therefore to achieve HA you must have at least 3 sentinel instances running on different physical nodes.
If you don't need HA, just run Redis single instance without sentinel.

How to restart redis cluster node after failure

I am experimenting with Redis Cluster as per document. I have small confusion.
Initial Configuration
35edd8052caf37149b4f9cc800fcd2ba60018ab5 127.0.0.1:30005#40005 slave bd76f831d34ed265a964e5f5caff2c0807c96b85 0 1524390407263 5 connected
d9e92c606f1fddebf84bbbc6f76485e418647683 127.0.0.1:30003#40003 master - 0 1524390407263 8 connected 10923-16383
edf62838d10b99018a0ecb7698c1b9ac52aa3bbb 127.0.0.1:30002#40002 myself,master - 0 1524390407000 2 connected 5461-10922
bd76f831d34ed265a964e5f5caff2c0807c96b85 127.0.0.1:30001#40001 master - 0 1524390407062 1 connected 0-5460
55a72ea5b4d0a77e2b18ca2b3f74b20d3550244c 127.0.0.1:30006#40006 slave edf62838d10b99018a0ecb7698c1b9ac52aa3bbb 0 1524390407562 6 connected
26788ce4523c95a93bd63907c1c75827fe61476a 127.0.0.1:30004#40004 slave d9e92c606f1fddebf84bbbc6f76485e418647683 0 1524390407263 8 connected
Now to test that if any master get failed I failed it manually using following command.
redis-cli -p 30001 debug segfault
Now configuration is look like this. ( 30001 is failed and 30005 promoted as master)
35edd8052caf37149b4f9cc800fcd2ba60018ab5 127.0.0.1:30005#40005 master - 0 1524390694964 9 connected 0-5460
d9e92c606f1fddebf84bbbc6f76485e418647683 127.0.0.1:30003#40003 master - 0 1524390695064 8 connected 10923-16383
edf62838d10b99018a0ecb7698c1b9ac52aa3bbb 127.0.0.1:30002#40002 myself,master - 0 1524390694000 2 connected 5461-10922
bd76f831d34ed265a964e5f5caff2c0807c96b85 127.0.0.1:30001#40001 master,fail - 1524390636966 1524390636165 1 disconnected
55a72ea5b4d0a77e2b18ca2b3f74b20d3550244c 127.0.0.1:30006#40006 slave edf62838d10b99018a0ecb7698c1b9ac52aa3bbb 0 1524390694964 6 connected
26788ce4523c95a93bd63907c1c75827fe61476a 127.0.0.1:30004#40004 slave d9e92c606f1fddebf84bbbc6f76485e418647683 0 1524390695164 8 connected
How can I add 30001 again into cluster ? Also How can I start that node Only ?
I am following this document.
https://redis.io/topics/cluster-tutorial. ( Here there is one statement that "I restarted the crashed instance so that it rejoins the cluster as a slave" but did not mention how to do that ?)

creating a cluster using redis-trib.rb needs running instances of Redis which we should start using a custom config file
../redis-server redis.conf
where redis.conf contains config for that node.
For instance
port 7000
cluster-enabled yes
cluster-config-file nodes.conf
cluster-node-timeout 5000
appendonly yes
The redis cluster is created as below,
./redis-trib.rb create --replicas 1 host1:port1 host2:port2 host3:port3 host4:port4 host5:port5 host6:port6
The ruby file will randomly create master and slaves among these and create a nodes.conf file (as mentioned in redis.conf file) which will have the node information
when you start the server using ../redis-server redis.conf it will pick node information like id, its master/slave from nodes.conf and connect to cluster again

You can restart the redis instance on required port, using the same command as you have used to start it earlier i.e.
cd 30001
../redis-server redis.conf

Assuming that you followed the tutorial and created the cluster using create-cluster command i.e.
# pwd: redis/utils/create-cluster
./create-cluster start
./create-cluster create
To bring back the node that you failed, start it again using
./create-cluster start
This will start the failed node. Currently running nodes won't be affected.
https://github.com/antirez/redis/blob/unstable/utils/create-cluster/create-cluster#L25

Unable to Run MPI CLUSTER within a LAN

This is the sanpshot of my /etc/hosts file
karpathy is master & client is slave
I have successfully done
SETUP PASSWORDLESS SSH
Mounted sudo mount -t nfs karpathy:/home/mpiuser/cloud ~/cloud
I can login to my client simply by ssh client
I have followed this blog
http://mpitutorial.com/tutorials/running-an-mpi-cluster-within-a-lan/
mpirun -np 5 -hosts karpathy ./cpi output
mpirun -np 5 -hosts client ./cpi
Getting Error
[mpiexec#karpathy] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[mpiexec#karpathy] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:179): error waiting for event
[mpiexec#karpathy] main (./ui/mpich/mpiexec.c:397): process manager error waiting for completion

I hope you already have found the solution, in case you haven't I would suggest doing a couple of things.
1. disabling firewall on both the nodes by doing `
sudo ufw disable
`
2. Creating a file named as machinefile (or whatever u like) and storing the number of CPU's in both nodes along with the hostnames.
my machinefile contains:
master:8
slave:4
master and slave are the hostnames while 8 and 4 are the number of CPUs on each node.
to compile use
mpicc -o filename filename.cpp
to run use the machinefile as an argument
mpirun -np 12 -f machinefile ./filename
12 is th enumber of processes. Since both the nodes have 12 CPUs combined so it's better to divide the code on 12 processes.

Redis Cluster: No automatic failover for master failure

I am trying to implement a Redis cluster with 6 machine.
I have a vagrant cluster of six machines:
192.168.56.101
192.168.56.102
192.168.56.103
192.168.56.104
192.168.56.105
192.168.56.106
all running redis-server
I edited /etc/redis/redis.conf file of all the above servers adding this
cluster-enabled yes
cluster-config-file nodes.conf
cluster-node-timeout 5000
cluster-slave-validity-factor 0
appendonly yes
I then ran this on one of the six machines;
./redis-trib.rb create --replicas 1 192.168.56.101:6379 192.168.56.102:6379 192.168.56.103:6379 192.168.56.104:6379 192.168.56.105:6379 192.168.56.106:6379
A Redis cluster is up and running. I checked manually by setting value in one machine it shows up on other machine.
$ redis-cli -p 6379 cluster nodes
3c6ffdddfec4e726f29d06a6da550f94d976f859 192.168.56.105:6379 master - 0 1450088598212 5 connected
47d04bc98ab42fc793f9f382855e5c54ab8f2e20 192.168.56.102:6379 slave caf2cec45114dc8f4cbc6d96c6dbb20b62a39f90 0 1450088598716 7 connected
040d4bb6a00569fc44eec05440a5fe0796952ccf 192.168.56.101:6379 myself,slave 5318e48e9ef0fc68d2dc723a336b791fc43e23c8 0 0 4 connected
caf2cec45114dc8f4cbc6d96c6dbb20b62a39f90 192.168.56.104:6379 master - 0 1450088599720 7 connected 0-10922
d78293d0821de3ab3d2bca82b24525e976e7ab63 192.168.56.106:6379 slave 5318e48e9ef0fc68d2dc723a336b791fc43e23c8 0 1450088599316 8 connected
5318e48e9ef0fc68d2dc723a336b791fc43e23c8 192.168.56.103:6379 master - 0 1450088599218 8 connected 10923-16383
My problem is that when I shutdown or stop redis-server on any one machine which is master the whole cluster goes down, but if all the three slaves die the cluster still works properly.
What should I do so that a slave turns a master if a master fails(Fault tolerance)?
I am under the assumption that redis handles all those things and I need not worry about it after deploying the cluster. Am I right or would I have to do thing myself?
Another question is lets say I have six machine of 16GB RAM. How much total data I would be able to handle on this Redis cluster with three masters and three slaves?
Thank you.

the setting cluster-slave-validity-factor 0 may be the culprit here.
from redis.conf
# A slave of a failing master will avoid to start a failover if its data
# looks too old.
In your setup the slave of the terminated master considers itself unfit to be elected master since the time it last contacted master is greater than the computed value of:
(node-timeout * slave-validity-factor) + repl-ping-slave-period
Therefore, even with a redundant slave, the cluster state is changed to DOWN and becomes unavailable.
You can try with a different value, example, the suggested default
cluster-slave-validity-factor 10
This will ensure that the cluster is able to tolerate one random redis instance failure. (it can be slave or a master instance)
For your second question: Six machines of 16GB RAM each will be able to function as a Redis Cluster of 3 Master instances and 3 Slave instances. So theoretical maximum is 16GB x 3 data. Such a cluster can tolerate a maximum of ONE node failure if cluster-require-full-coverage is turned on. else it may be able to still serve data in the shards that are still available in the functioning instances.

Redis cluster creation [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 7 years ago.
Improve this question
Is it possible to create a redis cluster with 2 nodes , one acting as a master and other one as slave.
I get the following error if I try with 2 nodes (one as master and other as slave)
>>> Creating cluster
Connecting to node 127.0.0.1:6379: OK
Connecting to node 192.168.40.159:6379: OK
*** ERROR: Invalid configuration for cluster creation.
*** Redis Cluster requires at least 3 master nodes.
*** This is not possible with 2 nodes and 1 replicas per node.
*** At least 6 nodes are required.

Yes. The requirement of at least 3 master nodes is set by the ruby script, but not a hard limit in cluster.
The first thing you need to do is send a cluster command with 16385 arguments like
cluster addslots 0 1 2 3 ... 16384
to the cluster. Since there is too many arguments to manually type them in a redis-cli, I suggest write a program to do that, in which you open a TCP socket connecting to the redis node, convert the previous command into a redis command string and write it to the socket.
The single node cluster will be online after few seconds you send the command. Then connect to the other node with redis-cli, type the following command to make it a slave
cluster meet MASTER_HOST MASTER_PORT
cluster replicate MASTER_ID
where MASTER_HOST:MASTER_PORT is the address of the previous node, and MASTER_ID is the ID of that node, which you could retrieve it via a cluster nodes command.
For convenience I've written a python tool for those kinds of redis cluster management, you could install it with
pip install redis-trib
For more detail please go to https://github.com/HunanTV/redis-trib.py/

Redis-Cluster is not a fit for your use case.
For your use case, you need to configure one server (the master), then configure a second server and add the "slaveof" directive - pointing it to the master. How you handle failover is up to your scenario but I would recommend the use of redis-sentinel.
For a more detailed walkthrough, see the Redis Replication page

Nope, it is not possible to create a redis cluster with 1 master node, as suggested here setting up a redis cluster requires atleast 3 master nodes.

'redis-trib.rb create' command requires at least 3 nodes.
this way can make 1 master, 1 slave redis cluster.
Using redis-trib
$ redis-server 5001/redis.conf
$ redis-trib.rb fix 127.0.0.1:5001
so many messages ...
$ redis-server 5002/redis.conf
$ redis-trib.rb add-node --slave 127.0.0.1:5002 127.0.0.1:5001
>>> Adding node 127.0.0.1:5002 to cluster 127.0.0.1:5001<br>
Connecting to node 127.0.0.1:5001: OK<br>
>>> Performing Cluster Check (using node 127.0.0.1:5001)<br>
M: 015bec64d631990b83ad63736d906cda257a762c 127.0.0.1:5001<br>
slots:0-16383 (16384 slots) master<br>
0 additional replica(s)<br>
[OK] All nodes agree about slots configuration.<br>
>>> Check for open slots...<br>
>>> Check slots coverage...<br>
[OK] All 16384 slots covered.<br>
Automatically selected master 127.0.0.1:5001<br>
Connecting to node 127.0.0.1:5002: OK<br>
>>> Send CLUSTER MEET to node 127.0.0.1:5002 to make it join the cluster.<br>
Waiting for the cluster to join.<br>
>>> Configure node as replica of 127.0.0.1:5001.<br>
[OK] New node added correctly.
Using cluster commands
$ redis-server 5001/redis.conf
Using Ruby : addslots
$ echo '(0..16383).each{|x| puts "cluster addslots "+x.to_s}' | ruby | redis-cli -c -p 5001 > /dev/null
$ redis-server 5002/redis.conf
$ redis-cli -c -p 5002
127.0.0.1:5002> cluster meet 127.0.0.1 5001
OK
127.0.0.1:5002> cluster replicate 7c38d2e5e76fc4857fe238e34b4096fc9c9f12a5
node-id of 5001
OK

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Balancing OpenMPI Workload - Distribute Over Master and Slave Nodes - load-balancing

Related

How to run redis sentinel monitoring redis servers

How to restart redis cluster node after failure

Unable to Run MPI CLUSTER within a LAN

Redis Cluster: No automatic failover for master failure

Redis cluster creation [closed]

Categories

Resources