Local processes of head node of a PBS cluster - ps

When I run ps or top on the head node of a PBS cluster, it shows all the processes running on all nodes.
I wonder if there is any way to limit to only the processes on the head node.

Related

Redis cluster can not auto failed over

I setup my Redis cluster (version 6.2) with 3 master nodes and 3 slave nodes. It works well for normal scenario.
However, if I kill one of the master nodes, even if I wait a very long time, the auto-failover does not happen. I use the "cluster nodes" command, the output tells me that the killed node is marked as "master, failed", and all 3 slave nodes are still as "slave". From the log, I also can not see any useful information.
My cluster config, except below 2, all are used default:
cluster-node-timeout 5000
cluster-require-full-covearage no
So may I know who has an idea how to check what is wrong, that is very appreciated!

Setup docker-swarm to use specific nodes as backup?

Is there a way to setup docker-swarm to only use specific nodes (workers or managers) as fail-over nodes? For instance if one specific worker dies (or if a service on it dies), only then it will use another node, before that happens it's as if the node wasn't in the swarm.
No, that is not possible. However, docker-swarm does have the features to build that up. Let's say that you have 3 worker nodes in which you want to run service A. 2/3 nodes will always be available and node 3 will be the backup.
Add a label to the 3 nodes. E.g: runs=serviceA . This will make sure that your service only runs in those 3 nodes.
Make the 3rd node unable to schedule tasks by running docker node update --availability drain <NODE-ID>
Whenever you need your node back, run docker node update --availability active <NODE-ID>

Can not connect by ssh between nodes

I have 5 Proxmox nodes with debian, we had some power problems and all nodes went down. The nodes at this time cannot ssh between them... From my computer I can access all of them but from one node to another takes too long and appears disconnected...
Any one had this problem?
Tks.

Is it possible to connect to other RabbitMQ nodes when one node is down?

The environment I have consists of two separate servers, one each with RabbitMQ service application running. They are correctly clustered and the queues are using mirroring correctly.
Node A is master
Node B is slave
My question is more specifically when Node A goes down but Service A is still up. Node B and Service B are still up. At this point, Node B is now promoted to master. When an application connects to Node B it connects okay, of course.
rabbitmqctl cluster_status on Node B shows cluster is up with two nodes and Node B is running. rabbitmqctl cluster_status on Node A shows node is down. This is expected behavior.
It is possible for an application to connect to Node A and be able to publish/pop queue items as normal?

How can I determine which nodes in my rabbitmq cluster are HA?

I have a clustered HA rabbitmq setup. I am using the "exactly" policy similar to:
rabbitmqctl set_policy ha-two "^two\." \'{"ha-mode":"exactly","ha-params":10,"ha-sync-mode":"automatic"}'
I have 30 machines running, of which 10 are HA nodes with queues replicated. When my broker goes down (randomly assigned to be the first HA node), I need my celery workers to point to a new HA node (one of the 9 left). I have a script that automates this. The problem is: I do not know how to distinguish between a regular cluster node and a HA node. When I issue the command:
rabbitmqctl cluster_status
The categories I get are "running nodes", "disc", and "ram". but there is no way here to tell if a node is HA.
Any ideas?
In cluster every node share everything with another, so you don't have do distinguish nodes in your application in order to access all entities.
In your case when one of HA node goes down (their number get to 9), HA queues will be replicated to first available node (doesn't matter disc or ram).