Apache Ignite : NODE_LEFT event - ignite

I wanted to understand the how a Node left event is triggered for an Apache Ignite grid.
Do nodes keep pinging each other constantly to find it if nodes are present or they ping each other only when required?
If ping from client node is not successful then can it also trigger NODE_LEFT event or it can only be triggered by server node.
Once a node has left, then which node triggers topology update event i.e. PME. Can it be triggered by client node or only server nodes can trigger it.

Yes, nodes are pinging each other to verify the connection. Here is more detailed explanation of how a node failure happens. You might also check this video.
The final decision of failing a node (leaving the cluster) is made on the Coordinator node issuing a special event that has to be acked by other nodes (NODE FAILED).
Though a node might leave a cluster explicitly, sending a TcpDiscoveryNodeLeftMessage (aka triggering a NODE_LEFT event), for example when you stop it gracefully.
Only the coordinator node can change topology version, meaning that a PME always starts on the coordinator and is spread to other nodes afterward.

Related

What ports does Selenium grid uses for heartbeat signals?

I am trying to setup a testing environment using Selenium grid. The hub is to be running on an AWS computer while the nodes would be executed on AWS Workspaces. At this point, nodes can register in the hub, but about one minute latter, the hub complains stating:
Marking the node http://192.168.x.x:1444 as down: cannot reach the node for 2 tries
I have been doing some research and the problem seems to be that the hub sends heartbeat signals that do not reach the nodes and then it drops the session. Since there is a firewall that might be blocking the heartbeat signals, I need to know which ports are used to send such signals in order to configure the firewall accordingly.
Thank you

Ignite cluster active event

Is there a way to get notified through a listener when the Ignite cluster gets activated/deactivated? Maybe I'm blind but I can't seem to find this event.
On node #2 I would like to be notified when on node #1 the cluster is activated.

ignite client node semaphore not reacquiring permit after bouncing

I am running Ignite 2.1.0 with 1 server and 1 client node.
My client node acquires the 1 available semaphore permit as follows:
IgniteSemaphore semaphore = _ignite.semaphore(name, 1, true, true);
if(semaphore.tryAcquire())
...
}
I bounce the client node, confirming that it leaves the topology. On restarting, the tryAcquire() method above returns false. This is not what I was expecting. I expect the client node to reacquire the permit, that was released when the client left the topology. The server node has no code running on it that would attempt to acquire the permit, once it is released
It looks like Ignite has a bug. You can watch discussion in this ticket to track plans of fixing it: https://issues.apache.org/jira/browse/IGNITE-4173
Note that if a failed node wasn't the last in topology who held an instance of the semaphore, then it will behave as expected.

Akka.net Lighthouse keeps trying to connect to failed node

I am trying to learn akka.net clustering.
I thought I understood that when a node went down, it would be removed from the cluster. But that does not seem to be happening.
I fired up a instance of Lighthouse (as the seed node) and made a super simple Akka.net project and connected then. It all connected fine.
But when I killed the node, Lighthouse keeps looking for it over and over. Eventually it will say something about the Leader not being able to perform its duties.
I know that the node did not leave the cluster gracefully, but I imagine that I will have nodes that crash.
I thought that when that happens, the gossip system was supposed to remove the dead node from the cluster and everything would move on. (Then if the node came back online, it could ask to be added back into the cluster.
But I must be missing something. Because Lighthouse just keeps retrying over and over.
Why does it do that instead of just waiting for it to connect again?
I added this to the "Cluster" part of my configuration and it caused the node to timeout:
auto-down-unreachable-after = 5s

CouchBase 2.5 2 nodes in replica: 1 node fail: the service is no more available

We are testing Couchbase with a two node cluster with one replica.
When we stop the service on one node, the other one does not respond until we restart the service or manually failover the stopped node.
Is there a way to maintain the service from the good node when one node is temporary unavailable?
If a node goes down then in order to activate the replicas on the other node you will need to manually fail it over. If you want this to happen automatically then you can enable auto-failover, but in order to use that feature I'm pretty sure you must have at least a three node cluster. When you want to add the failed node back then you can just re-add it to the cluster and rebalance.