How to setup multiple gemfire/geode WAN clusters on one machine for testing? - gemfire

What's needed to run multiple gemfire/geode clusters on one machine? I'm trying to test using WAN gateways locally, before setting it up on servers.
I have one cluster (i.e. gemfire.distributed-system-id=1) up and running with one locator and one server.
I am trying to setup a second cluster (i.e. gemfire.distributed-system-id=2), but receive the following error when attempting to connect to the locator in cluster 2:
Exception caused JMX Manager startup to fail because: 'HTTP service
failed to start'
I assume the error is due to a JMX Manager already running in cluster 1, so I'm guessing I need to start a second JMX Manager on a different port in cluster 2. Is this a correct assumption? If so, how do I setup the second JMX Manager?

Your assumption is correct, the exception is being thrown because the first members started some services (PULSE, jmx-manager, etc.) using the default ports already
You basically want to make sure the properties http-service-port and jmx-manager-port (non an extensive list, there are other properties you need to look at), are different in the second cluster.
Hope this helps.
Cheers.

Related

How to configure Akka.Cluster for services that Crash when binding to port 0

What I am testing is the following scenario:
Start 2 Lighthouses, then start a 3 service that is a member of the cluster. It's seed nodes are configured to be the two Lighthouses that were previously started.
Now this 3rd service has it's HOCON set to bind to port 0, which does it's job and gives me a random port.
Now when I force quit this service to simulate a crash, The logging output from Akka.Net gets REAL chatty (important parts)
AssociationError...Tried to associate with unreachable remote address
address is now gated for 5000ms ... No connection could be made because the target machine actively refused it.
And it seems like it just goes on forever. I assume this is probably harmless and it just looks like a terrible error. The message itself makes sense, the service is literally gone so it can not and will never be able to connect.
Now if I restart the service since it's configured to bind to 0 for Akka.Remoting, it will get an entirely new port, so the Unreachable status of the other failed service will never be resolved.
Is this the expected behavior? I also think there is a configuration setting that might come into play here:
auto-down-unreachable-after
Now this comes with it's own warning about:
Using auto-down implies that two separate clusters will automatically be formed in case of network partition.
Setting this does silence the messages:
auto-down-unreachable-after = 3s
And I get a new message after the node is marked unreachable:
Association to [akka.tcp://ClusterName#localhost:58977] having UID [983892349]is irrecoverably failed. UID is now quarantined and all messages to this UID will be delivered to dead letters. Remote actorsystem must be restarted to recover from this situation.
Remote actorsystem must be restarted to recover from this situation. Seems pretty serious and something to avoid. At the same time, given that the service joins on a random port, it is irrecoverable. In trying to gain some more knowledge about the UID it seems that it's internally assigned. So I can only guess there would not be any collisions later in time with UIDs, so this would be the proper behavior.
This seems to be the only option outside of
log-info = off
to just silence the logs
I assume the logging of the lighthouse services are chatty, right? That is 'normal' behaviour of the Akka gossip protocol trying to communicate with the crashed node. When this happens, you must configure what you want to do.
The solution for solving this is not always the same for each situation. It could depend for example if you are running the services on a cloud microservices platform for example. But one of the options is indeed 'auto-downing'. This will mark the service as 'UNREACHABLE' (as you can see). This means that the node isn't out of the cluster, but the cluster continues to operate without the crashed node. That's the reason that the same node cannot join, because it is still marked as 'UNREACHABLE'.
Be aware that auto-downing could result into a 'split-brain' of the cluster, where the two parts of the cluster (for example one cluster of 4 nodes gets split into 2 clusters of 2 nodes). This is a situation that you don't want, so this may not be the best solution!
Akka.NET has some other solution to you can configure to correctly deal with this: the Split Brain Resolver. More information how to configure this: https://getakka.net/articles/clustering/split-brain-resolver.html
These are all strategies to prevent 'split-brain' situations and will involve sacrificing nodes to keep the cluster consistent. Use these strategies in combination with for example a microservices orchestration platform (so that instances will restart themselves after crashing/exiting) to create a perfect self-healing Akka cluster.

SAP HANA Vora distributed log service refused to start

I installed SAP HANA Vora on a 3 node MapR cluster. While trying to bring up Vora service via Vora Manager UI, I get the following error:
Error occurred while starting all services: vora-dlog refused to
start. Cannot continue Start All Jobs. Error: There are no health
checks registered for service vora-dlog.
The vora-manager log file displays the following error:
vora.vora-dlog: [c.xxxxxxx] : Error while creating dlog store.
nomad[xxxxx]: client: failed to query for node allocations: no known servers
nomad[xxxxx]: client:rpcproxy: No servers available.
All 3 nodes in the cluster have 2 IPs in different subnets. Can anyone suggest how to configure a health check for consul? And what else can be wrong here?
The messages from the VoraMgr log file are not sufficient to understand the actual problem. Are there other messages from dlog before 'Error while creating dlog store.'? I have seen that message e.g. if the disk was full and the dlog could not create its local persistency.
Also, the 2 different networks could cause an issue like you described. You can configure the use of different network interface names on different nodes. However, on each node all Vora services as well as the Vora Manager must use the same network interface name. If using 2 different subnets the configuration must allow network traffic between them. Could you give some additional info on your topology + network configuration?

How does a GlassFish cluster find active IIOP endpoints?

I have a curiosity and I was searching for it without any result. In GlassFish documentation it is written:
If the GlassFish Server instance on which the application client is
deployed participates in a cluster, the GlassFish Server finds all
currently active IIOP endpoints in the cluster automatically. However,
a client should have at least two endpoints specified for
bootstrapping purposes, in case one of the endpoints has failed.
but I am asking myself how this list is created.
I've done some tests with a stand-alone client that is executed in a JVM and does some RMI calls on an application that is deployed in a GlassFish cluster and I can see from the logs that the IIOP endpoints list is completed automatically and it is set as com.sun.appserv.iiop.endpoints system property but if I stop a server instance or start another during the execution of the client the list remains the one that was created when the JVM was started.
GlassFish clustering is managed by the GMS (Group Management Service) which usually uses UDP Multicast, but can use TCP where that is not available.
See section 4 "Administering GlassFish Server Clusters" in the HA Administration Guide (PDF)
The Group Management Service (GMS) enables instances to participate in a cluster by
detecting changes in cluster membership and notifying instances of the changes. To
ensure that GMS can detect changes in cluster membership, a cluster's GMS settings
must be configured correctly.

Get Number of connection from all host to my activemq broker

ActiveMQ broker setup:
Broker is running on machine: hostA
Clients from different host can connect to my broker instance running on hostA, there can be any number of client from any host.
Is there a way to find out how many clients are connected to broker and also list which tell me how many connection from each host is there to my broker.
I want to do this without making assumption about number of hosts.
I can do this by using lsof command and some parsing over output, but I am in situation where I can not use this.
Is there any feature provided by ActiveMQ command line utility activemq-admin.
You can get to pretty much any Mbean attribute ActiveMQ exposes via the activemq-admin. There are no attributes or operations that give you a quick count of connections from specific clients. You will have to do some work on your end to get all the details you want, but all the raw data is there.
Examples:
Broker Stats:
activemq-admin query --objname type=Broker,brokerName=localhost
Connection Stats
activemq-admin query --objname type=Broker,brokerName=localhost,connector=clientConnectors,connectorName=<transport connector name>,connectionViewType=clientId,connectionName=*
See full doc here.
NOTE: Documentation as of this writting has not be updated to take into account the Mbean changes made in AMQ. References to Object names in examples are not correct.
You can get the object name (or example sytax) from JMX (using jconsole or visual vm for example) from the MBeanInfo. Each object name wills stat something like org.apache.activemq:type. For the script, remove the "org.apache.activemq:" and you should be in business for any thing you need from JMX via the script.
I think you may also look into using Jolokia with your broker. Although not compatible with the activemq-admin script, you can reach everything you can from the activemq-admin script, but also have access to all of the operations. In the past I've heavily used the activemq-admin script for local monitoring/command line administration of the broker, but have started converting everything to hit the Jolokia service. But again, activemq-admin will give you a way to access what you are looking for here.

Why are my WebLogic clustered MDB app deployments in warning state?

I have a WebLogic cluster on which I've deployed numerous topics and applications that use them. My applications uniformly show themselves in a Warning status. Looking at Monitoring on the deployment, I see the MDB application connects to Server #1, but on server #2 it shows this:
MDB application appName is NOT connected to messaging system.
My JMS Server is targetted to a migratable target, which is in turn targetted to the #1 server and has a cluster identified. And messages sent to either server all flow as expected. I just don't know why these deployments show in a Warning state.
WebLogic 11g
This can be avoided by using the parameter below
<start-mdbs-with-application>false</start-mdbs-with-application>
In the weblogic-application.xml, Setting start-mdbs-with-application to false forces MDBs to defer starting until after the server instance opens its listen port, near the end of the server boot up process.
If you want to perform startup tasks after JMS and JDBC services are available, but before applications and modules have been activated, you can select the Run Before Application Deployments option in the Administration Console (or set the StartupClassMBean’s LoadBeforeAppActivation attribute to “true”).
If you want to perform startup tasks before JMS and JDBC services are available, you can select the Run Before Application Activations option in the Administration Console (or set the StartupClassMBean’s LoadBeforeAppDeployments attribute to “true”).
Refer :http://docs.oracle.com/cd/E13222_01/wls/docs81/ejb/message_beans.html
this is applicable for the versions till 12c and later
I don't like unanswered questions, so I'm going to answer this one.
The problem is resolved, though I was not involved in its resolution. At present the problem only exists for the length of time it takes the JMS subsystem to fully initialize. During that period (with many queues, it can take a while) the JNDI system throws errors and the apps are truly in warning state. Once the JMS is fully initialized, everything goes green.
My belief is that someone corrected something in the JMS Server / Cluster config. I'll never know what it was.