How to fix mcast issue in gemfire caching system in a peer to peer set up - gemfire

I am using gemfire caching in a peer to peer set up. The system has been running fine with gemfire 6 for a number of years. I recently upgraded to gemfire 7 and get this error in agents and one of the processes-
[main] ERROR [GemfirePeer] Issues while creating gemfire distributed region : com.gemstone.gemfire.IncompatibleSystemException: Rejected new system node because mcast was disabled which does not match the distributed system it is attempting to join. To fix this make sure the "mcast-port" gemfire property is set the same on all members of the same distributed system.
The mcast-port=0 is set in configuration properties in all processes.
Can someone please give ideas what could be the issue here?

This message means that there is at-least one member you have started that has the mcast-port set to a non-zero value, it potentially could be from your 6.x install as well.
I would recommend that you use locator for member discovery.

Related

Apache Ignite: Getting affinity for too old topology version that is already out of history (try to increase 'IGNITE_AFFINITY_HISTORY_SiZE'

I am getting this exception intermittently while trying to run co-located join queries on cached data. Below are some of specifics of the environment and how the caches are initialized.
Running embedded with a spring boot application
Deployed in Kubernetes environment with TcpDiscoveryJdbcIpFinder
Running on 3+ nodes
The caches are created dynamically using BinaryObjects and QueryEntity
The affinity keys are forced to be a static value using AffinityKeyMapper (for the same group of data)
I am getting Getting affinity for too old topology version that is already out of history (try to increase 'IGNITE_AFFINITY_HISTORY_SiZE) sporadically. Sometimes this happens continuously for a few minutes. Sometimes it would work on a second or third try and sometimes we don't see this error for hours. I already increased IGNITE_AFFINITY_HISTORY_SiZE to 100000 and we are still getting this message.

Ignite error upgrading the setup in Kubernetes

While I upgraded the Ignite that is deployed in Kubernetes (EKS) for Log4j vulnerability, I get the error below
[ignite-1] Caused by: class org.apache.ignite.spi.IgniteSpiException: BaselineTopology of joining node (54b55de4-7742-4e82-9212-7158bf51b4a9) is not compatible with BaselineTopology in the cluster. Joining node BlT id (4) is greater than cluster BlT id (3). New BaselineTopology was set on joining node with set-baseline command. Consider cleaning persistent storage of the node and adding it to the cluster again.
The setup is a 3 node cluster, with native persistence enabled (PVC). This seems to be occurring many times in our journey with Apache Ignite, having followed the official guide.
I cannot clean the storage as the pod gets restarted every now and then, by the time I get the pod shell the pod crash & restarts.
This might happen to be due to the wrong startup order, starting nodes manually in reverse order may resolve this, but I'm not sure if that is possible in K8s. Another possible issue might be related to the baseline auto-adjustment that might change your baseline unexpectedly, I suggest you turn it off if it's enabled.
One of the workarounds to clean a DB of a failing POD might be (quite tricky) - to replace Ignite image with some simple image like a plain Debian or Alpine docker images (just to be able to access CLI) keeping the same PVC attached, and once you fix the persistence issue, set the Ignite image back. The other one is - to access underlying PV directly if possible and do surgery in place.

How to change the Ignite to maintanance mode?

What is Ignite maintenance mode of Ignite, and how to change an ignite to this mode? I was stuck joining the node to the cluster and complains cleaning up the persistent data, however the data can be cleaned (using control.sh) only in the maintenance mode only.
This is a special mode, similar to running Windows in a safe mode after a crash or a data corruption where most of the cluster functionality is disabled and a user is asked to perform some maintenance task to resolve the issue, most straightforward example I can think of is - to clean (remove) some corrupted files on disk just like in your question. You can refer to IEP-53: Maintenance Mode proposal for the details.
I don't think that there is a way to enter this mode manually unless you trigger some preconfigured conditions like stopping a node in the middle of checkpointing with WAL disabled. Once the state is fixed, maintenance mode should be resolved automatically allowing a node to join the cluster.
Also, from my understanding, this mode is about a particular node rather than a complete cluster. I.e. you can have a 4-nodes cluster with only 1 node in maintenance mode, in that case, you have to run control.sh commands locally for the concrete failed node, not from another healthy node. If that's not the case, please provide more details or file a JIRA ticket because reported behavior looks quite broken to me.

Dynamic GemFire Region Creation with PCC

I am using Pivotal GemFire 9.1.1 through Pivotal Cloud Cache 1.3.1 and ran into the following error while using the #EnableClusterConfiguration SDG annotation:
2018-11-17T16:30:35.279-05:00 [APP/PROC/WEB/0] [OUT] org.springframework.context.ApplicationContextException: Failed to start bean 'gemfireClusterSchemaObjectInitializer'; nested exception is org.apache.geode.cache.client.ServerOperationException: remote server on ac62ca98-0ec5-4a30-606b-1cc9(:8:loner):47710:a6159523:: The function is not registered for function id CreateRegionFunction
2018-11-17T16:30:35.279-05:00 [APP/PROC/WEB/0] [OUT] at org.springframework.context.support.DefaultLifecycleProcessor.doStart(DefaultLifecycleProcessor.java:184)
Finally, I ran into this post - https://github.com/spring-projects/spring-boot-data-geode/issues/15
Is there any other annotation I can use with Spring Boot 2+ which will help me with GemFire Region creation, dynamically?
Thanks!
Unfortunately, NO; there is no other way to currently and "dynamically" push cluster/server-side configuration from a Spring/GemFire cache client to a cluster of PCC servers running in PCF, using SDG/SBDG at the moment.
This is now because of this underlying issue as well, SBDG Issue #16 - "HTTP client does not authenticate when pushing cluster config from client to server using #EnableClusterConfiguration with PCC 1.5."
For the time being, you must manually create Regions (and Indexes) using the documentation provided by PCC.
I am sorry for any inconvenience or trouble this has caused you. This will be resolved very soon.
This does work in a local, non-managed context, even when starting your cluster (servers) using Gfsh. It just does not work in PCF using PCC, yet.
Regards.

Why are my WebLogic clustered MDB app deployments in warning state?

I have a WebLogic cluster on which I've deployed numerous topics and applications that use them. My applications uniformly show themselves in a Warning status. Looking at Monitoring on the deployment, I see the MDB application connects to Server #1, but on server #2 it shows this:
MDB application appName is NOT connected to messaging system.
My JMS Server is targetted to a migratable target, which is in turn targetted to the #1 server and has a cluster identified. And messages sent to either server all flow as expected. I just don't know why these deployments show in a Warning state.
WebLogic 11g
This can be avoided by using the parameter below
<start-mdbs-with-application>false</start-mdbs-with-application>
In the weblogic-application.xml, Setting start-mdbs-with-application to false forces MDBs to defer starting until after the server instance opens its listen port, near the end of the server boot up process.
If you want to perform startup tasks after JMS and JDBC services are available, but before applications and modules have been activated, you can select the Run Before Application Deployments option in the Administration Console (or set the StartupClassMBean’s LoadBeforeAppActivation attribute to “true”).
If you want to perform startup tasks before JMS and JDBC services are available, you can select the Run Before Application Activations option in the Administration Console (or set the StartupClassMBean’s LoadBeforeAppDeployments attribute to “true”).
Refer :http://docs.oracle.com/cd/E13222_01/wls/docs81/ejb/message_beans.html
this is applicable for the versions till 12c and later
I don't like unanswered questions, so I'm going to answer this one.
The problem is resolved, though I was not involved in its resolution. At present the problem only exists for the length of time it takes the JMS subsystem to fully initialize. During that period (with many queues, it can take a while) the JNDI system throws errors and the apps are truly in warning state. Once the JMS is fully initialized, everything goes green.
My belief is that someone corrected something in the JMS Server / Cluster config. I'll never know what it was.