Gemfire WAN Gateway-sender configuration - gemfire

We are using the Gemfire WAN topology and have problems setting up the gateway-senders.
Couple of assumptions:
- Replicated regions
- Serial gateway-senders
- manual-start is false for all gateway-senders
Let's say we have 2 clusters, within each cluster, we have 2 members (Member A and Member B)
Member A's cache.xml
<gfe:gateway-sender id="gateway-sender-A" parallel="false" remote-distributed-system-id="2" manual-start="false" />
<gfe:replicated-region name="data" scope="DISTRIBUTED_NO_ACK">
<gfe:replicated-region name="subData" data-policy="REPLICATE" scope="DISTRIBUTED_ACK">
<gfe:gateway-sender-ref bean="gateway-sender-A"/>
</gfe:replicated-region>
</gfe:replicated-region>
Member B's cache.xml
<gfe:gateway-sender id="gateway-sender-B" parallel="false" remote-distributed-system-id="2" manual-start="false" />
<gfe:replicated-region name="data" scope="DISTRIBUTED_NO_ACK">
<gfe:replicated-region name="subData" data-policy="REPLICATE" scope="DISTRIBUTED_ACK">
<gfe:gateway-sender-ref bean="gateway-sender-B"/>
</gfe:replicated-region>
</gfe:replicated-region>
There is a problem when we run start up the two members within one cluster. It raises this error:
java.lang.IllegalStateException: Cannot create Region /data with [gateway-sender-A] gateway sender ids because another cache has the same region defined with [gateway-sender-B] gateway sender ids
Looking at the "High Availability for Gateway Senders" documentation, our understanding is that we can create 2 gateway-senders, in which only one will be doing the sending at a given point in time. Ultimately, we want to have 2 gateway senders (one in each member) for one cache region, one as the primary sender and the other as the secondary sender.
Thanks

From Geode documentation, it says
For serial Senders, Queue HA is achieved by configuring identical serial Senders in multiple members. The Queue is replicated between the members.
So if the two gateway senders in member A and B is doing the same job (except their primary/secondary roles), you should use "identical" setting.
Among gateway senders, there will be one that manages to acquire a specific distributed lock and becomes the primary sender, usually the first one comes up. I don't see a property to force one to become primary.
Geode is the open source version of Gemfire if you wonder.

After changing the sender-ids to the same for both members, we have another problem:
java.lang.IllegalStateException: Cannot create Gateway Sender
"some-gateway-sender-id" with manual start "false" because another
cache has the same Gateway Sender defined with manual start "true
It seems like our problem was the inconsistent format.
Member A used XML format
<?xml version="1.0"?>
<!DOCTYPE cache PUBLIC "-//GemStone Systems, Inc.//GemFire Declarative Caching 8.0//EN" "http://www.gemstone.com/dtd/cache8_0.dtd">
Member B used Spring Gemfire Data format
xmlns:gfe="http://www.springframework.org/schema/gemfire" xsi:schemaLocation="http://www.springframework.org/schema/gemfire http://www.springframework.org/schema/gemfire/spring-gemfire.xsd">
<gfe:gateway-sender ...>
We switched to using the Spring Gemfire Data format for both members, and it solved both issues.
TL;DR, the gateway-senders need to have the same ids if they are in the same cluster and use consistent cache xml formats.

Related

Spring JMS - Message-driven-channel-adapter The number of consumers doesn't reduce to the standard level

I have a message-driven-channel-adapter and I defined the max-concurrent-consumers as 100 and concurrent-consumers as 2. When I tried a load test, I saw that the concurrent-consumers increased but after the load test, The number of consumers didn't reduce to the standard level. I'm checking it with RabbitMQ management portal.
When the project restarted (no load test), the GET (Empty) is 650/s but after load test it stays about 2500/s. It is not returning to 650/s. I think concurrent-consumers property is being increased to a number but is not being reduced to original value.
How can I make it to reduce to normal level again?
Here is my message-driven-channel-adapter definition:
<int-jms:message-driven-channel-adapter id="inboundAdapter"
auto-startup="true"
connection-factory="jmsConnectionFactory"
destination="inboundQueue"
channel="requestChannel"
error-channel="errorHandlerChannel"
receive-timeout="-1"
concurrent-consumers="2"
max-concurrent-consumers="100" />
With receiveTimeout=-1; the container has no control over the idle consumer (it is blocked in the jms client).
You also need to set max-messages-per-task for the container to consider stopping a consumer.
<int-jms:message-driven-channel-adapter id="inboundAdapter"
auto-startup="true"
connection-factory="jmsConnectionFactory"
destination-name="inboundQueue"
channel="requestChannel"
error-channel="errorHandlerChannel"
receive-timeout="5000"
concurrent-consumers="2"
max-messages-per-task="10"
max-concurrent-consumers="100" />
The time elapsed for an idle consumer is receiveTimeout * maxMessagesPerTask.

Apache Geode debug Unknown pdx type=2140705

If I start a GFSH client and connect to Geode. There is a lot of data in myRegion and to check through it then I run:
query --query="select * from /myRegion"
I am getting the response:
Result : false
startCount : 0
endCount : 20
Message : Unknown pdx type=2140705
How does one troubleshoot / debug this problem?
UPDATE: The error in the Geode server log is:
[info 2018/07/04 10:53:07.275 BST IsGeode <Function Execution Processor1> tid=0x48] Exception occurred:
java.lang.IllegalStateException: Unknown pdx type=1318971
at org.apache.geode.internal.InternalDataSerializer.readPdxSerializable(InternalDataSerializer.java:3042)
at org.apache.geode.internal.InternalDataSerializer.basicReadObject(InternalDataSerializer.java:2859)
at org.apache.geode.DataSerializer.readObject(DataSerializer.java:2961)
at org.apache.geode.internal.util.BlobHelper.deserializeBlob(BlobHelper.java:90)
at org.apache.geode.internal.cache.EntryEventImpl.deserialize(EntryEventImpl.java:1911)
at org.apache.geode.internal.cache.EntryEventImpl.deserialize(EntryEventImpl.java:1904)
at org.apache.geode.internal.cache.PreferBytesCachedDeserializable.getDeserializedValue(PreferBytesCachedDeserializable.java:73)
at org.apache.geode.internal.cache.LocalRegion.getDeserialized(LocalRegion.java:1269)
at org.apache.geode.internal.cache.LocalRegion$NonTXEntry.getValue(LocalRegion.java:8771)
at org.apache.geode.internal.cache.EntriesSet$EntriesIterator.moveNext(EntriesSet.java:179)
at org.apache.geode.internal.cache.EntriesSet$EntriesIterator.next(EntriesSet.java:134)
at org.apache.geode.cache.query.internal.CompiledSelect.doNestedIterations(CompiledSelect.java:837)
at org.apache.geode.cache.query.internal.CompiledSelect.doIterationEvaluate(CompiledSelect.java:699)
at org.apache.geode.cache.query.internal.CompiledSelect.evaluate(CompiledSelect.java:423)
at org.apache.geode.cache.query.internal.CompiledSelect.evaluate(CompiledSelect.java:53)
at org.apache.geode.cache.query.internal.DefaultQuery.executeUsingContext(DefaultQuery.java:558)
at org.apache.geode.cache.query.internal.DefaultQuery.execute(DefaultQuery.java:385)
at org.apache.geode.cache.query.internal.DefaultQuery.execute(DefaultQuery.java:319)
at org.apache.geode.management.internal.cli.functions.DataCommandFunction.select(DataCommandFunction.java:247)
at org.apache.geode.management.internal.cli.functions.DataCommandFunction.select(DataCommandFunction.java:202)
at org.apache.geode.management.internal.cli.functions.DataCommandFunction.execute(DataCommandFunction.java:147)
at org.apache.geode.internal.cache.MemberFunctionStreamingMessage.process(MemberFunctionStreamingMessage.java:185)
at org.apache.geode.distributed.internal.DistributionMessage.scheduleAction(DistributionMessage.java:374)
at org.apache.geode.distributed.internal.DistributionMessage$1.run(DistributionMessage.java:440)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at org.apache.geode.distributed.internal.DistributionManager.runUntilShutdown(DistributionManager.java:662)
at org.apache.geode.distributed.internal.DistributionManager$9$1.run(DistributionManager.java:1108)
at java.lang.Thread.run(Thread.java:748)
You can tell the immediate cause from the stack trace.
A PDX serialized stream contains a type id which is a reference into a repository of type metadata maintained by a GemFire cluster. In this case, the serialized data of the object contained a typeId that is not in the cluster's metadata repository.
So the question becomes, "what serialized that object and why did it use an invalid type id ?"
The only way I've seen this happen before is when a cluster is fully restarted and the pdx metadata goes away, either because it was not persistent or because it was deleted (by clearing out the locator working directory for example).
GemFire clients cache the mapping between a type and it's type ID. This allows them to quickly serialize objects without continually looking up the type id from the server. Client connections can persist across cluster restarts. When a client reconnects it does not flush the cached information and continues to write objects using its cached type ID.
So the combination of a pdx-metadata losing cluster restart and a client that is not restarted (e.g. an app. server) is the only way I have seen this happen before. Does this match your scenario ?
If so, one of the best ways to avoid this is to persist your pdx metadata and never delete it.

Gemfire WAN with Peer to Peer combined

We are using the multi-site WAN configuration. We have two clusters across geographical distances in North America and Europe.
Context: Cluster 1 has two members A and B that are both gateway senders. Cluster B has two members C and D that are both gateway receivers. When member A in cluster 1 starts, it reads data from database and loads it into the gemfire cache which gets sent to the cluster 2. Everything so far is good.
Problem: If both members in Cluster 2 are restarted at the same time, they lose all the gemfire regions/data. At that point, we could restart member A in cluster 1, it again loads data from the DB and gets pushed to cluster B. But we would prefer to avoid the restart of member A and without persisting to hard disk.
Is there a solution where if cluster 2 is restarted, it can request a full copy of data from cluster 1?
Not sure if it's possible, but could we somehow setup peer to peer for the gateway receivers in cluster 2 (on top of WAN), so they would be updated automatically upon restart.
Thanks
Getting a full copy of data over WAN is not supported at this time. What you could do instead is run a function on all members of site A, that simply iterates over all data and puts it back again in the region. i.e something like:
public void execute(FunctionContext context) {
RegionFunctionContext ctx = (RegionFunctionContext)context;
Region localData = PartitionRegionHelper.getLocalDataForContext(ctx);
for (Object key : localData.keySet()) {
Object val = localData.get(key);
localData.put(key, val);
}
}

Gemfire WAN Gateway-senders/receivers members

Within a cluster, we want to create members that are neither senders nor receivers, while having 1 sender/receiver in each cluster. We started up the sender/receiver in a given cluster first, no errors. As soon as we started up the member that's neither a sender nor a receiver, it raises this error:
java.lang.IllegalStateException: Cannot create Region /data with
[gateway-sender-A] gateway sender ids because another cache has the
same region defined with [] gateway sender ids.
Some assumptions:
- Replicated regions
- Serial gateway-senders
- manual-start is false for all gateway-senders
My guess is that since the member doesn't have a gateway-sender-id, it's complaining being blank, which I'm confused. I thought we can have members that are neither sender/receiver in a cluster. Can someone clarify?
Thanks
I assume you are using peer-to-peer configurations within a cluster, so all of these are peers but not clients. Then, the configuration for the same region needs to be identical, i.e. if member A has gateway sender 1, member B should have sender 1 in the gateway-sender-ids property.
This error will be thrown when the starting member figures out that somewhere in the cluster exists a region with the same name but not having the same configuration. And the error message tells the detailed reason.
To achieve "neither sender nor receiver", i think just simply not configure gateway sender/receiver on that specific instance will do.
Figured it out.
My old understanding was based on this graph,
Where the 2 top left members were neither senders/receivers. This is wrong. The graph is a little misleading.
You CAN set which members are receivers in cache xml but CAN NOT set
which members hosting the region are senders. In other words, having the tag in a member doesn't mean that member is the sender, it simply means that region has a sender in the cluster.
Therefore, if you want a sender for a specified region, you must include the <gateway-sender> tag in all the members under the specified region for it to be valid.

Maximum number of messages sent to a Queue in OpenMQ?

I am currently using Glassfish v2.1 and I have set up a queue to send and receive messages from with Sesion beans and MDBs respectively. However, I have noticed that I can send only a maximum of 1000 messages to the queue. Is there any reason why I cannot send more than 1000 messages to the queue? I do have a "developer" profile setup for the glassfish domain. Could that be the reason? Or is there some resource configuration setting that I need to modify?
I have setup the sun-resources.xml configuration properties as follows:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE resources PUBLIC "-//Sun Microsystems, Inc.//DTD Application Server 9.0 Resource Definitions //EN" "http://www.sun.com/software/appserver/dtds/sun-resources_1_3.dtd">
<resources>
<admin-object-resource
enabled="true"
jndi-name="jms/UpdateQueue"
object-type="user"
res-adapter="jmsra"
res-type="javax.jms.Queue">
<description/>
<property name="Name" value="UpdatePhysicalQueue"/>
</admin-object-resource>
<connector-resource
enabled="true" jndi-name="jms/UpdateQueueFactory"
object-type="user"
pool-name="jms/UpdateQueueFactoryPool">
<description/>
</connector-resource>
<connector-connection-pool
associate-with-thread="false"
connection-creation-retry-attempts="0"
connection-creation-retry-interval-in-seconds="10"
connection-definition-name="javax.jms.QueueConnectionFactory"
connection-leak-reclaim="false"
connection-leak-timeout-in-seconds="0"
fail-all-connections="false"
idle-timeout-in-seconds="300"
is-connection-validation-required="false"
lazy-connection-association="false"
lazy-connection-enlistment="false"
match-connections="true"
max-connection-usage-count="0"
max-pool-size="32"
max-wait-time-in-millis="60000"
name="jms/UpdateFactoryPool"
pool-resize-quantity="2"
resource-adapter-name="jmsra"
steady-pool-size="8"
validate-atmost-once-period-in-seconds="0"/>
</resources>
Hmm .. further investigation revealed the following in the imq logs:
[17/Nov/2009:10:27:57 CST] ERROR sendMessage: Sending message failed. Connection ID: 427038234214377984:
com.sun.messaging.jmq.jmsserver.util.BrokerException: transaction failed: [B4303]: The maximum number of messages [1,000] that the producer can process in a single transaction (TID=427038234364096768) has been exceeded. Please either limit the # of messages per transaction or increase the imq.transaction.producer.maxNumMsgs property.
So what would I do if I needed to send more than 5000 messages at a time?
What I am trying to do is to read all the records in a table and update a particular field of each record based on the corresponding value of that record in a legacy table to which I have only read only access. This table has more than 10k records in it. As of now, I am sequentially going through each record in a for loop, getting the corresponding record from the legacy table, comparing the field values, updating the record if necessary and adding corresponding new records in other tables.
However, I was hoping to improve performance by processing all the records asynchronously. To do that I was thinking of sending each record info as a separate message and hence requiring so many messages.
To configure OpenMQ and set artitrary broker properties, have a look at this blog post.
But actually, I wouldn't advice to increase the imq.transaction.producer.maxNumMsgs property, at least not above the value recommended in the documentation:
The maximum number of messages that a producer can process in a single transaction. It is recommended that the value be less than 5000 to prevent the exhausting of resources.
If you need to send more messages, consider doing it in several transactions.