How to store related entries in the Geode region - locking

We operate on the sketches (sizes can vary from 1GB to 15GB) and currently breaking them into parcels (each one is around 50mb) and storing the parcels in the Geode partitioned region, We read these data from S3 and put all of them in the region. once this is successful, we insert an entry (marker key) in the region. This marker key is very important in our business logic.
Below is the region configuration
<region name="region_abc">
<region-attributes data-policy="partition" statistics-enabled="true">
<key-constraint>java.lang.String</key-constraint>
<entry-time-to-live>
<expiration-attributes action="destroy" timeout="86400"/>
</entry-time-to-live>
<partition-attributes redundant-copies="0">
<partition-resolver name="SingleBucketPartitioner">
<class-name>com.companyname.geode.sketch.partition.SingleBucketPartitioner</class-name>
</partition-resolver>
</partition-attributes>
<cache-loader>
<class-name>com.companyname.geode.abc.cache.BitmapSketchParcelCacheLoader</class-name>
<parameter name="s3-region-name">
<string>us-east-1</string>
</parameter>
<parameter name="s3-bucket-name">
<string>xyz</string>
</parameter>
<parameter name="s3-folder-name">
<string>abc</string>
</parameter>
<parameter name="s3-read-timeout">
<string>600</string>
</parameter>
<parameter name="read-through-pool-size">
<string>70</string>
</parameter>
<parameter name="measurement-group">
<string>abcd</string>
</parameter>
</cache-loader>
<cache-listener>
<class-name>com.companyname.geode.abc.cache.ClearMarkerKeyAfterAnyEntryDestroyCacheListener</class-name>
</cache-listener>
<eviction-attributes>
<lru-heap-percentage action="local-destroy"/>
</eviction-attributes>
</region-attributes>
</region>
If the marker key is present in the region, then we assume that we have all the entries of the sketch.
We have currently set the cache eviction to trigger at 70% of heap usage, which evicts cache entries using the LRU algorithm. We have been seeing some inconsistencies in the data due to the cache eviction. We have spotted scenarios, where cache eviction evicted some or many of the entries, but not the marker key, that brings inconsistency to that object, as application thinks that we have all the entries but actually we don't.
To fix this, we also implemented a listener for the destroy, but somehow this is also not fixing the issue.
'''
#Override
public void afterDestroy(EntryEvent<String, BitmapSketch> event) {
String regionKey = event.getKey();
Region<String, BitmapSketch> region = event.getRegion();
// take action only when non-marker key is evicted and marker key is still present
if (regionKey != null && !regionKey.startsWith("[")) {
//asynchronus call
reloadExecutor.submit(
() -> {
String markerKey = "[".concat(regionKey.substring(0, regionKey.indexOf("_")).trim().concat("]"));
//check for marker key presence before removing the marker key
if (region.containsKey(markerKey)) {
logger.info("FixGeodeCacheInconsistency : Marker key exist !!! Deleting the marker key associated with the entry key. Region: `{}`; Entry Key: `{}`; Marker Key: `{}`",
region.getName(), regionKey, markerKey);
//remove the marker key from the region to bring consistency for the sketch
region.remove(markerKey);
logger.info("FixGeodeCacheInconsistency : Marker key destroyed. Region: `{}`; Entry Key: `{}`; Marker Key: `{}`",
region.getName(), regionKey, markerKey);
}
});
}
}
'''
We are now in the run to look for some other more reliable solution and trying to take a deeper look at the problem.
Couple of notes
We are breaking one big object and storing the parts as entries in the region.
We add one marker key to the region to determine the object existence
We read all the parts of this object from the region to create the big object
Geode region does not know the connection between these parts
A simple example of one such object.
For example, the object is E12345
Marker key:- [E12345]
Parts/Entries:- E12345_00, E12345_01, E12345_02, E12345_03, E12345_04, E12345_05 and so on....
Geode Cache eviction sometimes evict some of the parts but not the marker key, which is causing all the issues.
We are trying to come up with the approach of achieving any of the below.
Is there an option of grouping related entries together so that Geode knows these are relevant entries for one broader object
How to make sure that Geode cache eviction is not causing inconsistencies. Currently, it is removing some of the entries and leaving some of them there, that brings inconsistency in the end results
Is this a good use case for the region locking semantics?
I will be glad to provide more context and details as required.
Any details/guidance/suggestions are appreciated.

Related

Infinispan clustered lock performance does not improve with more nodes?

I have a piece of code that is essentially executing the following with Infinispan in embedded mode, using version 13.0.0 of the -core and -clustered-lock modules:
#Inject
lateinit var lockManager: ClusteredLockManager
private fun getLock(lockName: String): ClusteredLock {
lockManager.defineLock(lockName)
return lockManager.get(lockName)
}
fun createSession(sessionId: String) {
tryLockCounter.increment()
logger.debugf("Trying to start session %s. trying to acquire lock", sessionId)
Future.fromCompletionStage(getLock(sessionId).lock()).map {
acquiredLockCounter.increment()
logger.debugf("Starting session %s. Got lock", sessionId)
}.onFailure {
logger.errorf(it, "Failed to start session %s", sessionId)
}
}
I take this piece of code and deploy it to kubernetes. I then run it in six pods distributed over six nodes in the same region. The code exposes createSession with random Guids through an API. This API is called and creates sessions in chunks of 500, using a k8s service in front of the pods which means the load gets balanced over the pods. I notice that the execution time to acquire a lock grows linearly with the amount of sessions. In the beginning it's around 10ms, when there's about 20_000 sessions it takes about 100ms and the trend continues in a stable fashion.
I then take the same code and run it, but this time with twelve pods on twelve nodes. To my surprise I see that the performance characteristics are almost identical to when I had six pods. I've been digging in to the code but still haven't figured out why this is, I'm wondering if there's a good reason why infinispan here doesn't seem to perform better with more nodes?
For completeness the configuration of the locks are as follows:
val global = GlobalConfigurationBuilder.defaultClusteredBuilder()
global.addModule(ClusteredLockManagerConfigurationBuilder::class.java)
.reliability(Reliability.AVAILABLE)
.numOwner(1)
and looking at the code the clustered locks is using DIST_SYNC which should spread out the load of the cache onto the different nodes.
UPDATE:
The two counters in the code above are simply micrometer counters. It is through them and prometheus that I can see how the lock creation starts to slow down.
It's correctly observed that there's one lock created per session id, this is per design what we'd like. Our use case is that we want to ensure that a session is running in at least one place. Without going to deep into detail this can be achieved by ensuring that we at least have two pods that are trying to acquire the same lock. The Infinispan library is great in that it tells us directly when the lock holder dies without any additional extra chattiness between pods, which means that we have a "cheap" way of ensuring that execution of the session continues when one pod is removed.
After digging deeper into the code I found the following in CacheNotifierImpl in the core library:
private CompletionStage<Void> doNotifyModified(K key, V value, Metadata metadata, V previousValue,
Metadata previousMetadata, boolean pre, InvocationContext ctx, FlagAffectedCommand command) {
if (clusteringDependentLogic.running().commitType(command, ctx, extractSegment(command, key), false).isLocal()
&& (command == null || !command.hasAnyFlag(FlagBitSets.PUT_FOR_STATE_TRANSFER))) {
EventImpl<K, V> e = EventImpl.createEvent(cache.wired(), CACHE_ENTRY_MODIFIED);
boolean isLocalNodePrimaryOwner = isLocalNodePrimaryOwner(key);
Object batchIdentifier = ctx.isInTxScope() ? null : Thread.currentThread();
try {
AggregateCompletionStage<Void> aggregateCompletionStage = null;
for (CacheEntryListenerInvocation<K, V> listener : cacheEntryModifiedListeners) {
// Need a wrapper per invocation since converter could modify the entry in it
configureEvent(listener, e, key, value, metadata, pre, ctx, command, previousValue, previousMetadata);
aggregateCompletionStage = composeStageIfNeeded(aggregateCompletionStage,
listener.invoke(new EventWrapper<>(key, e), isLocalNodePrimaryOwner));
}
The lock library uses a clustered Listener on the entry modified event, and this one uses a filter to only notify when the key for the lock is modified. It seems to me the core library still has to check this condition on every registered listener, which of course becomes a very big list as the number of sessions grow. I suspect this to be the reason and if it is it would be really really awesome if the core library supported a kind of key filter so that it could use a hashmap for these listeners instead of going through a whole list with all listeners.
I believe you are creating a clustered lock per session id. Is this what you need ? what is the acquiredLockCounter? We are about to deprecate the "lock" method in favour of "tryLock" with timeout since the lock method will block forever if the clustered lock is never acquired. Do you ever unlock the clustered lock in another piece of code? If you shared a complete reproducer of the code will be very helpful for us. Thanks!

Hibernate search Monitoring the Index process

I am using Hibernate search to index Data from Postgresql datenbank, while the process takes really long i want to display Process bar to estimate how long it will take to finish indexing, i also want to display which Entity is being indexed.
First i enabled jmx_enabled and generate_statistics in my Persistence.xml
<property name="hibernate.search.generate_statistics" value="true"/>
<property name="hibernate.search.jmx_enabled" value="true"/>
then added the processMotitor to FullTextSession in my Index Class like this
MassIndexerProgressMonitor monitor = new SimpleIndexingProgressMonitor();
FullTextSession fullTextSession = Search.getFullTextSession(em.unwrap(Session.class));
fullTextSession.getStatistics();
fullTextSession.createIndexer(TCase.class).progressMonitor(monitor).startAndWait();
the Problem is that i still don't know how to print the Process results on console while Indexing
According to documentation of SimpleIndexingProgressMonitor you need to have INFO level enabled at package level org.hibernate.search.batchindexing.impl or class level org.hibernate.search.batchindexing.impl.SimpleIndexingProgressMonitor
Can you check your log level?

Gemfire WAN Gateway-sender configuration

We are using the Gemfire WAN topology and have problems setting up the gateway-senders.
Couple of assumptions:
- Replicated regions
- Serial gateway-senders
- manual-start is false for all gateway-senders
Let's say we have 2 clusters, within each cluster, we have 2 members (Member A and Member B)
Member A's cache.xml
<gfe:gateway-sender id="gateway-sender-A" parallel="false" remote-distributed-system-id="2" manual-start="false" />
<gfe:replicated-region name="data" scope="DISTRIBUTED_NO_ACK">
<gfe:replicated-region name="subData" data-policy="REPLICATE" scope="DISTRIBUTED_ACK">
<gfe:gateway-sender-ref bean="gateway-sender-A"/>
</gfe:replicated-region>
</gfe:replicated-region>
Member B's cache.xml
<gfe:gateway-sender id="gateway-sender-B" parallel="false" remote-distributed-system-id="2" manual-start="false" />
<gfe:replicated-region name="data" scope="DISTRIBUTED_NO_ACK">
<gfe:replicated-region name="subData" data-policy="REPLICATE" scope="DISTRIBUTED_ACK">
<gfe:gateway-sender-ref bean="gateway-sender-B"/>
</gfe:replicated-region>
</gfe:replicated-region>
There is a problem when we run start up the two members within one cluster. It raises this error:
java.lang.IllegalStateException: Cannot create Region /data with [gateway-sender-A] gateway sender ids because another cache has the same region defined with [gateway-sender-B] gateway sender ids
Looking at the "High Availability for Gateway Senders" documentation, our understanding is that we can create 2 gateway-senders, in which only one will be doing the sending at a given point in time. Ultimately, we want to have 2 gateway senders (one in each member) for one cache region, one as the primary sender and the other as the secondary sender.
Thanks
From Geode documentation, it says
For serial Senders, Queue HA is achieved by configuring identical serial Senders in multiple members. The Queue is replicated between the members.
So if the two gateway senders in member A and B is doing the same job (except their primary/secondary roles), you should use "identical" setting.
Among gateway senders, there will be one that manages to acquire a specific distributed lock and becomes the primary sender, usually the first one comes up. I don't see a property to force one to become primary.
Geode is the open source version of Gemfire if you wonder.
After changing the sender-ids to the same for both members, we have another problem:
java.lang.IllegalStateException: Cannot create Gateway Sender
"some-gateway-sender-id" with manual start "false" because another
cache has the same Gateway Sender defined with manual start "true
It seems like our problem was the inconsistent format.
Member A used XML format
<?xml version="1.0"?>
<!DOCTYPE cache PUBLIC "-//GemStone Systems, Inc.//GemFire Declarative Caching 8.0//EN" "http://www.gemstone.com/dtd/cache8_0.dtd">
Member B used Spring Gemfire Data format
xmlns:gfe="http://www.springframework.org/schema/gemfire" xsi:schemaLocation="http://www.springframework.org/schema/gemfire http://www.springframework.org/schema/gemfire/spring-gemfire.xsd">
<gfe:gateway-sender ...>
We switched to using the Spring Gemfire Data format for both members, and it solved both issues.
TL;DR, the gateway-senders need to have the same ids if they are in the same cluster and use consistent cache xml formats.

NHIbernate SysCache2 and SQLDependency problems

I've set enable_broker on my SQL Server 2008 to use SQLDepndency
I've configured my .Net app to use Syscache2 with a cache region as follows:
<syscache2>
<cacheRegion name="BlogEntriesCacheRegion" priority="High">
<dependencies>
<commands>
<add name="BlogEntries"
command="Select EntryId from dbo.Blog_Entries where ENABLED=1"
/>
</commands>
</dependencies>
</cacheRegion>
</syscache2>
My Hbm file looks like this:
<?xml version="1.0" encoding="utf-8"?>
<hibernate-mapping xmlns="urn:nhibernate-mapping-2.2">
<class name="BlogEntry" table="Blog_Entries">
<cache usage="nonstrict-read-write" region="BlogEntriesCacheRegion"/>
....
</class>
</hibernate-mapping>
I also have query caching enabled for queries against BlogEntry
When I first query, the results are cached in the 2nd level cache, as expected.
If I now go and change a row in blog_entries, everything works as expected, the cache is expired, it get's this message:
2010-03-03 12:56:50,583 [7] DEBUG NHibernate.Caches.SysCache2.SysCacheRegion - Cache items for region 'BlogEntriesCacheRegion' have been removed from the cache for the following reason : DependencyChanged
I expect that. On the next page request, the query and it's results are stored back in the cache. However, the cache is immediately invalidated again, even though nothing has further changed.
DEBUG NHibernate.Caches.SysCache2.SysCacheRegion - Cache items for region 'BlogEntriesCacheRegion' have been removed from the cache for the following reason : DependencyChanged
My cache is constantly invalidated every subsequent time with no changes to the underlying data. Only a restart of the application allows the cache to operate again - but only the first time the data is cached (again, the first dirtying of the cache, causes it to never work again)
Has anyone seen this problem or got any ideas what this could be? I was thinking that syscache2 needs to handle the SQLDependency onChange event, which it probably is doing - so I don't understand why SQL Server keeps sending SQLDependency depedencyChanged.
thanks
We are getting the same problem on one database instance, but not on the other. It definitely seems to be some kind of permission problem on the database end, because the exact same NHibernate configuration is used in both cases.
In the working case the cache behaves as expected, in the other (which is a database engine which has much stricter permissions) we get the exact same behaviour you mentioned.

Maximum number of messages sent to a Queue in OpenMQ?

I am currently using Glassfish v2.1 and I have set up a queue to send and receive messages from with Sesion beans and MDBs respectively. However, I have noticed that I can send only a maximum of 1000 messages to the queue. Is there any reason why I cannot send more than 1000 messages to the queue? I do have a "developer" profile setup for the glassfish domain. Could that be the reason? Or is there some resource configuration setting that I need to modify?
I have setup the sun-resources.xml configuration properties as follows:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE resources PUBLIC "-//Sun Microsystems, Inc.//DTD Application Server 9.0 Resource Definitions //EN" "http://www.sun.com/software/appserver/dtds/sun-resources_1_3.dtd">
<resources>
<admin-object-resource
enabled="true"
jndi-name="jms/UpdateQueue"
object-type="user"
res-adapter="jmsra"
res-type="javax.jms.Queue">
<description/>
<property name="Name" value="UpdatePhysicalQueue"/>
</admin-object-resource>
<connector-resource
enabled="true" jndi-name="jms/UpdateQueueFactory"
object-type="user"
pool-name="jms/UpdateQueueFactoryPool">
<description/>
</connector-resource>
<connector-connection-pool
associate-with-thread="false"
connection-creation-retry-attempts="0"
connection-creation-retry-interval-in-seconds="10"
connection-definition-name="javax.jms.QueueConnectionFactory"
connection-leak-reclaim="false"
connection-leak-timeout-in-seconds="0"
fail-all-connections="false"
idle-timeout-in-seconds="300"
is-connection-validation-required="false"
lazy-connection-association="false"
lazy-connection-enlistment="false"
match-connections="true"
max-connection-usage-count="0"
max-pool-size="32"
max-wait-time-in-millis="60000"
name="jms/UpdateFactoryPool"
pool-resize-quantity="2"
resource-adapter-name="jmsra"
steady-pool-size="8"
validate-atmost-once-period-in-seconds="0"/>
</resources>
Hmm .. further investigation revealed the following in the imq logs:
[17/Nov/2009:10:27:57 CST] ERROR sendMessage: Sending message failed. Connection ID: 427038234214377984:
com.sun.messaging.jmq.jmsserver.util.BrokerException: transaction failed: [B4303]: The maximum number of messages [1,000] that the producer can process in a single transaction (TID=427038234364096768) has been exceeded. Please either limit the # of messages per transaction or increase the imq.transaction.producer.maxNumMsgs property.
So what would I do if I needed to send more than 5000 messages at a time?
What I am trying to do is to read all the records in a table and update a particular field of each record based on the corresponding value of that record in a legacy table to which I have only read only access. This table has more than 10k records in it. As of now, I am sequentially going through each record in a for loop, getting the corresponding record from the legacy table, comparing the field values, updating the record if necessary and adding corresponding new records in other tables.
However, I was hoping to improve performance by processing all the records asynchronously. To do that I was thinking of sending each record info as a separate message and hence requiring so many messages.
To configure OpenMQ and set artitrary broker properties, have a look at this blog post.
But actually, I wouldn't advice to increase the imq.transaction.producer.maxNumMsgs property, at least not above the value recommended in the documentation:
The maximum number of messages that a producer can process in a single transaction. It is recommended that the value be less than 5000 to prevent the exhausting of resources.
If you need to send more messages, consider doing it in several transactions.