Can not start/stop cache within lock or transaction with DataStorageConfiguration - ignite

I have one server node and one client node. In DataStorageConfiguration ,persistent is enabled.
I restarted my server node and trying to perform the operations on cache. I am getting below exceptions. This exception is if I use DataStorageConfiguration.
Caused by: class org.apache.ignite.IgniteException: Cannot start/stop cache within lock or transaction. [cacheName=, operation=dynamicStartCache]
at org.apache.ignite.internal.processors.cache.GridCacheProcessor.checkEmptyTransactionsEx(
at org.apache.ignite.internal.processors.cache.GridCacheProcessor.dynamicStartCache(
at org.apache.ignite.internal.processors.cache.GridCacheProcessor.dynamicStartCache(
at org.apache.ignite.internal.processors.cache.GridCacheProcessor.publicJCache(
at org.apache.ignite.internal.processors.cache.GridCacheProcessor.publicJCache(
at org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.checkProxyIsValid(
at org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.onEnter(
at org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.put(
Can you please help?

Ignite Cache Reconnection Issue (Cache is stopped)
I have refereed above link and liten the reconnect event and then call ignite.getOrCreateCache(spaCacheName);

I guess you should not use e.g. getOrCreateCache within a started transaction. Create all of your caches before starting a transaction. I think it's pretty self-explanatory.


Apache Ignite - Partition Map exchange causes deadlock when used with write through enabled cache

We have Ignite running in server mode in our JVM. Ignite is going into deadlock in following scenario. I have added the thread stack at the end of this question
a.Create a cache with write through enabled
b.In CacheWriter.write() implementation
1.Wait for a second to for step c to be invoked
2.Try to read from another cache
c. While step b is executing Trigger a thread which will create a new
d.On executing above scenario, Ignite is going into deadlock as
1.Readlock has been acquired by cache.put() operation
2.When cache creation is triggered in separate thread, Partition Map Exchange is also started
3.PME tries to acquire all 16 locks , but wait as one Read lock is already acquire
4.While reading from cache, cache.get() can not complete as it waits for current Partition Map Exchange to complete
We have face this issue in production and above scenario is just its reproducer. Write Through implementation is just trying to read from cache and cache creation is happening in totally different thread
Why Ignite is blocking all cache.get() operation for PME when it does not even have all required locks? Shouldn’t the call be blocked only after PME operation has all the locks?
why PME stops everything? If I create cache A then only related operation for cache A or its cache group should be stopped
Also is there any solution to solve this deadlock?
Thread executing cache.put() and write through
"main" #1 prio=5 os_prio=0 tid=0x0000000003505000 nid=0x43f4 waiting on condition [0x000000000334b000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(
at org.apache.ignite.internal.util.future.GridFutureAdapter.get0(
at org.apache.ignite.internal.util.future.GridFutureAdapter.get(
at org.apache.ignite.internal.processors.cache.GridCacheAdapter.get(
at org.apache.ignite.internal.processors.cache.GridCacheAdapter.repairableGet(
at org.apache.ignite.internal.processors.cache.GridCacheAdapter.get(
at org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.get(
at org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.get(
at ReadWriteThroughInterceptor.write(
at org.apache.ignite.internal.processors.cache.GridCacheLoaderWriterStore.write(
at org.apache.ignite.internal.processors.cache.GridCacheMapEntry$AtomicCacheUpdateClosure.update(
at org.apache.ignite.internal.processors.cache.GridCacheMapEntry$
at org.apache.ignite.internal.processors.cache.GridCacheMapEntry$
at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Invoke.invokeClosure(
at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Invoke.access$5700(
at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invokeDown(
at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invoke(
at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke0(
at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke(
at org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.invoke(
at org.apache.ignite.internal.processors.cache.GridCacheMapEntry.innerUpdate(
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateSingle(
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.update(
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal0(
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAllAsyncInternal(
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicAbstractUpdateFuture.sendSingleRequest(
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridNearAtomicSingleUpdateFuture.mapOnTopology(
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.update0(
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.put0(
at org.apache.ignite.internal.processors.cache.GridCacheAdapter.put(
at org.apache.ignite.internal.processors.cache.GridCacheAdapter.put(
at org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.put(
at org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.put(
at com.eqtechnologic.eqube.cache.tests.readerwriter.WriteReadThroughTest.writeToCache(
at com.eqtechnologic.eqube.cache.tests.readerwriter.WriteReadThroughTest.lambda$runTest$0(
at com.eqtechnologic.eqube.cache.tests.readerwriter.WriteReadThroughTest$$Lambda$1095/2028767654.execute(Unknown Source)
at org.junit.jupiter.api.AssertDoesNotThrow.assertDoesNotThrow(
at org.junit.jupiter.api.AssertDoesNotThrow.assertDoesNotThrow(
at org.junit.jupiter.api.Assertions.assertDoesNotThrow(
at WriteReadThroughTest.runTest(
PME thread waiting for locks
"exchange-worker-#39" #56 prio=5 os_prio=0 tid=0x0000000022b91800 nid=0x450 waiting on condition [0x000000002866e000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x000000076e73b428> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireInterruptibly(
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(
at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lockInterruptibly(
at org.apache.ignite.internal.util.StripedCompositeReadWriteLock$WriteLock.lock0(
at org.apache.ignite.internal.util.StripedCompositeReadWriteLock$WriteLock.lockInterruptibly(
at org.apache.ignite.internal.util.IgniteUtils.writeLock(
at org.apache.ignite.internal.processors.cache.distributed.dht.topology.GridDhtPartitionTopologyImpl.updateTopologyVersion(
at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.updateTopologies(
at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(
at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(
at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(
Technically, you have answered your question on your own, that is great work, to be honest.
You are not supposed to have blocking methods in your write-through cache store implementation that might get in conflict with PME or cause pool starvation.
You have to remember that PME is a show-stopper mechanism: the entire user load is stopped. In short, that is required to ensure ACID guarantees. The lock indeed is divided into multiple parts to speed up the processing, i.e. allowing up to 16 threads to perform cache operations concurrently. But a PME does need exclusive control over the cluster, thus it acquires a write lock over all the threads.
Shouldn’t the call be blocked only after PME operation has all the
Yes, that's indeed how it's supposed to work. But in your case, PME tries to get the write lock, whereas the read lock is there, therefore it's waiting for its completion, and all further read locks are being queued after the write lock.
Also is there any solution to solve this deadlock?
move cache-related logic out of the CacheStore. Ideally, do not start caches dynamically, since that triggers PME. Have them created in advance if possible
check if other mechanisms like continuous-queries or entry processo would work.
But still, it all depends on your use case.
I don't think creating a cache inside the cache store will work. From the documentation for CacheWriter:
A CacheWriter is used for write-through to an external resource.
(Emphasis mine.)
Without knowing your use case, it's difficult to suggest an alternative approach, but creating your caches in advance or using a continuous query as a trigger works in similar situations.

Apache Ignite Fault Tolerance

I have few questions about Ignite Cache in Partitioned mode
1)When a node goes down in a Ignite cluster, If the failed node is primary for a key, does the backup of this become new primary?.
2)What happens to the backup copies in the failed node? will they be recreated in the cluster?.
3)If I set CacheRebalanceMode in cache configuration will it be applicable for node failure as well or only in case of node addition?
Yes, this is right. Former backup will become a new primary and new backup will receive the copy in background.
Yes, if backup is lost, new node will assigned for this role. It will receive the copy in background.
In synchronous rebalance mode a node will not complete start process and user will not be able to use the API until the data is rebalanced. This doesn't affect the rebalancing process in case of failures.

Aerospike cluster rebalancing causing errors

When adding a new node to an Aerospike cluster, a rebalance happens for the new node. For large data sets this takes time and some requests to the new node fail until rebalance is complete. The only solution I could figure out is retry the request until it gets the data.
Is there a better way?
I don't think it is possible to keep the node out of cluster for requests until it's done replicating because it is also master for one of the partitions.
If you are performing batch-reads, there is an improvement in 3.6.0. While the cluster is in-flux, if the client directs the read transaction to Node_A, but the partition containing the record has been moved to Node_B, Node_A proxies the request to Node_B.
Is that what you are doing?
You should not be in a position where the client cannot connect to the cluster, or it cannot complete a transaction.
I know that SO frowns on this, but can you provide more detail about the failures? What kinds of transactions are you performing? What versions are you using?
I hope this helps,
Requests shouldn't be failing, the new node will proxy to the node that currently has the data.
Prior to Aerospike 3.6.0 batch read requests were the exception. I suspect this is your problem.

nservicebus, azure and transaction scope

I have a strange thing going on that i suspect is related to transaction scope within nservicebus, but thought i should ask before i go off in the wrong direction.
Here is my issue: I have an endpoint hosted in an azure worker role, talking with azure queues for the transport. I am doing very simple database writes in a few of my handlers using entity framework 5 against sql server 2012. Everything is just great and all works until i scale out the worker role to more than one instance.
When i do that i start getting sporadic deadlock errors:
Error 1205 : Transaction (Process ID) was deadlocked on resources with another process and has been chosen as the deadlock victim. Rerun the transaction.
Should i be looking at changing the default isolation level in my endpoint, or am i looking in the wrong direction here.
I have ran explain plains on all statements so i am sure there are no full scans, and the database is not under load; the only variable that seems to cause this is scaling out horizontally with more instances in azure.
Changing the default isolation mode to read committed resolved the issue.

NHibernate ISessionFactory.OpenSession() does not open a database connection

I have NHibernate configured with Fluent NNibernate connecting to a PostgreSQL database.
I have a worker class that takes an ISessionFactory as a constructor parameter and consumes messages from a queue. For each message the worker process calls ISessionFactory.OpenSession() and it does some database processing.
When I add more worker processes the performance of the system remains the same which is odd.
After some more investigation I realized that all worker processes are using a single database connection. For example I would add 8 worker processes but on the database I can see only one database connection.
My understanding is that ISessionFactory.OpenSession() will open a new database connection unless the Connection Pool is full.
So is my understanding wrong or is this and issue with the Postgres NHibernate driver.
OpenSession does not open a database connection until needed, and it closes it (i.e. releases it back into the pool) as soon as possible.
By default the session will keep the connection open for the life time of a transaction and as Diego said, it only opens it when needed.
If you want to manage your own connections you can call