Ignite: ClassNotFoundException for a class of another service - ignite

I created a spring service (Service A) which uses an Ignite Cache. The Ignite Cache preloads data from a database using the CacheStoreAdapter. The cache configuration is:
final CacheConfiguration<String, StandardItem> cacheConfiguration = new CacheConfiguration<>(Identifiers.IGNITE_ITEM);
cacheConfiguration.setName(Identifiers.IGNITE_ITEM);
cacheConfiguration.setIndexedTypes(String.class, IgniteItem.class);
cacheConfiguration.setAtomicityMode(CacheAtomicityMode.TRANSACTIONAL);
cacheConfiguration.setCacheStoreFactory(FactoryBuilder.factoryOf(StandardItemCacheStore.class.getName()));
cacheConfiguration.setReadThrough(true);
cacheConfiguration.setWriteThrough(true);
cacheConfiguration.setCopyOnRead(false);
cacheConfiguration.setCacheMode(CacheMode.LOCAL);
cacheConfiguration.setOnheapCacheEnabled(true);
return cacheConfiguration;
The service itself is working great. But I have another service (Service B) which uses Ignite on the same network. The Configuration of the other service is the following:
final CacheConfiguration<String, String> cacheConfiguration = new CacheConfiguration<>(Constants.IGNITE_RESULT_CACHE);
cacheConfiguration.setName(Constants.IGNITE_RESULT_CACHE);
cacheConfiguration.setIndexedTypes(String.class, String.class);
cacheConfiguration.setAtomicityMode(CacheAtomicityMode.ATOMIC);
cacheConfiguration.setCopyOnRead(false);
cacheConfiguration.setCacheMode(CacheMode.LOCAL);
cacheConfiguration.setOnheapCacheEnabled(true);
cacheConfiguration.setEvictionPolicy(new LruEvictionPolicy(1_000_000));
final IgniteConfiguration igniteConfiguration = new IgniteConfiguration();
igniteConfiguration.setIgniteInstanceName("ServiceGrid");
igniteConfiguration.setCacheConfiguration(cacheConfiguration);
igniteConfiguration.setIncludeEventTypes();
return igniteConfiguration;
But for this service now I get exceptions:
2017-12-01 10:25:39.512 ERROR 39 --- [34%ServiceGrid%] .c.d.d.p.GridDhtPartitionsExchangeFuture : Failed to reinitialize local partitions (preloading will be stopped): GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=6, minorTopVer=1], nodeId=79de964e, evt=DISCOVERY_CUSTOM_EVT]
java.lang.RuntimeException: Failed to create an instance of io.mio.scap.dao.cache.IgniteItemCacheStore
at javax.cache.configuration.FactoryBuilder$ClassFactory.create(FactoryBuilder.java:134) ~[cache-api-1.0.0.jar!/:na]
at org.apache.ignite.internal.processors.cache.GridCacheProcessor.createCache(GridCacheProcessor.java:1392) ~[ignite-core-2.2.0.jar!/:2.2.0]
at org.apache.ignite.internal.processors.cache.GridCacheProcessor.prepareCacheStart(GridCacheProcessor.java:1867) ~[ignite-core-2.2.0.jar!/:2.2.0]
at org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager.onCacheChangeRequest(CacheAffinitySharedManager.java:748) ~[ignite-core-2.2.0.jar!/:2.2.0]
at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onCacheChangeRequest(GridDhtPartitionsExchangeFuture.java:838) ~[ignite-core-2.2.0.jar!/:2.2.0]
at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:579) ~[ignite-core-2.2.0.jar!/:2.2.0]
at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:1901) [ignite-core-2.2.0.jar!/:2.2.0]
at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) [ignite-core-2.2.0.jar!/:2.2.0]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_151]
Caused by: java.lang.ClassNotFoundException: io.mio.scap.dao.cache.IgniteItemCacheStore
at java.net.URLClassLoader.findClass(URLClassLoader.java:381) ~[na:1.8.0_151]
at java.lang.ClassLoader.loadClass(ClassLoader.java:424) ~[na:1.8.0_151]
at org.springframework.boot.loader.LaunchedURLClassLoader.loadClass(LaunchedURLClassLoader.java:93) ~[app.jar:na]
at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ~[na:1.8.0_151]
at javax.cache.configuration.FactoryBuilder$ClassFactory.create(FactoryBuilder.java:130) ~[cache-api-1.0.0.jar!/:na]
... 8 common frames omitted
2017-12-01 10:25:39.513 INFO 39 --- [34%ServiceGrid%] .c.d.d.p.GridDhtPartitionsExchangeFuture : Snapshot initialization completed [topVer=AffinityTopologyVersion [topVer=6, minorTopVer=1], time=0ms]
2017-12-01 10:25:39.516 ERROR 39 --- [34%ServiceGrid%] .i.p.c.GridCachePartitionExchangeManager : Failed to wait for completion of partition map exchange (preloading will not start): GridDhtPartitionsExchangeFuture [dummy=false, forcePreload=false, reassign=false, discoEvt=DiscoveryCustomEvent [customMsg=null, affTopVer=AffinityTopologyVersion [topVer=6, minorTopVer=1], super=DiscoveryEvent [evtNode=TcpDiscoveryNode [id=79de964e-08e9-4a09-b915-4a221d10c642, addrs=[127.0.0.1, 172.19.0.10], sockAddrs=[1f3ec767d8a2/172.19.0.10:47500, /127.0.0.1:47500], discPort=47500, order=6, intOrder=6, lastExchangeTime=1512123936779, loc=false, ver=2.2.0#20170915-sha1:5747ce6b, isClient=false], topVer=6, nodeId8=68a843cf, msg=null, type=DISCOVERY_CUSTOM_EVT, tstamp=1512123939493]], crd=TcpDiscoveryNode [id=6e5c36ce-5042-4f9b-b507-3f9ed4eb1384, addrs=[127.0.0.1, 172.19.0.11], sockAddrs=[5b8f17420165/172.19.0.11:47500, /127.0.0.1:47500], discPort=47500, order=1, intOrder=1, lastExchangeTime=1512123917048, loc=false, ver=2.2.0#20170915-sha1:5747ce6b, isClient=false], exchId=GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=6, minorTopVer=1], nodeId=79de964e, evt=DISCOVERY_CUSTOM_EVT], added=true, initFut=GridFutureAdapter [ignoreInterrupts=false, state=DONE, res=false, hash=2088336463], init=false, lastVer=null, partReleaseFut=null, exchActions=null, affChangeMsg=null, skipPreload=false, clientOnlyExchange=false, initTs=1512123939493, centralizedAff=false, changeGlobalStateE=null, forcedRebFut=null, done=true, evtLatch=0, remaining=[a9181efd-950b-4d23-9697-a301c81094aa, 2fd78465-8a0f-4318-bded-716348753cef, 79de964e-08e9-4a09-b915-4a221d10c642, 3fa0ac5d-c464-46a7-9dac-c56873b64dd0, 6e5c36ce-5042-4f9b-b507-3f9ed4eb1384], super=GridFutureAdapter [ignoreInterrupts=false, state=DONE, res=java.lang.RuntimeException: Failed to create an instance of io.mio.scap.dao.cache.IgniteItemCacheStore, hash=1326392751]]
org.apache.ignite.IgniteCheckedException: Failed to create an instance of io.mio.scap.dao.cache.IgniteItemCacheStore
at org.apache.ignite.internal.util.IgniteUtils.cast(IgniteUtils.java:7229) ~[ignite-core-2.2.0.jar!/:2.2.0]
at org.apache.ignite.internal.util.future.GridFutureAdapter.resolve(GridFutureAdapter.java:258) ~[ignite-core-2.2.0.jar!/:2.2.0]
at org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:206) ~[ignite-core-2.2.0.jar!/:2.2.0]
at org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:158) ~[ignite-core-2.2.0.jar!/:2.2.0]
at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:1911) ~[ignite-core-2.2.0.jar!/:2.2.0]
at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) [ignite-core-2.2.0.jar!/:2.2.0]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_151]
Caused by: java.lang.RuntimeException: Failed to create an instance of io.mio.scap.dao.cache.IgniteItemCacheStore
at javax.cache.configuration.FactoryBuilder$ClassFactory.create(FactoryBuilder.java:134) ~[cache-api-1.0.0.jar!/:na]
at org.apache.ignite.internal.processors.cache.GridCacheProcessor.createCache(GridCacheProcessor.java:1392) ~[ignite-core-2.2.0.jar!/:2.2.0]
at org.apache.ignite.internal.processors.cache.GridCacheProcessor.prepareCacheStart(GridCacheProcessor.java:1867) ~[ignite-core-2.2.0.jar!/:2.2.0]
at org.apache.ignite.internal.processors.cache.CacheAffinitySharedManager.onCacheChangeRequest(CacheAffinitySharedManager.java:748) ~[ignite-core-2.2.0.jar!/:2.2.0]
at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.onCacheChangeRequest(GridDhtPartitionsExchangeFuture.java:838) ~[ignite-core-2.2.0.jar!/:2.2.0]
at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:579) ~[ignite-core-2.2.0.jar!/:2.2.0]
at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:1901) ~[ignite-core-2.2.0.jar!/:2.2.0]
... 2 common frames omitted
Caused by: java.lang.ClassNotFoundException: io.mio.scap.dao.cache.IgniteItemCacheStore
at java.net.URLClassLoader.findClass(URLClassLoader.java:381) ~[na:1.8.0_151]
at java.lang.ClassLoader.loadClass(ClassLoader.java:424) ~[na:1.8.0_151]
at org.springframework.boot.loader.LaunchedURLClassLoader.loadClass(LaunchedURLClassLoader.java:93) ~[app.jar:na]
at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ~[na:1.8.0_151]
at javax.cache.configuration.FactoryBuilder$ClassFactory.create(FactoryBuilder.java:130) ~[cache-api-1.0.0.jar!/:na]
... 8 common frames omitted
2017-12-01 10:25:40.890 INFO 39 --- [34%ServiceGrid%] o.apache.ignite.internal.exchange.time : Started exchange init [topVer=AffinityTopologyVersion [topVer=6, minorTopVer=2], crd=false, evt=18, node=TcpDiscoveryNode [id=68a843cf-2970-4f2a-930b-c9a36b3942ce, addrs=[127.0.0.1, 172.19.0.9], sockAddrs=[/127.0.0.1:47500, 2a1936cdc613/172.19.0.9:47500], discPort=47500, order=4, intOrder=4, lastExchangeTime=1512123940480, loc=true, ver=2.2.0#20170915-sha1:5747ce6b, isClient=false], evtNode=TcpDiscoveryNode [id=68a843cf-2970-4f2a-930b-c9a36b3942ce, addrs=[127.0.0.1, 172.19.0.9], sockAddrs=[/127.0.0.1:47500, 2a1936cdc613/172.19.0.9:47500], discPort=47500, order=4, intOrder=4, lastExchangeTime=1512123940480, loc=true, ver=2.2.0#20170915-sha1:5747ce6b, isClient=false], customEvt=CacheAffinityChangeMessage [id=2b0ab911061-e15fbf79-7ad2-420e-b870-5a61d94b442c, topVer=AffinityTopologyVersion [topVer=6, minorTopVer=0], exchId=null, partsMsg=null, exchangeNeeded=true]]
2017-12-01 10:25:40.896 INFO 39 --- [34%ServiceGrid%] .c.d.d.p.GridDhtPartitionsExchangeFuture : Finished waiting for partition release future [topVer=AffinityTopologyVersion [topVer=6, minorTopVer=2], waitTime=0ms]
2017-12-01 10:25:40.898 INFO 39 --- [34%ServiceGrid%] o.apache.ignite.internal.exchange.time : Finished exchange init [topVer=AffinityTopologyVersion [topVer=6, minorTopVer=2], crd=false]
2017-12-01 10:25:40.970 INFO 39 --- [39%ServiceGrid%] .c.d.d.p.GridDhtPartitionsExchangeFuture : Snapshot initialization completed [topVer=AffinityTopologyVersion [topVer=6, minorTopVer=2], time=0ms]
2017-12-01 10:25:40.976 INFO 39 --- [34%ServiceGrid%] .i.p.c.GridCachePartitionExchangeManager : Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion [topVer=6, minorTopVer=2], evt=DISCOVERY_CUSTOM_EVT, node=6e5c36ce-5042-4f9b-b507-3f9ed4eb1384]
My question now is: Why does Service B need to know/have the IgniteItemCacheStore which is not even used by that service, its only used by Service A.

All Ignite nodes need to have all caches' configurations, and cache store is a part of cache configuration, therefore all Ignite nodes need all classes used by cache store.
There are ways of sidestep this, such as factory that returns lazily-initialized proxy of CacheStore, your mileage may vary.

Related

Ignite recurring stacktrace: Failed to process selector key

Ignite version: 2.14.0
Node configuration: 2 Nodes running on same PC (IPV4) using localhost and 255 available ports:
TcpDiscoveryMulticastIpFinder ipFinder = new TcpDiscoveryMulticastIpFinder();
ipFinder.setAddresses(Collections.singletonList("127.0.0.1"));
Also 2 different working dirs, Threadpool 16, 2 caches (one atomic, one transactional)
What happens: Using ExecutorService i submit 8 threads to pool. Class run correctly (4 on each node) and execute tasks as expected.
But during execution raise, repeatedly and with some frequency, the following exception on both nodes: GRAVE: "Failed to process selector key".
The application generates a high computational load. A simple "for loop" with a sleep gives no error
Full stack follows:
GRAVE: Failed to process selector key [ses=GridSelectorNioSessionImpl [worker=DirectNioClientWorker [super=AbstractNioClientWorker [idx=3, bytesRcvd=97567668, bytesSent=100128669, bytesRcvd0=0, bytesSent0=0, select=true, super=GridWorker [name=grid-nio-worker-tcp-comm-3, igniteInstanceName=TcpCommunicationSpi, finished=false, heartbeatTs=1675265761563, hashCode=2143442267, interrupted=false, runner=grid-nio-worker-tcp-comm-3-#26%TcpCommunicationSpi%]]], writeBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], inRecovery=GridNioRecoveryDescriptor [acked=1690656, resendCnt=0, rcvCnt=1696452, sentCnt=1691375, reserved=true, lastAck=1696448, nodeLeft=false, node=TcpDiscoveryNode [id=cd1ffdf0-b9b3-49ef-a9e3-db1676fad428, consistentId=0:0:0:0:0:0:0:1,127.0.0.1,192.168.178.30,192.168.56.1:47500, addrs=ArrayList [0:0:0:0:0:0:0:1, 127.0.0.1, 192.168.178.30, 192.168.56.1], sockAddrs=HashSet [host.docker.internal/192.168.178.30:47500, /0:0:0:0:0:0:0:1:47500, WOPR/192.168.56.1:47500, /127.0.0.1:47500], discPort=47500, order=1, intOrder=1, lastExchangeTime=1675265584899, loc=false, ver=2.14.0#20220929-sha1:951e8deb, isClient=false], connected=true, connectCnt=69, queueLimit=4096, reserveCnt=101, pairedConnections=false], outRecovery=GridNioRecoveryDescriptor [acked=1690656, resendCnt=0, rcvCnt=1696452, sentCnt=1691375, reserved=true, lastAck=1696448, nodeLeft=false, node=TcpDiscoveryNode [id=cd1ffdf0-b9b3-49ef-a9e3-db1676fad428, consistentId=0:0:0:0:0:0:0:1,127.0.0.1,192.168.178.30,192.168.56.1:47500, addrs=ArrayList [0:0:0:0:0:0:0:1, 127.0.0.1, 192.168.178.30, 192.168.56.1], sockAddrs=HashSet [host.docker.internal/192.168.178.30:47500, /0:0:0:0:0:0:0:1:47500, WOPR/192.168.56.1:47500, /127.0.0.1:47500], discPort=47500, order=1, intOrder=1, lastExchangeTime=1675265584899, loc=false, ver=2.14.0#20220929-sha1:951e8deb, isClient=false], connected=true, connectCnt=69, queueLimit=4096, reserveCnt=101, pairedConnections=false], closeSocket=true, outboundMessagesQueueSizeMetric=o.a.i.i.processors.metric.impl.LongAdderMetric#69a257d1, super=GridNioSessionImpl [locAddr=/0:0:0:0:0:0:0:1:47101, rmtAddr=/0:0:0:0:0:0:0:1:56361, createTime=1675265760336, closeTime=0, bytesSent=8479762, bytesRcvd=7459908, bytesSent0=0, bytesRcvd0=0, sndSchedTime=1675265760336, lastSndTime=1675265761545, lastRcvTime=1675265761563, readsPaused=false, filterChain=FilterChain[filters=[GridNioCodecFilter [parser=o.a.i.i.util.nio.GridDirectParser#b329ba4, directMode=true], GridConnectionBytesVerifyFilter], accepted=true, markedForClose=true]]]
java.io.IOException: Connessione in corso interrotta forzatamente dall'host remoto
at java.base/sun.nio.ch.SocketDispatcher.write0(Native Method)
at java.base/sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:51)
at java.base/sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:113)
at java.base/sun.nio.ch.IOUtil.write(IOUtil.java:58)
at java.base/sun.nio.ch.IOUtil.write(IOUtil.java:50)
at java.base/sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:466)
at org.apache.ignite.internal.util.nio.GridNioServer$DirectNioClientWorker.processWrite0(GridNioServer.java:1715)
at org.apache.ignite.internal.util.nio.GridNioServer$DirectNioClientWorker.processWrite(GridNioServer.java:1407)
at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.processSelectedKeysOptimized(GridNioServer.java:2511)
at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:2273)
at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.body(GridNioServer.java:1910)
at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:125)
at java.base/java.lang.Thread.run(Thread.java:834)
Expected: I read that it could be a configuration problem but I don't understand how to fix it.
The configuration seems simple but and even if the execution is without calculation errors i would like to avoid this exception.
NODE1
[2023-02-02 15:54:30] [AVVERTENZA] Client disconnected abruptly due to network connection loss or because the connection was left open on application shutdown. [cls=class o.a.i.i.util.nio.GridNioException, msg=Connessione in corso interrotta forzatamente dall'host remoto] - [org.apache.ignite.logger.java.JavaLogger warning:]
[2023-02-02 15:54:30] [AVVERTENZA] Unacknowledged messages queue size overflow, will attempt to reconnect [remoteAddr=/127.0.0.1:63660, queueLimit=4096] - [org.apache.ignite.logger.java.JavaLogger warning:]
[2023-02-02 15:54:30] [INFORMAZIONI] Accepted incoming communication connection [locAddr=/127.0.0.1:47101, rmtAddr=/127.0.0.1:63670] - [org.apache.ignite.logger.java.JavaLogger info:]
[2023-02-02 15:54:30] [INFORMAZIONI] Accepted incoming communication connection [locAddr=/127.0.0.1:47101, rmtAddr=/127.0.0.1:63671] - [org.apache.ignite.logger.java.JavaLogger info:]
[2023-02-02 15:54:30] [INFORMAZIONI] Received incoming connection when already connected to this node, rejecting [locNode=af74d5c9-3631-4fdf-b9f2-0babc853019f, rmtNode=8a378874-f3ae-4d0c-9733-a6b143097658] - [org.apache.ignite.logger.java.JavaLogger info:]
[2023-02-02 15:54:30] [INFORMAZIONI] Accepted incoming communication connection [locAddr=/127.0.0.1:47101, rmtAddr=/127.0.0.1:63672] - [org.apache.ignite.logger.java.JavaLogger info:]
[2023-02-02 15:54:30] [INFORMAZIONI] Received incoming connection when already connected to this node, rejecting [locNode=af74d5c9-3631-4fdf-b9f2-0babc853019f, rmtNode=8a378874-f3ae-4d0c-9733-a6b143097658] - [org.apache.ignite.logger.java.JavaLogger info:]
[2023-02-02 15:54:31] [INFORMAZIONI] Accepted incoming communication connection [locAddr=/127.0.0.1:47101, rmtAddr=/127.0.0.1:63673] - [org.apache.ignite.logger.java.JavaLogger info:]
[2023-02-02 15:54:31] [INFORMAZIONI] Received incoming connection when already connected to this node, rejecting [locNode=af74d5c9-3631-4fdf-b9f2-0babc853019f, rmtNode=8a378874-f3ae-4d0c-9733-a6b143097658] - [org.apache.ignite.logger.java.JavaLogger info:]
[2023-02-02 15:54:31] [GRAVE ] Failed to process selector key [ses=GridSelectorNioSessionImpl [worker=DirectNioClientWorker [super=AbstractNioClientWorker [idx=0, bytesRcvd=2269317, bytesSent=3928093, bytesRcvd0=1909138, bytesSent0=720914, select=true, super=GridWorker [name=grid-nio-worker-tcp-comm-0, igniteInstanceName=TcpCommunicationSpi, finished=false, heartbeatTs=1675349670621, hashCode=722948156, interrupted=false, runner=grid-nio-worker-tcp-comm-0-#23%TcpCommunicationSpi%]]], writeBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], inRecovery=GridNioRecoveryDescriptor [acked=17152, resendCnt=870, rcvCnt=31061, sentCnt=18796, reserved=true, lastAck=31040, nodeLeft=false, node=TcpDiscoveryNode [id=8a378874-f3ae-4d0c-9733-a6b143097658, consistentId=0:0:0:0:0:0:0:1,127.0.0.1,192.168.178.30,192.168.56.1:47500, addrs=ArrayList [0:0:0:0:0:0:0:1, 127.0.0.1, 192.168.178.30, 192.168.56.1], sockAddrs=HashSet [host.docker.internal/192.168.178.30:47500, /0:0:0:0:0:0:0:1:47500, WOPR/192.168.56.1:47500, /127.0.0.1:47500], discPort=47500, order=1, intOrder=1, lastExchangeTime=1675349650217, loc=false, ver=2.14.0#20220929-sha1:951e8deb, isClient=false], connected=true, connectCnt=7, queueLimit=4096, reserveCnt=9, pairedConnections=false], outRecovery=GridNioRecoveryDescriptor [acked=17152, resendCnt=870, rcvCnt=31061, sentCnt=18796, reserved=true, lastAck=31040, nodeLeft=false, node=TcpDiscoveryNode [id=8a378874-f3ae-4d0c-9733-a6b143097658, consistentId=0:0:0:0:0:0:0:1,127.0.0.1,192.168.178.30,192.168.56.1:47500, addrs=ArrayList [0:0:0:0:0:0:0:1, 127.0.0.1, 192.168.178.30, 192.168.56.1], sockAddrs=HashSet [host.docker.internal/192.168.178.30:47500, /0:0:0:0:0:0:0:1:47500, WOPR/192.168.56.1:47500, /127.0.0.1:47500], discPort=47500, order=1, intOrder=1, lastExchangeTime=1675349650217, loc=false, ver=2.14.0#20220929-sha1:951e8deb, isClient=false], connected=true, connectCnt=7, queueLimit=4096, reserveCnt=9, pairedConnections=false], closeSocket=true, outboundMessagesQueueSizeMetric=o.a.i.i.processors.metric.impl.LongAdderMetric#69a257d1, super=GridNioSessionImpl [locAddr=/127.0.0.1:47101, rmtAddr=/127.0.0.1:63670, createTime=1675349670241, closeTime=0, bytesSent=720914, bytesRcvd=1909138, bytesSent0=720914, bytesRcvd0=1909138, sndSchedTime=1675349670241, lastSndTime=1675349670277, lastRcvTime=1675349670621, readsPaused=false, filterChain=FilterChain[filters=[GridNioCodecFilter [parser=o.a.i.i.util.nio.GridDirectParser#16179752, directMode=true], GridConnectionBytesVerifyFilter], accepted=true, markedForClose=true]]] - [org.apache.ignite.logger.java.JavaLogger error:
java.io.IOException: Connessione in corso interrotta forzatamente dall'host remoto
NODE2
[2023-02-02 15:54:30] [INFORMAZIONI] Accepted incoming communication connection [locAddr=/127.0.0.1:47100, rmtAddr=/127.0.0.1:63669] - [org.apache.ignite.logger.java.JavaLogger info:]
[2023-02-02 15:54:30] [INFORMAZIONI] Received incoming connection from remote node while connecting to this node, rejecting [locNode=8a378874-f3ae-4d0c-9733-a6b143097658, locNodeOrder=1, rmtNode=af74d5c9-3631-4fdf-b9f2-0babc853019f, rmtNodeOrder=2] - [org.apache.ignite.logger.java.JavaLogger info:]
[2023-02-02 15:54:30] [INFORMAZIONI] Established outgoing communication connection [locAddr=/127.0.0.1:63670, rmtAddr=/127.0.0.1:47101] - [org.apache.ignite.logger.java.JavaLogger info:]
[2023-02-02 15:54:31] [INFORMAZIONI] Established outgoing communication connection [locAddr=/127.0.0.1:63676, rmtAddr=/127.0.0.1:47101] - [org.apache.ignite.logger.java.JavaLogger info:]
[2023-02-02 15:54:31] [INFORMAZIONI] TCP client created [client=GridTcpNioCommunicationClient [ses=GridSelectorNioSessionImpl [worker=DirectNioClientWorker [super=AbstractNioClientWorker [idx=1, bytesRcvd=84, bytesSent=56, bytesRcvd0=0, bytesSent0=0, select=true, super=GridWorker [name=grid-nio-worker-tcp-comm-1, igniteInstanceName=TcpCommunicationSpi, finished=false, heartbeatTs=1675349671637, hashCode=762674116, interrupted=false, runner=grid-nio-worker-tcp-comm-1-#24%TcpCommunicationSpi%]]], writeBuf=java.nio.DirectByteBuffer[pos=9391 lim=32768 cap=32768], readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], inRecovery=GridNioRecoveryDescriptor [acked=31061, resendCnt=753, rcvCnt=17160, sentCnt=31871, reserved=true, lastAck=17152, nodeLeft=false, node=TcpDiscoveryNode [id=af74d5c9-3631-4fdf-b9f2-0babc853019f, consistentId=0:0:0:0:0:0:0:1,127.0.0.1,192.168.178.30,192.168.56.1:47501, addrs=ArrayList [0:0:0:0:0:0:0:1, 127.0.0.1, 192.168.178.30, 192.168.56.1], sockAddrs=HashSet [host.docker.internal/192.168.178.30:47501, /0:0:0:0:0:0:0:1:47501, WOPR/192.168.56.1:47501, /127.0.0.1:47501], discPort=47501, order=2, intOrder=2, lastExchangeTime=1675349650060, loc=false, ver=2.14.0#20220929-sha1:951e8deb, isClient=false], connected=false, connectCnt=8, queueLimit=4096, reserveCnt=9, pairedConnections=false], outRecovery=GridNioRecoveryDescriptor [acked=31061, resendCnt=531, rcvCnt=17160, sentCnt=31871, reserved=true, lastAck=17152, nodeLeft=false, node=TcpDiscoveryNode [id=af74d5c9-3631-4fdf-b9f2-0babc853019f, consistentId=0:0:0:0:0:0:0:1,127.0.0.1,192.168.178.30,192.168.56.1:47501, addrs=ArrayList [0:0:0:0:0:0:0:1, 127.0.0.1, 192.168.178.30, 192.168.56.1], sockAddrs=HashSet [host.docker.internal/192.168.178.30:47501, /0:0:0:0:0:0:0:1:47501, WOPR/192.168.56.1:47501, /127.0.0.1:47501], discPort=47501, order=2, intOrder=2, lastExchangeTime=1675349650060, loc=false, ver=2.14.0#20220929-sha1:951e8deb, isClient=false], connected=false, connectCnt=8, queueLimit=4096, reserveCnt=9, pairedConnections=false], closeSocket=true, outboundMessagesQueueSizeMetric=org.apache.ignite.internal.processors.metric.impl.LongAdderMetric#69a257d1, super=GridNioSessionImpl [locAddr=/127.0.0.1:63676, rmtAddr=/127.0.0.1:47101, createTime=1675349671637, closeTime=0, bytesSent=0, bytesRcvd=0, bytesSent0=0, bytesRcvd0=0, sndSchedTime=1675349671637, lastSndTime=1675349671637, lastRcvTime=1675349671637, readsPaused=false, filterChain=FilterChain[filters=[GridNioCodecFilter [parser=org.apache.ignite.internal.util.nio.GridDirectParser#544beb47, directMode=true], GridConnectionBytesVerifyFilter], accepted=false, markedForClose=false]], super=GridAbstractCommunicationClient [lastUsed=1675349671637, closed=false, connIdx=0]], duration=339ms] - [org.apache.ignite.logger.java.JavaLogger info:]

Ignite client unstable since upgrade from 2.7.0 to 2.9.0

We upgraded a bunch of libraries in our application one of them is ignite. Right now the ignite running in client mode is crashing. My thinking is that one of the upgrades caused the cache to have increased in size. (so I don't think the upgrade of ignite is the problem).
So I increased the heap size from 10 to 20 GB. But when about 50% is used the JVM hangs.
I'm confused on why it does this when there is only 50% in use.
[12/3/20 16:07:58:788 GMT] 000000c4 IgniteKernal I .... Heap [used=9937MB, free=51.48%, comm=10680MB]
followed by
[12/3/20 16:08:26:410 GMT] 000000bd IgniteKernal W Possible too long JVM pause: 2418 milliseconds.
[12/3/20 16:08:27:465 GMT] 000000c5 TcpCommunicat W Client disconnected abruptly due to network connection loss or because the connection was left open on application shutdown. [cls=class o.a.i.i.util.nio.GridNioException, msg=Connection reset by peer]
[12/3/20 16:08:27:411 GMT] 000000c5 TcpCommunicat E Failed to process selector key [ses=GridSelectorNioSessionImpl [worker=DirectNioClientWorker [super=AbstractNioClientWorker [idx=0, bytesRcvd=48849402273, bytesSent=15994664546, bytesRcvd0=54446, bytesSent0=102, select=true, super=GridWorker [name=grid-nio-worker-tcp-comm-0, igniteInstanceName=null, finished=false, heartbeatTs=1607011706410, hashCode=433635054, interrupted=false, runner=grid-nio-worker-tcp-comm-0-#51]]], writeBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], inRecovery=GridNioRecoveryDescriptor [acked=9025120, resendCnt=0, rcvCnt=9025150, sentCnt=9025152, reserved=true, lastAck=9025120, nodeLeft=false, node=TcpDiscoveryNode [id=b3ca311e-077f-42a5-884a-807b539730b6, consistentId=10.60.46.12:48500, addrs=ArrayList [10.60.46.12], sockAddrs=HashSet [hex-wgc-p-web02/10.60.46.12:48500], discPort=48500, order=1, intOrder=1, lastExchangeTime=1607006097079, loc=false, ver=2.9.0#20201015-sha1:70742da8, isClient=false], connected=false, connectCnt=1, queueLimit=4096, reserveCnt=1, pairedConnections=false], outRecovery=GridNioRecoveryDescriptor [acked=9025120, resendCnt=0, rcvCnt=9025150, sentCnt=9025152, reserved=true, lastAck=9025120, nodeLeft=false, node=TcpDiscoveryNode [id=b3ca311e-077f-42a5-884a-807b539730b6, consistentId=10.60.46.12:48500, addrs=ArrayList [10.60.46.12], sockAddrs=HashSet [hex-wgc-p-web02/10.60.46.12:48500], discPort=48500, order=1, intOrder=1, lastExchangeTime=1607006097079, loc=false, ver=2.9.0#20201015-sha1:70742da8, isClient=false], connected=false, connectCnt=1, queueLimit=4096, reserveCnt=1, pairedConnections=false], closeSocket=true, outboundMessagesQueueSizeMetric=o.a.i.i.processors.metric.impl.LongAdderMetric#69a257d1, super=GridNioSessionImpl [locAddr=/10.223.132.3:52550, rmtAddr=/10.60.46.12:48100, createTime=1607006097572, closeTime=0, bytesSent=15994657850, bytesRcvd=48849402273, bytesSent0=102, bytesRcvd0=54446, sndSchedTime=1607006097572, lastSndTime=1607011706410, lastRcvTime=1607011706410, readsPaused=false, filterChain=FilterChain[filters=[GridNioCodecFilter [parser=o.a.i.i.util.nio.GridDirectParser#93200255, directMode=true], GridConnectionBytesVerifyFilter], accepted=false, markedForClose=false]]]
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:51)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:235)
at sun.nio.ch.IOUtil.read(IOUtil.java:204)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:394)
at org.apache.ignite.internal.util.nio.GridNioServer$DirectNioClientWorker.processRead(GridNioServer.java:1330)
at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.processSelectedKeysOptimized(GridNioServer.java:2472)
at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:2239)
at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.body(GridNioServer.java:1880)
at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
at java.lang.Thread.run(Thread.java:822)
[12/3/20 16:08:44:437 GMT] 000000c4 SystemOut O [16:08:44] Possible failure suppressed accordingly to a configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker [name=tcp-comm-worker, igniteInstanceName=null, finished=false, heartbeatTs=1607011706420]]]
[12/3/20 16:08:44:436 GMT] 000000c4 W java.util.logging.LogManager$RootLogger log Possible failure suppressed accordingly to a configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker [name=tcp-comm-worker, igniteInstanceName=null, finished=false, heartbeatTs=1607011706420]]]
class org.apache.ignite.IgniteException: GridWorker [name=tcp-comm-worker, igniteInstanceName=null, finished=false, heartbeatTs=1607011706420]
at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$3.apply(IgnitionEx.java:1806)
at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$3.apply(IgnitionEx.java:1801)
at org.apache.ignite.internal.worker.WorkersRegistry.onIdle(WorkersRegistry.java:234)
at org.apache.ignite.internal.util.worker.GridWorker.onIdle(GridWorker.java:297)
at org.apache.ignite.internal.processors.timeout.GridTimeoutProcessor$TimeoutWorker.body(GridTimeoutProcessor.java:221)
at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
at java.lang.Thread.run(Thread.java:822)
[12/3/20 16:08:44:434 GMT] 000000c4 G W Thread [name="tcp-comm-worker-#1-#63", id=211, state=WAITING, blockCnt=2, waitCnt=100]
[12/3/20 16:08:44:432 GMT] 000000c4 G E Blocked system-critical thread has been detected. This can lead to cluster-wide undefined behaviour [workerName=tcp-comm-worker, threadName=tcp-comm-worker-#1-#63, blockedFor=18s]
[12/3/20 16:09:14:486 GMT] 000000c4 SystemOut O [16:09:14] Possible failure suppressed accordingly to a configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker [name=tcp-comm-worker, igniteInstanceName=null, finished=false, heartbeatTs=1607011736000]]]
These look like network issues.
[12/3/20 16:08:27:411 GMT] 000000c5 TcpCommunicat E Failed to process selector key
[workerName=tcp-comm-worker, threadName=tcp-comm-worker-#1-#63, blockedFor=18s]
Check that you are able to connect from your client machines to your server machines and that firewall configs are properly set up.
see: https://ignite.apache.org/docs/latest/clustering/network-configuration
make sure you've set: Djava.net.preferIPv4Stack=true if you are using IP v4 addresses.
If there are containers and/or private addresses involved, it might cause connection issues.
See: https://ignite.apache.org/docs/latest/clustering/running-client-nodes-behind-nat#limitations

apache ignite client hangs on startup

I'm experiencing issues with my ignite cluster where the clients are consistently hanging during startup. This cluster is running in k8s and is 3 nodes
I've created one simple cache / near cache, and since them am making changes to it to gauge performance implications. This is the client startup code:
Ignition.setClientMode(true);
IgniteConfiguration igniteConfiguration = new IgniteConfiguration();
igniteConfiguration.setIncludeEventTypes(EventType.EVTS_ALL);
igniteConfiguration.setPeerClassLoadingEnabled(true);
TcpDiscoverySpi tcpDiscoverySpi = new TcpDiscoverySpi();
igniteConfiguration.setDiscoverySpi(tcpDiscoverySpi);
TcpDiscoveryIpFinder podResolver = getKubePodResolver();
tcpDiscoverySpi.setIpFinder(podResolver);
tcpDiscoverySpi.setJoinTimeout(30000);
tcpDiscoverySpi.setAckTimeout(30000);
tcpDiscoverySpi.setSocketTimeout(30000);
tcpDiscoverySpi.setNetworkTimeout(30000);
tcpDiscoverySpi.failureDetectionTimeoutEnabled(true);
try (Ignite ignite = Ignition.start(igniteConfiguration)) {
ignite.destroyCache("myCache");
NearCacheConfiguration<Integer, Integer> nearCfg = new NearCacheConfiguration<>();
nearCfg.setNearEvictionPolicy(new LruEvictionPolicy<>(5000));
nearCfg.setNearStartSize(5000);
CacheConfiguration<Integer, Integer> cacheConfiguration = new CacheConfiguration<Integer, Integer>("myCache");
cacheConfiguration.setOnheapCacheEnabled(false);
cacheConfiguration.setStatisticsEnabled(true);
cacheConfiguration.setWriteBehindEnabled(true);
cacheConfiguration.setCacheMode(CacheMode.PARTITIONED);
// toggling btwn partitioned and replicated
// cacheConfiguration.setCacheMode(CacheMode.REPLICATED);
cacheConfiguration.setQueryParallelism(3);
IgniteCache<Integer, Integer> cache = ignite.getOrCreateCache(cacheConfiguration, nearCfg);
After creating the cache I run gets and puts to fill it up to 10k entries. When i restart the client it hangs - I can reproduce this by simply restarting the client.
When running a thread dump on the client I see the main thread hanging on a future and the associated thread is
"main" #1 prio=5 os_prio=0 tid=0x00007fc03800b800 nid=0x6 waiting on condition [0x00007fc04139d000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:338)
at org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:217)
at org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:159)
at org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:151)
at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager.onKernalStart(GridCachePartitionExchangeManager.java:595)
at org.apache.ignite.internal.processors.cache.GridCacheProcessor.onKernalStart(GridCacheProcessor.java:769)
at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1060)
at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:1909)
at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1652)
- locked <0x0000000086b27728> (a org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance)
at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1080)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:600)
at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:525)
at org.apache.ignite.Ignition.start(Ignition.java:322)
...
"exchange-worker-#35" #60 prio=5 os_prio=0 tid=0x00007fc039093000 nid=0x42 waiting on condition [0x00007fbfe3bfc000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:338)
at org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:217)
at org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:159)
at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:2289)
at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
at java.lang.Thread.run(Thread.java:748)
"disco-event-worker-#34" #57 prio=5 os_prio=0 tid=0x00007fc0388ad000 nid=0x3f waiting on condition [0x00007fbff013b000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x0000000086556ea0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryWorker.body0(GridDiscoveryManager.java:2552)
at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryWorker.body(GridDiscoveryManager.java:2534)
at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
at java.lang.Thread.run(Thread.java:748)
When I turn ignite debugging on here is a sample of the output from the client and server cluster:
client:
[02:48:29,653][WARNING][main][GridCachePartitionExchangeManager] Still waiting for initial partition map exchange [fut=GridDhtPartitionsExchangeFuture [firstDiscoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode [id=c26a231d-027a-49c0-8d64-7d5c92be0c7a, addrs=[0:0:0:0:0:0:0:1%lo, 127.0.0.1, 172.16.102.6], sockAddrs=[/0:0:0:0:0:0:0:1%lo:0, /127.0.0.1:0, near-test-5d9699b96f-lzsbx/172.16.102.6:0], discPort=0, order=35, intOrder=0, lastExchangeTime=1514342845847, loc=true, ver=2.3.0#20171028-sha1:8add7fd5, isClient=true], topVer=35, nodeId8=c26a231d, msg=null, type=NODE_JOINED, tstamp=1514342848946], crd=TcpDiscoveryNode [id=cc754ef0-a004-40c9-985f-f43b2df66e39, addrs=[0:0:0:0:0:0:0:1%lo, 127.0.0.1, 172.16.83.5], sockAddrs=[/172.16.83.5:47500, /0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500], discPort=47500, order=1, intOrder=1, lastExchangeTime=1514342846946, loc=false, ver=2.3.0#20171028-sha1:8add7fd5, isClient=false], exchId=GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=35, minorTopVer=0], discoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode [id=c26a231d-027a-49c0-8d64-7d5c92be0c7a, addrs=[0:0:0:0:0:0:0:1%lo, 127.0.0.1, 172.16.102.6], sockAddrs=[/0:0:0:0:0:0:0:1%lo:0, /127.0.0.1:0, near-test-5d9699b96f-lzsbx/172.16.102.6:0], discPort=0, order=35, intOrder=0, lastExchangeTime=1514342845847, loc=true, ver=2.3.0#20171028-sha1:8add7fd5, isClient=true], topVer=35, nodeId8=c26a231d, msg=null, type=NODE_JOINED, tstamp=1514342848946], nodeId=c26a231d, evt=NODE_JOINED], added=true, initFut=GridFutureAdapter [ignoreInterrupts=false, state=DONE, res=true, hash=2111815064], init=true, lastVer=null, partReleaseFut=null, exchActions=null, affChangeMsg=null, initTs=1514342849646, centralizedAff=false, changeGlobalStateE=null, done=false, state=CLIENT, evtLatch=0, remaining=[b1581d62-f72f-4a56-93b6-babd364cc695, cc754ef0-a004-40c9-985f-f43b2df66e39, 8f840e6f-c40d-46fd-8476-06793c25d329], super=GridFutureAdapter [ignoreInterrupts=false, state=INIT, res=null, hash=353891789]]]
[02:48:35,555][WARNING][exchange-worker-#35][diagnostic] Failed to wait for partition map exchange [topVer=AffinityTopologyVersion [topVer=35, minorTopVer=0], node=c26a231d-027a-49c0-8d64-7d5c92be0c7a]. Dumping pending objects that might be the cause:
[02:48:45,555][WARNING][exchange-worker-#35][diagnostic] Failed to wait for partition map exchange [topVer=AffinityTopologyVersion [topVer=35, minorTopVer=0], node=c26a231d-027a-49c0-8d64-7d5c92be0c7a]. Dumping pending objects that might be the cause:
one of the server nodes:
2017-12-27 02:46:52,298 ignite-8df95c79b-bbtvx ignite: [priority='INFO' thread='grid-timeout-worker-#23' class='org.apache.ignite.internal.IgniteKernal#463']
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
^-- Node [id=8f840e6f, uptime=04:43:04.048]
^-- H/N/C [hosts=5, nodes=5, CPUs=10]
^-- CPU [cur=0.5%, avg=3.3%, GC=0%]
^-- PageMemory [pages=1024]
^-- Heap [used=400MB, free=80.36%, comm=2041MB]
^-- Non heap [used=72MB, free=-1%, comm=74MB]
^-- Public thread pool [active=0, idle=0, qSize=0]
^-- System thread pool [active=0, idle=6, qSize=0]
^-- Outbound messages queue [size=0]
2017-12-27 02:46:52,298 ignite-8df95c79b-bbtvx ignite: [priority='INFO' thread='grid-timeout-worker-#23' class='org.apache.ignite.internal.IgniteKernal#463'] FreeList [name=null, buckets=256, dataPages=9, reusePages=638]
2017-12-27 02:46:52,298 ignite-8df95c79b-bbtvx ignite: [priority='DEBUG' thread='grid-timeout-worker-#23' class='org.apache.ignite.internal.processors.timeout.GridTimeoutProcessor#452'] Timeout has occurred [obj=CancelableTask [id=9b979d49061-90a56489-d5c5-4100-bfd4-dd2732dca5a1, endTime=1514342812287, period=60000, cancel=false, task=org.apache.ignite.internal.IgniteKernal$4#6cb224d], process=true]
2017-12-27 02:46:52,656 ignite-8df95c79b-bbtvx ignite: [priority='DEBUG' thread='grid-timeout-worker-#23' class='org.apache.ignite.internal.processors.timeout.GridTimeoutProcessor#452'] Timeout has occurred [obj=CancelableTask [id=a6979d49061-90a56489-d5c5-4100-bfd4-dd2732dca5a1, endTime=1514342812652, period=3000, cancel=false, task=org.apache.ignite.internal.processors.query.GridQueryProcessor$2#3f625e1a], process=true]
2017-12-27 02:46:53,887 ignite-8df95c79b-bbtvx ignite: [priority='DEBUG' thread='grid-timeout-worker-#23' class='org.apache.ignite.internal.processors.timeout.GridTimeoutProcessor#452'] Timeout has occurred [obj=CancelableTask [id=c6979d49061-90a56489-d5c5-4100-bfd4-dd2732dca5a1, endTime=1514342813885, period=3000, cancel=false, task=MetricsUpdater [prevGcTime=2117, prevCpuTime=578225, super=org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$MetricsUpdater#24c52bbf]], process=true]
2017-12-27 02:46:54,481 ignite-8df95c79b-bbtvx ignite: [priority='DEBUG' thread='grid-timeout-worker-#23' class='org.apache.ignite.internal.processors.timeout.GridTimeoutProcessor#452'] Timeout has occurred [obj=CancelableTask [id=69979d49061-90a56489-d5c5-4100-bfd4-dd2732dca5a1, endTime=1514342814475, period=5000, cancel=false, task=org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager$BackupCleaner#2032f1ff], process=true]
2017-12-27 02:46:54,481 ignite-8df95c79b-bbtvx ignite: [priority='DEBUG' thread='grid-timeout-worker-#23' class='org.apache.ignite.internal.processors.timeout.GridTimeoutProcessor#452'] Timeout has occurred [obj=CancelableTask [id=dba79d49061-90a56489-d5c5-4100-bfd4-dd2732dca5a1, endTime=1514342814475, period=5000, cancel=false, task=org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager$BackupCleaner#7c995f6b], process=true]
2017-12-27 02:46:54,776 ignite-8df95c79b-bbtvx ignite: [priority='DEBUG' thread='grid-timeout-worker-#23' class='org.apache.ignite.internal.processors.timeout.GridTimeoutProcessor#452'] Timeout has occurred [obj=CancelableTask [id=ea829359061-90a56489-d5c5-4100-bfd4-dd2732dca5a1, endTime=1514342814774, period=5000, cancel=false, task=org.apache.ignite.internal.processors.cache.query.continuous.CacheContinuousQueryManager$BackupCleaner#76ae058], process=true]
2017-12-27 02:46:54,845 ignite-8df95c79b-bbtvx ignite: [priority='DEBUG' thread='grid-timeout-worker-#23' class='org.apache.ignite.internal.processors.timeout.GridTimeoutProcessor#452'] Timeout has occurred [obj=CancelableTask [id=7b979d49061-90a56489-d5c5-4100-bfd4-dd2732dca5a1, endTime=1514342814839, period=30000, cancel=false, task=org.apache.ignite.internal.IgniteKernal$2#1905ce7], process=true]
2017-12-27 02:46:54,908 ignite-8df95c79b-bbtvx ignite: [priority='DEBUG' thread='grid-timeout-worker-#23' class='org.apache.ignite.internal.processors.timeout.GridTimeoutProcessor#452'] Timeout has occurred [obj=GridCommunicationMessageSet [nodeId=cc754ef0-a004-40c9-985f-f43b2df66e39, endTime=1514342814898, timeoutId=5b979d49061-90a56489-d5c5-4100-bfd4-dd2732dca5a1, topic=T6 [topic=TOPIC_CACHE, id1=83e8ca36-2305-3266-8e65-1463be879baa, id2=0], plc=5, msgs=[], reserved=false, timeout=10000, skipOnTimeout=false, lastTs=1514325827646], process=true]
2017-12-27 02:46:55,664 ignite-8df95c79b-bbtvx ignite: [priority='DEBUG' thread='grid-timeout-worker-#23' class='org.apache.ignite.internal.processors.timeout.GridTimeoutProcessor#452'] Timeout has occurred [obj=CancelableTask [id=a6979d49061-90a56489-d5c5-4100-bfd4-dd2732dca5a1, endTime=1514342815654, period=3000, cancel=false, task=org.apache.ignite.internal.processors.query.GridQueryProcessor$2#3f625e1a], process=true]
2017-12-27 02:46:55,823 ignite-8df95c79b-bbtvx ignite: [priority='DEBUG' thread='nio-acceptor-#29' class='org.apache.ignite.internal.processors.odbc.ClientListenerProcessor#452'] Balancing data [min0=0, minIdx=0, max0=-1, maxIdx=-1]
2017-12-27 02:46:56,871 ignite-8df95c79b-bbtvx ignite: [priority='DEBUG' thread='nio-acceptor-#33' class='org.apache.ignite.internal.processors.rest.protocols.tcp.GridTcpRestProtocol#452'] Balancing data [min0=0, minIdx=0, max0=-1, maxIdx=-1]
2017-12-27 02:46:56,897 ignite-8df95c79b-bbtvx ignite: [priority='DEBUG' thread='grid-timeout-worker-#23' class='org.apache.ignite.internal.processors.timeout.GridTimeoutProcessor#452'] Timeout has occurred [obj=CancelableTask [id=c6979d49061-90a56489-d5c5-4100-bfd4-dd2732dca5a1, endTime=1514342816887, period=3000, cancel=false, task=MetricsUpdater [prevGcTime=2117, prevCpuTime=578240, super=org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$MetricsUpdater#24c52bbf]], process=true]
2017-12-27 02:46:56,951 ignite-8df95c79b-bbtvx ignite: [priority='DEBUG' thread='nio-acceptor-#24' class='org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi#452'] Balancing data [min0=0, minIdx=0, max0=-1, maxIdx=-1]
2017-12-27 02:46:56,984 ignite-8df95c79b-bbtvx ignite: [priority='DEBUG' thread='grid-timeout-worker-#23' class='org.apache.ignite.internal.processors.timeout.GridTimeoutProcessor#452'] Timeout has occurred [obj=org.apache.ignite.internal.processors.cache.GridCacheProcessor$RemovedItemsCleanupTask#67741cd0, process=true]
2017-12-27 02:46:56,984 ignite-8df95c79b-bbtvx ignite: [priority='DEBUG' thread='grid-timeout-worker-#23' class='org.apache.ignite.internal.managers.deployment.GridDeploymentLocalStore#452'] Deployment meta for local deployment: GridDeploymentMetadata [depMode=SHARED, alias=org.apache.ignite.internal.processors.cache.GridCacheProcessor$RemovedItemsCleanupTask$1, clsName=org.apache.ignite.internal.processors.cache.GridCacheProcessor$RemovedItemsCleanupTask$1, userVer=null, sndNodeId=8f840e6f-c40d-46fd-8476-06793c25d329, clsLdrId=null, clsLdr=null, participants=null, parentLdr=null, record=true, nodeFilter=null, seqNum=n/a]
2017-12-27 02:46:56,985 ignite-8df95c79b-bbtvx ignite: [priority='DEBUG' thread='grid-timeout-worker-#23' class='org.apache.ignite.internal.managers.deployment.GridDeploymentLocalStore#452'] Acquired deployment class from local cache: GridDeployment [ts=1514325826446, depMode=SHARED, clsLdr=sun.misc.Launcher$AppClassLoader#764c12b6, clsLdrId=8a979d49061-8f840e6f-c40d-46fd-8476-06793c25d329, userVer=0, loc=true, sampleClsName=org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionFullMap, pendingUndeploy=false, undeployed=false, usage=0]
Any idea of what is going on here?

Apache Ignite Near Cache Always Missed

When using a near cache everything will work fine until a second client (could be visor) attempts to connect or disconnect to the cluster while a cache operation is in process.
After the second client connects/disconnects, the original client will always miss the near cache until and the original client restarts. Almost as if the cluster informs the client their are issues and to keep the cluster as the source of truth.
We have been able to reproduce this by running our test and connecting/disconnect with visor. During a disconnect we can see a Timeout mentioned in the logs on the original client IgniteTxManager$NodeFailureTimeoutObject.
Below is a snippet of the logs with the org.apache.ignite.internal.processors suppressed.
[2017-10-09 14:26:52.148] boot - 9955 DEBUG [http-nio-8081-exec-8] --- CacheHelper: Total time accessing cache ng-security-service-ORG_SPEC_CACHE for key * | value com.cache.model.PrefixCluster#78475a88: 0 millis
[2017-10-09 14:26:52.150] boot - 9955 DEBUG [disco-event-worker-#26%null%] --- GridDiscoveryManager: Daemon node left topology: TcpDiscoveryNode [id=4cc6c321-d9cc-4149-a6ef-cba68877a269, addrs=[10.70.255.8, 127.0.0.1, 172.17.0.1], sockAddrs=[/172.17
.0.1:0, /127.0.0.1:0, /10.70.255.8:0], discPort=0, order=57, intOrder=31, lastExchangeTime=1507577126368, loc=false, ver=2.1.0#20170720-sha1:a6ca5c8a, isClient=true]
[2017-10-09 14:26:52.150] boot - 9955 DEBUG [http-nio-8081-exec-8] --- OrgSpecCacheImpl: OrgSpec Cache Stats: OrgSpec ObjId: IgniteCacheProxy [delegate=GridNearCacheAdapter [], opCtx=null, restartFut=null] HitCount: 120, MissCount: 50, AvgReadTime:
120, Eviction Count: 0
[2017-10-09 14:26:52.150] boot - 9955 DEBUG [disco-event-worker-#26%null%] --- GridDeploymentPerVersionStore: Processing node departure event: DiscoveryEvent [evtNode=TcpDiscoveryNode [id=4cc6c321-d9cc-4149-a6ef-cba68877a269, addrs=[10.70.255.8, 127.0.0.1, 172.17.0.1], sockAddrs=[/172.17.0.1:0, /127.0.0.1:0, /10.70.255.8:0], discPort=0, order=57, intOrder=31, lastExchangeTime=1507577126368, loc=false, ver=2.1.0#20170720-sha1:a6ca5c8a, isClient=true], topVer=58, nodeId8=2e573c60, msg=Node left: TcpDiscoveryNode [id=4cc6c321-d9cc-4149-a6ef-cba68877a269, addrs=[10.70.255.8, 127.0.0.1, 172.17.0.1], sockAddrs=[/172.17.0.1:0, /127.0.0.1:0, /10.70.255.8:0], discPort=0, order=57, intOrder=31, lastExchangeTime=1507577126368, loc=false, ver=2.1.0#20170720-sha1:a6ca5c8a, isClient=true], type=NODE_LEFT, tstamp=1507577212142]
[2017-10-09 14:26:52.163] boot - 9955 INFO [exchange-worker-#27%null%] --- time: Started exchange init [topVer=AffinityTopologyVersion [topVer=58, minorTopVer=0], crd=false, evt=11, node=TcpDiscoveryNode [id=2e573c60-45f0-4429-a3fa-068489663148, addrs=[0:0:0:0:0:0:0:1%lo, 10.70.242.138, 127.0.0.1], sockAddrs=[port-svc-inc-13.tw-test.net/10.70.242.138:0, /0:0:0:0:0:0:0:1%lo:0, /127.0.0.1:0], discPort=0, order=56, intOrder=0, lastExchangeTime=1507576971754, loc=true, ver=2.1.0#20170720-sha1:a6ca5c8a, isClient=true], evtNode=TcpDiscoveryNode [id=2e573c60-45f0-4429-a3fa-068489663148, addrs=[0:0:0:0:0:0:0:1%lo, 10.70.242.138, 127.0.0.1], sockAddrs=[port-svc-inc-13.tw-test.net/10.70.242.138:0, /0:0:0:0:0:0:0:1%lo:0, /127.0.0.1:0], discPort=0, order=56, intOrder=0, lastExchangeTime=1507576971754, loc=true, ver=2.1.0#20170720-sha1:a6ca5c8a, isClient=true], customEvt=null]
[2017-10-09 14:26:52.164] boot - 9955 INFO [exchange-worker-#27%null%] --- GridDhtPartitionsExchangeFuture: Snapshot initialization completed [topVer=AffinityTopologyVersion [topVer=58, minorTopVer=0], time=0ms]
[2017-10-09 14:26:52.164] boot - 9955 INFO [exchange-worker-#27%null%] --- GridDhtPartitionsExchangeFuture: Snapshot initialization completed [topVer=AffinityTopologyVersion [topVer=58, minorTopVer=0], time=0ms]
[2017-10-09 14:26:52.164] boot - 9955 INFO [exchange-worker-#27%null%] --- time: Finished exchange init [topVer=AffinityTopologyVersion [topVer=58, minorTopVer=0], crd=false]
[2017-10-09 14:26:52.203] boot - 9955 DEBUG [disco-event-worker-#26%null%] --- GridDeploymentLocalStore: Deployment meta for local deployment: GridDeploymentMetadata [depMode=SHARED, alias=org.apache.ignite.internal.processors.task.GridTaskProcessor$TaskDiscoveryListener$1, clsName=org.apache.ignite.internal.processors.task.GridTaskProcessor$TaskDiscoveryListener$1, userVer=null, sndNodeId=2e573c60-45f0-4429-a3fa-068489663148, clsLdrId=null, clsLdr=null, participants=null, parentLdr=null, record=true, nodeFilter=null, seqNum=n/a]
[2017-10-09 14:26:52.203] boot - 9955 DEBUG [disco-event-worker-#26%null%] --- LocalDeploymentSpi: Registering [ldrRsrcs={org.springframework.boot.loader.LaunchedURLClassLoader#7adf9f5f={org.apache.ignite.internal.util.typedef.T2=org.apache.ignite.internal.util.typedef.T2, org.apache.ignite.internal.processors.cache.distributed.dht.preloader.IgniteDhtPartitionHistorySuppliersMap=org.apache.ignite.internal.processors.cache.distributed.dht.preloader.IgniteDhtPartitionHistorySuppliersMap, org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionFullMap=org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionFullMap, java.util.Collections$UnmodifiableList=java.util.Collections$UnmodifiableList, org.apache.ignite.internal.visor.cache.VisorCacheMetricsCollectorTask=org.apache.ignite.internal.visor.cache.VisorCacheMetricsCollectorTask, org.apache.ignite.internal.processors.cache.distributed.dht.preloader.IgniteDhtPartitionsToReloadMap=org.apache.ignite.internal.processors.cache.distributed.dht.preloader.IgniteDhtPartitionsToReloadMap, org.apache.ignite.internal.processors.service.GridServiceProcessor$1=org.apache.ignite.internal.processors.service.GridServiceProcessor$1, org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionMap=org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionMap, org.apache.ignite.internal.processors.cache.distributed.dht.preloader.IgniteDhtPartitionCountersMap=org.apache.ignite.internal.processors.cache.distributed.dht.preloader.IgniteDhtPartitionCountersMap}}, ldr=org.springframework.boot.loader.LaunchedURLClassLoader#7adf9f5f, rsrc=class org.apache.ignite.internal.processors.task.GridTaskProcessor$TaskDiscoveryListener$1]
[2017-10-09 14:26:52.203] boot - 9955 DEBUG [disco-event-worker-#26%null%] --- LocalDeploymentSpi: Resources to register: {org.apache.ignite.internal.processors.task.GridTaskProcessor$TaskDiscoveryListener$1=org.apache.ignite.internal.processors.task.GridTaskProcessor$TaskDiscoveryListener$1}
[2017-10-09 14:26:52.203] boot - 9955 DEBUG [disco-event-worker-#26%null%] --- LocalDeploymentSpi: New resources: {org.apache.ignite.internal.processors.task.GridTaskProcessor$TaskDiscoveryListener$1=org.apache.ignite.internal.processors.task.GridTaskProcessor$TaskDiscoveryListener$1}
[2017-10-09 14:26:52.203] boot - 9955 DEBUG [disco-event-worker-#26%null%] --- LocalDeploymentSpi: Removing resources [clsLdrToIgnore=org.springframework.boot.loader.LaunchedURLClassLoader#7adf9f5f, rsrcs={org.apache.ignite.internal.processors.task.GridTaskProcessor$TaskDiscoveryListener$1=org.apache.ignite.internal.processors.task.GridTaskProcessor$TaskDiscoveryListener$1}]
[2017-10-09 14:26:52.203] boot - 9955 DEBUG [disco-event-worker-#26%null%] --- GridDeploymentLocalStore: Retrieved auto-loaded resource from spi: DeploymentResourceAdapter [name=org.apache.ignite.internal.processors.task.GridTaskProcessor$TaskDiscoveryListener$1, rsrcCls=class org.apache.ignite.internal.processors.task.GridTaskProcessor$TaskDiscoveryListener$1, clsLdr=org.springframework.boot.loader.LaunchedURLClassLoader#7adf9f5f]
[2017-10-09 14:26:52.203] boot - 9955 DEBUG [disco-event-worker-#26%null%] --- GridDeploymentLocalStore: Acquired deployment class: GridDeployment [ts=1507576972855, depMode=SHARED, clsLdr=org.springframework.boot.loader.LaunchedURLClassLoader#7adf9f5f, clsLdrId=6d9e6920f51-2e573c60-45f0-4429-a3fa-068489663148, userVer=0, loc=true, sampleClsName=org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionFullMap, pendingUndeploy=false, undeployed=false, usage=0]
[2017-10-09 14:26:52.203] boot - 9955 DEBUG [disco-event-worker-#26%null%] --- GridResourceProcessor: Injecting resources [target=org.apache.ignite.internal.processors.task.GridTaskProcessor$TaskDiscoveryListener$1#61ea2cff]
[2017-10-09 14:26:52.211] boot - 9955 DEBUG [disco-event-worker-#26%null%] --- GridDeploymentLocalStore: Deployment meta for local deployment: GridDeploymentMetadata [depMode=SHARED, alias=org.apache.ignite.internal.processors.datastructures.DataStructuresProcessor$1$1, clsName=org.apache.ignite.internal.processors.datastructures.DataStructuresProcessor$1$1, userVer=null, sndNodeId=2e573c60-45f0-4429-a3fa-068489663148, clsLdrId=null, clsLdr=null, participants=null, parentLdr=null, record=true, nodeFilter=null, seqNum=n/a]
[2017-10-09 14:26:52.211] boot - 9955 DEBUG [disco-event-worker-#26%null%] --- LocalDeploymentSpi: Registering [ldrRsrcs={org.springframework.boot.loader.LaunchedURLClassLoader#7adf9f5f={org.apache.ignite.internal.util.typedef.T2=org.apache.ignite.internal.util.typedef.T2, org.apache.ignite.internal.processors.cache.distributed.dht.preloader.IgniteDhtPartitionHistorySuppliersMap=org.apache.ignite.internal.processors.cache.distributed.dht.preloader.IgniteDhtPartitionHistorySuppliersMap, org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionFullMap=org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionFullMap, java.util.Collections$UnmodifiableList=java.util.Collections$UnmodifiableList, org.apache.ignite.internal.visor.cache.VisorCacheMetricsCollectorTask=org.apache.ignite.internal.visor.cache.VisorCacheMetricsCollectorTask, org.apache.ignite.internal.processors.cache.distributed.dht.preloader.IgniteDhtPartitionsToReloadMap=org.apache.ignite.internal.processors.cache.distributed.dht.preloader.IgniteDhtPartitionsToReloadMap, org.apache.ignite.internal.processors.service.GridServiceProcessor$1=org.apache.ignite.internal.processors.service.GridServiceProcessor$1, org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionMap=org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionMap, org.apache.ignite.internal.processors.cache.distributed.dht.preloader.IgniteDhtPartitionCountersMap=org.apache.ignite.internal.processors.cache.distributed.dht.preloader.IgniteDhtPartitionCountersMap, org.apache.ignite.internal.processors.task.GridTaskProcessor$TaskDiscoveryListener$1=org.apache.ignite.internal.processors.task.GridTaskProcessor$TaskDiscoveryListener$1}}, ldr=org.springframework.boot.loader.LaunchedURLClassLoader#7adf9f5f, rsrc=class org.apache.ignite.internal.processors.datastructures.DataStructuresProcessor$1$1]
[2017-10-09 14:26:52.211] boot - 9955 DEBUG [disco-event-worker-#26%null%] --- LocalDeploymentSpi: Resources to register: {org.apache.ignite.internal.processors.datastructures.DataStructuresProcessor$1$1=org.apache.ignite.internal.processors.datastructures.DataStructuresProcessor$1$1}
[2017-10-09 14:26:52.211] boot - 9955 DEBUG [disco-event-worker-#26%null%] --- LocalDeploymentSpi: New resources: {org.apache.ignite.internal.processors.datastructures.DataStructuresProcessor$1$1=org.apache.ignite.internal.processors.datastructures.DataStructuresProcessor$1$1}
[2017-10-09 14:26:52.211] boot - 9955 DEBUG [disco-event-worker-#26%null%] --- LocalDeploymentSpi: Removing resources [clsLdrToIgnore=org.springframework.boot.loader.LaunchedURLClassLoader#7adf9f5f, rsrcs={org.apache.ignite.internal.processors.datastructures.DataStructuresProcessor$1$1=org.apache.ignite.internal.processors.datastructures.DataStructuresProcessor$1$1}]
[2017-10-09 14:26:52.211] boot - 9955 DEBUG [disco-event-worker-#26%null%] --- GridDeploymentLocalStore: Retrieved auto-loaded resource from spi: DeploymentResourceAdapter [name=org.apache.ignite.internal.processors.datastructures.DataStructuresProcessor$1$1, rsrcCls=class org.apache.ignite.internal.processors.datastructures.DataStructuresProcessor$1$1, clsLdr=org.springframework.boot.loader.LaunchedURLClassLoader#7adf9f5f]
[2017-10-09 14:26:52.212] boot - 9955 DEBUG [disco-event-worker-#26%null%] --- GridDeploymentLocalStore: Acquired deployment class: GridDeployment [ts=1507576972855, depMode=SHARED, clsLdr=org.springframework.boot.loader.LaunchedURLClassLoader#7adf9f5f, clsLdrId=6d9e6920f51-2e573c60-45f0-4429-a3fa-068489663148, userVer=0, loc=true, sampleClsName=org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionFullMap, pendingUndeploy=false, undeployed=false, usage=0]
[2017-10-09 14:26:52.212] boot - 9955 DEBUG [disco-event-worker-#26%null%] --- GridResourceProcessor: Injecting resources [target=org.apache.ignite.internal.processors.datastructures.DataStructuresProcessor$1$1#1ff41d49]
[2017-10-09 14:26:52.213] boot - 9955 DEBUG [pub-#34%null%] --- GridClosureProcessor: Grid runnable started: closure-proc-worker
[2017-10-09 14:26:52.213] boot - 9955 DEBUG [pub-#34%null%] --- GridClosureProcessor: Grid runnable finished normally: closure-proc-worker
[2017-10-09 14:26:52.216] boot - 9955 DEBUG [disco-event-worker-#26%null%] --- TcpCommunicationSpi: Forcing NIO client close since node has left [nodeId=4cc6c321-d9cc-4149-a6ef-cba68877a269, client=GridTcpNioCommunicationClient [ses=GridSelectorNioSessionImpl [worker=DirectNioClientWorker [super=AbstractNioClientWorker [idx=3, bytesRcvd=714, bytesSent=6799, bytesRcvd0=0, bytesSent0=0, select=true, super=GridWorker [name=grid-nio-worker-tcp-comm-3, igniteInstanceName=null, finished=false, hashCode=1557230104, interrupted=false, runner=grid-nio-worker-tcp-comm-3-#20%null%]]], writeBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], inRecovery=GridNioRecoveryDescriptor [acked=0, resendCnt=0, rcvCnt=1, sentCnt=1, reserved=true, lastAck=0, nodeLeft=false, node=TcpDiscoveryNode [id=4cc6c321-d9cc-4149-a6ef-cba68877a269, addrs=[10.70.255.8, 127.0.0.1, 172.17.0.1], sockAddrs=[/172.17.0.1:0, /127.0.0.1:0, /10.70.255.8:0], discPort=0, order=57, intOrder=31, lastExchangeTime=1507577126368, loc=false, ver=2.1.0#20170720-sha1:a6ca5c8a, isClient=true], connected=true, connectCnt=0, queueLimit=4096, reserveCnt=1, pairedConnections=false], outRecovery=GridNioRecoveryDescriptor [acked=0, resendCnt=0, rcvCnt=1, sentCnt=1, reserved=true, lastAck=0, nodeLeft=false, node=TcpDiscoveryNode [id=4cc6c321-d9cc-4149-a6ef-cba68877a269, addrs=[10.70.255.8, 127.0.0.1, 172.17.0.1], sockAddrs=[/172.17.0.1:0, /127.0.0.1:0, /10.70.255.8:0], discPort=0, order=57, intOrder=31, lastExchangeTime=1507577126368, loc=false, ver=2.1.0#20170720-sha1:a6ca5c8a, isClient=true], connected=true, connectCnt=0, queueLimit=4096, reserveCnt=1, pairedConnections=false], super=GridNioSessionImpl [locAddr=/10.70.242.138:47100, rmtAddr=/10.70.255.8:53916, createTime=1507577162587, closeTime=0, bytesSent=6799, bytesRcvd=714, bytesSent0=0, bytesRcvd0=0, sndSchedTime=1507577162587, lastSndTime=1507577162697, lastRcvTime=1507577162617, readsPaused=false, filterChain=FilterChain[filters=[GridNioCodecFilter [parser=org.apache.ignite.internal.util.nio.GridDirectParser#9573b3b, directMode=true], GridConnectionBytesVerifyFilter], accepted=true]], super=GridAbstractCommunicationClient [lastUsed=1507577162587, closed=false, connIdx=0]]]
[2017-10-09 14:26:52.217] boot - 9955 DEBUG [disco-event-worker-#26%null%] --- TcpCommunicationSpi: Offered move [ses=GridSelectorNioSessionImpl [worker=DirectNioClientWorker [super=AbstractNioClientWorker [idx=3, bytesRcvd=714, bytesSent=6799, bytesRcvd0=0, bytesSent0=0, select=true, super=GridWorker [name=grid-nio-worker-tcp-comm-3, igniteInstanceName=null, finished=false, hashCode=1557230104, interrupted=false, runner=grid-nio-worker-tcp-comm-3-#20%null%]]], writeBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], inRecovery=GridNioRecoveryDescriptor [acked=0, resendCnt=0, rcvCnt=1, sentCnt=1, reserved=true, lastAck=0, nodeLeft=false, node=TcpDiscoveryNode [id=4cc6c321-d9cc-4149-a6ef-cba68877a269, addrs=[10.70.255.8, 127.0.0.1, 172.17.0.1], sockAddrs=[/172.17.0.1:0, /127.0.0.1:0, /10.70.255.8:0], discPort=0, order=57, intOrder=31, lastExchangeTime=1507577126368, loc=false, ver=2.1.0#20170720-sha1:a6ca5c8a, isClient=true], connected=true, connectCnt=0, queueLimit=4096, reserveCnt=1, pairedConnections=false], outRecovery=GridNioRecoveryDescriptor [acked=0, resendCnt=0, rcvCnt=1, sentCnt=1, reserved=true, lastAck=0, nodeLeft=false, node=TcpDiscoveryNode [id=4cc6c321-d9cc-4149-a6ef-cba68877a269, addrs=[10.70.255.8, 127.0.0.1, 172.17.0.1], sockAddrs=[/172.17.0.1:0, /127.0.0.1:0, /10.70.255.8:0], discPort=0, order=57, intOrder=31, lastExchangeTime=1507577126368, loc=false, ver=2.1.0#20170720-sha1:a6ca5c8a, isClient=true], connected=true, connectCnt=0, queueLimit=4096, reserveCnt=1, pairedConnections=false], super=GridNioSessionImpl [locAddr=/10.70.242.138:47100, rmtAddr=/10.70.255.8:53916, createTime=1507577162587, closeTime=0, bytesSent=6799, bytesRcvd=714, bytesSent0=0, bytesRcvd0=0, sndSchedTime=1507577162587, lastSndTime=1507577162697, lastRcvTime=1507577162617, readsPaused=false, filterChain=FilterChain[filters=[GridNioCodecFilter [parser=org.apache.ignite.internal.util.nio.GridDirectParser#9573b3b, directMode=true], GridConnectionBytesVerifyFilter], accepted=true]], fut=NioOperationFuture [op=CLOSE]]
[2017-10-09 14:26:52.217] boot - 9955 DEBUG [disco-event-worker-#26%null%] --- GridIoManager: Removed messages from discovery startup delay list (sender node left topology): null
[2017-10-09 14:26:52.217] boot - 9955 DEBUG [pub-#35%null%] --- GridClosureProcessor: Grid runnable started: closure-proc-worker
[2017-10-09 14:26:52.217] boot - 9955 DEBUG [pub-#35%null%] --- GridClosureProcessor: Grid runnable finished normally: closure-proc-worker
[2017-10-09 14:26:52.283] boot - 9955 DEBUG [grid-timeout-worker-#15%null%] --- GridTimeoutProcessor: Timeout has occurred: org.apache.ignite.internal.processors.cache.transactions.IgniteTxManager$NodeFailureTimeoutObject#7ff59c90
[2017-10-09 14:26:52.284] boot - 9955 DEBUG [grid-timeout-worker-#15%null%] --- GridDeploymentLocalStore: Deployment meta for local deployment: GridDeploymentMetadata [depMode=SHARED, alias=org.apache.ignite.internal.processors.cache.transactions.IgniteTxManager$NodeFailureTimeoutObject$2, clsName=org.apache.ignite.internal.processors.cache.transactions.IgniteTxManager$NodeFailureTimeoutObject$2, userVer=null, sndNodeId=2e573c60-45f0-4429-a3fa-068489663148, clsLdrId=null, clsLdr=null, participants=null, parentLdr=null, record=true, nodeFilter=null, seqNum=n/a]
[2017-10-09 14:26:52.285] boot - 9955 DEBUG [grid-timeout-worker-#15%null%] --- LocalDeploymentSpi: Registering [ldrRsrcs={org.springframework.boot.loader.LaunchedURLClassLoader#7adf9f5f={org.apache.ignite.internal.util.typedef.T2=org.apache.ignite.internal.util.typedef.T2, org.apache.ignite.internal.processors.cache.distributed.dht.preloader.IgniteDhtPartitionHistorySuppliersMap=org.apache.ignite.internal.processors.cache.distributed.dht.preloader.IgniteDhtPartitionHistorySuppliersMap, org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionFullMap=org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionFullMap, java.util.Collections$UnmodifiableList=java.util.Collections$UnmodifiableList, org.apache.ignite.internal.visor.cache.VisorCacheMetricsCollectorTask=org.apache.ignite.internal.visor.cache.VisorCacheMetricsCollectorTask, org.apache.ignite.internal.processors.cache.distributed.dht.preloader.IgniteDhtPartitionsToReloadMap=org.apache.ignite.internal.processors.cache.distributed.dht.preloader.IgniteDhtPartitionsToReloadMap, org.apache.ignite.internal.processors.service.GridServiceProcessor$1=org.apache.ignite.internal.processors.service.GridServiceProcessor$1, org.apache.ignite.internal.processors.datastructures.DataStructuresProcessor$1$1=org.apache.ignite.internal.processors.datastructures.DataStructuresProcessor$1$1, org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionMap=org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionMap, org.apache.ignite.internal.processors.cache.distributed.dht.preloader.IgniteDhtPartitionCountersMap=org.apache.ignite.internal.processors.cache.distributed.dht.preloader.IgniteDhtPartitionCountersMap, org.apache.ignite.internal.processors.task.GridTaskProcessor$TaskDiscoveryListener$1=org.apache.ignite.internal.processors.task.GridTaskProcessor$TaskDiscoveryListener$1}}, ldr=org.springframework.boot.loader.LaunchedURLClassLoader#7adf9f5f, rsrc=class org.apache.ignite.internal.processors.cache.transactions.IgniteTxManager$NodeFailureTimeoutObject$2]
2017-10-09 14:26:52.285] boot - 9955 DEBUG [grid-timeout-worker-#15%null%] --- GridDeploymentLocalStore: Retrieved auto-loaded resource from spi: DeploymentResourceAdapter [name=org.apache.ignite.internal.processors.cache.transactions.IgniteTxManager$NodeFailureTimeoutObject$2, rsrcCls=class org.apache.ignite.internal.processors.cache.transactions.IgniteTxManager$NodeFailureTimeoutObject$2, clsLdr=org.springframework.boot.loader.LaunchedURLClassLoader#7adf9f5f]
[2017-10-09 14:26:52.285] boot - 9955 DEBUG [grid-timeout-worker-#15%null%] --- GridDeploymentLocalStore: Acquired deployment class: GridDeployment [ts=1507576972855, depMode=SHARED, clsLdr=org.springframework.boot.loader.LaunchedURLClassLoader#7adf9f5f, clsLdrId=6d9e6920f51-2e573c60-45f0-4429-a3fa-068489663148, userVer=0, loc=true, sampleClsName=org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionFullMap, pendingUndeploy=false, undeployed=false, usage=0]
[2017-10-09 14:26:52.285] boot - 9955 DEBUG [grid-timeout-worker-#15%null%] --- GridResourceProcessor: Injecting resources [target=org.apache.ignite.internal.processors.cache.transactions.IgniteTxManager$NodeFailureTimeoutObject$2#3f183e4]
[2017-10-09 14:26:52.317] boot - 9955 DEBUG [http-nio-8081-exec-8] --- CacheHelper: Total time accessing cache ng-security-service-ORG_SPEC_CACHE for key * | value com.cache.model.PrefixCluster#6954be5d: 167 millis
[2017-10-09 14:26:52.319] boot - 9955 DEBUG [http-nio-8081-exec-8] --- OrgSpecCacheImpl: OrgSpec Cache Stats: OrgSpec ObjId: IgniteCacheProxy [delegate=GridNearCacheAdapter [], opCtx=null, restartFut=null] HitCount: 126, MissCount: 53, AvgReadTime: 126, Eviction Count: 0
[2017-10-09 14:26:52.321] boot - 9955 DEBUG [sys-#36%null%] --- GridClosureProcessor: Grid runnable started: closure-proc-worker
My question is, is this expected behavior? Can we get the near cache not be bypassed, or in the least re-establish using the near cache after the bad client disconnects.
It turns out their is a bug with the near cache. When a topology change occurs it can wipe out the topology version on the NearCacheGridEntry which will cause every call to check if the entry is valid to return false.
Bug has been submitted https://issues.apache.org/jira/browse/IGNITE-6767

Apache Ignite Cache put fails with in the IgniteRunnable run method

I am getting the below error during the igniteCache.put() in the IgniteRunnable run()
I have only 2 nodes (client and server) .
1) Client creates the cache
CacheConfiguration<Integer, LAttribute> cfg = new CacheConfiguration<Integer, LAttribute>();
cfg.setIndexedTypes(Integer.class, LoanAttribute.class);
cfg.setCacheMode(CacheMode.PARTITIONED);
cfg.setName("inv_result");
cfg.setCopyOnRead(false);
cfg.setAtomicityMode(CacheAtomicityMode.ATOMIC);
2) Client Submit the IgniteRunnable task to Server
3) Client exit the cluster
On the Server (with in run() method)
1) Get the cache and put a value
IgniteCache<Integer, LAttribute> iCache = Ignition.localIgnite().cache("inv_result");
System.out.println("Begin .. "+iCache.size(CachePeekMode.ALL));
iCache.put(la.getId(), la);
Error :
[21:41:14,859][SEVERE][pub-#67%null%][GridJobWorker] Failed to execute job due to unexpected runtime exception [jobId=f4606f39b51-21c994a7-6b35-49fa-b696-582fa7825c31, ses=GridJobSessionImpl [ses=GridTaskSessionImpl [taskName=com.test.ignite.compute.AssetRestrictionComputeJob, dep=GridDeployment [ts=1492836063447, depMode=SHARED, clsLdr=sun.misc.Launcher$AppClassLoader#73d16e93, clsLdrId=438a5f39b51-76a937b0-7831-458b-aee4-cec662f02b0d, userVer=0, loc=true, sampleClsName=o.a.i.i.processors.cache.distributed.dht.preloader.GridDhtPartitionMap2, pendingUndeploy=false, undeployed=false, usage=1], taskClsName=com.bfm.seclending.ignite.compute.AssetRestrictionComputeJob, sesId=c4606f39b51-21c994a7-6b35-49fa-b696-582fa7825c31, startTime=1492836072790, endTime=9223372036854775807, taskNodeId=21c994a7-6b35-49fa-b696-582fa7825c31, clsLdr=sun.misc.Launcher$AppClassLoader#73d16e93, closed=false, cpSpi=null, failSpi=null, loadSpi=null, usage=1, fullSup=false, internal=false, subjId=21c994a7-6b35-49fa-b696-582fa7825c31, mapFut=IgniteFuture [orig=GridFutureAdapter [resFlag=0, res=null, startTime=1492836072829, endTime=0, ignoreInterrupts=false, state=INIT]]], jobId=f4606f39b51-21c994a7-6b35-49fa-b696-582fa7825c31]]
javax.cache.CacheException: class org.apache.ignite.IgniteInterruptedException: Failed to wait for asynchronous operation permit (thread got interrupted).
at org.apache.ignite.internal.processors.cache.GridCacheUtils.convertToCacheException(GridCacheUtils.java:1440)
at org.apache.ignite.internal.processors.cache.IgniteCacheProxy.cacheException(IgniteCacheProxy.java:2183)
at org.apache.ignite.internal.processors.cache.IgniteCacheProxy.put(IgniteCacheProxy.java:1383)
at co.test.ignite.compute.AssetRestrictionComputeJob.run(AssetRestrictionComputeJob.java:110)
at org.apache.ignite.internal.processors.closure.GridClosureProcessor$C4V2.execute(GridClosureProcessor.java:2215)
at org.apache.ignite.internal.processors.job.GridJobWorker$2.call(GridJobWorker.java:556)
at org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:6564)
at org.apache.ignite.internal.processors.job.GridJobWorker.execute0(GridJobWorker.java:550)
at org.apache.ignite.internal.processors.job.GridJobWorker.body(GridJobWorker.java:479)
at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
at org.apache.ignite.internal.processors.job.GridJobProcessor.processJobExecuteRequest(GridJobProcessor.java:1180)
at org.apache.ignite.internal.processors.job.GridJobProcessor$JobExecutionListener.onMessage(GridJobProcessor.java:1894)
at org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1082)
at org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:710)
at org.apache.ignite.internal.managers.communication.GridIoManager.access$1700(GridIoManager.java:102)
at org.apache.ignite.internal.managers.communication.GridIoManager$5.run(GridIoManager.java:673)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: class org.apache.ignite.IgniteInterruptedException: Failed to wait for asynchronous operation permit (thread got interrupted).
at org.apache.ignite.internal.util.IgniteUtils$3.apply(IgniteUtils.java:766)
at org.apache.ignite.internal.util.IgniteUtils$3.apply(IgniteUtils.java:764)
... 19 more
Caused by: java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1302)
at java.util.concurrent.Semaphore.acquire(Semaphore.java:312)
at org.apache.ignite.internal.processors.cache.GridCacheAdapter.asyncOpAcquire(GridCacheAdapter.java:4597)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.asyncOp(GridDhtAtomicCache.java:817)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.updateAsync0(GridDhtAtomicCache.java:1148)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.putAsync0(GridDhtAtomicCache.java:618)
at org.apache.ignite.internal.processors.cache.GridCacheAdapter.putAsync(GridCacheAdapter.java:2541)
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.put(GridDhtAtomicCache.java:595)
at org.apache.ignite.internal.processors.cache.GridCacheAdapter.put(GridCacheAdapter.java:2215)
at org.apache.ignite.internal.processors.cache.IgniteCacheProxy.put(IgniteCacheProxy.java:1376)
... 16 more
Most likely server node was stopped in the middle of execution. That's the only case when internal Ignite threads are interrupted. When this happens, job can be automatically failed over to another node: https://apacheignite.readme.io/docs/fault-tolerance
I found the reason for the InterruptedException , executionService that i am using on the client side to submit the jobs is not waiting for the job completion .
when i call future.get() .. all good now. Thanks