Related
Ignite version: 2.14.0
Node configuration: 2 Nodes running on same PC (IPV4) using localhost and 255 available ports:
TcpDiscoveryMulticastIpFinder ipFinder = new TcpDiscoveryMulticastIpFinder();
ipFinder.setAddresses(Collections.singletonList("127.0.0.1"));
Also 2 different working dirs, Threadpool 16, 2 caches (one atomic, one transactional)
What happens: Using ExecutorService i submit 8 threads to pool. Class run correctly (4 on each node) and execute tasks as expected.
But during execution raise, repeatedly and with some frequency, the following exception on both nodes: GRAVE: "Failed to process selector key".
The application generates a high computational load. A simple "for loop" with a sleep gives no error
Full stack follows:
GRAVE: Failed to process selector key [ses=GridSelectorNioSessionImpl [worker=DirectNioClientWorker [super=AbstractNioClientWorker [idx=3, bytesRcvd=97567668, bytesSent=100128669, bytesRcvd0=0, bytesSent0=0, select=true, super=GridWorker [name=grid-nio-worker-tcp-comm-3, igniteInstanceName=TcpCommunicationSpi, finished=false, heartbeatTs=1675265761563, hashCode=2143442267, interrupted=false, runner=grid-nio-worker-tcp-comm-3-#26%TcpCommunicationSpi%]]], writeBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], inRecovery=GridNioRecoveryDescriptor [acked=1690656, resendCnt=0, rcvCnt=1696452, sentCnt=1691375, reserved=true, lastAck=1696448, nodeLeft=false, node=TcpDiscoveryNode [id=cd1ffdf0-b9b3-49ef-a9e3-db1676fad428, consistentId=0:0:0:0:0:0:0:1,127.0.0.1,192.168.178.30,192.168.56.1:47500, addrs=ArrayList [0:0:0:0:0:0:0:1, 127.0.0.1, 192.168.178.30, 192.168.56.1], sockAddrs=HashSet [host.docker.internal/192.168.178.30:47500, /0:0:0:0:0:0:0:1:47500, WOPR/192.168.56.1:47500, /127.0.0.1:47500], discPort=47500, order=1, intOrder=1, lastExchangeTime=1675265584899, loc=false, ver=2.14.0#20220929-sha1:951e8deb, isClient=false], connected=true, connectCnt=69, queueLimit=4096, reserveCnt=101, pairedConnections=false], outRecovery=GridNioRecoveryDescriptor [acked=1690656, resendCnt=0, rcvCnt=1696452, sentCnt=1691375, reserved=true, lastAck=1696448, nodeLeft=false, node=TcpDiscoveryNode [id=cd1ffdf0-b9b3-49ef-a9e3-db1676fad428, consistentId=0:0:0:0:0:0:0:1,127.0.0.1,192.168.178.30,192.168.56.1:47500, addrs=ArrayList [0:0:0:0:0:0:0:1, 127.0.0.1, 192.168.178.30, 192.168.56.1], sockAddrs=HashSet [host.docker.internal/192.168.178.30:47500, /0:0:0:0:0:0:0:1:47500, WOPR/192.168.56.1:47500, /127.0.0.1:47500], discPort=47500, order=1, intOrder=1, lastExchangeTime=1675265584899, loc=false, ver=2.14.0#20220929-sha1:951e8deb, isClient=false], connected=true, connectCnt=69, queueLimit=4096, reserveCnt=101, pairedConnections=false], closeSocket=true, outboundMessagesQueueSizeMetric=o.a.i.i.processors.metric.impl.LongAdderMetric#69a257d1, super=GridNioSessionImpl [locAddr=/0:0:0:0:0:0:0:1:47101, rmtAddr=/0:0:0:0:0:0:0:1:56361, createTime=1675265760336, closeTime=0, bytesSent=8479762, bytesRcvd=7459908, bytesSent0=0, bytesRcvd0=0, sndSchedTime=1675265760336, lastSndTime=1675265761545, lastRcvTime=1675265761563, readsPaused=false, filterChain=FilterChain[filters=[GridNioCodecFilter [parser=o.a.i.i.util.nio.GridDirectParser#b329ba4, directMode=true], GridConnectionBytesVerifyFilter], accepted=true, markedForClose=true]]]
java.io.IOException: Connessione in corso interrotta forzatamente dall'host remoto
at java.base/sun.nio.ch.SocketDispatcher.write0(Native Method)
at java.base/sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:51)
at java.base/sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:113)
at java.base/sun.nio.ch.IOUtil.write(IOUtil.java:58)
at java.base/sun.nio.ch.IOUtil.write(IOUtil.java:50)
at java.base/sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:466)
at org.apache.ignite.internal.util.nio.GridNioServer$DirectNioClientWorker.processWrite0(GridNioServer.java:1715)
at org.apache.ignite.internal.util.nio.GridNioServer$DirectNioClientWorker.processWrite(GridNioServer.java:1407)
at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.processSelectedKeysOptimized(GridNioServer.java:2511)
at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:2273)
at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.body(GridNioServer.java:1910)
at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:125)
at java.base/java.lang.Thread.run(Thread.java:834)
Expected: I read that it could be a configuration problem but I don't understand how to fix it.
The configuration seems simple but and even if the execution is without calculation errors i would like to avoid this exception.
NODE1
[2023-02-02 15:54:30] [AVVERTENZA] Client disconnected abruptly due to network connection loss or because the connection was left open on application shutdown. [cls=class o.a.i.i.util.nio.GridNioException, msg=Connessione in corso interrotta forzatamente dall'host remoto] - [org.apache.ignite.logger.java.JavaLogger warning:]
[2023-02-02 15:54:30] [AVVERTENZA] Unacknowledged messages queue size overflow, will attempt to reconnect [remoteAddr=/127.0.0.1:63660, queueLimit=4096] - [org.apache.ignite.logger.java.JavaLogger warning:]
[2023-02-02 15:54:30] [INFORMAZIONI] Accepted incoming communication connection [locAddr=/127.0.0.1:47101, rmtAddr=/127.0.0.1:63670] - [org.apache.ignite.logger.java.JavaLogger info:]
[2023-02-02 15:54:30] [INFORMAZIONI] Accepted incoming communication connection [locAddr=/127.0.0.1:47101, rmtAddr=/127.0.0.1:63671] - [org.apache.ignite.logger.java.JavaLogger info:]
[2023-02-02 15:54:30] [INFORMAZIONI] Received incoming connection when already connected to this node, rejecting [locNode=af74d5c9-3631-4fdf-b9f2-0babc853019f, rmtNode=8a378874-f3ae-4d0c-9733-a6b143097658] - [org.apache.ignite.logger.java.JavaLogger info:]
[2023-02-02 15:54:30] [INFORMAZIONI] Accepted incoming communication connection [locAddr=/127.0.0.1:47101, rmtAddr=/127.0.0.1:63672] - [org.apache.ignite.logger.java.JavaLogger info:]
[2023-02-02 15:54:30] [INFORMAZIONI] Received incoming connection when already connected to this node, rejecting [locNode=af74d5c9-3631-4fdf-b9f2-0babc853019f, rmtNode=8a378874-f3ae-4d0c-9733-a6b143097658] - [org.apache.ignite.logger.java.JavaLogger info:]
[2023-02-02 15:54:31] [INFORMAZIONI] Accepted incoming communication connection [locAddr=/127.0.0.1:47101, rmtAddr=/127.0.0.1:63673] - [org.apache.ignite.logger.java.JavaLogger info:]
[2023-02-02 15:54:31] [INFORMAZIONI] Received incoming connection when already connected to this node, rejecting [locNode=af74d5c9-3631-4fdf-b9f2-0babc853019f, rmtNode=8a378874-f3ae-4d0c-9733-a6b143097658] - [org.apache.ignite.logger.java.JavaLogger info:]
[2023-02-02 15:54:31] [GRAVE ] Failed to process selector key [ses=GridSelectorNioSessionImpl [worker=DirectNioClientWorker [super=AbstractNioClientWorker [idx=0, bytesRcvd=2269317, bytesSent=3928093, bytesRcvd0=1909138, bytesSent0=720914, select=true, super=GridWorker [name=grid-nio-worker-tcp-comm-0, igniteInstanceName=TcpCommunicationSpi, finished=false, heartbeatTs=1675349670621, hashCode=722948156, interrupted=false, runner=grid-nio-worker-tcp-comm-0-#23%TcpCommunicationSpi%]]], writeBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], inRecovery=GridNioRecoveryDescriptor [acked=17152, resendCnt=870, rcvCnt=31061, sentCnt=18796, reserved=true, lastAck=31040, nodeLeft=false, node=TcpDiscoveryNode [id=8a378874-f3ae-4d0c-9733-a6b143097658, consistentId=0:0:0:0:0:0:0:1,127.0.0.1,192.168.178.30,192.168.56.1:47500, addrs=ArrayList [0:0:0:0:0:0:0:1, 127.0.0.1, 192.168.178.30, 192.168.56.1], sockAddrs=HashSet [host.docker.internal/192.168.178.30:47500, /0:0:0:0:0:0:0:1:47500, WOPR/192.168.56.1:47500, /127.0.0.1:47500], discPort=47500, order=1, intOrder=1, lastExchangeTime=1675349650217, loc=false, ver=2.14.0#20220929-sha1:951e8deb, isClient=false], connected=true, connectCnt=7, queueLimit=4096, reserveCnt=9, pairedConnections=false], outRecovery=GridNioRecoveryDescriptor [acked=17152, resendCnt=870, rcvCnt=31061, sentCnt=18796, reserved=true, lastAck=31040, nodeLeft=false, node=TcpDiscoveryNode [id=8a378874-f3ae-4d0c-9733-a6b143097658, consistentId=0:0:0:0:0:0:0:1,127.0.0.1,192.168.178.30,192.168.56.1:47500, addrs=ArrayList [0:0:0:0:0:0:0:1, 127.0.0.1, 192.168.178.30, 192.168.56.1], sockAddrs=HashSet [host.docker.internal/192.168.178.30:47500, /0:0:0:0:0:0:0:1:47500, WOPR/192.168.56.1:47500, /127.0.0.1:47500], discPort=47500, order=1, intOrder=1, lastExchangeTime=1675349650217, loc=false, ver=2.14.0#20220929-sha1:951e8deb, isClient=false], connected=true, connectCnt=7, queueLimit=4096, reserveCnt=9, pairedConnections=false], closeSocket=true, outboundMessagesQueueSizeMetric=o.a.i.i.processors.metric.impl.LongAdderMetric#69a257d1, super=GridNioSessionImpl [locAddr=/127.0.0.1:47101, rmtAddr=/127.0.0.1:63670, createTime=1675349670241, closeTime=0, bytesSent=720914, bytesRcvd=1909138, bytesSent0=720914, bytesRcvd0=1909138, sndSchedTime=1675349670241, lastSndTime=1675349670277, lastRcvTime=1675349670621, readsPaused=false, filterChain=FilterChain[filters=[GridNioCodecFilter [parser=o.a.i.i.util.nio.GridDirectParser#16179752, directMode=true], GridConnectionBytesVerifyFilter], accepted=true, markedForClose=true]]] - [org.apache.ignite.logger.java.JavaLogger error:
java.io.IOException: Connessione in corso interrotta forzatamente dall'host remoto
NODE2
[2023-02-02 15:54:30] [INFORMAZIONI] Accepted incoming communication connection [locAddr=/127.0.0.1:47100, rmtAddr=/127.0.0.1:63669] - [org.apache.ignite.logger.java.JavaLogger info:]
[2023-02-02 15:54:30] [INFORMAZIONI] Received incoming connection from remote node while connecting to this node, rejecting [locNode=8a378874-f3ae-4d0c-9733-a6b143097658, locNodeOrder=1, rmtNode=af74d5c9-3631-4fdf-b9f2-0babc853019f, rmtNodeOrder=2] - [org.apache.ignite.logger.java.JavaLogger info:]
[2023-02-02 15:54:30] [INFORMAZIONI] Established outgoing communication connection [locAddr=/127.0.0.1:63670, rmtAddr=/127.0.0.1:47101] - [org.apache.ignite.logger.java.JavaLogger info:]
[2023-02-02 15:54:31] [INFORMAZIONI] Established outgoing communication connection [locAddr=/127.0.0.1:63676, rmtAddr=/127.0.0.1:47101] - [org.apache.ignite.logger.java.JavaLogger info:]
[2023-02-02 15:54:31] [INFORMAZIONI] TCP client created [client=GridTcpNioCommunicationClient [ses=GridSelectorNioSessionImpl [worker=DirectNioClientWorker [super=AbstractNioClientWorker [idx=1, bytesRcvd=84, bytesSent=56, bytesRcvd0=0, bytesSent0=0, select=true, super=GridWorker [name=grid-nio-worker-tcp-comm-1, igniteInstanceName=TcpCommunicationSpi, finished=false, heartbeatTs=1675349671637, hashCode=762674116, interrupted=false, runner=grid-nio-worker-tcp-comm-1-#24%TcpCommunicationSpi%]]], writeBuf=java.nio.DirectByteBuffer[pos=9391 lim=32768 cap=32768], readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], inRecovery=GridNioRecoveryDescriptor [acked=31061, resendCnt=753, rcvCnt=17160, sentCnt=31871, reserved=true, lastAck=17152, nodeLeft=false, node=TcpDiscoveryNode [id=af74d5c9-3631-4fdf-b9f2-0babc853019f, consistentId=0:0:0:0:0:0:0:1,127.0.0.1,192.168.178.30,192.168.56.1:47501, addrs=ArrayList [0:0:0:0:0:0:0:1, 127.0.0.1, 192.168.178.30, 192.168.56.1], sockAddrs=HashSet [host.docker.internal/192.168.178.30:47501, /0:0:0:0:0:0:0:1:47501, WOPR/192.168.56.1:47501, /127.0.0.1:47501], discPort=47501, order=2, intOrder=2, lastExchangeTime=1675349650060, loc=false, ver=2.14.0#20220929-sha1:951e8deb, isClient=false], connected=false, connectCnt=8, queueLimit=4096, reserveCnt=9, pairedConnections=false], outRecovery=GridNioRecoveryDescriptor [acked=31061, resendCnt=531, rcvCnt=17160, sentCnt=31871, reserved=true, lastAck=17152, nodeLeft=false, node=TcpDiscoveryNode [id=af74d5c9-3631-4fdf-b9f2-0babc853019f, consistentId=0:0:0:0:0:0:0:1,127.0.0.1,192.168.178.30,192.168.56.1:47501, addrs=ArrayList [0:0:0:0:0:0:0:1, 127.0.0.1, 192.168.178.30, 192.168.56.1], sockAddrs=HashSet [host.docker.internal/192.168.178.30:47501, /0:0:0:0:0:0:0:1:47501, WOPR/192.168.56.1:47501, /127.0.0.1:47501], discPort=47501, order=2, intOrder=2, lastExchangeTime=1675349650060, loc=false, ver=2.14.0#20220929-sha1:951e8deb, isClient=false], connected=false, connectCnt=8, queueLimit=4096, reserveCnt=9, pairedConnections=false], closeSocket=true, outboundMessagesQueueSizeMetric=org.apache.ignite.internal.processors.metric.impl.LongAdderMetric#69a257d1, super=GridNioSessionImpl [locAddr=/127.0.0.1:63676, rmtAddr=/127.0.0.1:47101, createTime=1675349671637, closeTime=0, bytesSent=0, bytesRcvd=0, bytesSent0=0, bytesRcvd0=0, sndSchedTime=1675349671637, lastSndTime=1675349671637, lastRcvTime=1675349671637, readsPaused=false, filterChain=FilterChain[filters=[GridNioCodecFilter [parser=org.apache.ignite.internal.util.nio.GridDirectParser#544beb47, directMode=true], GridConnectionBytesVerifyFilter], accepted=false, markedForClose=false]], super=GridAbstractCommunicationClient [lastUsed=1675349671637, closed=false, connIdx=0]], duration=339ms] - [org.apache.ignite.logger.java.JavaLogger info:]
in product environment, one node (172.11.11.36) log show:
[..common.ignite.spi.CustomTcpDiscoverySpi] Finished serving remote node connection [rmtAddr=/172.11.11.49:53137, rmtPort=53137
[2021-12-14T15:25:21,681][ERROR][sys-stripe-15-#16][org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi] Failed to send message to remote node [node=TcpDiscoveryNode [id=f6fe6cd0-612b-4a26-8b63-2054b749fe7f, consistentId=node-live-39, addrs=ArrayList [0:0:0:0:0:0:0:1%lo, 127.0.0.1, 172.17.0.1, 172.11.11.39], sockAddrs=HashSet [ip-172-11-11-39.ap-northeast-1.compute.internal/172.11.11.39:47500, ip-172-17-0-1.ap-northeast-1.compute.internal/172.17.0.1:47500, /0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500], discPort=47500, order=5, intOrder=5, lastExchangeTime=1638264334577, loc=false, ver=2.9.1#20201203-sha1:adce517, isClient=false], msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0, skipOnTimeout=false, msg=GridDhtAtomicDeferredUpdateResponse [futIds=GridLongList [idx=1, arr=[107382444]]]]]
org.apache.ignite.internal.cluster.ClusterTopologyCheckedException: Remote node does not observe current node in topology : f6fe6cd0-612b-4a26-8b63-2054b749fe7f
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioSession(TcpCommunicationSpi.java:3819) ~[ignite-core-2.9.1.jar:2.9.1]
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3635) ~[ignite-core-2.9.1.jar:2.9.1]
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createCommunicationClient(TcpCommunicationSpi.java:3375) ~[ignite-core-2.9.1.jar:2.9.1]
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:3180) ~[ignite-core-2.9.1.jar:2.9.1]
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:3013) [ignite-core-2.9.1.jar:2.9.1]
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2960) [ignite-core-2.9.1.jar:2.9.1]
at org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:2100) [ignite-core-2.9.1.jar:2.9.1]
at org.apache.ignite.internal.managers.communication.GridIoManager.sendToGridTopic(GridIoManager.java:2195) [ignite-core-2.9.1.jar:2.9.1]
at org.apache.ignite.internal.processors.cache.GridCacheIoManager.send(GridCacheIoManager.java:1257) [ignite-core-2.9.1.jar:2.9.1]
at org.apache.ignite.internal.processors.cache.GridCacheIoManager.send(GridCacheIoManager.java:1296) [ignite-core-2.9.1.jar:2.9.1]
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.sendDeferredUpdateResponse(GridDhtAtomicCache.java:3643) [ignite-core-2.9.1.jar:2.9.1]
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache.access$3300(GridDhtAtomicCache.java:141) [ignite-core-2.9.1.jar:2.9.1]
at org.apache.ignite.internal.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$DeferredUpdateTimeout.run(GridDhtAtomicCache.java:3889) [ignite-core-2.9.1.jar:2.9.1]
at org.apache.ignite.internal.util.StripedExecutor$Stripe.body(StripedExecutor.java:565) [ignite-core-2.9.1.jar:2.9.1]
at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) [ignite-core-2.9.1.jar:2.9.1]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_212]
and node (172.11.11.39)
[2021-12-14T15:25:21,641][WARN ][disco-event-worker-#71][org.apache.ignite.internal.managers.discovery.GridDiscoveryManager] Node FAILED: TcpDiscoveryNode [id=702a3e0f-afc9-446e-9c9d-7ec25b185b49, consistentId=node-live-36, addrs=ArrayList [0:0:0:0:0:0:0:1%lo, 127.0.0.1, 172.17.0.1, 172.11.11.36], sockAddrs=HashSet [ip-172-11-11-36.ap-northeast-1.compute.internal/172.11.11.36:47500, ip-172-17-0-1.ap-northeast-1.compute.internal/172.17.0.1:47500, /0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500], discPort=47500, order=1, intOrder=1, lastExchangeTime=1638264334663, loc=false, ver=2.9.1#20201203-sha1:adcce517, isClient=false]
[2021-12-14T15:25:21,680][WARN ][grid-nio-worker-tcp-comm-6-#45][org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi] Close incoming connection, unknown node [nodeId=702a3e0f-afc9-446e-9c9d-7ec25b185b49, ses=GridSelectorNioSessionImpl [worker=DirectNioClientWorker [super=AbstractNioClientWorker [idx=6, bytesRcvd=437336562986, bytesSent=474752492909, bytesRcvd0=1781892, bytesSent0=1106881, select=true, super=GridWorker [name=grid-nio-worker-tcp-comm-6, igniteInstanceName=null, finished=false, heartbeatTs=1639495521670, hashCode=1976943565, interrupted=false, runner=grid-nio-worker-tcp-comm-6-#45]]], writeBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], readBuf=java.nio.DirectByteBuffer[pos=38 lim=38 cap=32768], inRecovery=null, outRecovery=null, closeSocket=true, outboundMessagesQueueSizeMetric=o.a.i.i.processors.metric.impl.LongAdderMetric#69a257d1, super=GridNioSessionImpl [locAddr=/172.11.11.39:47100, rmtAddr=/172.11.11.36:49818, createTime=1639495521670, closeTime=0, bytesSent=18, bytesRcvd=42, bytesSent0=18, bytesRcvd0=42, sndSchedTime=1639495521670, lastSndTime=1639495521670, lastRcvTime=1639495521670, readsPaused=false, filterChain=FilterChain[filters=[GridNioCodecFilter [parser=o.a.i.i.util.nio.GridDirectParser#4f37de39, directMode=true], GridConnectionBytesVerifyFilter], accepted=true, markedForClose=false]]]
[2021-12-14T15:25:21,673][ERROR][query-#105][org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi] Failed to send message to remote node [node=TcpDiscoveryNode [id=702a3e0f-afc9-446e-9c9d-7ec25b185b49, consistentId=node-live-36, addrs=ArrayList [0:0:0:0:0:0:0:1%lo, 127.0.0.1, 172.17.0.1, 172.11.11.36], sockAddrs=HashSet [ip-172-11-11-36.ap-northeast-1.compute.internal/172.11.11.36:47500, ip-172-17-0-1.ap-northeast-1.compute.internal/172.17.0.1:47500, /0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500], discPort=47500, order=1, intOrder=1, lastExchangeTime=1638264334663, loc=false, ver=2.9.1#20201203-sha1:adcce517, isClient=false], msg=GridIoMessage [plc=10, topic=TOPIC_QUERY, topicOrd=19, ordered=false, timeout=0, skipOnTimeout=false, msg=GridQueryNextPageResponse [qryReqId=78777738, segmentId=0, qry=2, page=0, allRows=364, cols=4, retry=null, retryCause=null, last=true, removeMapping=false, valsSize=1456, rowsSize=0]]]
org.apache.ignite.internal.cluster.ClusterTopologyCheckedException: Failed to send message (node left topology): TcpDiscoveryNode [id=702a3e0f-afc9-446e-9c9d-7ec25b185b49, consistentId=node-live-36, addrs=ArrayList [0:0:0:0:0:0:0:1%lo, 127.0.0.1, 172.17.0.1, 172.11.11.36], sockAddrs=HashSet [ip-172-11-11-36.ap-northeast-1.compute.internal/172.11.11.36:47500, ip-172-17-0-1.ap-northeast-1.compute.internal/172.17.0.1:47500, /0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500], discPort=47500, order=1, intOrder=1, lastExchangeTime=1638264334663, loc=false, ver=2.9.1#20201203-sha1:adcce517, isClient=false]
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioSession(TcpCommunicationSpi.java:3736) ~[ignite-core-2.9.1.jar:2.9.1]
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3635) ~[ignite-core-2.9.1.jar:2.9.1]
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createCommunicationClient(TcpCommunicationSpi.java:3375) ~[ignite-core-2.9.1.jar:2.9.1]
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:3180) ~[ignite-core-2.9.1.jar:2.9.1]
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:3013) ~[ignite-core-2.9.1.jar:2.9.1]
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2960) ~[ignite-core-2.9.1.jar:2.9.1]
at org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:2100) ~[ignite-core-2.9.1.jar:2.9.1]
at org.apache.ignite.internal.managers.communication.GridIoManager.sendToGridTopic(GridIoManager.java:2195) ~[ignite-core-2.9.1.jar:2.9.1]
at org.apache.ignite.internal.processors.cache.GridCacheIoManager.send(GridCacheIoManager.java:1257) ~[ignite-core-2.9.1.jar:2.9.1]
at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLockFuture.map(GridDhtLockFuture.java:1026) ~[ignite-core-2.9.1.jar:2.9.1]
at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLockFuture.onOwnerChanged(GridDhtLockFuture.java:714) ~[ignite-core-2.9.1.jar:2.9.1]
at org.apache.ignite.internal.processors.cache.GridCacheMvccManager.notifyOwnerChanged(GridCacheMvccManager.java:227) ~[ignite-core-2.9.1.jar:2.9.1]
at org.apache.ignite.internal.processors.cache.GridCacheMvccManager.access$200(GridCacheMvccManager.java:82) ~[ignite-core-2.9.1.jar:2.9.1]
at org.apache.ignite.internal.processors.cache.GridCacheMvccManager$3.onOwnerChanged(GridCacheMvccManager.java:164) ~[ignite-core-2.9.1.jar:2.9.1]
at org.apache.ignite.internal.processors.cache.GridCacheMapEntry.checkOwnerChanged(GridCacheMapEntry.java:4935) ~[ignite-core-2.9.1.jar:2.9.1]
at org.apache.ignite.internal.processors.cache.GridCacheMapEntry.checkOwnerChanged(GridCacheMapEntry.java:4887) ~[ignite-core-2.9.1.jar:2.9.1]
at org.apache.ignite.internal.processors.cache.distributed.GridDistributedCacheEntry.readyLock(GridDistributedCacheEntry.java:516) ~[ignite-core-2.9.1.jar:2.9.1]
at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLockFuture.readyLocks(GridDhtLockFuture.java:622) ~[ignite-core-2.9.1.jar:2.9.1]
at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLockFuture.map(GridDhtLockFuture.java:830) ~[ignite-core-2.9.1.jar:2.9.1]
at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTransactionalCacheAdapter.lockAllAsync(GridDhtTransactionalCacheAdapter.java:1274) ~[ignite-core-2.9.1.jar:2.9.1]
at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTransactionalCacheAdapter.processNearLockRequest0(GridDhtTransactionalCacheAdapter.java:815) ~[ignite-core-2.9.1.jar:2.9.1]
at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTransactionalCacheAdapter.processNearLockRequest(GridDhtTransactionalCacheAdapter.java:800) ~[ignite-core-2.9.1.jar:2.9.1]
at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTransactionalCacheAdapter.access$000(GridDhtTransactionalCacheAdapter.java:112) ~[ignite-core-2.9.1.jar:2.9.1]
at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTransactionalCacheAdapter$3.apply(GridDhtTransactionalCacheAdapter.java:158) ~[ignite-core-2.9.1.jar:2.9.1]
at org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTransactionalCacheAdapter$3.apply(GridDhtTransactionalCacheAdapter.java:156) ~[ignite-core-2.9.1.jar:2.9.1]
at org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1142) ~[ignite-core-2.9.1.jar:2.9.1]
at org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:591) ~[ignite-core-2.9.1.jar:2.9.1]
at org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:392) ~[ignite-core-2.9.1.jar:2.9.1]
at org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:318) ~[ignite-core-2.9.1.jar:2.9.1]
at org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:109) ~[ignite-core-2.9.1.jar:2.9.1]
at org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:308) ~[ignite-core-2.9.1.jar:2.9.1]
at org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1907) ~[ignite-core-2.9.1.jar:2.9.1]
at org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1528) ~[ignite-core-2.9.1.jar:2.9.1]
at org.apache.ignite.internal.managers.communication.GridIoManager.access$5300(GridIoManager.java:241) ~[ignite-core-2.9.1.jar:2.9.1]
at org.apache.ignite.internal.managers.communication.GridIoManager$9.execute(GridIoManager.java:1421) ~[ignite-core-2.9.1.jar:2.9.1]
at org.apache.ignite.internal.managers.communication.TraceRunnable.run(TraceRunnable.java:55) ~[ignite-core-2.9.1.jar:2.9.1]
at org.apache.ignite.internal.util.StripedExecutor$Stripe.body(StripedExecutor.java:565) ~[ignite-core-2.9.1.jar:2.9.1]
at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) ~[ignite-core-2.9.1.jar:2.9.1]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_212]
and then node (172.11.11.36) shows
[..common.ignite.spi.CustomTcpDiscoverySpi] Initialized connection with remote server node [nodeId=ad21f9e2-cfd0-44b2-821f-7be19184b3d8, rmtAddr=/172.11.11.21:59943]
[2021-12-14T15:25:23,215][WARN ][tcp-disco-msg-worker-[ad21f9e2 172.11.11.37:47500 crd]-#2-#67][..common.ignite.spi.CustomTcpDiscoverySpi] Node is out of topology (probably, due to short-time network problems).
[2021-12-14T15:25:23,216][WARN ][disco-event-worker-#69][org.apache.ignite.internal.managers.discovery.GridDiscoveryManager] Local node SEGMENTED: TcpDiscoveryNode [id=702a3e0f-afc9-446e-9c9d-7ec25b185b49, consistentId=node-live-36, addrs=ArrayList [0:0:0:0:0:0:0:1%lo, 127.0.0.1, 172.17.0.1, 172.11.11.36], sockAddrs=HashSet [ip-172-11-11-36.ap-northeast-1.compute.internal/172.11.11.36:47500, ip-172-17-0-1.ap-northeast-1.compute.internal/172.17.0.1:47500, /0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500], discPort=47500, order=1, intOrder=1, lastExchangeTime=1639495523214, loc=true, ver=2.9.1#20201203-sha1:adcce517, isClient=false]
[2021-12-14T15:25:23,228][ERROR][disco-event-worker-#69][] Critical system error detected. Will be handled accordingly to configured handler [hnd=StopNodeFailureHandler [super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext [type=SEGMENTATION, err=null]]
and then node(36) shows
[2021-12-14T15:25:23,240][ERROR][node-stopper][] Stopping local node on Ignite failure: [failureCtx=FailureContext [type=SEGMENTATION, err=null]]
and this node was shutdown completely;
by the time ,i checked the log and confirmed network works well (this node could connect to other servers and other servers could connect to this node and exchange partition data,and other client node can connect to this node in order to execute query task) ;
but i don't know why others server node's show the same error log(Close incoming connection, unknown node) and cause the node shutdown ;
anybody knows the root cause; and how to prevent this thing happen again.
Network problems like this have two common causes:
A network problem(!)
A long JVM pause
You don't show in your logs what happened before the errors, but there's a good chance you'll see warnings about a "Long JVM pause," which means that no Ignite code was being executed for a period of time. In this case, it means that messages from other nodes were not being handled. There are a number of causes for long pauses, but the most common is incorrectly configured garbage collectors. See the documentation for some hints.
I configure the static IP:
TcpDiscoverySpi spi = new TcpDiscoverySpi();`TcpDiscoveryVmIpFinder ipFinder = new TcpDiscoveryVmIpFinder();ipFinder.setAddresses(Arrays.asList("76.3.16.109", "76.3.16.110","76.3.16.111", "76.3.16.112", "76.3.16.113"));`
ignite log:
Failed to send message [node=TcpDiscoveryNode [id=2402793f-f484-4f3a-9213-82beeebfd09a, consistentId=76.3.16.110:23054, addrs=ArrayList [76.3.16.110], sockAddrs=HashSet [fl-76-3-16-110.dhcp.embarqhsd.net/76.3.16.110:23054], discPort=23054, order=15, intOrder=9, lastExchangeTime=1631517404103, loc=false, ver=2.8.1#20200521-sha1:86422096, isClient=false], msg=GridQueryCancelRequest [qryReqId=3560], errMsg=Failed to send message (node left topology): TcpDiscoveryNode [id=2402793f-f484-4f3a-9213-82beeebfd09a, consistentId=76.3.16.110:23054, addrs=ArrayList [76.3.16.110], sockAddrs=HashSet [fl-76-3-16-110.dhcp.embarqhsd.net/76.3.16.110:23054], discPort=23054, order=15, intOrder=9, lastExchangeTime=1631517404103, loc=false, ver=2.8.1#20200521-sha1:86422096, isClient=false]]
/etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
10.2.144.62 tools.cmc.rnd.huawei.com
/etc/networks
default 0.0.0.0 loopback 127.0.0.0 link-local 169.254.0.0
/etc/hostname
EulerOS
I don't know which configuration has a problem
There are no similar problems in other environments
Please look at it for me, thank you
[2021-10-28 19:42:20,560][WARN ][0][0][tcp-disco-sock-reader-[d5c103a9 115.0.77.41:39585]-#4-#800][][IgniteLoggerImp][74] Failed to shutdown socket: closing inbound before receiving peer's close_notify javax.net.ssl.SSLException: closing inbound before receiving peer's close_notify
at sun.security.ssl.SSLSocketImpl.shutdownInput(SSLSocketImpl.java:735)
at sun.security.ssl.SSLSocketImpl.shutdownInput(SSLSocketImpl.java:714)
at org.apache.ignite.internal.util.IgniteUtils.close(IgniteUtils.java:4232)
at org.apache.ignite.spi.discovery.tcp.ServerImpl$SocketReader.body(ServerImpl.java:7382)
at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:58)
[2021-10-28 19:42:25,205][WARN ][0][0][jvm-pause-detector-worker][][IgniteLoggerImp][72] Possible too long JVM pause: 4503 milliseconds.
[2021-10-28 19:42:47,285][WARN ][0][0][fm.monitor.rebuild-105-1][ROOT][IgniteLoggerImp][72] Query produced big result set. [fetched=100000, duration=1250ms, type=MAP, distributedJoin=false, enforceJoinOrder=false, lazy=false, schema=alarmCache, sql='SELECT\n__Z0.MERGED __C0_0,\n__Z0.SPECIALALARMSTATUS __C0_1,\n__Z0.NATIVEMODN __C0_2,\n__Z0.SEVERITY __C0_3,\n__Z0.ACKED __C0_4,\n__Z0.CLEARED __C0_5,\n__Z0.MEDN __C0_6,\n__Z0.CSN __C0_7\nFROM "alarmCache".ALARMRECORD __Z0', plan=SELECT\n __Z0.MERGED AS __C0_0,\n __Z0.SPECIALALARMSTATUS AS __C0_1,\n __Z0.NATIVEMODN AS __C0_2,\n __Z0.SEVERITY AS __C0_3,\n __Z0.ACKED AS __C0_4,\n __Z0.CLEARED AS __C0_5,\n __Z0.MEDN AS __C0_6,\n __Z0.CSN AS __C0_7\nFROM "alarmCache".ALARMRECORD __Z0\n /* "alarmCache".ALARMRECORD._SCAN /\n / scanCount: 140023 /, node=TcpDiscoveryNode [id=4b9a0352-7d68-4a51-8f31-b4ae2405ee7c, consistentId=115.0.77.40:23054, addrs=ArrayList [115.0.77.40], sockAddrs=HashSet [EulerOS/115.0.77.40:23054], discPort=23054, order=1, intOrder=1, lastExchangeTime=1635415317696, loc=true, ver=2.11.0#20210911-sha1:8f3f07d3, isClient=false], reqId=35, segment=1]
[2021-10-28 19:42:47,287][WARN ][0][0][fm.monitor.rebuild-105-1][ROOT][IgniteLoggerImp][72] Query produced big result set. [fetched=100000, duration=1260ms, type=MAP, distributedJoin=false, enforceJoinOrder=false, lazy=false, schema=alarmCache, sql='SELECT\n__Z0.MERGED __C0_0,\n__Z0.SPECIALALARMSTATUS __C0_1,\n__Z0.NATIVEMODN __C0_2,\n__Z0.SEVERITY __C0_3,\n__Z0.ACKED __C0_4,\n__Z0.CLEARED __C0_5,\n__Z0.MEDN __C0_6,\n__Z0.CSN __C0_7\nFROM "alarmCache".ALARMRECORD __Z0', plan=SELECT\n __Z0.MERGED AS __C0_0,\n __Z0.SPECIALALARMSTATUS AS __C0_1,\n __Z0.NATIVEMODN AS __C0_2,\n __Z0.SEVERITY AS __C0_3,\n __Z0.ACKED AS __C0_4,\n __Z0.CLEARED AS __C0_5,\n __Z0.MEDN AS __C0_6,\n __Z0.CSN AS __C0_7\nFROM "alarmCache".ALARMRECORD __Z0\n / "alarmCache".ALARMRECORD._SCAN /\n / scanCount: 140688 /, node=TcpDiscoveryNode [id=4b9a0352-7d68-4a51-8f31-b4ae2405ee7c, consistentId=115.0.77.40:23054, addrs=ArrayList [115.0.77.40], sockAddrs=HashSet [EulerOS/115.0.77.40:23054], discPort=23054, order=1, intOrder=1, lastExchangeTime=1635415317696, loc=true, ver=2.11.0#20210911-sha1:8f3f07d3, isClient=false], reqId=35, segment=0]
[2021-10-28 19:42:47,453][WARN ][0][0][fm.monitor.rebuild-105-1][ROOT][IgniteLoggerImp][72] Query produced big result set. [fetched=140022, duration=1422ms, type=MAP, distributedJoin=false, enforceJoinOrder=false, lazy=false, schema=alarmCache, sql='SELECT\n__Z0.MERGED __C0_0,\n__Z0.SPECIALALARMSTATUS __C0_1,\n__Z0.NATIVEMODN __C0_2,\n__Z0.SEVERITY __C0_3,\n__Z0.ACKED __C0_4,\n__Z0.CLEARED __C0_5,\n__Z0.MEDN __C0_6,\n__Z0.CSN __C0_7\nFROM "alarmCache".ALARMRECORD __Z0', plan=SELECT\n __Z0.MERGED AS __C0_0,\n __Z0.SPECIALALARMSTATUS AS __C0_1,\n __Z0.NATIVEMODN AS __C0_2,\n __Z0.SEVERITY AS __C0_3,\n __Z0.ACKED AS __C0_4,\n __Z0.CLEARED AS __C0_5,\n __Z0.MEDN AS __C0_6,\n __Z0.CSN AS __C0_7\nFROM "alarmCache".ALARMRECORD __Z0\n / "alarmCache".ALARMRECORD._SCAN /\n / scanCount: 140023 /, node=TcpDiscoveryNode [id=4b9a0352-7d68-4a51-8f31-b4ae2405ee7c, consistentId=115.0.77.40:23054, addrs=ArrayList [115.0.77.40], sockAddrs=HashSet [EulerOS/115.0.77.40:23054], discPort=23054, order=1, intOrder=1, lastExchangeTime=1635415317696, loc=true, ver=2.11.0#20210911-sha1:8f3f07d3, isClient=false], reqId=35, segment=1]
[2021-10-28 19:42:47,461][WARN ][0][0][fm.monitor.rebuild-105-1][ROOT][IgniteLoggerImp][72] Query produced big result set. [fetched=140687, duration=1432ms, type=MAP, distributedJoin=false, enforceJoinOrder=false, lazy=false, schema=alarmCache, sql='SELECT\n__Z0.MERGED __C0_0,\n__Z0.SPECIALALARMSTATUS __C0_1,\n__Z0.NATIVEMODN __C0_2,\n__Z0.SEVERITY __C0_3,\n__Z0.ACKED __C0_4,\n__Z0.CLEARED __C0_5,\n__Z0.MEDN __C0_6,\n__Z0.CSN __C0_7\nFROM "alarmCache".ALARMRECORD __Z0', plan=SELECT\n __Z0.MERGED AS __C0_0,\n __Z0.SPECIALALARMSTATUS AS __C0_1,\n __Z0.NATIVEMODN AS __C0_2,\n __Z0.SEVERITY AS __C0_3,\n __Z0.ACKED AS __C0_4,\n __Z0.CLEARED AS __C0_5,\n __Z0.MEDN AS __C0_6,\n __Z0.CSN AS __C0_7\nFROM "alarmCache".ALARMRECORD __Z0\n / "alarmCache".ALARMRECORD._SCAN /\n / scanCount: 140688 */, node=TcpDiscoveryNode [id=4b9a0352-7d68-4a51-8f31-b4ae2405ee7c, consistentId=115.0.77.40:23054, addrs=ArrayList [115.0.77.40], sockAddrs=HashSet [EulerOS/115.0.77.40:23054], discPort=23054, order=1, intOrder=1, lastExchangeTime=1635415317696, loc=true, ver=2.11.0#20210911-sha1:8f3f07d3, isClient=false], reqId=35, segment=0]
[2021-10-28 19:43:04,759][WARN ][0][0][jvm-pause-detector-worker][ROOT][IgniteLoggerImp][72] Possible too long JVM pause: 12688 milliseconds.
[2021-10-28 19:43:04,775][WARN ][0][0][tcp-disco-msg-worker-[crd]-#2-#55][][IgniteLoggerImp][72] Failed to send message to next node [msg=TcpDiscoveryNodeAddedMessage [node=TcpDiscoveryNode [id=d5c103a9-4b8a-4430-bab1-fa63ad8066e7, consistentId=115.0.77.41:23054, addrs=ArrayList [115.0.77.41], sockAddrs=HashSet [EulerOS/115.0.77.40:23054, 115.0.77.41/115.0.77.41:23054], discPort=23054, order=0, intOrder=2, lastExchangeTime=1635421284474, loc=false, ver=2.11.0#20210911-sha1:8f3f07d3, isClient=false], dataPacket=o.a.i.spi.discovery.tcp.internal.DiscoveryDataPacket#492d48ec, discardMsgId=null, discardCustomMsgId=null, top=null, clientTop=null, gridStartTime=1635415317736, super=TcpDiscoveryAbstractMessage [sndNodeId=null, id=9d88656cc71-4b9a0352-7d68-4a51-8f31-b4ae2405ee7c, verifierNodeId=4b9a0352-7d68-4a51-8f31-b4ae2405ee7c, topVer=0, pendingIdx=0, failedNodes=null, isClient=false]], next=TcpDiscoveryNode [id=d5c103a9-4b8a-4430-bab1-fa63ad8066e7, consistentId=115.0.77.41:23054, addrs=ArrayList [115.0.77.41], sockAddrs=HashSet [EulerOS/115.0.77.40:23054, 115.0.77.41/115.0.77.41:23054], discPort=23054, order=0, intOrder=2, lastExchangeTime=1635421284474, loc=false, ver=2.11.0#20210911-sha1:8f3f07d3, isClient=false], errMsg=Failed to send message to next node [msg=TcpDiscoveryNodeAddedMessage [node=TcpDiscoveryNode [id=d5c103a9-4b8a-4430-bab1-fa63ad8066e7, consistentId=115.0.77.41:23054, addrs=ArrayList [115.0.77.41], sockAddrs=HashSet [EulerOS/115.0.77.40:23054, 115.0.77.41/115.0.77.41:23054], discPort=23054, order=0, intOrder=2, lastExchangeTime=1635421284474, loc=false, ver=2.11.0#20210911-sha1:8f3f07d3, isClient=false], dataPacket=o.a.i.spi.discovery.tcp.internal.DiscoveryDataPacket#492d48ec, discardMsgId=null, discardCustomMsgId=null, top=null, clientTop=null, gridStartTime=1635415317736, super=TcpDiscoveryAbstractMessage [sndNodeId=null, id=9d88656cc71-4b9a0352-7d68-4a51-8f31-b4ae2405ee7c, verifierNodeId=4b9a0352-7d68-4a51-8f31-b4ae2405ee7c, topVer=0, pendingIdx=0, failedNodes=null, isClient=false]], next=ClusterNode [id=d5c103a9-4b8a-4430-bab1-fa63ad8066e7, order=0, addr=[115.0.77.41], daemon=false]]]
[2021-10-28 19:43:04,777][WARN ][0][0][tcp-disco-msg-worker-[crd]-#2-#55][ROOT][IgniteLoggerImp][72] Local node has detected failed nodes and started cluster-wide procedure. To speed up failure detection please see 'Failure Detection' section under javadoc for 'TcpDiscoverySpi'
[2021-10-28 19:43:04,814][WARN ][0][0][disco-event-worker-#57][][IgniteLoggerImp][72] Node FAILED: TcpDiscoveryNode [id=d5c103a9-4b8a-4430-bab1-fa63ad8066e7, consistentId=115.0.77.41:23054, addrs=ArrayList [115.0.77.41], sockAddrs=HashSet [EulerOS/115.0.77.40:23054, 115.0.77.41/115.0.77.41:23054], discPort=23054, order=2, intOrder=2, lastExchangeTime=1635421284474, loc=false, ver=2.11.0#20210911-sha1:8f3f07d3, isClient=false]
[2021-10-28 19:42:20,907][DEBUG][0][0][tcp-disco-msg-worker-[]-#2-#35][ROOT]
[IgniteLoggerImp][51] Message has been added to a worker's queue: TcpDiscoveryStatusCheckMessage [creatorNode=null, failedNodeId=null, status=0, super=TcpDiscoveryAbstractMessage [sndNodeId=null, id=04671b6cc71-d5c103a9-4b8a-4430-bab1-fa63ad8066e7, verifierNodeId=null, topVer=0, pendingIdx=0, failedNodes=null, isClient=false]]
[2021-10-28 19:42:20,907][DEBUG][0][0][tcp-disco-msg-worker-[]-#2-#35][ROOT]
[IgniteLoggerImp][51] Processing message [cls=TcpDiscoveryStatusCheckMessage, id=04671b6cc71-d5c103a9-4b8a-4430-bab1-fa63ad8066e7]
[2021-10-28 19:42:20,907][DEBUG][0][0][tcp-disco-msg-worker-[]-#2-#35][ROOT]
[IgniteLoggerImp][51] Ignore message, local node order is not initialized
[msg=TcpDiscoveryStatusCheckMessage [creatorNode=null, failedNodeId=null, status=0, super=TcpDiscoveryAbstractMessage [sndNodeId=null, id=04671b6cc71-d5c103a9-4b8a-4430-bab1-fa63ad8066e7, verifierNodeId=null, topVer=0, pendingIdx=0, failedNodes=null, isClient=false]],
locNode=Tcp
We upgraded a bunch of libraries in our application one of them is ignite. Right now the ignite running in client mode is crashing. My thinking is that one of the upgrades caused the cache to have increased in size. (so I don't think the upgrade of ignite is the problem).
So I increased the heap size from 10 to 20 GB. But when about 50% is used the JVM hangs.
I'm confused on why it does this when there is only 50% in use.
[12/3/20 16:07:58:788 GMT] 000000c4 IgniteKernal I .... Heap [used=9937MB, free=51.48%, comm=10680MB]
followed by
[12/3/20 16:08:26:410 GMT] 000000bd IgniteKernal W Possible too long JVM pause: 2418 milliseconds.
[12/3/20 16:08:27:465 GMT] 000000c5 TcpCommunicat W Client disconnected abruptly due to network connection loss or because the connection was left open on application shutdown. [cls=class o.a.i.i.util.nio.GridNioException, msg=Connection reset by peer]
[12/3/20 16:08:27:411 GMT] 000000c5 TcpCommunicat E Failed to process selector key [ses=GridSelectorNioSessionImpl [worker=DirectNioClientWorker [super=AbstractNioClientWorker [idx=0, bytesRcvd=48849402273, bytesSent=15994664546, bytesRcvd0=54446, bytesSent0=102, select=true, super=GridWorker [name=grid-nio-worker-tcp-comm-0, igniteInstanceName=null, finished=false, heartbeatTs=1607011706410, hashCode=433635054, interrupted=false, runner=grid-nio-worker-tcp-comm-0-#51]]], writeBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], inRecovery=GridNioRecoveryDescriptor [acked=9025120, resendCnt=0, rcvCnt=9025150, sentCnt=9025152, reserved=true, lastAck=9025120, nodeLeft=false, node=TcpDiscoveryNode [id=b3ca311e-077f-42a5-884a-807b539730b6, consistentId=10.60.46.12:48500, addrs=ArrayList [10.60.46.12], sockAddrs=HashSet [hex-wgc-p-web02/10.60.46.12:48500], discPort=48500, order=1, intOrder=1, lastExchangeTime=1607006097079, loc=false, ver=2.9.0#20201015-sha1:70742da8, isClient=false], connected=false, connectCnt=1, queueLimit=4096, reserveCnt=1, pairedConnections=false], outRecovery=GridNioRecoveryDescriptor [acked=9025120, resendCnt=0, rcvCnt=9025150, sentCnt=9025152, reserved=true, lastAck=9025120, nodeLeft=false, node=TcpDiscoveryNode [id=b3ca311e-077f-42a5-884a-807b539730b6, consistentId=10.60.46.12:48500, addrs=ArrayList [10.60.46.12], sockAddrs=HashSet [hex-wgc-p-web02/10.60.46.12:48500], discPort=48500, order=1, intOrder=1, lastExchangeTime=1607006097079, loc=false, ver=2.9.0#20201015-sha1:70742da8, isClient=false], connected=false, connectCnt=1, queueLimit=4096, reserveCnt=1, pairedConnections=false], closeSocket=true, outboundMessagesQueueSizeMetric=o.a.i.i.processors.metric.impl.LongAdderMetric#69a257d1, super=GridNioSessionImpl [locAddr=/10.223.132.3:52550, rmtAddr=/10.60.46.12:48100, createTime=1607006097572, closeTime=0, bytesSent=15994657850, bytesRcvd=48849402273, bytesSent0=102, bytesRcvd0=54446, sndSchedTime=1607006097572, lastSndTime=1607011706410, lastRcvTime=1607011706410, readsPaused=false, filterChain=FilterChain[filters=[GridNioCodecFilter [parser=o.a.i.i.util.nio.GridDirectParser#93200255, directMode=true], GridConnectionBytesVerifyFilter], accepted=false, markedForClose=false]]]
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:51)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:235)
at sun.nio.ch.IOUtil.read(IOUtil.java:204)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:394)
at org.apache.ignite.internal.util.nio.GridNioServer$DirectNioClientWorker.processRead(GridNioServer.java:1330)
at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.processSelectedKeysOptimized(GridNioServer.java:2472)
at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:2239)
at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.body(GridNioServer.java:1880)
at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
at java.lang.Thread.run(Thread.java:822)
[12/3/20 16:08:44:437 GMT] 000000c4 SystemOut O [16:08:44] Possible failure suppressed accordingly to a configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker [name=tcp-comm-worker, igniteInstanceName=null, finished=false, heartbeatTs=1607011706420]]]
[12/3/20 16:08:44:436 GMT] 000000c4 W java.util.logging.LogManager$RootLogger log Possible failure suppressed accordingly to a configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker [name=tcp-comm-worker, igniteInstanceName=null, finished=false, heartbeatTs=1607011706420]]]
class org.apache.ignite.IgniteException: GridWorker [name=tcp-comm-worker, igniteInstanceName=null, finished=false, heartbeatTs=1607011706420]
at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$3.apply(IgnitionEx.java:1806)
at org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance$3.apply(IgnitionEx.java:1801)
at org.apache.ignite.internal.worker.WorkersRegistry.onIdle(WorkersRegistry.java:234)
at org.apache.ignite.internal.util.worker.GridWorker.onIdle(GridWorker.java:297)
at org.apache.ignite.internal.processors.timeout.GridTimeoutProcessor$TimeoutWorker.body(GridTimeoutProcessor.java:221)
at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
at java.lang.Thread.run(Thread.java:822)
[12/3/20 16:08:44:434 GMT] 000000c4 G W Thread [name="tcp-comm-worker-#1-#63", id=211, state=WAITING, blockCnt=2, waitCnt=100]
[12/3/20 16:08:44:432 GMT] 000000c4 G E Blocked system-critical thread has been detected. This can lead to cluster-wide undefined behaviour [workerName=tcp-comm-worker, threadName=tcp-comm-worker-#1-#63, blockedFor=18s]
[12/3/20 16:09:14:486 GMT] 000000c4 SystemOut O [16:09:14] Possible failure suppressed accordingly to a configured handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=UnmodifiableSet [SYSTEM_WORKER_BLOCKED, SYSTEM_CRITICAL_OPERATION_TIMEOUT]]], failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class o.a.i.IgniteException: GridWorker [name=tcp-comm-worker, igniteInstanceName=null, finished=false, heartbeatTs=1607011736000]]]
These look like network issues.
[12/3/20 16:08:27:411 GMT] 000000c5 TcpCommunicat E Failed to process selector key
[workerName=tcp-comm-worker, threadName=tcp-comm-worker-#1-#63, blockedFor=18s]
Check that you are able to connect from your client machines to your server machines and that firewall configs are properly set up.
see: https://ignite.apache.org/docs/latest/clustering/network-configuration
make sure you've set: Djava.net.preferIPv4Stack=true if you are using IP v4 addresses.
If there are containers and/or private addresses involved, it might cause connection issues.
See: https://ignite.apache.org/docs/latest/clustering/running-client-nodes-behind-nat#limitations
i want connect apache ignite from nodejs by redis's interface
in ignite config, i add bellow xml defintion in config-default.xml:
<property name="connectorConfiguration">
<bean class="org.apache.ignite.configuration.ConnectorConfiguration">
<property name="host" value="localhost"/>
<property name="port" value="6379"/>
</bean>
</property>
and run ignite.bat (testing platform in win7 64bit)
console message seeming running fine:
[15:59:09] To start Console Management & Monitoring run ignitevisorcmd.{sh|bat}
[15:59:09]
[15:59:09] Ignite node started OK (id=b58f9f35)
[15:59:09] Topology snapshot [ver=1, servers=1, clients=0, CPUs=4, heap=1.0GB]
when test a nodejs code such as:
var redis = require("redis");
var client = redis.createClient({detect_buffers: true});
client.get("test", function (err, reply) {
if(err == null){
console.log('reply:'+reply);
}
else{
console.log('error:'+err);
}
});
client.quit();
got a error message in nodejs
vents.js:183
throw er; // Unhandled 'error' event
^
AbortError: Ready check failed: Stream connection ended and command aborted. It might have been processed.
at RedisClient.flush_and_error (d:\APP\nodejs\mcanserver\node_modules\redis\index.js:362:23)
at RedisClient.connection_gone (d:\APP\nodejs\mcanserver\node_modules\redis\index.js:597:14)
at Socket.<anonymous> (d:\APP\nodejs\mcanserver\node_modules\redis\index.js:293:14)
at Object.onceWrapper (events.js:313:30)
at emitNone (events.js:111:20)
at Socket.emit (events.js:208:7)
at endReadableNT (_stream_readable.js:1064:12)
at _combinedTickCallback (internal/process/next_tick.js:138:11)
at process._tickCallback (internal/process/next_tick.js:180:9)
Waiting for the debugger to disconnect...
AbortError
and errors in ignite console:
[16:08:59,809][SEVERE][grid-nio-worker-tcp-rest-0-#36][GridTcpRestProtocol] Fail
ed to process selector key [ses=GridSelectorNioSessionImpl [worker=ByteBufferNio
ClientWorker [readBuf=java.nio.HeapByteBuffer[pos=14 lim=14 cap=8192], super=Abs
tractNioClientWorker [idx=0, bytesRcvd=0, bytesSent=0, bytesRcvd0=0, bytesSent0=
0, select=true, super=GridWorker [name=grid-nio-worker-tcp-rest-0, igniteInstanc
eName=null, finished=false, hashCode=1424564915, interrupted=false, runner=grid-
nio-worker-tcp-rest-0-#36]]], writeBuf=null, readBuf=null, inRecovery=null, outR
ecovery=null, super=GridNioSessionImpl [locAddr=/127.0.0.1:6379, rmtAddr=/127.0.
0.1:18970, createTime=1527667738762, closeTime=0, bytesSent=0, bytesRcvd=14, byt
esSent0=0, bytesRcvd0=14, sndSchedTime=1527667738762, lastSndTime=1527667738762,
lastRcvTime=1527667738762, readsPaused=false, filterChain=FilterChain[filters=[
GridNioCodecFilter [parser=GridTcpRestParser [jdkMarshaller=JdkMarshaller [], ro
uterClient=false], directMode=false]], accepted=true]]]
java.lang.IllegalArgumentException: No enum constant org.apache.ignite.internal.
processors.rest.protocols.tcp.redis.GridRedisCommand.INFO
at java.lang.Enum.valueOf(Unknown Source)
at org.apache.ignite.internal.processors.rest.protocols.tcp.redis.GridRe
disCommand.valueOf(GridRedisCommand.java:26)
at org.apache.ignite.internal.processors.rest.protocols.tcp.redis.GridRe
disMessage.command(GridRedisMessage.java:124)
at org.apache.ignite.internal.processors.rest.protocols.tcp.redis.GridRe
disNioListener.onMessage(GridRedisNioListener.java:132)
at org.apache.ignite.internal.processors.rest.protocols.tcp.GridTcpRestN
ioListener.onMessage(GridTcpRestNioListener.java:193)
at org.apache.ignite.internal.processors.rest.protocols.tcp.GridTcpRestN
ioListener.onMessage(GridTcpRestNioListener.java:94)
at org.apache.ignite.internal.util.nio.GridNioFilterChain$TailFilter.onM
essageReceived(GridNioFilterChain.java:279)
at org.apache.ignite.internal.util.nio.GridNioFilterAdapter.proceedMessa
geReceived(GridNioFilterAdapter.java:109)
i need help to known what's happen? and how to fixed this error, thanks.
I'm afraid that your client uses a command INFO which isn't implemented by Apache Ignite. Is it possible to try alternative client or maybe a memcached client? Note that the latter has binary and text forms, only one of which is supported.
just ran into the same issue with node.js redis client.. indeed, INFO operation is not supported by Apache Ignite, and node redis client uses it to check if Redis is ready to serve queries.. a workaround is to turn it off:
redis.createClient({no_ready_check: true});