apache ignite node not able to join in the cluster - ignite

I'm using apacheignite:2.5.0 docker image deployed in 2 different
ec2-instances and using static IP finder config below is the config file, one of the node is unable to join in the cluster. I have attached logs also please find
below its accepting connection and disconnecting , i ran docker container with --net=host so conatainer attach all ports to host machine and all ports are opened in security group
#
**>
<beans xmlns="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:util="http://www.springframework.org/schema/util"
xsi:schemaLocation="
http://www.springframework.org/schema/beans
http://www.springframework.org/schema/beans/spring-beans.xsd
http://www.springframework.org/schema/util
http://www.springframework.org/schema/util/spring-util.xsd">
<bean abstract="false" id="ignite.cfg" class="org.apache.ignite.configuration.IgniteConfiguration">
<property name="discoverySpi">
<bean class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">
<property name="ipFinder">
<bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder">
<property name="addresses">
<list>
<value>34.241.10.9:47500</value>
</list>
</property>
</bean>
</property>
</bean>
</property>
</bean>
</beans>**
[12:59:25,309][INFO][disco-event-worker-#37][GridDiscoveryManager] Added new node to topology: TcpDiscoveryNode [id=07b55edb-cdb7-45eb-bfd6-36fe9c5f5f15, addrs=[0:0:0:0:0:0:0:1%lo, 127.0.0.1, 172.17.0.1, 172.18.0.1, 172.31.29.3], sockAddrs=[/172.31.29.3:47500, /172.17.0.1:47500, /0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500, /172.18.0.1:47500], discPort=47500, order=312, intOrder=157, lastExchangeTime=1529067545288, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false]
[12:59:25,309][INFO][disco-event-worker-#37][GridDiscoveryManager] Topology snapshot [ver=312, servers=2, clients=0, CPUs=6, offheap=3.8GB, heap=2.0GB]
[12:59:25,309][INFO][disco-event-worker-#37][GridDiscoveryManager] Data Regions Configured:
[12:59:25,309][INFO][disco-event-worker-#37][GridDiscoveryManager] ^-- default [initSize=256.0 MiB, maxSize=710.0 MiB, persistenceEnabled=false]
[12:59:25,309][INFO][exchange-worker-#38][time] Started exchange init [topVer=AffinityTopologyVersion [topVer=312, minorTopVer=0], crd=true, evt=NODE_JOINED, evtNode=07b55edb-cdb7-45eb-bfd6-36fe9c5f5f15, customEvt=null, allowMerge=true]
[12:59:25,309][WARNING][disco-event-worker-#37][GridDiscoveryManager] Node FAILED: TcpDiscoveryNode [id=07b55edb-cdb7-45eb-bfd6-36fe9c5f5f15, addrs=[0:0:0:0:0:0:0:1%lo, 127.0.0.1, 172.17.0.1, 172.18.0.1, 172.31.29.3], sockAddrs=[/172.31.29.3:47500, /172.17.0.1:47500, /0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500, /172.18.0.1:47500], discPort=47500, order=312, intOrder=157, lastExchangeTime=1529067545288, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false]
[12:59:25,310][INFO][exchange-worker-#38][GridDhtPartitionsExchangeFuture] Finished waiting for partition release future [topVer=AffinityTopologyVersion [topVer=312, minorTopVer=0], waitTime=0ms, futInfo=NA]
[12:59:25,310][INFO][exchange-worker-#38][time] Finished exchange init [topVer=AffinityTopologyVersion [topVer=312, minorTopVer=0], crd=true]
[12:59:25,310][INFO][disco-event-worker-#37][GridDiscoveryManager] Topology snapshot [ver=313, servers=1, clients=0, CPUs=2, offheap=0.69GB, heap=1.0GB]
[12:59:25,310][INFO][disco-event-worker-#37][GridDiscoveryManager] Data Regions Configured:
[12:59:25,310][INFO][disco-event-worker-#37][GridDiscoveryManager] ^-- default [initSize=256.0 MiB, maxSize=710.0 MiB, persistenceEnabled=false]
[12:59:25,310][INFO][disco-event-worker-#37][GridDhtPartitionsExchangeFuture] Coordinator received all messages, try merge [ver=AffinityTopologyVersion [topVer=312, minorTopVer=0]]
[12:59:25,311][INFO][disco-event-worker-#37][GridCachePartitionExchangeManager] Merge exchange future [curFut=AffinityTopologyVersion [topVer=312, minorTopVer=0], mergedFut=AffinityTopologyVersion [topVer=313, minorTopVer=0], evt=NODE_FAILED, evtNode=07b55edb-cdb7-45eb-bfd6-36fe9c5f5f15, evtNodeClient=false]
[12:59:25,311][INFO][disco-event-worker-#37][GridDhtPartitionsExchangeFuture] finishExchangeOnCoordinator [topVer=AffinityTopologyVersion [topVer=312, minorTopVer=0], resVer=AffinityTopologyVersion [topVer=313, minorTopVer=0]]
[12:59:25,311][INFO][disco-event-worker-#37][GridDhtPartitionsExchangeFuture] Finish exchange future [startVer=AffinityTopologyVersion [topVer=312, minorTopVer=0], resVer=AffinityTopologyVersion [topVer=313, minorTopVer=0], err=null]
[12:59:25,312][INFO][exchange-worker-#38][GridCachePartitionExchangeManager] Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion [topVer=313, minorTopVer=0], evt=NODE_JOINED, node=07b55edb-cdb7-45eb-bfd6-36fe9c5f5f15]
[12:59:25,315][INFO][grid-timeout-worker-#23][IgniteKernal]
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
^-- Node [id=225f750c, uptime=01:42:00.504]
^-- H/N/C [hosts=1, nodes=1, CPUs=2]
^-- CPU [cur=0.17%, avg=0.4%, GC=0%]
^-- PageMemory [pages=200]
^-- Heap [used=73MB, free=92.47%, comm=981MB]
^-- Non heap [used=53MB, free=96.47%, comm=55MB]
^-- Outbound messages queue [size=0]
^-- Public thread pool [active=0, idle=6, qSize=0]
^-- System thread pool [active=0, idle=8, qSize=0]
[12:59:25,320][INFO][tcp-disco-srvr-#3][TcpDiscoverySpi] TCP discovery accepted incoming connection [rmtAddr=/34.241.7.9, rmtPort=53627]
[12:59:25,320][INFO][tcp-disco-srvr-#3][TcpDiscoverySpi] TCP discovery spawning a new thread for connection [rmtAddr=/34.241.7.9, rmtPort=53627]
[12:59:25,320][INFO][tcp-disco-sock-reader-#628][TcpDiscoverySpi] Started serving remote node connection [rmtAddr=/34.241.7.9:53627, rmtPort=53627]
[12:59:25,325][INFO][tcp-disco-sock-reader-#628][TcpDiscoverySpi] Finished serving remote node connection [rmtAddr=/34.241.7.9:53627, rmtPort=53627
[12:59:30,332][INFO][tcp-disco-srvr-#3][TcpDiscoverySpi] TCP discovery accepted incoming connection [rmtAddr=/34.241.7.9, rmtPort=50418]
[12:59:30,332][INFO][tcp-disco-srvr-#3][TcpDiscoverySpi] TCP discovery spawning a new thread for connection [rmtAddr=/34.241.7.9, rmtPort=50418]
[12:59:30,332][INFO][tcp-disco-sock-reader-#629][TcpDiscoverySpi] Started serving remote node connection [rmtAddr=/34.241.7.9:50418, rmtPort=50418]
[12:59:30,334][INFO][tcp-disco-sock-reader-#629][TcpDiscoverySpi] Finished
2nd ignite node logs
[12:13:12,850][INFO][main][TcpCommunicationSpi] Successfully bound communication NIO server to TCP port [port=47100, locHost=0.0.0.0/0.0.0.0, selectorsCnt=4, selectorSpins=0, pairedConn=false]
[12:13:12,869][WARNING][main][TcpCommunicationSpi] Message queue limit is set to 0 which may lead to potential OOMEs when running cache operations in FULL_ASYNC or PRIMARY_SYNC modes due to message queues growth on sender and receiver sides.
[12:13:12,888][WARNING][main][NoopCheckpointSpi] Checkpoints are disabled (to enable configure any GridCheckpointSpi implementation)
[12:13:12,918][WARNING][main][GridCollisionManager] Collision resolution is disabled (all jobs will be activated upon arrival).
[12:13:12,919][INFO][main][IgniteKernal] Security status [authentication=off, tls/ssl=off]
[12:13:13,275][INFO][main][ClientListenerProcessor] Client connector processor has started on TCP port 10800
[12:13:13,328][INFO][main][GridTcpRestProtocol] Command protocol successfully started [name=TCP binary, host=0.0.0.0/0.0.0.0, port=11211]
[12:13:13,369][INFO][main][IgniteKernal] Non-loopback local IPs: 172.17.0.1, 172.18.0.1, 172.31.29.3, fe80:0:0:0:10f0:92ff:fea1:d09f%vethee2519f, fe80:0:0:0:42:19ff:fe73:ee80%docker_gwbridge, fe80:0:0:0:42:e6ff:fe14:144a%docker0, fe80:0:0:0:4b3:6ff:fe01:7ee0%eth0, fe80:0:0:0:64f4:8bff:fe83:7e97%vethdae9948, fe80:0:0:0:9474:a1ff:fe6b:3368%vethcb2500f
[12:13:13,370][INFO][main][IgniteKernal] Enabled local MACs: 02421973EE80, 0242E614144A, 06B306017EE0, 12F092A1D09F, 66F48B837E97, 9674A16B3368
[12:13:13,429][INFO][main][TcpDiscoverySpi] Successfully bound to TCP port [port=47500, localHost=0.0.0.0/0.0.0.0, locNodeId=07b55edb-cdb7-45eb-bfd6-36fe9c5f5f15]
[12:13:18,555][WARNING][main][TcpDiscoverySpi] Node has not been connected to topology and will repeat join process. Check remote nodes logs for possible error messages. Note that large topology may require significant time to start. Increase 'TcpDiscoverySpi.networkTimeout' configuration property if getting this message on the starting nodes [networkTimeout=5000]
[12:18:20,925][WARNING][main][TcpDiscoverySpi] Node has not been connected to topology and will repeat join process. Check remote nodes logs for possible error messages. Note that large topology may require significant time to start. Increase 'TcpDiscoverySpi.networkTimeout' configuration property if getting this message on the starting nodes [networkTimeout=5000]
[12:23:22,710][WARNING][main][TcpDiscoverySpi] Node has not been connected to topology and will repeat join process. Check remote nodes logs for possible error messages. Note that large topology may require significant time to start. Increase 'TcpDiscoverySpi.networkTimeout' configuration property if getting this message on the starting nodes [networkTimeout=5000]
[12:28:23,988][WARNING][main][TcpDiscoverySpi] Node has not been connected to topology and will repeat join process. Check remote nodes logs for possible error messages. Note that large topology may require significant time to start. Increase 'TcpDiscoverySpi.networkTimeout' configuration property if getting this message on the starting nodes [networkTimeout=5000]
[12:33:25,004][WARNING][main][TcpDiscoverySpi] Node has not been connected to topology and will repeat join process. Check remote nodes logs for possible error messages. Note that large topology may require significant time to start. Increase 'TcpDiscoverySpi.networkTimeout' configuration property if getting this message on the starting nodes [networkTimeout=5000]
[12:38:25,815][WARNING][main][TcpDiscoverySpi] Node has not been connected to topology and will repeat join process. Check remote nodes logs for possible error messages. Note that large topology may require significant time to start. Increase 'TcpDiscoverySpi.networkTimeout' configuration property if getting this message on the starting nodes [networkTimeout=5000]
[12:43:26,831][WARNING][main][TcpDiscoverySpi] Node has not been connected to topology and will repeat join process. Check remote nodes logs for possible error messages. Note that large topology may require significant time to start. Increase 'TcpDiscoverySpi.networkTimeout' configuration property if getting this message on the starting nodes [networkTimeout=5000]
[12:48:27,916][WARNING][main][TcpDiscoverySpi] Node has not been connected to topology and will repeat join process. Check remote nodes logs for possible error messages. Note that large topology may require significant time to start. Increase 'TcpDiscoverySpi.networkTimeout' configuration property if getting this message on the starting nodes [networkTimeout=5000]

If you are using same config file for starting 2 nodes, then try to use localPortRange in DiscoverySpi.

Related

Remote Apache Ignite cluster connection failures

I can successfully join and leave a single node Apache Ignite 2.8.1 topology running as Docker container on my local Docker server.
Running the exact same program but on a remote Docker server I can see my program joining the cluster topology but before the connection completes I am getting the following connection error
SEVERE: Failed to send message to remote node [node=TcpDiscoveryNode [id=a239f009-bddd-4a06-845f-abb304850849, consistentId=127.0.0.1,172.17.0.13:42002, addrs=ArrayList [127.0.0.1, 172.17.0.13], sockAddrs=HashSet [/172.17.0.13:42002, /127.0.0.1:42002], discPort=42002, order=1, intOrder=1, lastExchangeTime=1605015503009, loc=false, ver=2.8.1#20200521-sha1:86422096, isClient=false], msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0, skipOnTimeout=false, msg=GridDhtPartitionsSingleMessage [parts=null, partCntrs=null, partsSizes=null, partHistCntrs=null, err=null, client=true, exchangeStartTime=106333448635300, finishMsg=null, super=GridDhtPartitionsAbstractMessage [exchId=GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=2, minorTopVer=0], discoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode [id=dc9a3700-5377-4095-ac2b-31a2cea3d9a5, consistentId=dc9a3700-5377-4095-ac2b-31a2cea3d9a5, addrs=ArrayList [0:0:0:0:0:0:0:1, 10.91.7.30, 127.0.0.1, 192.168.1.81, 192.168.38.1], sockAddrs=HashSet [host.docker.internal/192.168.1.81:0, /0:0:0:0:0:0:0:1:0, GBLG7Y7GH2.mshome.net/192.168.38.1:0, /127.0.0.1:0, GBLG7Y7GH2.enterprisenet.org/10.91.7.30:0], discPort=0, order=2, intOrder=0, lastExchangeTime=1605015498538, loc=true, ver=2.8.1#20200521-sha1:86422096, isClient=true], topVer=2, nodeId8=dc9a3700, msg=null, type=NODE_JOINED, tstamp=1605015505481], nodeId=dc9a3700, evt=NODE_JOINED], lastVer=GridCacheVersion [topVer=0, order=1605015496511, nodeOrder=0], super=GridCacheMessage [msgId=1, depInfo=null, lastAffChangedTopVer=AffinityTopologyVersion [topVer=-1, minorTopVer=0], err=null, skipPrepare=false]]]]]
class org.apache.ignite.IgniteCheckedException: Failed to connect to node (is node still alive?). Make sure that each ComputeTask and cache Transaction has a timeout set in order to prevent parties from waiting forever in case of network issues [nodeId=a239f009-bddd-4a06-845f-abb304850849, addrs=[/172.17.0.13:42003, /127.0.0.1:42003]]
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioSession(TcpCommunicationSpi.java:3738)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3458)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createCommunicationClient(TcpCommunicationSpi.java:3198)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:3078)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2918)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2877)
at org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:2035)
at org.apache.ignite.internal.managers.communication.GridIoManager.sendToGridTopic(GridIoManager.java:2132)
at org.apache.ignite.internal.processors.cache.GridCacheIoManager.send(GridCacheIoManager.java:1257)
at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.sendLocalPartitions(GridDhtPartitionsExchangeFuture.java:2020)
at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.clientOnlyExchange(GridDhtPartitionsExchangeFuture.java:1436)
at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:903)
at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:3214)
at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:3063)
at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
at java.lang.Thread.run(Thread.java:748)
Suppressed: class org.apache.ignite.IgniteCheckedException: Failed to connect to node (is node still alive?). Make sure that each ComputeTask and cache Transaction has a timeout set in order to prevent parties from waiting forever in case of network issues [nodeId=a239f009-bddd-4a06-845f-abb304850849, addrs=[/172.17.0.13:42003, /127.0.0.1:42003]]
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioSession(TcpCommunicationSpi.java:3740)
... 15 more
Caused by: java.net.SocketTimeoutException
at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:129)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioSession(TcpCommunicationSpi.java:3584)
... 15 more
Caused by: java.net.SocketTimeoutException
at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:129)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioSession(TcpCommunicationSpi.java:3584)
... 15 more
In my view the problem relates to the client connection settings, so I tried to increase the client discovery SPI "joinTimeout", "networkTimeout" and "socketTimeout" settings as well as the "connectionTimeout" and "socketWriteTimeout" settings but without success.
You have to set up an AddressResolver for the node running inside the remote Docker container.
Have a look at: https://www.gridgain.com/docs/latest/installation-guide/aws/manual-install-on-ec2#connecting-a-client-node
If you're using Spring configuration, then your config should look something like that:
<property name="addressResolver">
<bean class="org.apache.ignite.configuration.BasicAddressResolver">
<constructor-arg>
<map>
<entry key="172.31.59.27" value="3.93.186.198"/>
</map>
</constructor-arg>
</bean>
</property>
<!-- other properties -->
<!-- Discovery configuration -->
</bean>
Here 172.31.59.27 is an inner IP and 3.93.186.198 is an external IP, that you're connecting to.
Did you open 47500 and 45100 ports both way between your Docker and remote node?

Apache Ignite Topology Snapshot refreshed .Net

I start the Apache Ingnite node, a server node, and another client node.
My scenario is: Close the client node, and how to update the service node Topology Snapshot at the same time.
Now, the Topology Snapshot is refreshed only when the NodeFailed event is received by the server after 20 seconds.
What method or configuration on the server side can receive the NodeFailed event immediately or refresh the Topology Snapshot?
This is server log:
[09:08:50,522][WARNING][disco-event-worker-#45%ignite-instance-f69c161b-9f38-4576-b52b-ef3077ba3156%][GridDiscoveryManager] Node FAILED: TcpDiscoveryNode [id=5f346db2-50fd-4d83-b518-a09690569274, consistentId=5f346db2-50fd-4d83-b518-a09690569274, addrs=ArrayList [0:0:0:0:0:0:0:1, 127.0.0.1, 192.168.40.1, 192.168.50.135, 192.168.65.1], sockAddrs=HashSet [DESKTOP-1BLUS7R/192.168.40.1:0, /[0:0:0:0:0:0:0:1]:0, /127.0.0.1:0, /192.168.65.1:0, /192.168.50.135:0], discPort=0, order=3, intOrder=3, lastExchangeTime=1602810475243, loc=false, ver=2.8.1#20200521-sha1:86422096, isClient=true]
[09:08:50,525][INFO][disco-event-worker-#45%ignite-instance-f69c161b-9f38-4576-b52b-ef3077ba3156%][GridDiscoveryManager] Topology snapshot [ver=5, locNode=f6d3f760, servers=1, clients=0, state=ACTIVE, CPUs=6, offheap=1.5GB, heap=2.0GB]
[09:08:50,525][INFO][disco-event-worker-#45%ignite-instance-f69c161b-9f38-4576-b52b-ef3077ba3156%][GridDiscoveryManager] ^-- Baseline [id=0, size=1, online=1, offline=0]
[
Can reduce the service node attribute ClientFailureDetectionTimeout, increase the server check the frequency of client nodes.The default is 30 seconds.
//
// 摘要:
//Gets or sets the failure detection timeout used by Apache.Ignite.Core.Discovery.Tcp.TcpDiscoverySpi
//and Apache.Ignite.Core.Communication.Tcp.TcpCommunicationSpi for client nodes.
[DefaultValue(typeof(TimeSpan), "00:00:30")]
public TimeSpan ClientFailureDetectionTimeout { get; set; }

Ignite tomee-webprofile-7.1.0-> Start error reporting at Tomee

[1 image description here][1][2 image description here][2]
3 image description hereFailed to unmarshal discovery data for component: 1
class org.apache.ignite.IgniteCheckedException: Failed to deserialize object with given class loader: TomEEWebappClassLoader
context: chinawork
delegate: false
[16:27:05] ver. 2.7.0#20181201-sha1:256ae401
[16:27:05] 2018 Copyright(C) Apache Software Foundation
[16:27:05]
[16:27:05] Ignite documentation: http://ignite.apache.org
[16:27:05]
[16:27:05] Quiet mode.
[16:27:05] ^-- Logging by 'JavaLogger [quiet=true, config=null]'
[16:27:05] ^-- To see **FULL** console log here add -DIGNITE_QUIET=false or "-v" to ignite.{sh|bat}
[16:27:05]
[16:27:05] OS: Windows 10 10.0 amd64
[16:27:05] VM information: Java(TM) SE Runtime Environment 1.8.0_152-b16 Oracle Corporation Java HotSpot(TM) 64-Bit Server VM 25.152-b16
[16:27:05] Please set system property '-Djava.net.preferIPv4Stack=true' to avoid possible problems in mixed environments.
[16:27:05] Initial heap size is 126MB (should be no less than 512MB, use -Xms512m -Xmx512m).
[16:27:05] Configured plugins:
[16:27:05] ^-- None
[16:27:05]
[16:27:05] Configured failure handler: [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED]]]]
[16:27:06] Message queue limit is set to 0 which may lead to potential OOMEs when running cache operations in FULL_ASYNC or PRIMARY_SYNC modes due to message queues growth on sender and receiver sides.
[16:27:06] Security status [authentication=off, tls/ssl=off]
[16:27:07] REST protocols do not start on client node. To start the protocols on client node set '-DIGNITE_REST_START_ON_CLIENT=true' system property.
十二月 11, 2018 4:27:13 下午 org.apache.ignite.logger.java.JavaLogger error
严重: Failed to unmarshal discovery data for component: 1
class org.apache.ignite.IgniteCheckedException: Failed to deserialize object with given class loader: TomEEWebappClassLoader
context: cnf-soa
delegate: false
----------> Parent Classloader:
java.net.URLClassLoader#f6f4d33
at org.apache.ignite.marshaller.jdk.JdkMarshaller.unmarshal0(JdkMarshaller.java:147)
at org.apache.ignite.marshaller.AbstractNodeNameAwareMarshaller.unmarshal(AbstractNodeNameAwareMarshaller.java:94)
at org.apache.ignite.marshaller.jdk.JdkMarshaller.unmarshal0(JdkMarshaller.java:161)
at org.apache.ignite.marshaller.AbstractNodeNameAwareMarshaller.unmarshal(AbstractNodeNameAwareMarshaller.java:82)
at org.apache.ignite.spi.discovery.tcp.internal.DiscoveryDataPacket.unmarshalData(DiscoveryDataPacket.java:280)
at org.apache.ignite.spi.discovery.tcp.internal.DiscoveryDataPacket.unmarshalGridData(DiscoveryDataPacket.java:123)
at org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.onExchange(TcpDiscoverySpi.java:2006)
at org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.processNodeAddFinishedMessage(ClientImpl.java:2181)
at org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.processDiscoveryMessage(ClientImpl.java:2060)
at org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.body(ClientImpl.java:1905)
at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
at org.apache.ignite.spi.discovery.tcp.ClientImpl$1.body(ClientImpl.java:304)
at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)
Caused by: java.io.InvalidClassException: javax.cache.configuration.MutableConfiguration; local class incompatible: stream classdesc serialVersionUID = 201306200821, local class serialVersionUID = 201405
at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:687)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1880)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1746)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1880)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1746)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2037)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1568)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2282)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2206)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2064)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1568)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:428)
at java.util.HashMap.readObject(HashMap.java:1409)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1158)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2173)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2064)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1568)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2282)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2206)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2064)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1568)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:428)
at org.apache.ignite.marshaller.jdk.JdkMarshaller.unmarshal0(JdkMarshaller.java:139)
... 12 more
[16:27:14] Performance suggestions for grid 'igniteCosco' (fix if possible)
[16:27:14] To disable, set -DIGNITE_PERFORMANCE_SUGGESTIONS_DISABLED=true
[16:27:14] ^-- Enable G1 Garbage Collector (add '-XX:+UseG1GC' to JVM options)
[16:27:14] ^-- Specify JVM heap max size (add '-Xmx<size>[g|G|m|M|k|K]' to JVM options)
[16:27:14] ^-- Set max direct memory size if getting 'OOME: Direct buffer memory' (add '-XX:MaxDirectMemorySize=<size>[g|G|m|M|k|K]' to JVM options)
[16:27:14] ^-- Disable processing of calls to System.gc() (add '-XX:+DisableExplicitGC' to JVM options)
[16:27:14] Refer to this page for more performance suggestions: https://apacheignite.readme.io/docs/jvm-and-system-tuning
[16:27:14]
[16:27:14] To start Console Management & Monitoring run ignitevisorcmd.{sh|bat}
[16:27:14]
[16:27:14] Ignite node started OK (id=9d93bb08, instance name=igniteCosco)
[16:27:14] Topology snapshot [ver=2, locNode=9d93bb08, servers=1, clients=1, state=ACTIVE, CPUs=8, offheap=3.1GB, heap=7.1GB]
十二月 11, 2018 4:27:15 下午 org.apache.ignite.logger.java.JavaLogger error
严重: Failed to send message: TcpDiscoveryClientMetricsUpdateMessage [super=TcpDiscoveryAbstractMessage [sndNodeId=null, id=1ac606c9761-9d93bb08-2ba3-4234-807b-941605b3597b, verifierNodeId=null, topVer=0, pendingIdx=0, failedNodes=null, isClient=true]]
java.net.SocketException: Socket is closed
at java.net.Socket.getSendBufferSize(Socket.java:1215)
at org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.socketStream(TcpDiscoverySpi.java:1480)
at org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.writeToSocket(TcpDiscoverySpi.java:1606)
at org.apache.ignite.spi.discovery.tcp.ClientImpl$SocketWriter.body(ClientImpl.java:1362)
at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)
十二月 11, 2018 4:27:25 下午 org.apache.ignite.logger.java.JavaLogger error
严重: Failed to reconnect to cluster (consider increasing 'networkTimeout' configuration property) [networkTimeout=5000]
2018-12-11 16:27:25.768 [localhost-startStop-1] ERROR cjf.web.CommonServlet - 加载初始化资源文件[/cjf/config/cjfinit.properties]失败.
javax.cache.CacheException: class org.apache.ignite.IgniteClientDisconnectedException: Failed to execute dynamic cache change request, client node disconnected.
at org.apache.ignite.internal.processors.cache.GridCacheUtils.convertToCacheException(GridCacheUtils.java:1337)
at org.apache.ignite.internal.IgniteKernal.getOrCreateCache(IgniteKernal.java:3310)
at cjf.init.InitIgniteCache.intercept(InitIgniteCache.java:148)
at cjf.common.responsibility.DefaultActionInvocation.invoke(DefaultActionInvocation.java:26)
at cjf.init.CjfClusterInterceptor.intercept(CjfClusterInterceptor.java:37)
at cjf.common.responsibility.DefaultActionInvocation.invoke(DefaultActionInvocation.java:26)
at cjf.init.CjfMailInterceptor.intercept(CjfMailInterceptor.java:34)
at cjf.common.responsibility.DefaultActionInvocation.invoke(DefaultActionInvocation.java:26)
at cjf.init.InitSsoInterceptor.intercept(InitSsoInterceptor.java:52)
at cjf.common.responsibility.DefaultActionInvocation.invoke(DefaultActionInvocation.java:26)
at cjf.init.InitServletInterceptor.intercept(InitServletInterceptor.java:33)
at cjf.common.responsibility.DefaultActionInvocation.invoke(DefaultActionInvocation.java:26)
at cjf.init.InitCjfInterceptor.intercept(InitCjfInterceptor.java:50)
at cjf.common.responsibility.DefaultActionInvocation.invoke(DefaultActionInvocation.java:26)
at cjf.init.SysCacheInterceptor.intercept(SysCacheInterceptor.java:129)
at cjf.common.responsibility.DefaultActionInvocation.invoke(DefaultActionInvocation.java:26)
at cjf.web.CommonServlet.initCaches(CommonServlet.java:111)
at cjf.web.CommonServlet.init(CommonServlet.java:58)
at javax.servlet.GenericServlet.init(GenericServlet.java:158)
at org.apache.catalina.core.StandardWrapper.initServlet(StandardWrapper.java:1144)
at org.apache.catalina.core.StandardWrapper.loadServlet(StandardWrapper.java:1091)
at org.apache.catalina.core.StandardWrapper.load(StandardWrapper.java:983)
at org.apache.catalina.core.StandardContext.loadOnStartup(StandardContext.java:4978)
at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5290)
at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:754)
at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:730)
at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:734)
at org.apache.catalina.startup.HostConfig.deployDirectory(HostConfig.java:1140)
at org.apache.catalina.startup.HostConfig$DeployDirectory.run(HostConfig.java:1875)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.ignite.IgniteClientDisconnectedException: Failed to execute dynamic cache change request, client node disconnected.
at org.apache.ignite.internal.util.IgniteUtils$15.apply(IgniteUtils.java:948)
at org.apache.ignite.internal.util.IgniteUtils$15.apply(IgniteUtils.java:944)
... 35 common frames omitted
Caused by: org.apache.ignite.internal.IgniteClientDisconnectedCheckedException: Failed to execute dynamic cache change request, client node disconnected.
at org.apache.ignite.internal.processors.cache.GridCacheProcessor.onDisconnected(GridCacheProcessor.java:1173)
at org.apache.ignite.internal.IgniteKernal.onDisconnected(IgniteKernal.java:3949)
at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.onDiscovery0(GridDiscoveryManager.java:821)
at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.lambda$onDiscovery$0(GridDiscoveryManager.java:604)
at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body0(GridDiscoveryManager.java:2667)
at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$DiscoveryMessageNotifierWorker.body(GridDiscoveryManager.java:2705)
at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
... 1 common frames omitted
1 image description here 2 image description here
3 image description here
IgniteConfiguration:
<bean class="org.apache.ignite.configuration.IgniteConfiguration">
<property name="clientMode" value="true"/>
<property name="igniteInstanceName" value="igniteTest"/>
<property name="discoverySpi">
<bean class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">
<property name="ipFinder">
<bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder">
<property name="addresses">
<list>
<value>127.0.0.1:47500..47510</value>
</list>
</property>
</bean>
</property>
</bean>
</property>
</bean>
TomEE has lib/javaee-api-7.0-1.jar library that contains javax-cache version 1.1 while Ignite depends on javax-cache 1.0.
You need to eliminate this dependency issue.It makes sense to exclude java-cache by setting openejb.classloader.forced-skip=javax.cache in system.properties.
Looks like you have put some type in Discovery data which is not present on other nodes.
I can see that you have "local class incompatible". Is it possible that you jave javax-cache 1.0 on one node but javax-cache 1.1 on another? It could cause the problem that you are observing.

Connect to Ignite server with public and private ip

I tried to connect my Ignite client A (running in Eclipse IDE) to a remote Ignite server B running in a different network (OpenStack VM). B has a public IP ("floating IP"): like 193.224.x.x and a private IP: 192.168.0.4 (not visible from A).
In A, I set the public IP of B to connect to in Java (like: IgniteConfiguration < TcpDiscoverySpi.setIpFinder < TcpDiscoveryVmIpFinder < setAddresses(Arrays.asList("193.224.x.x")). Port 47500 (and some others for Ignite) are open on B to everyone.
Then I start the client I get exception after while:
SEVERE: Failed to reinitialize local partitions (preloading will be stopped): GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=6, minorTopVer=0], discoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode [id=4a4a9c63-b3e6-4191-a966-6fe86071c7d5, addrs=[0:0:0:0:0:0:0:1, 127.0.0.1, 192.168.1.100], sockAddrs=[/192.168.1.100:0, /0:0:0:0:0:0:0:1:0, /127.0.0.1:0], discPort=0, order=6, intOrder=0, lastExchangeTime=1530529560836, loc=true, ver=2.5.0#20180523-sha1:86e110c7, isClient=true], topVer=6, nodeId8=4a4a9c63, msg=null, type=NODE_JOINED, tstamp=1530529560973], nodeId=4a4a9c63, evt=NODE_JOINED]
class org.apache.ignite.IgniteCheckedException: Failed to send message (node may have left the grid or TCP connection cannot be established due to firewall issues) [node=TcpDiscoveryNode [id=d5828cee-0bbb-45e8-ba55-c34c1e68f165, addrs=[0:0:0:0:0:0:0:1%lo, 127.0.0.1, 172.17.0.1, 192.168.0.4], sockAddrs=[/192.168.0.4:47500, /172.17.0.1:47500, 0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500], discPort=47500, order=1, intOrder=1, lastExchangeTime=1530529560939, loc=false, ver=2.5.0#20180523-sha1:86e110c7, isClient=false], topic=TOPIC_CACHE, msg=GridDhtPartitionsSingleMessage [parts=null, partCntrs=null, partSizes=null, partHistCntrs=null, err=null, client=true, compress=true, finishMsg=null, super=GridDhtPartitionsAbstractMessage [exchId=GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=6, minorTopVer=0], discoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode [id=4a4a9c63-b3e6-4191-a966-6fe86071c7d5, addrs=[0:0:0:0:0:0:0:1, 127.0.0.1, 192.168.1.100], sockAddrs=[/192.168.1.100:0, /0:0:0:0:0:0:0:1:0, /127.0.0.1:0], discPort=0, order=6, intOrder=0, lastExchangeTime=1530529560836, loc=true, ver=2.5.0#20180523-sha1:86e110c7, isClient=true], topVer=6, nodeId8=4a4a9c63, msg=null, type=NODE_JOINED, tstamp=1530529560973], nodeId=4a4a9c63, evt=NODE_JOINED], lastVer=GridCacheVersion [topVer=0, order=1530529560661, nodeOrder=0], super=GridCacheMessage [msgId=1, depInfo=null, err=null, skipPrepare=false]]], policy=2]
I see signs about that the client is actually connected to the server for a moment (Topology snapshot [ver=6, servers=1, clients=1, CPUs=8,) but after that it is disconnected (or something happens). From the exception it seems (I feel like) the client wants to connect to sockAddrs=[/192.168.0.4:47500..., which fails, instead of 193.224.x.x:47500.
I tried what I found to let B to know its external IP,
in config file, but neither worked:
<property name="addressResolver">
<bean class="org.apache.ignite.configuration.BasicAddressResolver">
<constructor-arg>
<map>
<entry key="192.168.0.4" value="193.224.x.x">
nor
<bean class="org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi">
<property name="localAddress" value="193.224.x.x"/>
nor
<property name="discoverySpi">
<bean class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">
<property name="localAddress" value="193.224.x.x"/>
I have no more idea how to fix it. Ignite docs are very brief regarding to this clustering config.
It looks like Discovery works for you but Communication fails.
You can try supplying your own TcpCommunicationSpi to IgniteConfiguration, setting localAddress on it to 193.224.x.x on server node. However this will likely cause all node-to-node traffic to travel on external network.
You can also try to set localAddress to 193.224.x.x (or other external address) on node A to make sure it doesn't bind to its own 192.168.x.x that isn't shared with B. While leaving configuration on B intact.

Apache ignite node not able to join grid

I'm using static ipfinder configuration installed 2 ignite docker container in 2 different ec2 instances
but nodes not able to join each other below are logs
[07:40:10,696][INFO][disco-event-worker-#41][GridDiscoveryManager] Topology snapshot [ver=46, servers=2, clients=0, CPUs=6, offheap=3.8GB, heap=2.0GB]
[07:40:10,696][INFO][disco-event-worker-#41][GridDiscoveryManager] Data Regions Configured:
[07:40:10,696][INFO][disco-event-worker-#41][GridDiscoveryManager] ^-- default [initSize=256.0 MiB, maxSize=3.1 GiB, persistenceEnabled=false]
[07:40:10,697][INFO][exchange-worker-#42][time] Started exchange init [topVer=AffinityTopologyVersion [topVer=46, minorTopVer=0], crd=true, evt=NODE_JOINED, evtNode=05bece82-1950-4fc0-a58e-c062ad4e9b18, customEvt=null, allowMerge=true]
[07:40:10,697][INFO][exchange-worker-#42][GridDhtPartitionsExchangeFuture] Finished waiting for partition release future [topVer=AffinityTopologyVersion [topVer=46, minorTopVer=0], waitTime=0ms, futInfo=NA]
[07:40:10,697][INFO][exchange-worker-#42][time] Finished exchange init [topVer=AffinityTopologyVersion [topVer=46, minorTopVer=0], crd=true]
[07:40:10,697][WARNING][disco-event-worker-#41][GridDiscoveryManager] Node FAILED: TcpDiscoveryNode [id=05bece82-1950-4fc0-a58e-c062ad4e9b18, addrs=[0:0:0:0:0:0:0:1%lo, 127.0.0.1, 172.17.0.1, 172.19.0.1, 192.168.1.202], sockAddrs=[/172.17.0.1:47500, /0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500, /172.19.0.1:47500, /192.168.1.202:47500], discPort=47500, order=46, intOrder=24, lastExchangeTime=1529048390669, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false]
[07:40:10,698][INFO][disco-event-worker-#41][GridDiscoveryManager] Topology snapshot [ver=47, servers=1, clients=0, CPUs=4, offheap=3.1GB, heap=1.0GB]
[07:40:10,698][INFO][disco-event-worker-#41][GridDiscoveryManager] Data Regions Configured:
[07:40:10,698][INFO][disco-event-worker-#41][GridDiscoveryManager] ^-- default [initSize=256.0 MiB, maxSize=3.1 GiB, persistenceEnabled=false]
[07:40:10,699][INFO][disco-event-worker-#41][GridDhtPartitionsExchangeFuture] Coordinator received all messages, try merge [ver=AffinityTopologyVersion [topVer=46, minorTopVer=0]]
[07:40:10,699][INFO][disco-event-worker-#41][GridCachePartitionExchangeManager] Merge exchange future [curFut=AffinityTopologyVersion [topVer=46, minorTopVer=0], mergedFut=AffinityTopologyVersion [topVer=47, minorTopVer=0], evt=NODE_FAILED, evtNode=05bece82-1950-4fc0-a58e-c062ad4e9b18, evtNodeClient=false]
[07:40:10,699][INFO][disco-event-worker-#41][GridDhtPartitionsExchangeFuture] finishExchangeOnCoordinator [topVer=AffinityTopologyVersion [topVer=46, minorTopVer=0], resVer=AffinityTopologyVersion [topVer=47, minorTopVer=0]]
[07:40:10,700][INFO][disco-event-worker-#41][GridDhtPartitionsExchangeFuture] Finish exchange future [startVer=AffinityTopologyVersion [topVer=46, minorTopVer=0], resVer=AffinityTopologyVersion [topVer=47, minorTopVer=0], err=null]
[07:40:10,701][INFO][tcp-disco-srvr-#3][TcpDiscoverySpi] TCP discovery accepted incoming connection [rmtAddr=/53.247.167.223, rmtPort=50787]
[07:40:10,701][INFO][tcp-disco-srvr-#3][TcpDiscoverySpi] TCP discovery spawning a new thread for connection [rmtAddr=/53.247.167.223, rmtPort=50787]
[07:40:10,701][INFO][tcp-disco-sock-reader-#133][TcpDiscoverySpi] Started serving remote node connection [rmtAddr=/53.247.167.223:50787, rmtPort=50787]
[07:40:10,702][INFO][exchange-worker-#42][GridCachePartitionExchangeManager] Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion [topVer=47, minorTopVer=0], evt=NODE_JOINED, node=05bece82-1950-4fc0-a58e-c062ad4e9b18]
[07:40:10,704][INFO][tcp-disco-sock-reader-#133][TcpDiscoverySpi] Finished serving remote node connection [rmtAddr=/53.247.167.223:50787, rmtPort=50787
You can forward 1st container's host name to the ignite node of 2nd container via a system environment variable in your ignite configuration:
<bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder">
<property name="addresses">
<list>
<value>#{systemEnvironment['IGNITE_HOST'] ?: '127.0.0.1'}:47500..47509</value>
</list>
</property>
</bean>
An example of docker-compose.yml for 2 communicated ignite services:
version: "3"
services:
ignite:
image: image_name1
networks:
- net
face:
image: image_name2
depends_on:
- ignite
networks:
- net
environment:
IGNITE_HOST: 'ignite'
The ignite node of 'face' can connect to the another ignite node of 'ignite' using the address ignite:47500..47509
Try use internal IP addresses as from this answer http://apache-ignite-users.70518.x6.nabble.com/Ignite-docker-container-not-able-to-join-in-cluster-td22080.html