Related
I get this error when performing an FT.SEARCH through the redis-cli via both redis/redis-stack-server:latest and redislabs/redismod:latest and following the how-to herefor creating an index, documents, and querying them: https://redis.io/docs/stack/search/indexing_json/
It also happens when I follow these steps:
> FT.CREATE myIdx on JSON PREFIX 1 entity: SCHEMA $.position.y AS y NUMERIC $.position.x AS x NUMERIC $.name AS name TEXT
> JSON.SET entity:1 $ '{"id":"entityA","name":"EntityAlpha","longName":"This is entity alpha","speed":9.66,"ownerId":"god","position":{"x":15,"y":15},"color":"red"}'
> JSON.SET entity:2 $ '{"id":"entityB","name":"EntityBeta","longName":"This is entity beta","speed":9.66,"ownerId":"god","position":{"x":20,"y":20},"color":"red"}'
> JSON.SET entity:3 $ '{"id":"entityC","name":"EntityCeta","longName":"This is entity ceta","speed":9.66,"ownerId":"god","position":{"x":15,"y":25},"color":"fire"}'
> FT.SEARCH myIdx "#name:(Entity*)"
(error) elem.map is not a function
> FT.SEARCH myIdx "#x:[0 200]"
(error) elem.map is not a function
> FT.SEARCH myIdx "#name:(EntityAlpha)"
(error) elem.map is not a function
Here are results of 'info':
> info
# Server
redis_version:6.2.6
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:9c335ca9779faba5
redis_mode:standalone
os:Linux 5.15.0-48-generic x86_64
arch_bits:64
multiplexing_api:epoll
atomicvar_api:atomic-builtin
gcc_version:10.2.1
process_id:1
process_supervised:no
run_id:4dad70a5a0bf25821f440ad397ed5d114e637fbf
tcp_port:6379
server_time_usec:1664744098230246
uptime_in_seconds:39
uptime_in_days:0
hz:10
configured_hz:10
lru_clock:3799714
executable:/data/redis-server
config_file:
io_threads_active:0
# Clients
connected_clients:2
cluster_connections:0
maxclients:10000
client_recent_max_input_buffer:16
client_recent_max_output_buffer:0
blocked_clients:0
tracking_clients:0
clients_in_timeout_table:0
# Memory
used_memory:9735336
used_memory_human:9.28M
used_memory_rss:32952320
used_memory_rss_human:31.43M
used_memory_peak:9735336
used_memory_peak_human:9.28M
used_memory_peak_perc:100.00%
used_memory_overhead:9354584
used_memory_startup:9313304
used_memory_dataset:380752
used_memory_dataset_perc:90.22%
allocator_allocated:10321296
allocator_active:10809344
allocator_resident:13840384
total_system_memory:12321312768
total_system_memory_human:11.48G
used_memory_lua:37888
used_memory_lua_human:37.00K
used_memory_scripts:0
used_memory_scripts_human:0B
number_of_cached_scripts:0
maxmemory:0
maxmemory_human:0B
maxmemory_policy:noeviction
allocator_frag_ratio:1.05
allocator_frag_bytes:488048
allocator_rss_ratio:1.28
allocator_rss_bytes:3031040
rss_overhead_ratio:2.38
rss_overhead_bytes:19111936
mem_fragmentation_ratio:3.40
mem_fragmentation_bytes:23259752
mem_not_counted_for_evict:0
mem_replication_backlog:0
mem_clients_slaves:0
mem_clients_normal:40984
mem_aof_buffer:0
mem_allocator:jemalloc-5.1.0
active_defrag_running:0
lazyfree_pending_objects:0
lazyfreed_objects:0
# Persistence
loading:0
current_cow_size:0
current_cow_size_age:0
current_fork_perc:0.00
current_save_keys_processed:0
current_save_keys_total:0
rdb_changes_since_last_save:0
rdb_bgsave_in_progress:0
rdb_last_save_time:1664744059
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:-1
rdb_current_bgsave_time_sec:-1
rdb_last_cow_size:0
aof_enabled:0
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:-1
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_last_write_status:ok
aof_last_cow_size:0
module_fork_in_progress:0
module_fork_last_cow_size:0
# Stats
total_connections_received:2
total_commands_processed:6
instantaneous_ops_per_sec:0
total_net_input_bytes:28
total_net_output_bytes:4712
instantaneous_input_kbps:0.00
instantaneous_output_kbps:0.00
rejected_connections:0
sync_full:0
sync_partial_ok:0
sync_partial_err:0
expired_keys:0
expired_stale_perc:0.00
expired_time_cap_reached_count:0
expire_cycle_cpu_milliseconds:0
evicted_keys:0
keyspace_hits:20
keyspace_misses:0
pubsub_channels:0
pubsub_patterns:0
latest_fork_usec:0
total_forks:0
migrate_cached_sockets:0
slave_expires_tracked_keys:0
active_defrag_hits:0
active_defrag_misses:0
active_defrag_key_hits:0
active_defrag_key_misses:0
tracking_total_keys:0
tracking_total_items:0
tracking_total_prefixes:0
unexpected_error_replies:0
total_error_replies:0
dump_payload_sanitizations:0
total_reads_processed:2
total_writes_processed:1
io_threaded_reads_processed:0
io_threaded_writes_processed:0
# Replication
role:master
connected_slaves:0
master_failover_state:no-failover
master_replid:0f0fce86ffcbff2fc056b6d4e788242be12f2b2c
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:0
second_repl_offset:-1
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0
# CPU
used_cpu_sys:0.827450
used_cpu_user:0.459694
used_cpu_sys_children:0.000000
used_cpu_user_children:0.000000
used_cpu_sys_main_thread:0.061016
used_cpu_user_main_thread:0.130168
# Modules
module:name=graph,ver=20815,api=1,filters=0,usedby=[],using=[ReJSON],options=[]
module:name=timeseries,ver=10616,api=1,filters=0,usedby=[],using=[],options=[handle-io-errors]
module:name=ReJSON,ver=20011,api=1,filters=0,usedby=[search|graph],using=[],options=[handle-io-errors]
module:name=ai,ver=10205,api=1,filters=0,usedby=[],using=[],options=[handle-io-errors]
module:name=rg,ver=10204,api=1,filters=1,usedby=[rg],using=[rg],options=[]
module:name=search,ver=999999,api=1,filters=0,usedby=[],using=[ReJSON],options=[handle-io-errors]
module:name=bf,ver=20209,api=1,filters=0,usedby=[],using=[],options=[]
# Errorstats
# Cluster
cluster_enabled:0
# Keyspace
db0:keys=5,expires=0,avg_ttl=0
Please help.
HI I am facing a critical issue with Ignite in our production servers . We have 2 instances with heap sizes of 8gb each . Sometimes due to long gc pause or network issue one of our instances gets stopped . This causes aws auto-scaling to kick in and bring another instance up . This is fine but we have observed that in tis state the grid becomes unstable and our new ignite instaces are never able to join the topology and hang forever causing new autoscaled instances to come again and again .The workaround for this is to restart other instances in the cluster as doing so causes nodes to join again .But ideally in a prod environment this should happen automatically with auto scaling .
Had also added a longer failuredetection timeout but that also doesnt solve it completely and we still observe this sometimes .
The logs observed on the new instances not coming up is as below .Igite version use is 2.4 and off heap mode is used for partitioned caches .Our grid is setup using tcp discovery service using a s3 bucket .
I have some transactional caches as well which do lock based on
tryLocks.
evtLatch=0, remaining=[a450db0b-ce86-4f0b-a34b-a2f9c83bb3d9], super=GridFutureAdapter [ignoreInterrupts=false, state=INIT, res=null, hash=1272213534]]]
2018-07-18 16:34:10.534 UTC [FDPS] [exchange-worker-#35%fdps%] [WARN ] [,] o.apache.ignite.internal.diagnostic - Failed to wait for partition map exchange [topVer=AffinityTopologyVersion [topVer=32, minorTopVer=0], node=7d5e83aa-736a-4190-8b64-7261db7382f6]. Dumping pending objects that might be the cause:
2018-07-18 16:34:20.534 UTC [FDPS] [exchange-worker-#35%fdps%] [WARN ] [,] o.apache.ignite.internal.diagnostic - Failed to wait for partition map exchange [topVer=AffinityTopologyVersion [topVer=32, minorTopVer=0], node=7d5e83aa-736a-4190-8b64-7261db7382f6]. Dumping pending objects that might be the cause:
2018-07-18 16:34:20.534 UTC [FDPS] [exchange-worker-#35%fdps%] [WARN ] [,] o.apache.ignite.internal.diagnostic - Ready affinity version: AffinityTopologyVersion [topVer=-1, minorTopVer=0]
2018-07-18 16:34:20.535 UTC [FDPS] [exchange-worker-#35%fdps%] [WARN ] [,] o.apache.ignite.internal.diagnostic - Last exchange future: GridDhtPartitionsExchangeFuture [firstDiscoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode [id=7d5e83aa-736a-4190-8b64-7261db7382f6, addrs=[10.83.89.183, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, ip-10-83-89-183.ec2.internal/10.83.89.183:47500], discPort=47500, order=32, intOrder=17, lastExchangeTime=1531931660255, loc=true, ver=2.4.0#20180305-sha1:aa342270, isClient=false], topVer=32, nodeId8=7d5e83aa, msg=null, type=NODE_JOINED, tstamp=1531931329481], crd=TcpDiscoveryNode [id=a450db0b-ce86-4f0b-a34b-a2f9c83bb3d9, addrs=[10.83.87.131, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, ip-10-83-87-131.ec2.internal/10.83.87.131:47500], discPort=47500, order=26, intOrder=14, lastExchangeTime=1531931329258, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], exchId=GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=32, minorTopVer=0], discoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode [id=7d5e83aa-736a-4190-8b64-7261db7382f6, addrs=[10.83.89.183, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, ip-10-83-89-183.ec2.internal/10.83.89.183:47500], discPort=47500, order=32, intOrder=17, lastExchangeTime=1531931660255, loc=true, ver=2.4.0#20180305-sha1:aa342270, isClient=false], topVer=32, nodeId8=7d5e83aa, msg=null, type=NODE_JOINED, tstamp=1531931329481], nodeId=7d5e83aa, evt=NODE_JOINED], added=true, initFut=GridFutureAdapter [ignoreInterrupts=false, state=DONE, res=true, hash=247159314], init=true, lastVer=null, partReleaseFut=PartitionReleaseFuture [topVer=AffinityTopologyVersion [topVer=32, minorTopVer=0], futures=[ExplicitLockReleaseFuture [topVer=AffinityTopologyVersion [topVer=32, minorTopVer=0], futures=[]], TxReleaseFuture [topVer=AffinityTopologyVersion [topVer=32, minorTopVer=0], futures=[]], AtomicUpdateReleaseFuture [topVer=AffinityTopologyVersion [topVer=32, minorTopVer=0], futures=[]], DataStreamerReleaseFuture [topVer=AffinityTopologyVersion [topVer=32, minorTopVer=0], futures=[]]]], exchActions=ExchangeActions [startCaches=null, stopCaches=null, startGrps=[], stopGrps=[], resetParts=null, stateChangeRequest=null], affChangeMsg=null, initTs=1531931329576, centralizedAff=false, changeGlobalStateE=null, done=false, state=SRV, evtLatch=0, remaining=[a450db0b-ce86-4f0b-a34b-a2f9c83bb3d9], super=GridFutureAdapter [ignoreInterrupts=false, state=INIT, res=null, hash=1272213534]]
2018-07-18 16:34:20.535 UTC [FDPS] [exchange-worker-#35%fdps%] [WARN ] [,] o.a.i.i.p.c.GridCachePartitionExchangeManager - First 10 pending exchange futures [total=0]
2018-07-18 16:34:20.535 UTC [FDPS] [exchange-worker-#35%fdps%] [WARN ] [,] o.apache.ignite.internal.diagnostic - Last 10 exchange futures (total: 1):
2018-07-18 16:34:20.536 UTC [FDPS] [exchange-worker-#35%fdps%] [WARN ] [,] o.apache.ignite.internal.diagnostic - >>> GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion [topVer=32, minorTopVer=0], evt=NODE_JOINED, evtNode=TcpDiscoveryNode [id=7d5e83aa-736a-4190-8b64-7261db7382f6, addrs=[10.83.89.183, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, ip-10-83-89-183.ec2.internal/10.83.89.183:47500], discPort=47500, order=32, intOrder=17, lastExchangeTime=1531931660255, loc=true, ver=2.4.0#20180305-sha1:aa342270, isClient=false], done=false]
2018-07-18 16:34:20.536 UTC [FDPS] [exchange-worker-#35%fdps%] [WARN ] [,] o.apache.ignite.internal.diagnostic - Pending transactions:
2018-07-18 16:34:20.536 UTC [FDPS] [exchange-worker-#35%fdps%] [WARN ] [,] o.apache.ignite.internal.diagnostic - Pending explicit locks:
2018-07-18 16:34:20.536 UTC [FDPS] [exchange-worker-#35%fdps%] [WARN ] [,] o.apache.ignite.internal.diagnostic - Pending cache futures:
2018-07-18 16:34:20.536 UTC [FDPS] [exchange-worker-#35%fdps%] [WARN ] [,] o.apache.ignite.internal.diagnostic - Pending atomic cache futures:
2018-07-18 16:34:20.536 UTC [FDPS] [exchange-worker-#35%fdps%] [WARN ] [,] o.apache.ignite.internal.diagnostic - Pending data streamer futures:
2018-07-18 16:34:20.536 UTC [FDPS] [exchange-worker-#35%fdps%] [WARN ] [,] o.apache.ignite.internal.diagnostic - Pending transaction deadlock detection futures:
2018-07-18 16:34:20.547 UTC [FDPS] [grid-nio-worker-tcp-comm-3-#28%fdps%] [INFO ] [,] o.apache.ignite.internal.diagnostic - Exchange future waiting for coordinator response [crd=a450db0b-ce86-4f0b-a34b-a2f9c83bb3d9, topVer=AffinityTopologyVersion [topVer=32, minorTopVer=0]]
Remote node information:
General node info [id=a450db0b-ce86-4f0b-a34b-a2f9c83bb3d9, client=false, discoTopVer=AffinityTopologyVersion [topVer=32, minorTopVer=0], time=12:34:20.537]
Partitions exchange info [readyVer=AffinityTopologyVersion [topVer=29, minorTopVer=0]]
Last initialized exchange future: GridDhtPartitionsExchangeFuture [firstDiscoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode [id=ba6aba6c-7f5d-41bf-bfcc-5eefcad36b62, addrs=[10.83.85.122, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, ip-10-83-85-122.ec2.internal/10.83.85.122:47500], discPort=47500, order=30, intOrder=16, lastExchangeTime=1531930705943, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], topVer=30, nodeId8=a450db0b, msg=Node joined: TcpDiscoveryNode [id=ba6aba6c-7f5d-41bf-bfcc-5eefcad36b62, addrs=[10.83.85.122, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, ip-10-83-85-122.ec2.internal/10.83.85.122:47500], discPort=47500, order=30, intOrder=16, lastExchangeTime=1531930705943, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], type=NODE_JOINED, tstamp=1531930706210], crd=TcpDiscoveryNode [id=a450db0b-ce86-4f0b-a34b-a2f9c83bb3d9, addrs=[10.83.87.131, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, ip-10-83-87-131.ec2.internal/10.83.87.131:47500], discPort=47500, order=26, intOrder=14, lastExchangeTime=1531931660254, loc=true, ver=2.4.0#20180305-sha1:aa342270, isClient=false], exchId=GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=30, minorTopVer=0], discoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode [id=ba6aba6c-7f5d-41bf-bfcc-5eefcad36b62, addrs=[10.83.85.122, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, ip-10-83-85-122.ec2.internal/10.83.85.122:47500], discPort=47500, order=30, intOrder=16, lastExchangeTime=1531930705943, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], topVer=30, nodeId8=a450db0b, msg=Node joined: TcpDiscoveryNode [id=ba6aba6c-7f5d-41bf-bfcc-5eefcad36b62, addrs=[10.83.85.122, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, ip-10-83-85-122.ec2.internal/10.83.85.122:47500], discPort=47500, order=30, intOrder=16, lastExchangeTime=1531930705943, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], type=NODE_JOINED, tstamp=1531930706210], nodeId=ba6aba6c, evt=NODE_JOINED], added=true, initFut=GridFutureAdapter [ignoreInterrupts=false, state=INIT, res=null, hash=1921954756], init=false, lastVer=GridCacheVersion [topVer=0, order=1531930704443, nodeOrder=0], partReleaseFut=PartitionReleaseFuture [topVer=AffinityTopologyVersion [topVer=30, minorTopVer=0], futures=[ExplicitLockReleaseFuture [topVer=AffinityTopologyVersion [topVer=30, minorTopVer=0], futures=[ExplicitLockSpan [topVer=AffinityTopologyVersion [topVer=29, minorTopVer=0], firstCand=GridCacheMvccCandidate [nodeId=a450db0b-ce86-4f0b-a34b-a2f9c83bb3d9, ver=GridCacheVersion [topVer=141782290, order=1547786935479, nodeOrder=26], threadId=39726, id=559000, topVer=AffinityTopologyVersion [topVer=29, minorTopVer=0], reentry=null, otherNodeId=null, otherVer=null, mappedDhtNodes=null, mappedNearNodes=null, ownerVer=null, serOrder=null, key=KeyCacheObjectImpl [part=221, val=49583853497448469294730566354366524577617095530402283666, hasValBytes=false], masks=local=1|owner=0|ready=0|reentry=0|used=0|tx=0|single_implicit=0|dht_local=0|near_local=0|removed=0|read=0, prevVer=null, nextVer=null]], ExplicitLockSpan [topVer=AffinityTopologyVersion [topVer=29, minorTopVer=0], firstCand=GridCacheMvccCandidate [nodeId=a450db0b-ce86-4f0b-a34b-a2f9c83bb3d9, ver=GridCacheVersion [topVer=141782290, order=1547787212113, nodeOrder=26], threadId=39741, id=603904, topVer=AffinityTopologyVersion [topVer=29, minorTopVer=0], reentry=null, otherNodeId=null, otherVer=null, mappedDhtNodes=null, mappedNearNodes=null, ownerVer=null, serOrder=null, key=KeyCacheObjectImpl [part=288, val=49583853499611641578988037213538229804531966271996035234, hasValBytes=false], masks=local=1|owner=0|ready=0|reentry=0|used=0|tx=0|single_implicit=0|dht_local=0|near_local=0|removed=0|read=0, prevVer=null, nextVer=null]], ExplicitLockSpan [topVer=AffinityTopologyVersion [topVer=29, minorTopVer=0], firstCand=GridCacheMvccCandidate [nodeId=a450db0b-ce86-4f0b-a34b-a2f9c83bb3d9, ver=GridCacheVersion [topVer=141782290, order=1547786935487, nodeOrder=26], threadId=39740, id=558993, topVer=AffinityTopologyVersion [topVer=29, minorTopVer=0], reentry=null, otherNodeId=null, otherVer=null, mappedDhtNodes=null, mappedNearNodes=null, ownerVer=null, serOrder=null, key=KeyCacheObjectImpl [part=133, val=49583853497448469294730566354417299462040910024459419794, hasValBytes=false], masks=local=1|owner=0|ready=0|reentry=0|used=0|tx=0|single_implicit=0|dht_local=0|near_local=0|removed=0|read=0, prevVer=null, nextVer=null]], ExplicitLockSpan [topVer=AffinityTopologyVersion [topVer=29, minorTopVer=0], firstCand=GridCacheMvccCandidate [nodeId=a450db0b-ce86-4f0b-a34b-a2f9c83bb3d9, ver=GridCacheVersion [topVer=141782290, order=1547786935323, nodeOrder=26], threadId=39728, id=558949, topVer=AffinityTopologyVersion [topVer=29, minorTopVer=0], reentry=null, otherNodeId=null, otherVer=null, mappedDhtNodes=null, mappedNearNodes=null, ownerVer=null, serOrder=null, key=KeyCacheObjectImpl [part=1023, val=49583853497448469294730566353278491339963927967496667282, hasValBytes=false], masks=local=1|owner=0|ready=0|reentry=0|used=0|tx=0|single_implicit=0|dht_local=0|near_local=0|removed=0|read=0, prevVer=null, nextVer=null]], ExplicitLockSpan [topVer=AffinityTopologyVersion [topVer=29, minorTopVer=0], firstCand=GridCacheMvccCandidate [nodeId=a450db0b-ce86-4f0b-a34b-a2f9c83bb3d9, ver=GridCacheVersion [topVer=141782290, order=1547786935470, nodeOrder=26], threadId=39951, id=559009, topVer=AffinityTopologyVersion [topVer=29, minorTopVer=0], reentry=null, otherNodeId=null, otherVer=null, mappedDhtNodes=null, mappedNearNodes=null, ownerVer=null, serOrder=null, key=KeyCacheObjectImpl [part=556, val=49583853497448469294730566354226289182541798339977937042, hasValBytes=false], masks=local=1|owner=0|ready=0|reentry=0|used=0|tx=0|single_implicit=0|dht_local=0|near_local=0|removed=0|read=0, prevVer=null, nextVer=null]], ExplicitLockSpan [topVer=AffinityTopologyVersion [topVer=29, minorTopVer=0], firstCand=GridCacheMvccCandidate [nodeId=a450db0b-ce86-4f0b-a34b-a2f9c83bb3d9, ver=GridCacheVersion [topVer=141782290, order=1547786935497, nodeOrder=26], threadId=39683, id=558982, topVer=AffinityTopologyVersion [topVer=29, minorTopVer=0], reentry=null, otherNodeId=null, otherVer=null, mappedDhtNodes=null, mappedNearNodes=null, ownerVer=null, serOrder=null, key=KeyCacheObjectImpl [part=373, val=49583853497448469294730566354541818821461216966893109394, hasValBytes=false], masks=local=1|owner=0|ready=0|reentry=0|used=0|tx=0|single_implicit=0|dht_local=0|near_local=0|removed=0|read=0, prevVer=null, nextVer=null]], ExplicitLockSpan [topVer=AffinityTopologyVersion [topVer=29, minorTopVer=0], firstCand=GridCacheMvccCandidate [nodeId=a450db0b-ce86-4f0b-a34b-a2f9c83bb3d9, ver=GridCacheVersion [topVer=141782290, order=1547786935339, nodeOrder=26], threadId=39682, id=558941, topVer=AffinityTopologyVersion [topVer=29, minorTopVer=0], reentry=null, otherNodeId=null, otherVer=null, mappedDhtNodes=null, mappedNearNodes=null, ownerVer=null, serOrder=null, key=KeyCacheObjectImpl [part=156, val=49583853497448469294730566353353444740780034976328450194, hasValBytes=false], masks=local=1|owner=0|ready=0|reentry=0|used=0|tx=0|single_implicit=0|dht_local=0|near_local=0|removed=0|read=0, prevVer=null, nextVer=null]], ExplicitLockSpan [topVer=AffinityTopologyVersion [topVer=29, minorTopVer=0], firstCand=GridCacheMvccCandidate [nodeId=a450db0b-ce86-4f0b-a34b-a2f9c83bb3d9, ver=GridCacheVersion [topVer=141782290, order=1547786935358, nodeOrder=26], threadId=39936, id=558921, topVer=AffinityTopologyVersion [topVer=29, minorTopVer=0], reentry=null, otherNodeId=null, otherVer=null, mappedDhtNodes=null, mappedNearNodes=null, ownerVer=null, serOrder=null, key=KeyCacheObjectImpl [part=59, val=49583853497448469294730566353578304943228356208982229138, hasValBytes=false], masks=local=1|owner=0|ready=0|reentry=0|used=0|tx=0|single_implicit=0|dht_local=0|near_local=0|removed=0|read=0, prevVer=null, nextVer=null]], ExplicitLockSpan [topVer=AffinityTopologyVersion [topVer=29, minorTopVer=0], firstCand=GridCacheMvccCandida... and 48550 skipped ...ead=0, prevVer=null, nextVer=null]], ExplicitLockSpan [topVer=AffinityTopologyVersion [topVer=29, minorTopVer=0], firstCand=GridCacheMvccCandidate [nodeId=a450db0b-ce86-4f0b-a34b-a2f9c83bb3d9, ver=GridCacheVersion [topVer=141782290, order=1547786935486, nodeOrder=26], threadId=39894, id=558992, topVer=AffinityTopologyVersion [topVer=29, minorTopVer=0], reentry=null, otherNodeId=null, otherVer=null, mappedDhtNodes=null, mappedNearNodes=null, ownerVer=null, serOrder=null, key=KeyCacheObjectImpl [part=488, val=49583853497448469294730566354434224423515514832905306258, hasValBytes=false], masks=local=1|owner=0|ready=0|reentry=0|used=0|tx=0|single_implicit=0|dht_local=0|near_local=0|removed=0|read=0, prevVer=null, nextVer=null]], ExplicitLockSpan [topVer=AffinityTopologyVersion [topVer=29, minorTopVer=0], firstCand=GridCacheMvccCandidate [nodeId=a450db0b-ce86-4f0b-a34b-a2f9c83bb3d9, ver=GridCacheVersion [topVer=141782290, order=1547786935331, nodeOrder=26], threadId=39893, id=558948, topVer=AffinityTopologyVersion [topVer=29, minorTopVer=0], reentry=null, otherNodeId=null, otherVer=null, mappedDhtNodes=null, mappedNearNodes=null, ownerVer=null, serOrder=null, key=KeyCacheObjectImpl [part=570, val=49583853497448469294730566353289371672340459630069022866, hasValBytes=false], masks=local=1|owner=0|ready=0|reentry=0|used=0|tx=0|single_implicit=0|dht_local=0|near_local=0|removed=0|read=0, prevVer=null, nextVer=null]]]], TxReleaseFuture [topVer=AffinityTopologyVersion [topVer=30, minorTopVer=0], futures=[]], AtomicUpdateReleaseFuture [topVer=AffinityTopologyVersion [topVer=30, minorTopVer=0], futures=[]], DataStreamerReleaseFuture [topVer=AffinityTopologyVersion [topVer=30, minorTopVer=0], futures=[]]]], exchActions=null, affChangeMsg=null, initTs=1531930706210, centralizedAff=false, changeGlobalStateE=null, done=false, state=CRD, evtLatch=0, remaining=[ba6aba6c-7f5d-41bf-bfcc-5eefcad36b62], super=GridFutureAdapter [ignoreInterrupts=false, state=INIT, res=null, hash=325602672]]
Communication SPI statistics [rmtNode=7d5e83aa-736a-4190-8b64-7261db7382f6]
Communication SPI recovery descriptors:
[key=ConnectionKey [nodeId=7d5e83aa-736a-4190-8b64-7261db7382f6, idx=0, connCnt=0], msgsSent=5, msgsAckedByRmt=0, msgsRcvd=7, lastAcked=0, reserveCnt=1, descIdHash=1972345954]
Communication SPI clients:
[node=7d5e83aa-736a-4190-8b64-7261db7382f6, client=GridTcpNioCommunicationClient [ses=GridSelectorNioSessionImpl [worker=DirectNioClientWorker [super=AbstractNioClientWorker [idx=3, bytesRcvd=5740, bytesSent=77322, bytesRcvd0=853, bytesSent0=0, select=true, super=GridWorker [name=grid-nio-worker-tcp-comm-3, igniteInstanceName=fdps, finished=false, hashCode=2068348067, interrupted=false, runner=grid-nio-worker-tcp-comm-3-#28%fdps%]]], writeBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], inRecovery=GridNioRecoveryDescriptor [acked=0, resendCnt=0, rcvCnt=7, sentCnt=5, reserved=true, lastAck=0, nodeLeft=false, node=TcpDiscoveryNode [id=7d5e83aa-736a-4190-8b64-7261db7382f6, addrs=[10.83.89.183, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, ip-10-83-89-183.ec2.internal/10.83.89.183:47500], discPort=47500, order=32, intOrder=17, lastExchangeTime=1531931329178, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], connected=true, connectCnt=0, queueLimit=262144, reserveCnt=1, pairedConnections=false], outRecovery=GridNioRecoveryDescriptor [acked=0, resendCnt=0, rcvCnt=7, sentCnt=5, reserved=true, lastAck=0, nodeLeft=false, node=TcpDiscoveryNode [id=7d5e83aa-736a-4190-8b64-7261db7382f6, addrs=[10.83.89.183, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, ip-10-83-89-183.ec2.internal/10.83.89.183:47500], discPort=47500, order=32, intOrder=17, lastExchangeTime=1531931329178, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], connected=true, connectCnt=0, queueLimit=262144, reserveCnt=1, pairedConnections=false], super=GridNioSessionImpl [locAddr=/10.83.87.131:47100, rmtAddr=/10.83.89.183:34664, createTime=1531931330498, closeTime=0, bytesSent=77322, bytesRcvd=5740, bytesSent0=0, bytesRcvd0=853, sndSchedTime=1531931330498, lastSndTime=1531931500547, lastRcvTime=1531931660527, readsPaused=false, filterChain=FilterChain[filters=[GridNioCodecFilter [parser=org.apache.ignite.internal.util.nio.GridDirectParser#665c2413, directMode=true], GridConnectionBytesVerifyFilter], accepted=true]], super=GridAbstractCommunicationClient [lastUsed=1531931330508, closed=false, connIdx=0]]]
NIO sessions statistics:
>> Selector info [idx=3, keysCnt=1, bytesRcvd=5740, bytesRcvd0=853, bytesSent=77322, bytesSent0=0]
Connection info [in=true, rmtAddr=/10.83.89.183:34664, locAddr=/10.83.87.131:47100, msgsSent=5, msgsAckedByRmt=0, descIdHash=1972345954, unackedMsgs=[IgniteDiagnosticMessage, IgniteDiagnosticMessage, IgniteDiagnosticMessage, IgniteDiagnosticMessage, IgniteDiagnosticMessage], msgsRcvd=7, lastAcked=0, descIdHash=1972345954, bytesRcvd=5740, bytesRcvd0=853, bytesSent=77322, bytesSent0=0, opQueueSize=0]
Exchange future: GridDhtPartitionsExchangeFuture [firstDiscoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode [id=7d5e83aa-736a-4190-8b64-7261db7382f6, addrs=[10.83.89.183, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, ip-10-83-89-183.ec2.internal/10.83.89.183:47500], discPort=47500, order=32, intOrder=17, lastExchangeTime=1531931329178, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], topVer=32, nodeId8=a450db0b, msg=Node joined: TcpDiscoveryNode [id=7d5e83aa-736a-4190-8b64-7261db7382f6, addrs=[10.83.89.183, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, ip-10-83-89-183.ec2.internal/10.83.89.183:47500], discPort=47500, order=32, intOrder=17, lastExchangeTime=1531931329178, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], type=NODE_JOINED, tstamp=1531931329402], crd=null, exchId=GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=32, minorTopVer=0], discoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode [id=7d5e83aa-736a-4190-8b64-7261db7382f6, addrs=[10.83.89.183, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, ip-10-83-89-183.ec2.internal/10.83.89.183:47500], discPort=47500, order=32, intOrder=17, lastExchangeTime=1531931329178, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], topVer=32, nodeId8=a450db0b, msg=Node joined: TcpDiscoveryNode [id=7d5e83aa-736a-4190-8b64-7261db7382f6, addrs=[10.83.89.183, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, ip-10-83-89-183.ec2.internal/10.83.89.183:47500], discPort=47500, order=32, intOrder=17, lastExchangeTime=1531931329178, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], type=NODE_JOINED, tstamp=1531931329402], nodeId=7d5e83aa, evt=NODE_JOINED], added=true, initFut=GridFutureAdapter [ignoreInterrupts=false, state=INIT, res=null, hash=980776600], init=false, lastVer=GridCacheVersion [topVer=0, order=1531931327875, nodeOrder=0], partReleaseFut=null, exchActions=null, affChangeMsg=null, initTs=0, centralizedAff=false, changeGlobalStateE=null, done=false, state=null, evtLatch=0, remaining=[], super=GridFutureAdapter [ignoreInterrupts=false, state=INIT, res=null, hash=2138568466]]
Local communication statistics:
Communication SPI statistics [rmtNode=a450db0b-ce86-4f0b-a34b-a2f9c83bb3d9]
Communication SPI recovery descriptors:
[key=ConnectionKey [nodeId=a450db0b-ce86-4f0b-a34b-a2f9c83bb3d9, idx=0, connCnt=-1], msgsSent=7, msgsAckedByRmt=0, msgsRcvd=6, lastAcked=0, reserveCnt=1, descIdHash=1891649612]
Communication SPI clients:
Communication SPI clients:
[node=a450db0b-ce86-4f0b-a34b-a2f9c83bb3d9, client=GridTcpNioCommunicationClient [ses=GridSelectorNioSessionImpl [worker=DirectNioClientWorker [super=AbstractNioClientWorker [idx=0, bytesRcvd=92833, bytesSent=5698, bytesRcvd0=15539, bytesSent0=853, select=true, super=GridWorker [name=grid-nio-worker-tcp-comm-0, igniteInstanceName=fdps, finished=false, hashCode=2040212682, interrupted=false, runner=grid-nio-worker-tcp-comm-0-#25%fdps%]]], writeBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], inRecovery=GridNioRecoveryDescriptor [acked=0, resendCnt=0, rcvCnt=6, sentCnt=7, reserved=true, lastAck=0, nodeLeft=false, node=TcpDiscoveryNode [id=a450db0b-ce86-4f0b-a34b-a2f9c83bb3d9, addrs=[10.83.87.131, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, ip-10-83-87-131.ec2.internal/10.83.87.131:47500], discPort=47500, order=26, intOrder=14, lastExchangeTime=1531931329258, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], connected=false, connectCnt=1, queueLimit=262144, reserveCnt=1, pairedConnections=false], outRecovery=GridNioRecoveryDescriptor [acked=0, resendCnt=0, rcvCnt=6, sentCnt=7, reserved=true, lastAck=0, nodeLeft=false, node=TcpDiscoveryNode [id=a450db0b-ce86-4f0b-a34b-a2f9c83bb3d9, addrs=[10.83.87.131, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, ip-10-83-87-131.ec2.internal/10.83.87.131:47500], discPort=47500, order=26, intOrder=14, lastExchangeTime=1531931329258, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], connected=false, connectCnt=1, queueLimit=262144, reserveCnt=1, pairedConnections=false], super=GridNioSessionImpl [locAddr=/10.83.89.183:34664, rmtAddr=ip-10-83-87-131.ec2.internal/10.83.87.131:47100, createTime=1531931330468, closeTime=0, bytesSent=5698, bytesRcvd=92833, bytesSent0=853, bytesRcvd0=15539, sndSchedTime=1531931330468, lastSndTime=1531931660528, lastRcvTime=1531931660538, readsPaused=false, filterChain=FilterChain[filters=[GridNioCodecFilter [parser=org.apache.ignite.internal.util.nio.GridDirectParser#72024a61, directMode=true], GridConnectionBytesVerifyFilter], accepted=false]], super=GridAbstractCommunicationClient [lastUsed=1531931330468, closed=false, connIdx=0]]]
NIO sessions statistics:
>> Selector info [idx=0, keysCnt=1, bytesRcvd=92833, bytesRcvd0=15539, bytesSent=5698, bytesSent0=853]
Connection info [in=false, rmtAddr=ip-10-83-87-131.ec2.internal/10.83.87.131:47100, locAddr=/10.83.89.183:34664, msgsSent=7, msgsAckedByRmt=0, descIdHash=1891649612, unackedMsgs=[GridDhtPartitionsSingleMessage, IgniteDiagnosticMessage, IgniteDiagnosticMessage, IgniteDiagnosticMessage, IgniteDiagnosticMessage], msgsRcvd=6, lastAcked=0, descIdHash=1891649612, bytesRcvd=92833, bytesRcvd0=15539, bytesSent=5698, bytesSent0=853, opQueueSize=0]
2018-07-18 16:34:29.598 UTC [FDPS] [localhost-startStop-1] [WARN ] [,] o.a.i.i.p.c.GridCachePartitionExchangeManager - Still waiting for initial partition map exchange [fut=GridDhtPartitionsExchangeFuture [firstDiscoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode [id=7d5e83aa-736a-4190-8b64-7261db7382f6, addrs=[10.83.89.183, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, ip-10-83-89-183.ec2.internal/10.83.89.183:47500], discPort=47500, order=32, intOrder=17, lastExchangeTime=1531931669507, loc=true, ver=2.4.0#20180305-sha1:aa342270, isClient=false], topVer=32, nodeId8=7d5e83aa, msg=null, type=NODE_JOINED, tstamp=1531931329481], crd=TcpDiscoveryNode [id=a450db0b-ce86-4f0b-a34b-a2f9c83bb3d9, addrs=[10.83.87.131, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, ip-10-83-87-131.ec2.internal/10.83.87.131:47500], discPort=47500, order=26, intOrder=14, lastExchangeTime=1531931329258, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], exchId=GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=32, minorTopVer=0], discoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode [id=7d5e83aa-736a-4190-8b64-7261db7382f6, addrs=[10.83.89.183, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, ip-10-83-89-183.ec2.internal/10.83.89.183:47500], discPort=47500, order=32, intOrder=17, lastExchangeTime=1531931669507, loc=true, ver=2.4.0#20180305-sha1:aa342270, isClient=false], topVer=32, nodeId8=7d5e83aa, msg=null, type=NODE_JOINED, tstamp=1531931329481], nodeId=7d5e83aa, evt=NODE_JOINED], added=true, initFut=GridFutureAdapter [ignoreInterrupts=false, state=DONE, res=true, hash=247159314], init=true, lastVer=null, partReleaseFut=PartitionReleaseFuture [topVer=AffinityTopologyVersion [topVer=32, minorTopVer=0], futures=[ExplicitLockReleaseFuture [topVer=AffinityTopologyVersion [topVer=32, minorTopVer=0], futures=[]], TxReleaseFuture [topVer=AffinityTopologyVersion [topVer=32, minorTopVer=0], futures=[]], AtomicUpdateReleaseFuture [topVer=AffinityTopologyVersion [topVer=32, minorTopVer=0], futures=[]], DataStreamerReleaseFuture [topVer=AffinityTopologyVersion [topVer=32, minorTopVer=0], futures=[]]]], exchActions=ExchangeActions [startCaches=null, stopCaches=null, startGrps=[], stopGrps=[], resetParts=null, stateChangeRequest=null], affChangeMsg=null, initTs=1531931329576, centralizedAff=false, changeGlobalStateE=null, done=false, state=SRV, evtLatch=0, remaining=[a450db0b-ce86-4f0b-a34b-a2f9c83bb3d9], super=GridFutureAdapter [ignoreInterrupts=false, state=INIT, res=null, hash=1272213534]]]
2018-07-18 16:34:30.537 UTC [FDPS] [exchange-worker-#35%fdps%] [WARN ] [,] o.apache.ignite.internal.diagnostic - Failed to wait for partition map exchange [topVer=AffinityTopologyVersion [topVer=32, minorTopVer=0], node=7d5e83aa-736a-4190-8b64-7261db7382f6]. Dumping pending objects that might be the cause:
2018-07-18 16:34:40.537 UTC [FDPS] [exchange-worker-#35%fdps%] [WARN ] [,] o.apache.ignite.internal.diagnostic - Failed to wait for partition map exchange [topVer=AffinityTopologyVersion [topVer=32, minorTopVer=0], node=7d5e83aa-736a-4190-8b64-7261db7382f6]. Dumping pending objects that might be the cause:
Info about the other node 10-83-85-122
The other joining node never got started and was stuck in the ignite start phase . The logs also dont show the node to get up or the ip discovery to get kicked in . to eventually cause the node to be removed via autoscaling .
Transactional errors received
javax.cache.CacheException: Failed to acquire lock for keys (primary node left grid, retry transaction if possible) [keys=[UserKeyCacheObjectImpl [part=281,
Partition map exchange is a process of exchanging information between nodes where each piece of data is stored. It happens every time, when topology changes.
Every node sends a GridDhtPartitionsSingleMessage to a coordinator. Once the coordinator collected all such messages, it sends GridDhtPartitionsFullMessage back to other nodes. These messages are sent over communication SPI.
But if some of non-coordinator nodes don't send the SingleMessage to the coordinator, or if the coordinator doesn't send the FullMessage, then "Failed to wait for partition map exchange" error occurs.
Judging by the piece of log, that you provided, a node with ID=ba6aba6c didn't send the SingleMessage to the coordinator. It may mean, that communication SPI doesn't work there properly. Make sure, that ports, that are required for communication SPI are available. Usually it's 47100..47200.
Also joining node may be stuck on something. Look at its log to figure out, what happens there.
We are running Ignite 2.4 & have 2 server nodes & 30 odd client nodes. We use zookeeper discovery & the nodes are deployed in a Docker swarm environment.
After a while of running i saw the below exception in one of the ignite clients & the caches no longer seem to work,
service-be - [INFO ] 2018-06-15 02:01:52.256 [grid-timeout-worker-#55] org.apache.ignite.internal.IgniteKernal -
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
^-- Node [id=5249f20c, uptime=02:49:02.178]
^-- H/N/C [hosts=34, nodes=34, CPUs=816]
^-- CPU [cur=24.2%, avg=0.27%, GC=0%]
^-- PageMemory [pages=0]
^-- Heap [used=848MB, free=17.19%, comm=1024MB]
^-- Non heap [used=241MB, free=84.12%, comm=251MB]
^-- Outbound messages queue [size=4]
^-- Public thread pool [active=0, idle=0, qSize=0]
^-- System thread pool [active=0, idle=24, qSize=0]
service-be - [INFO ] 2018-06-15 02:01:52.432 [grid-nio-worker-tcp-comm-2-#59] org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi - Accepted incoming communication connection [locAddr=/10.11.0.7:47100, rmtAddr=/10.11.0.75:59204]
service-be - [INFO ] 2018-06-15 02:01:52.433 [grid-nio-worker-tcp-comm-2-#59] org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi - Received incoming connection when already connected to this node, rejecting [locNode=5249f20c-456b-4b6f-ab41-f5cd5c3c05ba, rmtNode=6739c9af-42d1-4aad-ac9c-ac738ed13534]
service-be - [INFO ] 2018-06-15 02:01:52.634 [grid-nio-worker-tcp-comm-3-#60] org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi - Accepted incoming communication connection [locAddr=/10.11.0.7:47100, rmtAddr=/10.11.0.75:59206]
service-be - [INFO ] 2018-06-15 02:01:52.635 [grid-nio-worker-tcp-comm-3-#60] org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi - Received incoming connection when already connected to this node, rejecting [locNode=5249f20c-456b-4b6f-ab41-f5cd5c3c05ba, rmtNode=6739c9af-42d1-4aad-ac9c-ac738ed13534]
service-be - [INFO ] 2018-06-15 02:01:52.836 [grid-nio-worker-tcp-comm-4-#61] org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi - Accepted incoming communication connection [locAddr=/10.11.0.7:47100, rmtAddr=/10.11.0.75:59208]
service-be - [INFO ] 2018-06-15 02:01:52.837 [grid-nio-worker-tcp-comm-4-#61] org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi - Received incoming connection when already connected to this node, rejecting [locNode=5249f20c-456b-4b6f-ab41-f5cd5c3c05ba, rmtNode=6739c9af-42d1-4aad-ac9c-ac738ed13534]
service-be - [INFO ] 2018-06-15 02:01:53.038 [grid-nio-worker-tcp-comm-5-#62] org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi - Accepted incoming communication connection [locAddr=/10.11.0.7:47100, rmtAddr=/10.11.0.75:59210]
service-be - [INFO ] 2018-06-15 02:01:53.039 [grid-nio-worker-tcp-comm-5-#62] org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi - Received incoming connection when already connected to this node, rejecting [locNode=5249f20c-456b-4b6f-ab41-f5cd5c3c05ba, rmtNode=6739c9af-42d1-4aad-ac9c-ac738ed13534]
service-be - [ERROR] 2018-06-15 02:01:53.231 [grid-nio-worker-tcp-comm-0-#57] org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi - Failed to process selector key [ses=GridSelectorNioSessionImpl [worker=DirectNioClientWorker [super=AbstractNioClientWorker [idx=0, bytesRcvd=70700138, bytesSent=18478193, bytesRcvd0=0, bytesSent0=0, select=true, super=GridWorker [name=grid-nio-worker-tcp-comm-0, igniteInstanceName=null, finished=false, hashCode=30436088, interrupted=false, runner=grid-nio-worker-tcp-comm-0-#57]]], writeBuf=java.nio.DirectByteBuffer[pos=0 lim=186 cap=32768], readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], inRecovery=GridNioRecoveryDescriptor [acked=48224, resendCnt=0, rcvCnt=111504, sentCnt=48229, reserved=true, lastAck=111488, nodeLeft=false, node=TcpDiscoveryNode [id=6739c9af-42d1-4aad-ac9c-ac738ed13534, addrs=[10.11.0.74, 10.11.0.75, 127.0.0.1, 172.18.0.22], sockAddrs=[/172.18.0.22:47500, bdd554c3dc77/10.11.0.75:47500, /10.11.0.74:47500, /127.0.0.1:47500], discPort=47500, order=1, intOrder=1, lastExchangeTime=1529039549468, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], connected=false, connectCnt=1, queueLimit=131072, reserveCnt=2, pairedConnections=false], outRecovery=GridNioRecoveryDescriptor [acked=48224, resendCnt=0, rcvCnt=111504, sentCnt=48229, reserved=true, lastAck=111488, nodeLeft=false, node=TcpDiscoveryNode [id=6739c9af-42d1-4aad-ac9c-ac738ed13534, addrs=[10.11.0.74, 10.11.0.75, 127.0.0.1, 172.18.0.22], sockAddrs=[/172.18.0.22:47500, bdd554c3dc77/10.11.0.75:47500, /10.11.0.74:47500, /127.0.0.1:47500], discPort=47500, order=1, intOrder=1, lastExchangeTime=1529039549468, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], connected=false, connectCnt=1, queueLimit=131072, reserveCnt=2, pairedConnections=false], super=GridNioSessionImpl [locAddr=/10.11.0.7:42970, rmtAddr=bdd554c3dc77/10.11.0.75:47100, createTime=1529039561958, closeTime=0, bytesSent=18478193, bytesRcvd=70700138, bytesSent0=0, bytesRcvd0=0, sndSchedTime=1529044007457, lastSndTime=1529049712225, lastRcvTime=1529049712225, readsPaused=false, filterChain=FilterChain[filters=[GridNioCodecFilter [parser=o.a.i.i.util.nio.GridDirectParser#7a15b36, directMode=true], GridConnectionBytesVerifyFilter], accepted=false]]]
java.io.IOException: Broken pipe
at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
at sun.nio.ch.IOUtil.write(IOUtil.java:51)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471)
at org.apache.ignite.internal.util.nio.GridNioServer$DirectNioClientWorker.processWrite0(GridNioServer.java:1636)
at org.apache.ignite.internal.util.nio.GridNioServer$DirectNioClientWorker.processWrite(GridNioServer.java:1293)
at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.processSelectedKeysOptimized(GridNioServer.java:2307)
at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:2080)
at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.body(GridNioServer.java:1749)
at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
at java.lang.Thread.run(Thread.java:748)
service-be - [WARN ] 2018-06-15 02:01:53.231 [grid-nio-worker-tcp-comm-0-#57] org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi - Closing NIO session because of unhandled exception [cls=class o.a.i.i.util.nio.GridNioException, msg=Broken pipe]
service-be - [INFO ] 2018-06-15 02:01:53.240 [grid-nio-worker-tcp-comm-6-#63] org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi - Accepted incoming communication connection [locAddr=/10.11.0.7:47100, rmtAddr=/10.11.0.75:59212]
service-be - [WARN ] 2018-06-15 02:02:03.253 [tcp-comm-worker-#1] org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi - Connect timed out (consider increasing 'failureDetectionTimeout' configuration property) [addr=/172.18.0.22:47100, failureDetectionTimeout=10000]
On searching with the remote node with which there seems to be trouble connecting (as mentioned in the trace above) I also see these warnings in some of the other client nodes aswell.
Any obvious pointers on what could be going wrong?. From what i have searched one suggestion was to use ipv4 but the docker overlay has enableipv6 as disabled in our case..so i am not sure how much that will help.
[root#rhel743411 logs]# egrep -i "6739c9af-42d1-4aad-ac9c-ac738ed13534" *
service1-mw.log:service1-mw - [WARN ] 2018-06-16 00:27:55.884 [grid-timeout-worker-#55] org.apache.ignite.internal.diagnostic - Found long running cache future [startTime=00:26:02.991, curTime=00:27:55.876, fut=GridDhtColocatedLockFuture [threadId=39579, keys=[UserKeyCacheObjectImpl [part=8, val=8, hasValBytes=false]], futId=776a8520461-6a403605-a8fd-4ed1-bd45-92e648929a2a, lockVer=GridCacheVersion [topVer=140519300, order=1529059257539, nodeOrder=6], read=false, retval=true, err=null, timeout=120000, topVer=AffinityTopologyVersion [topVer=34, minorTopVer=0], done=0, trackable=true, createTtl=-1, accessTtl=-1, skipStore=false, keepBinary=false, recovery=false, miniId=1, topVer=AffinityTopologyVersion [topVer=34, minorTopVer=0], innerFuts=[[node=6739c9af-42d1-4aad-ac9c-ac738ed13534, rcvRes=false, loc=false, done=false]], inTx=false, super=GridCompoundIdentityFuture [super=GridCompoundFuture [rdc=Bool reducer: true, initFlag=1, lsnrCalls=0, done=false, cancelled=false, err=null, futs=[false]]]]]
service1-mw.log:service1-mw - [WARN ] 2018-06-16 00:27:55.884 [grid-timeout-worker-#55] org.apache.ignite.internal.diagnostic - Found long running cache future [startTime=00:25:55.893, curTime=00:27:55.876, fut=GridDhtColocatedLockFuture [threadId=297, keys=[UserKeyCacheObjectImpl [part=8, val=8, hasValBytes=false]], futId=f03a8520461-6a403605-a8fd-4ed1-bd45-92e648929a2a, lockVer=GridCacheVersion [topVer=140519300, order=1529059253553, nodeOrder=6], read=false, retval=true, err=null, timeout=120000, topVer=AffinityTopologyVersion [topVer=34, minorTopVer=0], done=0, trackable=true, createTtl=-1, accessTtl=-1, skipStore=false, keepBinary=false, recovery=false, miniId=1, topVer=AffinityTopologyVersion [topVer=34, minorTopVer=0], innerFuts=[[node=6739c9af-42d1-4aad-ac9c-ac738ed13534, rcvRes=false, loc=false, done=false]], inTx=false, super=GridCompoundIdentityFuture [super=GridCompoundFuture [rdc=Bool reducer: true, initFlag=1, lsnrCalls=0, done=false, cancelled=false, err=null, futs=[false]]]]]
service1-mw.log:service1-mw - [WARN ] 2018-06-16 00:27:55.884 [grid-timeout-worker-#55] org.apache.ignite.internal.diagnostic - Found long running cache future [startTime=00:26:51.661, curTime=00:27:55.876, fut=GridDhtColocatedLockFuture [threadId=38749, keys=[UserKeyCacheObjectImpl [part=7, val=7, hasValBytes=false], UserKeyCacheObjectImpl [part=8, val=8, hasValBytes=false]], futId=354b8520461-6a403605-a8fd-4ed1-bd45-92e648929a2a, lockVer=GridCacheVersion [topVer=140519300, order=1529059268380, nodeOrder=6], read=false, retval=true, err=null, timeout=120000, topVer=AffinityTopologyVersion [topVer=34, minorTopVer=0], done=0, trackable=true, createTtl=-1, accessTtl=-1, skipStore=false, keepBinary=false, recovery=false, miniId=1, topVer=AffinityTopologyVersion [topVer=34, minorTopVer=0], innerFuts=[[node=6739c9af-42d1-4aad-ac9c-ac738ed13534, rcvRes=false, loc=false, done=false]], inTx=false, super=GridCompoundIdentityFuture [super=GridCompoundFuture [rdc=Bool reducer: true, initFlag=1, lsnrCalls=0, done=false, cancelled=false, err=null, futs=[false]]]]]
service1-mw.log:service1-mw - [WARN ] 2018-06-16 00:27:55.885 [grid-timeout-worker-#55] org.apache.ignite.internal.diagnostic - Found long running cache future [startTime=00:26:51.772, curTime=00:27:55.876, fut=GridDhtColocatedLockFuture [threadId=343, keys=[UserKeyCacheObjectImpl [part=7, val=7, hasValBytes=false]], futId=125b8520461-6a403605-a8fd-4ed1-bd45-92e648929a2a, lockVer=GridCacheVersion [topVer=140519300, order=1529059268816, nodeOrder=6], read=false, retval=true, err=null, timeout=120000, topVer=AffinityTopologyVersion [topVer=34, minorTopVer=0], done=0, trackable=true, createTtl=-1, accessTtl=-1, skipStore=false, keepBinary=false, recovery=false, miniId=1, topVer=AffinityTopologyVersion [topVer=34, minorTopVer=0], innerFuts=[[node=6739c9af-42d1-4aad-ac9c-ac738ed13534, rcvRes=false, loc=false, done=false]], inTx=false, super=GridCompoundIdentityFuture [super=GridCompoundFuture [rdc=Bool reducer: true, initFlag=1, lsnrCalls=0, done=false, cancelled=false, err=null, futs=[false]]]]]
service2-mw.log:service2y-mw - [WARN ] 2018-06-16 00:01:10.227 [grid-timeout-worker-#55] org.apache.ignite.internal.diagnostic - Found long running cache future [startTime=23:59:12.637, curTime=00:01:10.221, fut=GridDhtColocatedLockFuture [threadId=21129, keys=[UserKeyCacheObjectImpl [part=8, val=8, hasValBytes=false]], futId=f5216120461-0c4dcfda-c90b-42a3-83c4-8d2f8ecb6ab1, lockVer=GridCacheVersion [topVer=140519300, order=1529058842000, nodeOrder=17], read=false, retval=true, err=null, timeout=120000, topVer=AffinityTopologyVersion [topVer=34, minorTopVer=0], done=0, trackable=true, createTtl=-1, accessTtl=-1, skipStore=false, keepBinary=false, recovery=false, miniId=1, topVer=AffinityTopologyVersion [topVer=34, minorTopVer=0], innerFuts=[[node=6739c9af-42d1-4aad-ac9c-ac738ed13534, rcvRes=false, loc=false, done=false]], inTx=false, super=GridCompoundIdentityFuture [super=GridCompoundFuture [rdc=Bool reducer: true, initFlag=1, lsnrCalls=0, done=false, cancelled=false, err=null, futs=[false]]]]]
service2-mw.log:service2y-mw - [WARN ] 2018-06-16 00:09:10.242 [grid-timeout-worker-#55] org.apache.ignite.internal.diagnostic - Found long running cache future [startTime=00:07:30.520, curTime=00:09:10.239, fut=GridDhtColocatedLockFuture [threadId=21304, keys=[UserKeyCacheObjectImpl [part=8, val=8, hasValBytes=false]], futId=42176120461-0c4dcfda-c90b-42a3-83c4-8d2f8ecb6ab1, lockVer=GridCacheVersion [topVer=140519300, order=1529058982457, nodeOrder=17], read=false, retval=true, err=null, timeout=120000, topVer=AffinityTopologyVersion [topVer=34, minorTopVer=0], done=0, trackable=true, createTtl=-1, accessTtl=-1, skipStore=false, keepBinary=false, recovery=false, miniId=1, topVer=AffinityTopologyVersion [topVer=34, minorTopVer=0], innerFuts=[[node=6739c9af-42d1-4aad-ac9c-ac738ed13534, rcvRes=false, loc=false, done=false]], inTx=false, super=GridCompoundIdentityFuture [super=GridCompoundFuture [rdc=Bool reducer: true, initFlag=1, lsnrCalls=0, done=false, cancelled=false, err=null, futs=[false]]]]]
service2-mw.log:service2y-mw - [WARN ] 2018-06-16 00:13:10.269 [grid-timeout-worker-#55] org.apache.ignite.internal.diagnostic - Found long running cache future [startTime=00:11:32.462, curTime=00:13:10.268, fut=GridDhtColocatedLockFuture [threadId=21368, keys=[UserKeyCacheObjectImpl [part=7, val=7, hasValBytes=false]], futId=c0f96120461-0c4dcfda-c90b-42a3-83c4-8d2f8ecb6ab1, lockVer=GridCacheVersion [topVer=140519300, order=1529059041395, nodeOrder=17], read=false, retval=true, err=null, timeout=120000, topVer=AffinityTopologyVersion [topVer=34, minorTopVer=0], done=0, trackable=true, createTtl=-1, accessTtl=-1, skipStore=false, keepBinary=false, recovery=false, miniId=1, topVer=AffinityTopologyVersion [topVer=34, minorTopVer=0], innerFuts=[[node=6739c9af-42d1-4aad-ac9c-ac738ed13534, rcvRes=false, loc=false, done=false]], inTx=false, super=GridCompoundIdentityFuture [super=GridCompoundFuture [rdc=Bool reducer: true, initFlag=1, lsnrCalls=0, done=false, cancelled=false, err=null, futs=[false]]]]]
service2-mw.log:service2y-mw - [WARN ] 2018-06-16 00:15:10.281 [grid-timeout-worker-#55] org.apache.ignite.internal.diagnostic - Found long running cache future [startTime=00:13:43.800, curTime=00:15:10.279, fut=GridDhtColocatedLockFuture [threadId=172, keys=[UserKeyCacheObjectImpl [part=7, val=7, hasValBytes=false], UserKeyCacheObjectImpl [part=8, val=8, hasValBytes=false]], futId=49ab6120461-0c4dcfda-c90b-42a3-83c4-8d2f8ecb6ab1, lockVer=GridCacheVersion [topVer=140519300, order=1529059079186, nodeOrder=17], read=false, retval=true, err=null, timeout=120000, topVer=AffinityTopologyVersion [topVer=34, minorTopVer=0], done=0, trackable=true, createTtl=-1, accessTtl=-1, skipStore=false, keepBinary=false, recovery=false, miniId=1, topVer=AffinityTopologyVersion [topVer=34, minorTopVer=0], innerFuts=[[node=6739c9af-42d1-4aad-ac9c-ac738ed13534, rcvRes=false, loc=false, done=false]], inTx=false, super=GridCompoundIdentityFuture [super=GridCompoundFuture [rdc=Bool reducer: true, initFlag=1, lsnrCalls=0, done=false, cancelled=false, err=null, futs=[false]]]]]
service2-mw.log:service2y-mw - [WARN ] 2018-06-16 00:17:10.289 [grid-timeout-worker-#55] org.apache.ignite.internal.diagnostic - Found long running cache future [startTime=00:15:44.860, curTime=00:17:10.287, fut=GridDhtColocatedLockFuture [threadId=172, keys=[UserKeyCacheObjectImpl [part=8, val=8, hasValBytes=false]], futId=a3ec6120461-0c4dcfda-c90b-42a3-83c4-8d2f8ecb6ab1, lockVer=GridCacheVersion [topVer=140519300, order=1529059106786, nodeOrder=17], read=false, retval=true, err=null, timeout=120000, topVer=AffinityTopologyVersion [topVer=34, minorTopVer=0], done=0, trackable=true, createTtl=-1, accessTtl=-1, skipStore=false, keepBinary=false, recovery=false, miniId=1, topVer=AffinityTopologyVersion [topVer=34, minorTopVer=0], innerFuts=[[node=6739c9af-42d1-4aad-ac9c-ac738ed13534, rcvRes=false, loc=false, done=false]], inTx=false, super=GridCompoundIdentityFuture [super=GridCompoundFuture [rdc=Bool reducer: true, initFlag=1, lsnrCalls=0, done=false, cancelled=false, err=null, futs=[false]]]]]
service2-mw.log:service2y-mw - [WARN ] 2018-06-16 00:20:10.299 [grid-timeout-worker-#55] org.apache.ignite.internal.diagnostic - Found long running cache future [startTime=00:18:51.741, curTime=00:20:10.298, fut=GridDhtColocatedLockFuture [threadId=172, keys=[UserKeyCacheObjectImpl [part=7, val=7, hasValBytes=false], UserKeyCacheObjectImpl [part=8, val=8, hasValBytes=false]], futId=8ace6120461-0c4dcfda-c90b-42a3-83c4-8d2f8ecb6ab1, lockVer=GridCacheVersion [topVer=140519300, order=1529059136637, nodeOrder=17], read=false, retval=true, err=null, timeout=120000, topVer=AffinityTopologyVersion [topVer=34, minorTopVer=0], done=0, trackable=true, createTtl=-1, accessTtl=-1, skipStore=false, keepBinary=false, recovery=false, miniId=1, topVer=AffinityTopologyVersion [topVer=34, minorTopVer=0], innerFuts=[[node=6739c9af-42d1-4aad-ac9c-ac738ed13534, rcvRes=false, loc=false, done=false]], inTx=false, super=GridCompoundIdentityFuture [super=GridCompoundFuture [rdc=Bool reducer: true, initFlag=1, lsnrCalls=0, done=false, cancelled=false, err=null, futs=[false]]]]]
service2-mw.log:service2y-mw - [WARN ] 2018-06-16 00:21:10.308 [grid-timeout-worker-#55] org.apache.ignite.internal.diagnostic - Found long running cache future [startTime=00:19:19.018, curTime=00:21:10.304, fut=GridDhtColocatedLockFuture [threadId=21484, keys=[UserKeyCacheObjectImpl [part=8, val=8, hasValBytes=false]], futId=bd7f6120461-0c4dcfda-c90b-42a3-83c4-8d2f8ecb6ab1, lockVer=GridCacheVersion [topVer=140519300, order=1529059155514, nodeOrder=17], read=false, retval=true, err=null, timeout=120000, topVer=AffinityTopologyVersion [topVer=34, minorTopVer=0], done=0, trackable=true, createTtl=-1, accessTtl=-1, skipStore=false, keepBinary=false, recovery=false, miniId=1, topVer=AffinityTopologyVersion [topVer=34, minorTopVer=0], innerFuts=[[node=6739c9af-42d1-4aad-ac9c-ac738ed13534, rcvRes=false, loc=false, done=false]], inTx=false, super=GridCompoundIdentityFuture [super=GridCompoundFuture [rdc=Bool reducer: true, initFlag=1, lsnrCalls=0, done=false, cancelled=false, err=null, futs=[false]]]]]
service2-mw.log:service2y-mw - [WARN ] 2018-06-16 00:24:10.326 [grid-timeout-worker-#55] org.apache.ignite.internal.diagnostic - Found long running cache future [startTime=00:23:03.860, curTime=00:24:10.323, fut=GridDhtColocatedLockFuture [threadId=21544, keys=[UserKeyCacheObjectImpl [part=7, val=7, hasValBytes=false]], futId=f3e17120461-0c4dcfda-c90b-42a3-83c4-8d2f8ecb6ab1, lockVer=GridCacheVersion [topVer=140519300, order=1529059200701, nodeOrder=17], read=false, retval=true, err=null, timeout=120000, topVer=AffinityTopologyVersion [topVer=34, minorTopVer=0], done=0, trackable=true, createTtl=-1, accessTtl=-1, skipStore=false, keepBinary=false, recovery=false, miniId=1, topVer=AffinityTopologyVersion [topVer=34, minorTopVer=0], innerFuts=[[node=6739c9af-42d1-4aad-ac9c-ac738ed13534, rcvRes=false, loc=false, done=false]], inTx=false, super=GridCompoundIdentityFuture [super=GridCompoundFuture [rdc=Bool reducer: true, initFlag=1, lsnrCalls=0, done=false, cancelled=false, err=null, futs=[false]]]]]
service2-mw.log:service2y-mw - [WARN ] 2018-06-16 00:24:10.326 [grid-timeout-worker-#55] org.apache.ignite.internal.diagnostic - Found long running cache future [startTime=00:22:52.783, curTime=00:24:10.323, fut=GridDhtColocatedLockFuture [threadId=172, keys=[UserKeyCacheObjectImpl [part=7, val=7, hasValBytes=false], UserKeyCacheObjectImpl [part=8, val=8, hasValBytes=false]], futId=edc17120461-0c4dcfda-c90b-42a3-83c4-8d2f8ecb6ab1, lockVer=GridCacheVersion [topVer=140519300, order=1529059199113, nodeOrder=17], read=false, retval=true, err=null, timeout=120000, topVer=AffinityTopologyVersion [topVer=34, minorTopVer=0], done=0, trackable=true, createTtl=-1, accessTtl=-1, skipStore=false, keepBinary=false, recovery=false, miniId=1, topVer=AffinityTopologyVersion [topVer=34, minorTopVer=0], innerFuts=[[node=6739c9af-42d1-4aad-ac9c-ac738ed13534, rcvRes=false, loc=false, done=false]], inTx=false, super=GridCompoundIdentityFuture [super=GridCompoundFuture [rdc=Bool reducer: true, initFlag=1, lsnrCalls=0, done=false, cancelled=false, err=null, futs=[false]]]]]
service2-mw.log:service2y-mw - [WARN ] 2018-06-16 00:26:10.330 [grid-timeout-worker-#55] org.apache.ignite.internal.diagnostic - Found long running cache future [startTime=00:24:59.321, curTime=00:26:10.328, fut=GridDhtColocatedLockFuture [threadId=172, keys=[UserKeyCacheObjectImpl [part=7, val=7, hasValBytes=false]], futId=74737120461-0c4dcfda-c90b-42a3-83c4-8d2f8ecb6ab1, lockVer=GridCacheVersion [topVer=140519300, order=1529059232146, nodeOrder=17], read=false, retval=true, err=null, timeout=120000, topVer=AffinityTopologyVersion [topVer=34, minorTopVer=0], done=0, trackable=true, createTtl=-1, accessTtl=-1, skipStore=false, keepBinary=false, recovery=false, miniId=1, topVer=AffinityTopologyVersion [topVer=34, minorTopVer=0], innerFuts=[[node=6739c9af-42d1-4aad-ac9c-ac738ed13534, rcvRes=false, loc=false, done=false]], inTx=false, super=GridCompoundIdentityFuture [super=GridCompoundFuture [rdc=Bool reducer: true, initFlag=1, lsnrCalls=0, done=false, cancelled=false, err=null, futs=[false]]]]]
service2-mw.log:service2y-mw - [WARN ] 2018-06-16 00:29:10.349 [grid-timeout-worker-#55] org.apache.ignite.internal.diagnostic - Found long running cache future [startTime=00:27:32.480, curTime=00:29:10.347, fut=GridDhtColocatedLockFuture [threadId=21621, keys=[UserKeyCacheObjectImpl [part=8, val=8, hasValBytes=false]], futId=1fe57120461-0c4dcfda-c90b-42a3-83c4-8d2f8ecb6ab1, lockVer=GridCacheVersion [topVer=140519300, order=1529059289421, nodeOrder=17], read=false, retval=true, err=null, timeout=120000, topVer=AffinityTopologyVersion [topVer=34, minorTopVer=0], done=0, trackable=true, createTtl=-1, accessTtl=-1, skipStore=false, keepBinary=false, recovery=false, miniId=1, topVer=AffinityTopologyVersion [topVer=34, minorTopVer=0], innerFuts=[[node=6739c9af-42d1-4aad-ac9c-ac738ed13534, rcvRes=false, loc=false, done=false]], inTx=false, super=GridCompoundIdentityFuture [super=GridCompoundFuture [rdc=Bool reducer: true, initFlag=1, lsnrCalls=0, done=false, cancelled=false, err=null, futs=[false]]]]]
I have 3 node cluster with 20+ client and it's running in spark context.Initially it working fine but randomly get issue whenever new node i.e. client try to connect with cluster.The cluster getting inoperative.I have got following logs when its stuck.If I restart any Ignite server explicitly then its release and work fine.I have use Ignite 2.4.0 version. same issue produced in Ignite 2.5.0 version too.
client side Logs
Failed to wait for partition map exchange [topVer=AffinityTopologyVersion [topVer=44, minorTopVer=0], node=4d885cfd-45ed-43a2-8088-f35c9469797f]. Dumping pending objects that might be the cause:
GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion [topVer=44, minorTopVer=0], evt=NODE_JOINED, evtNode=TcpDiscoveryNode [id=4d885cfd-45ed-43a2-8088-f35c9469797f, addrs=[0:0:0:0:0:0:0:1%lo, 10.13.10.179, 127.0.0.1], sockAddrs=[/0:0:0:0:0:0:0:1%lo:0, /127.0.0.1:0, hdn6.mstorm.com/10.13.10.179:0], discPort=0, order=44, intOrder=0, lastExchangeTime=1527651620413, loc=true, ver=2.4.0#20180305-sha1:aa342270, isClient=true], done=false]
Failed to wait for partition map exchange [topVer=AffinityTopologyVersion [topVer=44, minorTopVer=0], node=4d885cfd-45ed-43a2-8088-f35c9469797f]. Dumping pending objects that might be the cause:
GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion [topVer=44, minorTopVer=0], evt=NODE_JOINED, evtNode=TcpDiscoveryNode [id=4d885cfd-45ed-43a2-8088-f35c9469797f, addrs=[0:0:0:0:0:0:0:1%lo, 10.13.10.179, 127.0.0.1], sockAddrs=[/0:0:0:0:0:0:0:1%lo:0, /127.0.0.1:0, hdn6.mstorm.com/10.13.10.179:0], discPort=0, order=44, intOrder=0, lastExchangeTime=1527651620413, loc=true, ver=2.4.0#20180305-sha1:aa342270, isClient=true], done=false]
Failed to wait for initial partition map exchange. Possible reasons are:
^-- Transactions in deadlock.
^-- Long running transactions (ignore if this is the case).
^-- Unreleased explicit locks.
Still waiting for initial partition map exchange [fut=GridDhtPartitionsExchangeFuture [firstDiscoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode [id=4d885cfd-45ed-43a2-8088-f35c9469797f, addrs=
Server Side Logs
Possible starvation in striped pool. Thread name: sys-stripe-0-#1 Queue: [Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0, skipOnTimeout=false, msg=GridDhtTxPrepareResponse [nearEvicted=null, futId=869dd4ca361-fe7e167d-4d80-4f57-b004-13359a9f2c11, miniId=1, super=GridDistributedTxPrepareResponse [txState=null, part=-1, err=null, super=GridDistributedBaseMessage [ver=GridCacheVersion [topVer=139084030, order=1527604094903, nodeOrder=1], committedVers=null, rolledbackVers=null, cnt=0, super=GridCacheIdMessage [cacheId=0]]]]]], Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0, skipOnTimeout=false, msg=GridDhtAtomicSingleUpdateRequest [key=KeyCacheObjectImpl [part=984, val=null, hasValBytes=true], val=BinaryObjectImpl [arr= true, ctx=false, start=0], prevVal=null, super=GridDhtAtomicAbstractUpdateRequest [onRes=false, nearNodeId=null, nearFutId=0, flags=]]]], o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$DeferredUpdateTimeout#2735c674, Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0, skipOnTimeout=false, msg=GridDhtTxPrepareRequest [nearNodeId=628e3078-17fd-4e49-b9ae-ad94ad97a2f1, futId=6576e4ca361-6e7cdac2-d5a3-4624-9ad3-b93f25546cc3, miniId=1, topVer=AffinityTopologyVersion [topVer=20, minorTopVer=0], invalidateNearEntries={}, nearWrites=null, owned=null, nearXidVer=GridCacheVersion [topVer=139084030, order=1527604094933, nodeOrder=2], subjId=628e3078-17fd-4e49-b9ae-ad94ad97a2f1, taskNameHash=0, preloadKeys=null, super=GridDistributedTxPrepareRequest [threadId=86, concurrency=OPTIMISTIC, isolation=READ_COMMITTED, writeVer=GridCacheVersion [topVer=139084030, order=1527604094935, nodeOrder=2], timeout=0, reads=null, writes=[IgniteTxEntry [key=BinaryObjectImpl [arr= true, ctx=false, start=0], cacheId=-1755241537, txKey=null, val=[op=UPDATE, val=BinaryObjectImpl [arr= true, ctx=false, start=0]], prevVal=[op=NOOP, val=null], oldVal=[op=NOOP, val=null], entryProcessorsCol=null, ttl=-1, conflictExpireTime=-1, conflictVer=null, explicitVer=null, dhtVer=null, filters=null, filtersPassed=false, filtersSet=false, entry=null, prepared=0, locked=false, nodeId=null, locMapped=false, expiryPlc=null, transferExpiryPlc=false, flags=0, partUpdateCntr=0, serReadVer=null, xidVer=null]], dhtVers=null, txSize=0, plc=2, txState=null, flags=onePhase|last, super=GridDistributedBaseMessage [ver=GridCacheVersion [topVer=139084030, order=1527604094933, nodeOrder=2], committedVers=null, rolledbackVers=null, cnt=0, super=GridCacheIdMessage [cacheId=0]]]]]], Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0, skipOnTimeout=false, msg=GridDhtAtomicDeferredUpdateResponse [futIds=GridLongList [idx=2, arr=[65774,65775]]]]], Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0, skipOnTimeout=false, msg=GridNearAtomicSingleUpdateRequest [key=KeyCacheObjectImpl [part=1016, val=null, hasValBytes=true], parent=GridNearAtomicAbstractSingleUpdateRequest [nodeId=null, futId=49328, topVer=AffinityTopologyVersion [topVer=20, minorTopVer=0], parent=GridNearAtomicAbstractUpdateRequest [res=null, flags=needRes]]]]], Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0, skipOnTimeout=false, msg=GridDhtAtomicDeferredUpdateResponse [futIds=GridLongList [idx=1, arr=[98591]]]]], Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0, skipOnTimeout=false, msg=GridDhtAtomicDeferredUpdateResponse [futIds=GridLongList [idx=1, arr=[114926]]]]], Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0, skipOnTimeout=false, msg=GridNearAtomicSingleUpdateRequest [key=KeyCacheObjectImpl [part=1016, val=null, hasValBytes=true], parent=GridNearAtomicAbstractSingleUpdateRequest [nodeId=null, futId=32946, topVer=AffinityTopologyVersion [topVer=20, minorTopVer=0], parent=GridNear
Using Ignite 2.1, I start first node in default server mode with peer class loading enabled from command line. I see the following line in the logs:
When I start the second node (using IgniteSpringBean on a tomcat server, in client mode) I am getting the following error, even though peer class loading is enabled:
org.apache.ignite.IgniteCheckedException: Failed to find class with given class loader for unmarshalling (make sure same versions of all classes are available on all nodes or enable peer-class-loading) [clsLdr=sun.misc.Launcher$AppClassLoader#18b4aac2,...
Visor tells me that both the server and the client node are in the topology and both have peer class loading enabled...
Server logs:
[vagrant#tw apache-ignite-fabric-2.1.0-bin]$ ./bin/ignite.sh ./config/example-default.xml -v
Ignite Command Line Startup, ver. 2.1.0#20170720-sha1:a6ca5c8a
2017 Copyright(C) Apache Software Foundation
[13:41:51,967][INFO][main][IgniteKernal]
>>> __________ ________________
>>> / _/ ___/ |/ / _/_ __/ __/
>>> _/ // (7 7 // / / / / _/
>>> /___/\___/_/|_/___/ /_/ /___/
>>>
>>> ver. 2.1.0#20170720-sha1:a6ca5c8a
>>> 2017 Copyright(C) Apache Software Foundation
>>>
>>> Ignite documentation: http://ignite.apache.org
[13:41:51,967][INFO][main][IgniteKernal] Config URL: file:/home/vagrant/ignite/apache-ignite-fabric-2.1.0-bin/./config/example-default.xml
[13:41:51,968][INFO][main][IgniteKernal] Daemon mode: off
[13:41:51,968][INFO][main][IgniteKernal] OS: Linux 3.10.0-327.el7.x86_64 amd64
[13:41:51,968][INFO][main][IgniteKernal] OS user: vagrant
[13:41:51,968][INFO][main][IgniteKernal] PID: 8122
[13:41:51,968][INFO][main][IgniteKernal] Language runtime: Java Platform API Specification ver. 1.8
[13:41:51,968][INFO][main][IgniteKernal] VM information: Java(TM) SE Runtime Environment 1.8.0_60-b27 Oracle Corporation Java HotSpot(TM) 64-Bit Server VM 25.60-b23
[13:41:51,970][INFO][main][IgniteKernal] VM total memory: 0.97GB
[13:41:51,970][INFO][main][IgniteKernal] Remote Management [restart: on, REST: on, JMX (remote: on, port: 49122, auth: off, ssl: off)]
[13:41:51,970][INFO][main][IgniteKernal] IGNITE_HOME=/home/vagrant/ignite/apache-ignite-fabric-2.1.0-bin
[13:41:51,971][INFO][main][IgniteKernal] VM arguments: [-Xms1g, -Xmx1g, -XX:+AggressiveOpts, -XX:MaxMetaspaceSize=256m, -DIGNITE_QUIET=false, -DIGNITE_SUCCESS_FILE=/home/vagrant/ignite/apache-ignite-fabric-2.1.0-bin/work/ignite_success_96df797d-5531-4b3e-b396-5f44cdc1470e, -Dcom.sun.management.jmxremote, -Dcom.sun.management.jmxremote.port=49122, -Dcom.sun.management.jmxremote.authenticate=false, -Dcom.sun.management.jmxremote.ssl=false, -DIGNITE_HOME=/home/vagrant/ignite/apache-ignite-fabric-2.1.0-bin, -DIGNITE_PROG_NAME=./bin/ignite.sh]
[13:41:51,973][INFO][main][IgniteKernal] System cache's MemoryPolicy size is configured to 40 MB. Use MemoryConfiguration.systemCacheMemorySize property to change the setting.
[13:41:51,980][INFO][main][IgniteKernal] Configured caches [in 'sysMemPlc' memoryPolicy: ['ignite-sys-cache']]
[13:41:51,980][WARNING][main][IgniteKernal] Peer class loading is enabled (disable it in production for performance and deployment consistency reasons)
[13:41:52,002][INFO][main][IgniteKernal] 3-rd party licenses can be found at: /home/vagrant/ignite/apache-ignite-fabric-2.1.0-bin/libs/licenses
[13:41:52,077][INFO][main][IgnitePluginProcessor] Configured plugins:
[13:41:52,078][INFO][main][IgnitePluginProcessor] ^-- None
[13:41:52,078][INFO][main][IgnitePluginProcessor]
[13:41:52,138][INFO][main][TcpCommunicationSpi] Successfully bound communication NIO server to TCP port [port=47100, locHost=0.0.0.0/0.0.0.0, selectorsCnt=4, selectorSpins=0, pairedConn=false]
[13:41:52,150][WARNING][main][TcpCommunicationSpi] Message queue limit is set to 0 which may lead to potential OOMEs when running cache operations in FULL_ASYNC or PRIMARY_SYNC modes due to message queues growth on sender and receiver sides.
[13:41:52,169][WARNING][main][NoopCheckpointSpi] Checkpoints are disabled (to enable configure any GridCheckpointSpi implementation)
[13:41:52,196][WARNING][main][GridCollisionManager] Collision resolution is disabled (all jobs will be activated upon arrival).
[13:41:52,197][INFO][main][IgniteKernal] Security status [authentication=off, tls/ssl=off]
[13:41:52,516][INFO][main][SqlListenerProcessor] SQL connector processor has started on TCP port 10800
[13:41:52,550][INFO][main][GridTcpRestProtocol] Command protocol successfully started [name=TCP binary, host=0.0.0.0/0.0.0.0, port=11211]
[13:41:52,593][INFO][main][IgniteKernal] Non-loopback local IPs: 10.0.10.103, 10.0.2.15, fe80:0:0:0:a00:27ff:fe51:d0d8%eth0, fe80:0:0:0:a00:27ff:fee7:1d4f%eth1
[13:41:52,593][INFO][main][IgniteKernal] Enabled local MACs: 08002751D0D8, 080027E71D4F
[13:41:52,637][INFO][main][TcpDiscoverySpi] Successfully bound to TCP port [port=47500, localHost=0.0.0.0/0.0.0.0, locNodeId=2a929c01-f8a6-4b14-9857-88eaa2b58a87]
[13:41:54,030][INFO][exchange-worker-#28%null%][time] Started exchange init [topVer=AffinityTopologyVersion [topVer=12, minorTopVer=0], crd=true, evt=10, node=TcpDiscoveryNode [id=2a929c01-f8a6-4b14-9857-88eaa2b58a87, addrs=[0:0:0:0:0:0:0:1%lo, 10.0.10.103, 10.0.2.15, 127.0.0.1], sockAddrs=[/10.0.10.103:47500, /10.0.2.15:47500, /0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500], discPort=47500, order=12, intOrder=7, lastExchangeTime=1505328114016, loc=true, ver=2.1.0#20170720-sha1:a6ca5c8a, isClient=false], evtNode=TcpDiscoveryNode [id=2a929c01-f8a6-4b14-9857-88eaa2b58a87, addrs=[0:0:0:0:0:0:0:1%lo, 10.0.10.103, 10.0.2.15, 127.0.0.1], sockAddrs=[/10.0.10.103:47500, /10.0.2.15:47500, /0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500], discPort=47500, order=12, intOrder=7, lastExchangeTime=1505328114016, loc=true, ver=2.1.0#20170720-sha1:a6ca5c8a, isClient=false], customEvt=null]
[13:41:54,042][WARNING][exchange-worker-#28%null%][IgniteCacheDatabaseSharedManager] No user-defined default MemoryPolicy found; system default of 1GB size will be used.
[13:41:54,299][INFO][exchange-worker-#28%null%][GridCacheProcessor] Started cache [name=ignite-sys-cache, memoryPolicyName=sysMemPlc, mode=REPLICATED, atomicity=TRANSACTIONAL]
[13:41:54,302][INFO][exchange-worker-#28%null%][GridDhtPartitionsExchangeFuture] Finished waiting for partition release future [topVer=AffinityTopologyVersion [topVer=12, minorTopVer=0], waitTime=0ms]
[13:41:54,333][INFO][exchange-worker-#28%null%][GridDhtPartitionsExchangeFuture] Snapshot initialization completed [topVer=AffinityTopologyVersion [topVer=12, minorTopVer=0], time=0ms]
[13:41:54,347][INFO][exchange-worker-#28%null%][time] Finished exchange init [topVer=AffinityTopologyVersion [topVer=12, minorTopVer=0], crd=true]
[13:41:54,350][INFO][exchange-worker-#28%null%][GridCachePartitionExchangeManager] Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion [topVer=12, minorTopVer=0], evt=NODE_JOINED, node=2a929c01-f8a6-4b14-9857-88eaa2b58a87]
[13:41:54,450][INFO][main][IgniteKernal] Performance suggestions for grid (fix if possible)
[13:41:54,451][INFO][main][IgniteKernal] To disable, set -DIGNITE_PERFORMANCE_SUGGESTIONS_DISABLED=true
[13:41:54,451][INFO][main][IgniteKernal] ^-- Disable grid events (remove 'includeEventTypes' from configuration)
[13:41:54,451][INFO][main][IgniteKernal] ^-- Enable G1 Garbage Collector (add '-XX:+UseG1GC' to JVM options)
[13:41:54,451][INFO][main][IgniteKernal] ^-- Set max direct memory size if getting 'OOME: Direct buffer memory' (add '-XX:MaxDirectMemorySize=<size>[g|G|m|M|k|K]' to JVM options)
[13:41:54,451][INFO][main][IgniteKernal] ^-- Disable processing of calls to System.gc() (add '-XX:+DisableExplicitGC' to JVM options)
[13:41:54,451][INFO][main][IgniteKernal] ^-- Speed up flushing of dirty pages by OS (alter vm.dirty_expire_centisecs parameter by setting to 500)
[13:41:54,451][INFO][main][IgniteKernal] ^-- Reduce pages swapping ratio (set vm.swappiness=10)
[13:41:54,451][INFO][main][IgniteKernal] Refer to this page for more performance suggestions: https://apacheignite.readme.io/docs/jvm-and-system-tuning
[13:41:54,451][INFO][main][IgniteKernal]
[13:41:54,451][INFO][main][IgniteKernal] To start Console Management & Monitoring run ignitevisorcmd.{sh|bat}
[13:41:54,451][INFO][main][IgniteKernal]
[13:41:54,459][INFO][main][IgniteKernal]
>>> +----------------------------------------------------------------------+
>>> Ignite ver. 2.1.0#20170720-sha1:a6ca5c8a97e9a4c9d73d40ce76d1504c14ba1940
>>> +----------------------------------------------------------------------+
>>> OS name: Linux 3.10.0-327.el7.x86_64 amd64
>>> CPU(s): 1
>>> Heap: 1.0GB
>>> VM name: 8122#tw.dna.com
>>> Local node [ID=2A929C01-F8A6-4B14-9857-88EAA2B58A87, order=12, clientMode=false]
>>> Local node addresses: [10.0.10.103/0:0:0:0:0:0:0:1%lo, 10.0.2.15/10.0.10.103, /10.0.2.15, /127.0.0.1]
>>> Local ports: TCP:10800 TCP:11211 TCP:47100 UDP:47400 TCP:47500
[13:41:54,462][INFO][main][GridDiscoveryManager] Topology snapshot [ver=12, servers=1, clients=0, CPUs=1, heap=1.0GB]
[13:42:54,444][INFO][grid-timeout-worker-#15%null%][IgniteKernal]
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
^-- Node [id=2a929c01, name=null, uptime=00:01:00:007]
^-- H/N/C [hosts=1, nodes=1, CPUs=1]
^-- CPU [cur=2.33%, avg=1.57%, GC=0%]
^-- PageMemory [pages=200]
^-- Heap [used=107MB, free=89.12%, comm=989MB]
^-- Non heap [used=36MB, free=97.59%, comm=37MB]
^-- Public thread pool [active=0, idle=0, qSize=0]
^-- System thread pool [active=0, idle=6, qSize=0]
^-- Outbound messages queue [size=0]
[13:43:46,444][INFO][disco-event-worker-#27%null%][GridDiscoveryManager] Added new node to topology: TcpDiscoveryNode [id=c8c42745-f838-48ea-9145-5783a6f77681, addrs=[0:0:0:0:0:0:0:1%lo, 10.0.10.101, 10.0.2.15, 127.0.0.1], sockAddrs=[/0:0:0:0:0:0:0:1%lo:0, /127.0.0.1:0, /10.0.10.101:0, /10.0.2.15:0], discPort=0, order=13, intOrder=8, lastExchangeTime=1505328226398, loc=false, ver=2.1.0#20170720-sha1:a6ca5c8a, isClient=true]
[13:43:46,446][INFO][disco-event-worker-#27%null%][GridDiscoveryManager] Topology snapshot [ver=13, servers=1, clients=1, CPUs=2, heap=3.0GB]
[13:43:46,448][INFO][exchange-worker-#28%null%][time] Started exchange init [topVer=AffinityTopologyVersion [topVer=13, minorTopVer=0], crd=true, evt=10, node=TcpDiscoveryNode [id=2a929c01-f8a6-4b14-9857-88eaa2b58a87, addrs=[0:0:0:0:0:0:0:1%lo, 10.0.10.103, 10.0.2.15, 127.0.0.1], sockAddrs=[/10.0.10.103:47500, /10.0.2.15:47500, /0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500], discPort=47500, order=12, intOrder=7, lastExchangeTime=1505328226435, loc=true, ver=2.1.0#20170720-sha1:a6ca5c8a, isClient=false], evtNode=TcpDiscoveryNode [id=2a929c01-f8a6-4b14-9857-88eaa2b58a87, addrs=[0:0:0:0:0:0:0:1%lo, 10.0.10.103, 10.0.2.15, 127.0.0.1], sockAddrs=[/10.0.10.103:47500, /10.0.2.15:47500, /0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500], discPort=47500, order=12, intOrder=7, lastExchangeTime=1505328226435, loc=true, ver=2.1.0#20170720-sha1:a6ca5c8a, isClient=false], customEvt=null]
[13:43:46,448][INFO][exchange-worker-#28%null%][GridDhtPartitionsExchangeFuture] Snapshot initialization completed [topVer=AffinityTopologyVersion [topVer=13, minorTopVer=0], time=0ms]
[13:43:46,449][INFO][exchange-worker-#28%null%][time] Finished exchange init [topVer=AffinityTopologyVersion [topVer=13, minorTopVer=0], crd=true]
[13:43:46,449][INFO][exchange-worker-#28%null%][GridCachePartitionExchangeManager] Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion [topVer=13, minorTopVer=0], evt=NODE_JOINED, node=c8c42745-f838-48ea-9145-5783a6f77681]
[13:43:47,121][INFO][grid-nio-worker-tcp-comm-0-#17%null%][TcpCommunicationSpi] Accepted incoming communication connection [locAddr=/10.0.10.103:47100, rmtAddr=/10.0.10.101:54857]
[13:43:47,357][INFO][exchange-worker-#28%null%][time] Started exchange init [topVer=AffinityTopologyVersion [topVer=13, minorTopVer=1], crd=true, evt=18, node=TcpDiscoveryNode [id=2a929c01-f8a6-4b14-9857-88eaa2b58a87, addrs=[0:0:0:0:0:0:0:1%lo, 10.0.10.103, 10.0.2.15, 127.0.0.1], sockAddrs=[/10.0.10.103:47500, /10.0.2.15:47500, /0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500], discPort=47500, order=12, intOrder=7, lastExchangeTime=1505328227343, loc=true, ver=2.1.0#20170720-sha1:a6ca5c8a, isClient=false], evtNode=TcpDiscoveryNode [id=2a929c01-f8a6-4b14-9857-88eaa2b58a87, addrs=[0:0:0:0:0:0:0:1%lo, 10.0.10.103, 10.0.2.15, 127.0.0.1], sockAddrs=[/10.0.10.103:47500, /10.0.2.15:47500, /0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500], discPort=47500, order=12, intOrder=7, lastExchangeTime=1505328227343, loc=true, ver=2.1.0#20170720-sha1:a6ca5c8a, isClient=false], customEvt=DynamicCacheChangeBatch [id=dcedd8c7e51-9d6cee64-90a5-4c0b-a1ed-b4c7a1697bfb, reqs=[DynamicCacheChangeRequest [cacheName=ignite-sys-atomic-cache#dna-EVENT_DELIVERY_SET, hasCfg=true, nodeId=c8c42745-f838-48ea-9145-5783a6f77681, clientStartOnly=false, stop=false, destroy=false]], exchangeActions=ExchangeActions [startCaches=[ignite-sys-atomic-cache#dna-EVENT_DELIVERY_SET], stopCaches=null, startGrps=[dna-EVENT_DELIVERY_SET], stopGrps=[], resetParts=null, stateChangeRequest=null], startCaches=false]]
[13:43:47,378][INFO][exchange-worker-#28%null%][GridCacheProcessor] Started cache [name=ignite-sys-atomic-cache#dna-EVENT_DELIVERY_SET, group=dna-EVENT_DELIVERY_SET, memoryPolicyName=default, mode=PARTITIONED, atomicity=TRANSACTIONAL]
[13:43:47,379][INFO][exchange-worker-#28%null%][GridDhtPartitionsExchangeFuture] Finished waiting for partition release future [topVer=AffinityTopologyVersion [topVer=13, minorTopVer=1], waitTime=0ms]
[13:43:47,496][INFO][exchange-worker-#28%null%][GridDhtPartitionsExchangeFuture] Snapshot initialization completed [topVer=AffinityTopologyVersion [topVer=13, minorTopVer=1], time=0ms]
[13:43:47,512][INFO][exchange-worker-#28%null%][time] Finished exchange init [topVer=AffinityTopologyVersion [topVer=13, minorTopVer=1], crd=true]
[13:43:47,515][INFO][exchange-worker-#28%null%][GridCachePartitionExchangeManager] Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion [topVer=13, minorTopVer=1], evt=DISCOVERY_CUSTOM_EVT, node=c8c42745-f838-48ea-9145-5783a6f77681]
[13:43:47,558][INFO][exchange-worker-#28%null%][time] Started exchange init [topVer=AffinityTopologyVersion [topVer=13, minorTopVer=2], crd=true, evt=18, node=TcpDiscoveryNode [id=2a929c01-f8a6-4b14-9857-88eaa2b58a87, addrs=[0:0:0:0:0:0:0:1%lo, 10.0.10.103, 10.0.2.15, 127.0.0.1], sockAddrs=[/10.0.10.103:47500, /10.0.2.15:47500, /0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500], discPort=47500, order=12, intOrder=7, lastExchangeTime=1505328227557, loc=true, ver=2.1.0#20170720-sha1:a6ca5c8a, isClient=false], evtNode=TcpDiscoveryNode [id=2a929c01-f8a6-4b14-9857-88eaa2b58a87, addrs=[0:0:0:0:0:0:0:1%lo, 10.0.10.103, 10.0.2.15, 127.0.0.1], sockAddrs=[/10.0.10.103:47500, /10.0.2.15:47500, /0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500], discPort=47500, order=12, intOrder=7, lastExchangeTime=1505328227557, loc=true, ver=2.1.0#20170720-sha1:a6ca5c8a, isClient=false], customEvt=DynamicCacheChangeBatch [id=3dedd8c7e51-9d6cee64-90a5-4c0b-a1ed-b4c7a1697bfb, reqs=[DynamicCacheChangeRequest [cacheName=datastructures_ATOMIC_PARTITIONED_0#dna-EVENT_DELIVERY_SET, hasCfg=true, nodeId=c8c42745-f838-48ea-9145-5783a6f77681, clientStartOnly=false, stop=false, destroy=false]], exchangeActions=ExchangeActions [startCaches=[datastructures_ATOMIC_PARTITIONED_0#dna-EVENT_DELIVERY_SET], stopCaches=null, startGrps=[], stopGrps=[], resetParts=null, stateChangeRequest=null], startCaches=false]]
[13:43:47,597][INFO][exchange-worker-#28%null%][GridCacheProcessor] Started cache [name=datastructures_ATOMIC_PARTITIONED_0#dna-EVENT_DELIVERY_SET, group=dna-EVENT_DELIVERY_SET, memoryPolicyName=default, mode=PARTITIONED, atomicity=ATOMIC]
[13:43:47,597][INFO][exchange-worker-#28%null%][GridDhtPartitionsExchangeFuture] Finished waiting for partition release future [topVer=AffinityTopologyVersion [topVer=13, minorTopVer=2], waitTime=0ms]
[13:43:47,623][INFO][exchange-worker-#28%null%][GridDhtPartitionsExchangeFuture] Snapshot initialization completed [topVer=AffinityTopologyVersion [topVer=13, minorTopVer=2], time=0ms]
[13:43:47,625][INFO][exchange-worker-#28%null%][time] Finished exchange init [topVer=AffinityTopologyVersion [topVer=13, minorTopVer=2], crd=true]
[13:43:47,626][INFO][exchange-worker-#28%null%][GridCachePartitionExchangeManager] Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion [topVer=13, minorTopVer=2], evt=DISCOVERY_CUSTOM_EVT, node=c8c42745-f838-48ea-9145-5783a6f77681]
[13:43:47,915][SEVERE][tcp-disco-msg-worker-#2%null%][TcpDiscoverySpi] Failed to unmarshal discovery custom message.
class org.apache.ignite.IgniteCheckedException: Failed to find class with given class loader for unmarshalling (make sure same versions of all classes are available on all nodes or enable peer-class-loading) [clsLdr=sun.misc.Launcher$AppClassLoader#18b4aac2, cls=scan.fragment.node.ignite.VersionedInterceptor]
at org.apache.ignite.marshaller.jdk.JdkMarshaller.unmarshal0(JdkMarshaller.java:124)
at org.apache.ignite.marshaller.AbstractNodeNameAwareMarshaller.unmarshal(AbstractNodeNameAwareMarshaller.java:94)
at org.apache.ignite.marshaller.jdk.JdkMarshaller.unmarshal0(JdkMarshaller.java:143)
at org.apache.ignite.marshaller.AbstractNodeNameAwareMarshaller.unmarshal(AbstractNodeNameAwareMarshaller.java:82)
at org.apache.ignite.internal.util.IgniteUtils.unmarshal(IgniteUtils.java:9733)
at org.apache.ignite.spi.discovery.tcp.messages.TcpDiscoveryCustomEventMessage.message(TcpDiscoveryCustomEventMessage.java:81)
at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.notifyDiscoveryListener(ServerImpl.java:5436)
at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processCustomMessage(ServerImpl.java:5321)
at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2629)
at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2420)
at org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerAdapter.body(ServerImpl.java:6576)
at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2506)
at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)
Caused by: java.lang.ClassNotFoundException: scan.fragment.node.ignite.VersionedInterceptor
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.ignite.internal.util.IgniteUtils.forName(IgniteUtils.java:8465)
at org.apache.ignite.marshaller.jdk.JdkMarshallerObjectInputStream.resolveClass(JdkMarshallerObjectInputStream.java:54)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1613)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
at java.util.ArrayList.readObject(ArrayList.java:791)
at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1900)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
at org.apache.ignite.marshaller.jdk.JdkMarshaller.unmarshal0(JdkMarshaller.java:121)
... 12 more
Peer class loading is working with Compute Grid [1] only. It looks like your VersionedInterceptor is part of cache configuration (implementation of CacheInterceptor?), such classes have to be explicitly deployed on all nodes prior to cluster start.