Redis 3.2.3 crashed by signal: 11 - redis

We have a big redis node in a cluster, and we're resharding it to other nodes. While doing that, we encountered many times the error reported below, which forced us to manually kill the instance using kill -9 and reload it.
In normal operation the node has been working for months without any problems. The only special thing about it is that there are some very big keys to migrate; for those usually we expected a timeout in the migration, but previously we didn't have this error which forces us to restart the node.
resharding log:
Moving slot 7929 from <source-node>:<port> to <target-node>:<port>: .................$
Moving slot 7930 from <source-node>:<port> to <target-node>:<port>: .................$
Moving slot 7931 from <source-node>:<port> to <target-node>:<port>: .................$
Moving slot 7932 from <source-node>:<port> to <target-node>:<port>: .................$
[ERR] Calling MIGRATE: Connection timed out
redis log:
=== REDIS BUG REPORT START: Cut & paste starting from here ===
29395:M 31 Jan 22:16:03.248 # Redis 3.2.3 crashed by signal: 11
29395:M 31 Jan 22:16:03.248 # Crashed running the instuction at: 0x468d4d
29395:M 31 Jan 22:16:03.248 # Accessing address: (nil)
29395:M 31 Jan 22:16:03.248 # Failed assertion: <no assertion failed> (<no file>:0)
------ STACK TRACE ------
EIP:
/home/cybranding/redis-3.2.3/redis-stable/src/redis-server *:7010 [cluster](migrateCloseSocket+0x5d)[0x468d4d]
Backtrace:
/home/cybranding/redis-3.2.3/redis-stable/src/redis-server *:7010 [cluster](logStackTrace+0x29)[0x45e699]
/home/cybranding/redis-3.2.3/redis-stable/src/redis-server *:7010 [cluster](sigsegvHandler+0xaa)[0x45ebca]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x10340)[0x7ffec32ce340]
/home/cybranding/redis-3.2.3/redis-stable/src/redis-server *:7010 [cluster](migrateCloseSocket+0x5d)[0x468d4d]
/home/cybranding/redis-3.2.3/redis-stable/src/redis-server *:7010 [cluster](migrateCommand+0x6d6)[0x469526]
/home/cybranding/redis-3.2.3/redis-stable/src/redis-server *:7010 [cluster](call+0x85)[0x426fb5]
/home/cybranding/redis-3.2.3/redis-stable/src/redis-server *:7010 [cluster](processCommand+0x367)[0x42a0e7]
/home/cybranding/redis-3.2.3/redis-stable/src/redis-server *:7010 [cluster](processInputBuffer+0x105)[0x436d05]
/home/cybranding/redis-3.2.3/redis-stable/src/redis-server *:7010 [cluster](aeProcessEvents+0x218)[0x421428]
/home/cybranding/redis-3.2.3/redis-stable/src/redis-server *:7010 [cluster](aeMain+0x2b)[0x4216db]
/home/cybranding/redis-3.2.3/redis-stable/src/redis-server *:7010 [cluster](main+0x410)[0x41e690]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7ffec2f19ec5]
/home/cybranding/redis-3.2.3/redis-stable/src/redis-server *:7010 [cluster][0x41e902]
------ INFO OUTPUT ------
# Server
redis_version:3.2.3
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:4992f89db2d932d
redis_mode:cluster
os:Linux 3.13.0-37-generic x86_64
arch_bits:64
multiplexing_api:epoll
gcc_version:4.8.2
process_id:29395
run_id:ea2b51c12beba3394b49febc23e62ddda85ddf7d
tcp_port:7010
uptime_in_seconds:218901
uptime_in_days:2
hz:10
lru_clock:9505952
executable:/home/cybranding/redis-3.2.3/redis-stable/src/redis-server
config_file:/etc/redis_cluster_client2/redis-3.2.3/7010/redis.conf
# Clients
connected_clients:7
client_longest_output_list:0
client_biggest_input_buf:0
blocked_clients:0
# Memory
used_memory:252229967880
used_memory_human:234.91G
used_memory_rss:226196897792
used_memory_rss_human:210.66G
used_memory_peak:261549868192
used_memory_peak_human:243.59G
total_system_memory:405738954752
total_system_memory_human:377.87G
used_memory_lua:37888
used_memory_lua_human:37.00K
maxmemory:0
maxmemory_human:0B
maxmemory_policy:noeviction
mem_fragmentation_ratio:0.90
mem_allocator:jemalloc-4.0.3
# Persistence
loading:0
rdb_changes_since_last_save:2441620072
rdb_bgsave_in_progress:0
rdb_last_save_time:1485682059
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:-1
rdb_current_bgsave_time_sec:-1
aof_enabled:1
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:-1
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_last_write_status:ok
aof_current_size:78198652011
aof_base_size:71141120009
aof_pending_rewrite:0
aof_buffer_length:0
aof_rewrite_buffer_length:0
aof_pending_bio_fsync:0
aof_delayed_fsync:0
# Stats
total_connections_received:50546
total_commands_processed:56092302
instantaneous_ops_per_sec:470
total_net_input_bytes:8820578118
total_net_output_bytes:15465464075
instantaneous_input_kbps:65.53
instantaneous_output_kbps:39.50
rejected_connections:0
sync_full:0
sync_partial_ok:0
sync_partial_err:0
expired_keys:70160
evicted_keys:0
keyspace_hits:25360054
keyspace_misses:16707970
pubsub_channels:0
pubsub_patterns:0
latest_fork_usec:0
migrate_cached_sockets:1
# Replication
role:master
connected_slaves:0
master_repl_offset:0
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0
# CPU
used_cpu_sys:17440.44
used_cpu_user:20364.37
used_cpu_sys_children:0.00
used_cpu_user_children:0.00
# Commandstats
cmdstat_get:calls=3013601,usec=69399477,usec_per_call=23.03
cmdstat_setex:calls=93465,usec=29887340,usec_per_call=319.77
cmdstat_incr:calls=6164797,usec=48002289,usec_per_call=7.79
cmdstat_sadd:calls=6404,usec=171138,usec_per_call=26.72
cmdstat_sismember:calls=1920066,usec=57777586,usec_per_call=30.09
cmdstat_zincrby:calls=29150890,usec=1841876323,usec_per_call=63.18
cmdstat_zrevrange:calls=298356,usec=100612284,usec_per_call=337.22
cmdstat_zscore:calls=2641113,usec=185788620,usec_per_call=70.34
cmdstat_expire:calls=10839539,usec=70516036,usec_per_call=6.51
cmdstat_ping:calls=69,usec=114,usec_per_call=1.65
cmdstat_info:calls=50357,usec=8779396,usec_per_call=174.34
cmdstat_cluster:calls=1034173,usec=262678023,usec_per_call=254.00
cmdstat_migrate:calls=879469,usec=6675930023,usec_per_call=7590.86
cmdstat_command:calls=3,usec=10340,usec_per_call=3446.67
# Cluster
cluster_enabled:1
# Keyspace
db0:keys=141374638,expires=30371,avg_ttl=45048480
hash_init_value: 1485278005
------ CLIENT LIST OUTPUT ------
id=50534 addr=127.0.0.1:43431 fd=10 name= age=828 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=zincrby
id=50535 addr=127.0.0.1:43432 fd=11 name= age=828 idle=7 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=sismember
id=50541 addr=127.0.0.1:58608 fd=16 name= age=346 idle=49 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=get
id=50108 addr=127.0.0.1:55292 fd=18 name= age=39041 idle=6 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=zrevrange
id=50546 addr=127.0.0.1:50654 fd=19 name= age=108 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=migrate
id=50543 addr=127.0.0.1:60246 fd=13 name= age=264 idle=264 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=info
id=50544 addr=127.0.0.1:54748 fd=17 name= age=165 idle=159 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=cluster
------ CURRENT CLIENT INFO ------
id=50546 addr=127.0.0.1:50654 fd=19 name= age=108 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=migrate
argv[0]: 'DEL'
argv[1]: 'nht.tw.urls.dec:쩐다'
------ REGISTERS ------
29395:M 31 Jan 22:16:03.283 #
RAX:00007fcc4d3f30e3 RBX:b8e0a2b8e097b8e0
RCX:0000000000000002 RDX:0000000000000001
RDI:00007fcc4d3f30f9 RSI:00000000004e7857
RBP:00007fdf418f6430 RSP:00007fff08e86490
R8 :0000000000000005 R9 :00007ffec2a0d5a0
R10:0000000000000002 R11:00007ffec2c00180
R12:0000000000000001 R13:00007fc0e9df1d40
R14:0000000000000002 R15:0000000000000000
RIP:0000000000468d4d EFL:0000000000010202
CSGSFS:0000000000000033
29395:M 31 Jan 22:16:03.283 # (00007fff08e8649f) -> 0000000000000058
29395:M 31 Jan 22:16:03.283 # (00007fff08e8649e) -> 0000000004af8cbe
29395:M 31 Jan 22:16:03.284 # (00007fff08e8649d) -> 0000000000000000
29395:M 31 Jan 22:16:03.284 # (00007fff08e8649c) -> 00007fc07adec0f0
29395:M 31 Jan 22:16:03.284 # (00007fff08e8649b) -> 0000000000000000
29395:M 31 Jan 22:16:03.284 # (00007fff08e8649a) -> 00007ffec39f5690
29395:M 31 Jan 22:16:03.284 # (00007fff08e86499) -> 0000000100000000
29395:M 31 Jan 22:16:03.284 # (00007fff08e86498) -> 00007fc000000002
29395:M 31 Jan 22:16:03.284 # (00007fff08e86497) -> 00007fc0ee22c610
29395:M 31 Jan 22:16:03.284 # (00007fff08e86496) -> 00007fec6ad43fa0
29395:M 31 Jan 22:16:03.284 # (00007fff08e86495) -> 00007fc00000000a
29395:M 31 Jan 22:16:03.284 # (00007fff08e86494) -> 00007fc000000000
29395:M 31 Jan 22:16:03.284 # (00007fff08e86493) -> 0000000000469526
29395:M 31 Jan 22:16:03.284 # (00007fff08e86492) -> 0000000000000001
29395:M 31 Jan 22:16:03.284 # (00007fff08e86491) -> 00007fc07adec0f0
29395:M 31 Jan 22:16:03.284 # (00007fff08e86490) -> 00007fc0e9df1d40
------ FAST MEMORY TEST ------
29395:M 31 Jan 22:16:03.284 # Bio thread for job type #0 terminated
29395:M 31 Jan 22:16:03.285 # Bio thread for job type #1 terminated
*** Preparing to test memory region 724000 (94208 bytes)
*** Preparing to test memory region 1907000 (135168 bytes)
*** Preparing to test memory region 7fc053400000 (268131368960 bytes)
*** Preparing to test memory region 7ffec13ff000 (8388608 bytes)
*** Preparing to test memory region 7ffec1c00000 (14680064 bytes)
*** Preparing to test memory region 7ffec2a00000 (4194304 bytes)
*** Preparing to test memory region 7ffec32b9000 (20480 bytes)
*** Preparing to test memory region 7ffec34d8000 (16384 bytes)
*** Preparing to test memory region 7ffec39f5000 (16384 bytes)
*** Preparing to test memory region 7ffec3a01000 (4096 bytes)
*** Preparing to test memory region 7ffec3a02000 (8192 bytes)
*** Preparing to test memory region 7ffec3a06000 (4096 bytes)

Related

Redis connexions problems, Right Configuration?

Good, recently we have a lot of problems between our applications and the redis/sentinel cluster.
the configuration of these parameters is the ideal one?
redis conf:
timeout 0
tcp-backlog 511
tcp-keepalive 300
right now we have about 300+- active clients and 38442 total conexions in the cluster, I can see that some connections have an age of 7-9 hours, is that normal?
Normally, when we encounter these problems we restart the cluster and everything works fine again for an undetermined period of time. But that is not the solution, the devs ensure that the connections are well managed by the multiplexer with the std library to connect to redis by .net.
Any suggestions as to why we might be experiencing these problems?
Some examples of the output of client information (291 total lines):
id=26728 addr=111.222.333.444:51393 laddr=ipClusterRedis:6379 fd=83 name=OT-DEVELOPSERVER age=24767 idle=28 flags=P db=0 sub=4 psub=0 multi=-1 qbuf=0 qbuf-free=0 argv-mem=0 obl=0 oll=0 omem=0 tot-mem=20496 events=r cmd=ping user=default redir=-1
id=27877 addr=222.333.444.555:1817 laddr=ipClusterRedis:6379 fd=256 name=translations-application-v6 age=13859 idle=37 flags=P db=0 sub=1 psub=0 multi=-1 qbuf=0 qbuf-free=0 argv-mem=0 obl=0 oll=0 omem=0 tot-mem=20496 events=r cmd=ping user=default redir=-1
id=26681 addr=333.444.555.666:51377 laddr=ipClusterRedis:6379 fd=36 name=OT-DEVELOPSERVER age=24767 idle=35 flags=P db=0 sub=2 psub=0 multi=-1 qbuf=0 qbuf-free=0 argv-mem=0 obl=0 oll=0 omem=0 tot-mem=20496 events=r cmd=ping user=default redir=-1
id=27743 addr=444.555.666.777:59131 laddr=ipClusterRedis:6379 fd=11 name=offermanager-application-v3 age=22565 idle=29 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 argv-mem=0 obl=0 oll=0 omem=0 tot-mem=20504 events=r cmd=info user=default redir=-1
Example of error taken from one application:
2022-06-15 10:24:52 ERROR Execution FetchedJobsWatcher is in the Failed state now due to an exception, execution will be retried no more than in 00:00:09 StackExchange.Redis.RedisTimeoutException: Timeout performing SMEMBERS (5000ms), next: ZRANGEBYSCORE pluginnamex:hangfireschedule, inst: 1, qu: 0, qs: 50, aw: False, rs: ReadAsync, ws: Idle, in: 0, in-pipe: 0, out-pipe: 0, serverEndpoint: IPredisCluster.myorganization:6379, mc: 1/1/0, mgr: 10 of 10 available, clientName: OT-TESTSERVER, IOCP: (Busy=0,Free=1000,Min=6,Max=1000), WORKER: (Busy=1,Free=32766,Min=6,Max=32767), v: 2.2.88.56325 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)
at StackExchange.Redis.ConnectionMultiplexer.ExecuteSyncImpl[T](Message message, ResultProcessor 1 processor, ServerEndPoint server) in //src/StackExchange.Redis/ConnectionMultiplexer.cs:line 2886
at StackExchange.Redis.RedisBase.ExecuteSync[T](Message message, ResultProcessor1 processor, ServerEndPoint server) in //src/StackExchange.Redis/RedisBase.cs:line 54
at Hangfire.Redis.FetchedJobsWatcher.Execute(CancellationToken cancellationToken)
at Hangfire.Processing.BackgroundExecution.Run(Action`2 callback, Object state)

How to debug redis with memory usage is over 20GB?

I am having a serious issue with Redis, and I cannot get my head around the root cause of the issue, I am running dockerised redis6-alpine and it is working well in some other setups with the exact configurations, any hint/guidance are appreciated! here is a quick view of my configurations:
# Server
redis_version:6.0.14
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:e0ee1c530b6d150b
redis_mode:standalone
os:Linux 4.19.0-16-amd64 x86_64
arch_bits:64
multiplexing_api:epoll
atomicvar_api:atomic-builtin
gcc_version:10.2.1
process_id:1
run_id:79e32727ad7a6eb2a9427060ba44fa918c01de5a
tcp_port:6379
uptime_in_seconds:215118
uptime_in_days:2
hz:10
configured_hz:10
lru_clock:14634210
executable:/data/redis-server
config_file:
io_threads_active:0
# Clients
connected_clients:58
client_recent_max_input_buffer:131072
client_recent_max_output_buffer:786456
blocked_clients:0
tracking_clients:0
clients_in_timeout_table:0
# Memory
used_memory:23814128424
used_memory_human:22.18G
used_memory_rss:25484832768
used_memory_rss_human:23.73G
used_memory_peak:26912783984
used_memory_peak_human:25.06G
used_memory_peak_perc:88.49%
used_memory_overhead:43319344
used_memory_startup:803160
used_memory_dataset:23770809080
used_memory_dataset_perc:99.82%
allocator_allocated:23814697320
allocator_active:25313169408
allocator_resident:25528991744
total_system_memory:135186935808
total_system_memory_human:125.90G
used_memory_lua:37888
used_memory_lua_human:37.00K
used_memory_scripts:0
used_memory_scripts_human:0B
number_of_cached_scripts:0
maxmemory:0
maxmemory_human:0B
maxmemory_policy:noeviction
allocator_frag_ratio:1.06
allocator_frag_bytes:1498472088
allocator_rss_ratio:1.01
allocator_rss_bytes:215822336
rss_overhead_ratio:1.00
rss_overhead_bytes:-44158976
mem_fragmentation_ratio:1.07
mem_fragmentation_bytes:1670633120
mem_not_counted_for_evict:0
mem_replication_backlog:0
mem_clients_slaves:0
mem_clients_normal:1345144
mem_aof_buffer:8
mem_allocator:jemalloc-5.1.0
active_defrag_running:0
lazyfree_pending_objects:0
# Persistence
loading:0
rdb_changes_since_last_save:193999866
rdb_bgsave_in_progress:0
rdb_last_save_time:1625031828
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:-1
rdb_current_bgsave_time_sec:-1
rdb_last_cow_size:0
aof_enabled:1
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:167
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_last_write_status:ok
aof_last_cow_size:565936128
module_fork_in_progress:0
module_fork_last_cow_size:0
aof_current_size:18283706329
aof_base_size:18280419536
aof_pending_rewrite:0
aof_buffer_length:0
aof_rewrite_buffer_length:0
aof_pending_bio_fsync:0
aof_delayed_fsync:0
# Stats
total_connections_received:5857404
total_commands_processed:518486668
instantaneous_ops_per_sec:1821
total_net_input_bytes:177162832735
total_net_output_bytes:8187427886085
instantaneous_input_kbps:432.05
instantaneous_output_kbps:53204.55
rejected_connections:0
sync_full:0
sync_partial_ok:0
sync_partial_err:0
expired_keys:30957
expired_stale_perc:0.00
expired_time_cap_reached_count:0
expire_cycle_cpu_milliseconds:21298
evicted_keys:0
keyspace_hits:372132606
keyspace_misses:10318752
pubsub_channels:0
pubsub_patterns:0
latest_fork_usec:593467
migrate_cached_sockets:0
slave_expires_tracked_keys:0
active_defrag_hits:0
active_defrag_misses:0
active_defrag_key_hits:0
active_defrag_key_misses:0
tracking_total_keys:0
tracking_total_items:0
tracking_total_prefixes:0
unexpected_error_replies:4
total_reads_processed:394752846
total_writes_processed:382030652
io_threaded_reads_processed:0
io_threaded_writes_processed:0
# Replication
role:master
connected_slaves:0
master_replid:2a7e6a090bdaa0474ee666c6c6b555e0c1c3942a
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:0
second_repl_offset:-1
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0
# CPU
used_cpu_sys:12596.074365
used_cpu_user:6090.006186
used_cpu_sys_children:164.531584
used_cpu_user_children:1229.917070
# Modules
# Cluster
cluster_enabled:0
# Keyspace
db0:keys=134616,expires=96904,avg_ttl=777017159
db1:keys=348560,expires=332665,avg_ttl=63989490
redis appendonly.aof file size

Ignite issue when a node in the cluster becomes unstable unable to join the cluster and hangs indefinitely

HI I am facing a critical issue with Ignite in our production servers . We have 2 instances with heap sizes of 8gb each . Sometimes due to long gc pause or network issue one of our instances gets stopped . This causes aws auto-scaling to kick in and bring another instance up . This is fine but we have observed that in tis state the grid becomes unstable and our new ignite instaces are never able to join the topology and hang forever causing new autoscaled instances to come again and again .The workaround for this is to restart other instances in the cluster as doing so causes nodes to join again .But ideally in a prod environment this should happen automatically with auto scaling .
Had also added a longer failuredetection timeout but that also doesnt solve it completely and we still observe this sometimes .
The logs observed on the new instances not coming up is as below .Igite version use is 2.4 and off heap mode is used for partitioned caches .Our grid is setup using tcp discovery service using a s3 bucket .
I have some transactional caches as well which do lock based on
tryLocks.
evtLatch=0, remaining=[a450db0b-ce86-4f0b-a34b-a2f9c83bb3d9], super=GridFutureAdapter [ignoreInterrupts=false, state=INIT, res=null, hash=1272213534]]]
2018-07-18 16:34:10.534 UTC [FDPS] [exchange-worker-#35%fdps%] [WARN ] [,] o.apache.ignite.internal.diagnostic - Failed to wait for partition map exchange [topVer=AffinityTopologyVersion [topVer=32, minorTopVer=0], node=7d5e83aa-736a-4190-8b64-7261db7382f6]. Dumping pending objects that might be the cause:
2018-07-18 16:34:20.534 UTC [FDPS] [exchange-worker-#35%fdps%] [WARN ] [,] o.apache.ignite.internal.diagnostic - Failed to wait for partition map exchange [topVer=AffinityTopologyVersion [topVer=32, minorTopVer=0], node=7d5e83aa-736a-4190-8b64-7261db7382f6]. Dumping pending objects that might be the cause:
2018-07-18 16:34:20.534 UTC [FDPS] [exchange-worker-#35%fdps%] [WARN ] [,] o.apache.ignite.internal.diagnostic - Ready affinity version: AffinityTopologyVersion [topVer=-1, minorTopVer=0]
2018-07-18 16:34:20.535 UTC [FDPS] [exchange-worker-#35%fdps%] [WARN ] [,] o.apache.ignite.internal.diagnostic - Last exchange future: GridDhtPartitionsExchangeFuture [firstDiscoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode [id=7d5e83aa-736a-4190-8b64-7261db7382f6, addrs=[10.83.89.183, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, ip-10-83-89-183.ec2.internal/10.83.89.183:47500], discPort=47500, order=32, intOrder=17, lastExchangeTime=1531931660255, loc=true, ver=2.4.0#20180305-sha1:aa342270, isClient=false], topVer=32, nodeId8=7d5e83aa, msg=null, type=NODE_JOINED, tstamp=1531931329481], crd=TcpDiscoveryNode [id=a450db0b-ce86-4f0b-a34b-a2f9c83bb3d9, addrs=[10.83.87.131, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, ip-10-83-87-131.ec2.internal/10.83.87.131:47500], discPort=47500, order=26, intOrder=14, lastExchangeTime=1531931329258, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], exchId=GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=32, minorTopVer=0], discoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode [id=7d5e83aa-736a-4190-8b64-7261db7382f6, addrs=[10.83.89.183, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, ip-10-83-89-183.ec2.internal/10.83.89.183:47500], discPort=47500, order=32, intOrder=17, lastExchangeTime=1531931660255, loc=true, ver=2.4.0#20180305-sha1:aa342270, isClient=false], topVer=32, nodeId8=7d5e83aa, msg=null, type=NODE_JOINED, tstamp=1531931329481], nodeId=7d5e83aa, evt=NODE_JOINED], added=true, initFut=GridFutureAdapter [ignoreInterrupts=false, state=DONE, res=true, hash=247159314], init=true, lastVer=null, partReleaseFut=PartitionReleaseFuture [topVer=AffinityTopologyVersion [topVer=32, minorTopVer=0], futures=[ExplicitLockReleaseFuture [topVer=AffinityTopologyVersion [topVer=32, minorTopVer=0], futures=[]], TxReleaseFuture [topVer=AffinityTopologyVersion [topVer=32, minorTopVer=0], futures=[]], AtomicUpdateReleaseFuture [topVer=AffinityTopologyVersion [topVer=32, minorTopVer=0], futures=[]], DataStreamerReleaseFuture [topVer=AffinityTopologyVersion [topVer=32, minorTopVer=0], futures=[]]]], exchActions=ExchangeActions [startCaches=null, stopCaches=null, startGrps=[], stopGrps=[], resetParts=null, stateChangeRequest=null], affChangeMsg=null, initTs=1531931329576, centralizedAff=false, changeGlobalStateE=null, done=false, state=SRV, evtLatch=0, remaining=[a450db0b-ce86-4f0b-a34b-a2f9c83bb3d9], super=GridFutureAdapter [ignoreInterrupts=false, state=INIT, res=null, hash=1272213534]]
2018-07-18 16:34:20.535 UTC [FDPS] [exchange-worker-#35%fdps%] [WARN ] [,] o.a.i.i.p.c.GridCachePartitionExchangeManager - First 10 pending exchange futures [total=0]
2018-07-18 16:34:20.535 UTC [FDPS] [exchange-worker-#35%fdps%] [WARN ] [,] o.apache.ignite.internal.diagnostic - Last 10 exchange futures (total: 1):
2018-07-18 16:34:20.536 UTC [FDPS] [exchange-worker-#35%fdps%] [WARN ] [,] o.apache.ignite.internal.diagnostic - >>> GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion [topVer=32, minorTopVer=0], evt=NODE_JOINED, evtNode=TcpDiscoveryNode [id=7d5e83aa-736a-4190-8b64-7261db7382f6, addrs=[10.83.89.183, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, ip-10-83-89-183.ec2.internal/10.83.89.183:47500], discPort=47500, order=32, intOrder=17, lastExchangeTime=1531931660255, loc=true, ver=2.4.0#20180305-sha1:aa342270, isClient=false], done=false]
2018-07-18 16:34:20.536 UTC [FDPS] [exchange-worker-#35%fdps%] [WARN ] [,] o.apache.ignite.internal.diagnostic - Pending transactions:
2018-07-18 16:34:20.536 UTC [FDPS] [exchange-worker-#35%fdps%] [WARN ] [,] o.apache.ignite.internal.diagnostic - Pending explicit locks:
2018-07-18 16:34:20.536 UTC [FDPS] [exchange-worker-#35%fdps%] [WARN ] [,] o.apache.ignite.internal.diagnostic - Pending cache futures:
2018-07-18 16:34:20.536 UTC [FDPS] [exchange-worker-#35%fdps%] [WARN ] [,] o.apache.ignite.internal.diagnostic - Pending atomic cache futures:
2018-07-18 16:34:20.536 UTC [FDPS] [exchange-worker-#35%fdps%] [WARN ] [,] o.apache.ignite.internal.diagnostic - Pending data streamer futures:
2018-07-18 16:34:20.536 UTC [FDPS] [exchange-worker-#35%fdps%] [WARN ] [,] o.apache.ignite.internal.diagnostic - Pending transaction deadlock detection futures:
2018-07-18 16:34:20.547 UTC [FDPS] [grid-nio-worker-tcp-comm-3-#28%fdps%] [INFO ] [,] o.apache.ignite.internal.diagnostic - Exchange future waiting for coordinator response [crd=a450db0b-ce86-4f0b-a34b-a2f9c83bb3d9, topVer=AffinityTopologyVersion [topVer=32, minorTopVer=0]]
Remote node information:
General node info [id=a450db0b-ce86-4f0b-a34b-a2f9c83bb3d9, client=false, discoTopVer=AffinityTopologyVersion [topVer=32, minorTopVer=0], time=12:34:20.537]
Partitions exchange info [readyVer=AffinityTopologyVersion [topVer=29, minorTopVer=0]]
Last initialized exchange future: GridDhtPartitionsExchangeFuture [firstDiscoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode [id=ba6aba6c-7f5d-41bf-bfcc-5eefcad36b62, addrs=[10.83.85.122, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, ip-10-83-85-122.ec2.internal/10.83.85.122:47500], discPort=47500, order=30, intOrder=16, lastExchangeTime=1531930705943, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], topVer=30, nodeId8=a450db0b, msg=Node joined: TcpDiscoveryNode [id=ba6aba6c-7f5d-41bf-bfcc-5eefcad36b62, addrs=[10.83.85.122, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, ip-10-83-85-122.ec2.internal/10.83.85.122:47500], discPort=47500, order=30, intOrder=16, lastExchangeTime=1531930705943, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], type=NODE_JOINED, tstamp=1531930706210], crd=TcpDiscoveryNode [id=a450db0b-ce86-4f0b-a34b-a2f9c83bb3d9, addrs=[10.83.87.131, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, ip-10-83-87-131.ec2.internal/10.83.87.131:47500], discPort=47500, order=26, intOrder=14, lastExchangeTime=1531931660254, loc=true, ver=2.4.0#20180305-sha1:aa342270, isClient=false], exchId=GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=30, minorTopVer=0], discoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode [id=ba6aba6c-7f5d-41bf-bfcc-5eefcad36b62, addrs=[10.83.85.122, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, ip-10-83-85-122.ec2.internal/10.83.85.122:47500], discPort=47500, order=30, intOrder=16, lastExchangeTime=1531930705943, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], topVer=30, nodeId8=a450db0b, msg=Node joined: TcpDiscoveryNode [id=ba6aba6c-7f5d-41bf-bfcc-5eefcad36b62, addrs=[10.83.85.122, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, ip-10-83-85-122.ec2.internal/10.83.85.122:47500], discPort=47500, order=30, intOrder=16, lastExchangeTime=1531930705943, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], type=NODE_JOINED, tstamp=1531930706210], nodeId=ba6aba6c, evt=NODE_JOINED], added=true, initFut=GridFutureAdapter [ignoreInterrupts=false, state=INIT, res=null, hash=1921954756], init=false, lastVer=GridCacheVersion [topVer=0, order=1531930704443, nodeOrder=0], partReleaseFut=PartitionReleaseFuture [topVer=AffinityTopologyVersion [topVer=30, minorTopVer=0], futures=[ExplicitLockReleaseFuture [topVer=AffinityTopologyVersion [topVer=30, minorTopVer=0], futures=[ExplicitLockSpan [topVer=AffinityTopologyVersion [topVer=29, minorTopVer=0], firstCand=GridCacheMvccCandidate [nodeId=a450db0b-ce86-4f0b-a34b-a2f9c83bb3d9, ver=GridCacheVersion [topVer=141782290, order=1547786935479, nodeOrder=26], threadId=39726, id=559000, topVer=AffinityTopologyVersion [topVer=29, minorTopVer=0], reentry=null, otherNodeId=null, otherVer=null, mappedDhtNodes=null, mappedNearNodes=null, ownerVer=null, serOrder=null, key=KeyCacheObjectImpl [part=221, val=49583853497448469294730566354366524577617095530402283666, hasValBytes=false], masks=local=1|owner=0|ready=0|reentry=0|used=0|tx=0|single_implicit=0|dht_local=0|near_local=0|removed=0|read=0, prevVer=null, nextVer=null]], ExplicitLockSpan [topVer=AffinityTopologyVersion [topVer=29, minorTopVer=0], firstCand=GridCacheMvccCandidate [nodeId=a450db0b-ce86-4f0b-a34b-a2f9c83bb3d9, ver=GridCacheVersion [topVer=141782290, order=1547787212113, nodeOrder=26], threadId=39741, id=603904, topVer=AffinityTopologyVersion [topVer=29, minorTopVer=0], reentry=null, otherNodeId=null, otherVer=null, mappedDhtNodes=null, mappedNearNodes=null, ownerVer=null, serOrder=null, key=KeyCacheObjectImpl [part=288, val=49583853499611641578988037213538229804531966271996035234, hasValBytes=false], masks=local=1|owner=0|ready=0|reentry=0|used=0|tx=0|single_implicit=0|dht_local=0|near_local=0|removed=0|read=0, prevVer=null, nextVer=null]], ExplicitLockSpan [topVer=AffinityTopologyVersion [topVer=29, minorTopVer=0], firstCand=GridCacheMvccCandidate [nodeId=a450db0b-ce86-4f0b-a34b-a2f9c83bb3d9, ver=GridCacheVersion [topVer=141782290, order=1547786935487, nodeOrder=26], threadId=39740, id=558993, topVer=AffinityTopologyVersion [topVer=29, minorTopVer=0], reentry=null, otherNodeId=null, otherVer=null, mappedDhtNodes=null, mappedNearNodes=null, ownerVer=null, serOrder=null, key=KeyCacheObjectImpl [part=133, val=49583853497448469294730566354417299462040910024459419794, hasValBytes=false], masks=local=1|owner=0|ready=0|reentry=0|used=0|tx=0|single_implicit=0|dht_local=0|near_local=0|removed=0|read=0, prevVer=null, nextVer=null]], ExplicitLockSpan [topVer=AffinityTopologyVersion [topVer=29, minorTopVer=0], firstCand=GridCacheMvccCandidate [nodeId=a450db0b-ce86-4f0b-a34b-a2f9c83bb3d9, ver=GridCacheVersion [topVer=141782290, order=1547786935323, nodeOrder=26], threadId=39728, id=558949, topVer=AffinityTopologyVersion [topVer=29, minorTopVer=0], reentry=null, otherNodeId=null, otherVer=null, mappedDhtNodes=null, mappedNearNodes=null, ownerVer=null, serOrder=null, key=KeyCacheObjectImpl [part=1023, val=49583853497448469294730566353278491339963927967496667282, hasValBytes=false], masks=local=1|owner=0|ready=0|reentry=0|used=0|tx=0|single_implicit=0|dht_local=0|near_local=0|removed=0|read=0, prevVer=null, nextVer=null]], ExplicitLockSpan [topVer=AffinityTopologyVersion [topVer=29, minorTopVer=0], firstCand=GridCacheMvccCandidate [nodeId=a450db0b-ce86-4f0b-a34b-a2f9c83bb3d9, ver=GridCacheVersion [topVer=141782290, order=1547786935470, nodeOrder=26], threadId=39951, id=559009, topVer=AffinityTopologyVersion [topVer=29, minorTopVer=0], reentry=null, otherNodeId=null, otherVer=null, mappedDhtNodes=null, mappedNearNodes=null, ownerVer=null, serOrder=null, key=KeyCacheObjectImpl [part=556, val=49583853497448469294730566354226289182541798339977937042, hasValBytes=false], masks=local=1|owner=0|ready=0|reentry=0|used=0|tx=0|single_implicit=0|dht_local=0|near_local=0|removed=0|read=0, prevVer=null, nextVer=null]], ExplicitLockSpan [topVer=AffinityTopologyVersion [topVer=29, minorTopVer=0], firstCand=GridCacheMvccCandidate [nodeId=a450db0b-ce86-4f0b-a34b-a2f9c83bb3d9, ver=GridCacheVersion [topVer=141782290, order=1547786935497, nodeOrder=26], threadId=39683, id=558982, topVer=AffinityTopologyVersion [topVer=29, minorTopVer=0], reentry=null, otherNodeId=null, otherVer=null, mappedDhtNodes=null, mappedNearNodes=null, ownerVer=null, serOrder=null, key=KeyCacheObjectImpl [part=373, val=49583853497448469294730566354541818821461216966893109394, hasValBytes=false], masks=local=1|owner=0|ready=0|reentry=0|used=0|tx=0|single_implicit=0|dht_local=0|near_local=0|removed=0|read=0, prevVer=null, nextVer=null]], ExplicitLockSpan [topVer=AffinityTopologyVersion [topVer=29, minorTopVer=0], firstCand=GridCacheMvccCandidate [nodeId=a450db0b-ce86-4f0b-a34b-a2f9c83bb3d9, ver=GridCacheVersion [topVer=141782290, order=1547786935339, nodeOrder=26], threadId=39682, id=558941, topVer=AffinityTopologyVersion [topVer=29, minorTopVer=0], reentry=null, otherNodeId=null, otherVer=null, mappedDhtNodes=null, mappedNearNodes=null, ownerVer=null, serOrder=null, key=KeyCacheObjectImpl [part=156, val=49583853497448469294730566353353444740780034976328450194, hasValBytes=false], masks=local=1|owner=0|ready=0|reentry=0|used=0|tx=0|single_implicit=0|dht_local=0|near_local=0|removed=0|read=0, prevVer=null, nextVer=null]], ExplicitLockSpan [topVer=AffinityTopologyVersion [topVer=29, minorTopVer=0], firstCand=GridCacheMvccCandidate [nodeId=a450db0b-ce86-4f0b-a34b-a2f9c83bb3d9, ver=GridCacheVersion [topVer=141782290, order=1547786935358, nodeOrder=26], threadId=39936, id=558921, topVer=AffinityTopologyVersion [topVer=29, minorTopVer=0], reentry=null, otherNodeId=null, otherVer=null, mappedDhtNodes=null, mappedNearNodes=null, ownerVer=null, serOrder=null, key=KeyCacheObjectImpl [part=59, val=49583853497448469294730566353578304943228356208982229138, hasValBytes=false], masks=local=1|owner=0|ready=0|reentry=0|used=0|tx=0|single_implicit=0|dht_local=0|near_local=0|removed=0|read=0, prevVer=null, nextVer=null]], ExplicitLockSpan [topVer=AffinityTopologyVersion [topVer=29, minorTopVer=0], firstCand=GridCacheMvccCandida... and 48550 skipped ...ead=0, prevVer=null, nextVer=null]], ExplicitLockSpan [topVer=AffinityTopologyVersion [topVer=29, minorTopVer=0], firstCand=GridCacheMvccCandidate [nodeId=a450db0b-ce86-4f0b-a34b-a2f9c83bb3d9, ver=GridCacheVersion [topVer=141782290, order=1547786935486, nodeOrder=26], threadId=39894, id=558992, topVer=AffinityTopologyVersion [topVer=29, minorTopVer=0], reentry=null, otherNodeId=null, otherVer=null, mappedDhtNodes=null, mappedNearNodes=null, ownerVer=null, serOrder=null, key=KeyCacheObjectImpl [part=488, val=49583853497448469294730566354434224423515514832905306258, hasValBytes=false], masks=local=1|owner=0|ready=0|reentry=0|used=0|tx=0|single_implicit=0|dht_local=0|near_local=0|removed=0|read=0, prevVer=null, nextVer=null]], ExplicitLockSpan [topVer=AffinityTopologyVersion [topVer=29, minorTopVer=0], firstCand=GridCacheMvccCandidate [nodeId=a450db0b-ce86-4f0b-a34b-a2f9c83bb3d9, ver=GridCacheVersion [topVer=141782290, order=1547786935331, nodeOrder=26], threadId=39893, id=558948, topVer=AffinityTopologyVersion [topVer=29, minorTopVer=0], reentry=null, otherNodeId=null, otherVer=null, mappedDhtNodes=null, mappedNearNodes=null, ownerVer=null, serOrder=null, key=KeyCacheObjectImpl [part=570, val=49583853497448469294730566353289371672340459630069022866, hasValBytes=false], masks=local=1|owner=0|ready=0|reentry=0|used=0|tx=0|single_implicit=0|dht_local=0|near_local=0|removed=0|read=0, prevVer=null, nextVer=null]]]], TxReleaseFuture [topVer=AffinityTopologyVersion [topVer=30, minorTopVer=0], futures=[]], AtomicUpdateReleaseFuture [topVer=AffinityTopologyVersion [topVer=30, minorTopVer=0], futures=[]], DataStreamerReleaseFuture [topVer=AffinityTopologyVersion [topVer=30, minorTopVer=0], futures=[]]]], exchActions=null, affChangeMsg=null, initTs=1531930706210, centralizedAff=false, changeGlobalStateE=null, done=false, state=CRD, evtLatch=0, remaining=[ba6aba6c-7f5d-41bf-bfcc-5eefcad36b62], super=GridFutureAdapter [ignoreInterrupts=false, state=INIT, res=null, hash=325602672]]
Communication SPI statistics [rmtNode=7d5e83aa-736a-4190-8b64-7261db7382f6]
Communication SPI recovery descriptors:
[key=ConnectionKey [nodeId=7d5e83aa-736a-4190-8b64-7261db7382f6, idx=0, connCnt=0], msgsSent=5, msgsAckedByRmt=0, msgsRcvd=7, lastAcked=0, reserveCnt=1, descIdHash=1972345954]
Communication SPI clients:
[node=7d5e83aa-736a-4190-8b64-7261db7382f6, client=GridTcpNioCommunicationClient [ses=GridSelectorNioSessionImpl [worker=DirectNioClientWorker [super=AbstractNioClientWorker [idx=3, bytesRcvd=5740, bytesSent=77322, bytesRcvd0=853, bytesSent0=0, select=true, super=GridWorker [name=grid-nio-worker-tcp-comm-3, igniteInstanceName=fdps, finished=false, hashCode=2068348067, interrupted=false, runner=grid-nio-worker-tcp-comm-3-#28%fdps%]]], writeBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], inRecovery=GridNioRecoveryDescriptor [acked=0, resendCnt=0, rcvCnt=7, sentCnt=5, reserved=true, lastAck=0, nodeLeft=false, node=TcpDiscoveryNode [id=7d5e83aa-736a-4190-8b64-7261db7382f6, addrs=[10.83.89.183, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, ip-10-83-89-183.ec2.internal/10.83.89.183:47500], discPort=47500, order=32, intOrder=17, lastExchangeTime=1531931329178, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], connected=true, connectCnt=0, queueLimit=262144, reserveCnt=1, pairedConnections=false], outRecovery=GridNioRecoveryDescriptor [acked=0, resendCnt=0, rcvCnt=7, sentCnt=5, reserved=true, lastAck=0, nodeLeft=false, node=TcpDiscoveryNode [id=7d5e83aa-736a-4190-8b64-7261db7382f6, addrs=[10.83.89.183, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, ip-10-83-89-183.ec2.internal/10.83.89.183:47500], discPort=47500, order=32, intOrder=17, lastExchangeTime=1531931329178, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], connected=true, connectCnt=0, queueLimit=262144, reserveCnt=1, pairedConnections=false], super=GridNioSessionImpl [locAddr=/10.83.87.131:47100, rmtAddr=/10.83.89.183:34664, createTime=1531931330498, closeTime=0, bytesSent=77322, bytesRcvd=5740, bytesSent0=0, bytesRcvd0=853, sndSchedTime=1531931330498, lastSndTime=1531931500547, lastRcvTime=1531931660527, readsPaused=false, filterChain=FilterChain[filters=[GridNioCodecFilter [parser=org.apache.ignite.internal.util.nio.GridDirectParser#665c2413, directMode=true], GridConnectionBytesVerifyFilter], accepted=true]], super=GridAbstractCommunicationClient [lastUsed=1531931330508, closed=false, connIdx=0]]]
NIO sessions statistics:
>> Selector info [idx=3, keysCnt=1, bytesRcvd=5740, bytesRcvd0=853, bytesSent=77322, bytesSent0=0]
Connection info [in=true, rmtAddr=/10.83.89.183:34664, locAddr=/10.83.87.131:47100, msgsSent=5, msgsAckedByRmt=0, descIdHash=1972345954, unackedMsgs=[IgniteDiagnosticMessage, IgniteDiagnosticMessage, IgniteDiagnosticMessage, IgniteDiagnosticMessage, IgniteDiagnosticMessage], msgsRcvd=7, lastAcked=0, descIdHash=1972345954, bytesRcvd=5740, bytesRcvd0=853, bytesSent=77322, bytesSent0=0, opQueueSize=0]
Exchange future: GridDhtPartitionsExchangeFuture [firstDiscoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode [id=7d5e83aa-736a-4190-8b64-7261db7382f6, addrs=[10.83.89.183, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, ip-10-83-89-183.ec2.internal/10.83.89.183:47500], discPort=47500, order=32, intOrder=17, lastExchangeTime=1531931329178, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], topVer=32, nodeId8=a450db0b, msg=Node joined: TcpDiscoveryNode [id=7d5e83aa-736a-4190-8b64-7261db7382f6, addrs=[10.83.89.183, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, ip-10-83-89-183.ec2.internal/10.83.89.183:47500], discPort=47500, order=32, intOrder=17, lastExchangeTime=1531931329178, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], type=NODE_JOINED, tstamp=1531931329402], crd=null, exchId=GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=32, minorTopVer=0], discoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode [id=7d5e83aa-736a-4190-8b64-7261db7382f6, addrs=[10.83.89.183, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, ip-10-83-89-183.ec2.internal/10.83.89.183:47500], discPort=47500, order=32, intOrder=17, lastExchangeTime=1531931329178, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], topVer=32, nodeId8=a450db0b, msg=Node joined: TcpDiscoveryNode [id=7d5e83aa-736a-4190-8b64-7261db7382f6, addrs=[10.83.89.183, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, ip-10-83-89-183.ec2.internal/10.83.89.183:47500], discPort=47500, order=32, intOrder=17, lastExchangeTime=1531931329178, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], type=NODE_JOINED, tstamp=1531931329402], nodeId=7d5e83aa, evt=NODE_JOINED], added=true, initFut=GridFutureAdapter [ignoreInterrupts=false, state=INIT, res=null, hash=980776600], init=false, lastVer=GridCacheVersion [topVer=0, order=1531931327875, nodeOrder=0], partReleaseFut=null, exchActions=null, affChangeMsg=null, initTs=0, centralizedAff=false, changeGlobalStateE=null, done=false, state=null, evtLatch=0, remaining=[], super=GridFutureAdapter [ignoreInterrupts=false, state=INIT, res=null, hash=2138568466]]
Local communication statistics:
Communication SPI statistics [rmtNode=a450db0b-ce86-4f0b-a34b-a2f9c83bb3d9]
Communication SPI recovery descriptors:
[key=ConnectionKey [nodeId=a450db0b-ce86-4f0b-a34b-a2f9c83bb3d9, idx=0, connCnt=-1], msgsSent=7, msgsAckedByRmt=0, msgsRcvd=6, lastAcked=0, reserveCnt=1, descIdHash=1891649612]
Communication SPI clients:
Communication SPI clients:
[node=a450db0b-ce86-4f0b-a34b-a2f9c83bb3d9, client=GridTcpNioCommunicationClient [ses=GridSelectorNioSessionImpl [worker=DirectNioClientWorker [super=AbstractNioClientWorker [idx=0, bytesRcvd=92833, bytesSent=5698, bytesRcvd0=15539, bytesSent0=853, select=true, super=GridWorker [name=grid-nio-worker-tcp-comm-0, igniteInstanceName=fdps, finished=false, hashCode=2040212682, interrupted=false, runner=grid-nio-worker-tcp-comm-0-#25%fdps%]]], writeBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], readBuf=java.nio.DirectByteBuffer[pos=0 lim=32768 cap=32768], inRecovery=GridNioRecoveryDescriptor [acked=0, resendCnt=0, rcvCnt=6, sentCnt=7, reserved=true, lastAck=0, nodeLeft=false, node=TcpDiscoveryNode [id=a450db0b-ce86-4f0b-a34b-a2f9c83bb3d9, addrs=[10.83.87.131, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, ip-10-83-87-131.ec2.internal/10.83.87.131:47500], discPort=47500, order=26, intOrder=14, lastExchangeTime=1531931329258, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], connected=false, connectCnt=1, queueLimit=262144, reserveCnt=1, pairedConnections=false], outRecovery=GridNioRecoveryDescriptor [acked=0, resendCnt=0, rcvCnt=6, sentCnt=7, reserved=true, lastAck=0, nodeLeft=false, node=TcpDiscoveryNode [id=a450db0b-ce86-4f0b-a34b-a2f9c83bb3d9, addrs=[10.83.87.131, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, ip-10-83-87-131.ec2.internal/10.83.87.131:47500], discPort=47500, order=26, intOrder=14, lastExchangeTime=1531931329258, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], connected=false, connectCnt=1, queueLimit=262144, reserveCnt=1, pairedConnections=false], super=GridNioSessionImpl [locAddr=/10.83.89.183:34664, rmtAddr=ip-10-83-87-131.ec2.internal/10.83.87.131:47100, createTime=1531931330468, closeTime=0, bytesSent=5698, bytesRcvd=92833, bytesSent0=853, bytesRcvd0=15539, sndSchedTime=1531931330468, lastSndTime=1531931660528, lastRcvTime=1531931660538, readsPaused=false, filterChain=FilterChain[filters=[GridNioCodecFilter [parser=org.apache.ignite.internal.util.nio.GridDirectParser#72024a61, directMode=true], GridConnectionBytesVerifyFilter], accepted=false]], super=GridAbstractCommunicationClient [lastUsed=1531931330468, closed=false, connIdx=0]]]
NIO sessions statistics:
>> Selector info [idx=0, keysCnt=1, bytesRcvd=92833, bytesRcvd0=15539, bytesSent=5698, bytesSent0=853]
Connection info [in=false, rmtAddr=ip-10-83-87-131.ec2.internal/10.83.87.131:47100, locAddr=/10.83.89.183:34664, msgsSent=7, msgsAckedByRmt=0, descIdHash=1891649612, unackedMsgs=[GridDhtPartitionsSingleMessage, IgniteDiagnosticMessage, IgniteDiagnosticMessage, IgniteDiagnosticMessage, IgniteDiagnosticMessage], msgsRcvd=6, lastAcked=0, descIdHash=1891649612, bytesRcvd=92833, bytesRcvd0=15539, bytesSent=5698, bytesSent0=853, opQueueSize=0]
2018-07-18 16:34:29.598 UTC [FDPS] [localhost-startStop-1] [WARN ] [,] o.a.i.i.p.c.GridCachePartitionExchangeManager - Still waiting for initial partition map exchange [fut=GridDhtPartitionsExchangeFuture [firstDiscoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode [id=7d5e83aa-736a-4190-8b64-7261db7382f6, addrs=[10.83.89.183, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, ip-10-83-89-183.ec2.internal/10.83.89.183:47500], discPort=47500, order=32, intOrder=17, lastExchangeTime=1531931669507, loc=true, ver=2.4.0#20180305-sha1:aa342270, isClient=false], topVer=32, nodeId8=7d5e83aa, msg=null, type=NODE_JOINED, tstamp=1531931329481], crd=TcpDiscoveryNode [id=a450db0b-ce86-4f0b-a34b-a2f9c83bb3d9, addrs=[10.83.87.131, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, ip-10-83-87-131.ec2.internal/10.83.87.131:47500], discPort=47500, order=26, intOrder=14, lastExchangeTime=1531931329258, loc=false, ver=2.4.0#20180305-sha1:aa342270, isClient=false], exchId=GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=32, minorTopVer=0], discoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode [id=7d5e83aa-736a-4190-8b64-7261db7382f6, addrs=[10.83.89.183, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, ip-10-83-89-183.ec2.internal/10.83.89.183:47500], discPort=47500, order=32, intOrder=17, lastExchangeTime=1531931669507, loc=true, ver=2.4.0#20180305-sha1:aa342270, isClient=false], topVer=32, nodeId8=7d5e83aa, msg=null, type=NODE_JOINED, tstamp=1531931329481], nodeId=7d5e83aa, evt=NODE_JOINED], added=true, initFut=GridFutureAdapter [ignoreInterrupts=false, state=DONE, res=true, hash=247159314], init=true, lastVer=null, partReleaseFut=PartitionReleaseFuture [topVer=AffinityTopologyVersion [topVer=32, minorTopVer=0], futures=[ExplicitLockReleaseFuture [topVer=AffinityTopologyVersion [topVer=32, minorTopVer=0], futures=[]], TxReleaseFuture [topVer=AffinityTopologyVersion [topVer=32, minorTopVer=0], futures=[]], AtomicUpdateReleaseFuture [topVer=AffinityTopologyVersion [topVer=32, minorTopVer=0], futures=[]], DataStreamerReleaseFuture [topVer=AffinityTopologyVersion [topVer=32, minorTopVer=0], futures=[]]]], exchActions=ExchangeActions [startCaches=null, stopCaches=null, startGrps=[], stopGrps=[], resetParts=null, stateChangeRequest=null], affChangeMsg=null, initTs=1531931329576, centralizedAff=false, changeGlobalStateE=null, done=false, state=SRV, evtLatch=0, remaining=[a450db0b-ce86-4f0b-a34b-a2f9c83bb3d9], super=GridFutureAdapter [ignoreInterrupts=false, state=INIT, res=null, hash=1272213534]]]
2018-07-18 16:34:30.537 UTC [FDPS] [exchange-worker-#35%fdps%] [WARN ] [,] o.apache.ignite.internal.diagnostic - Failed to wait for partition map exchange [topVer=AffinityTopologyVersion [topVer=32, minorTopVer=0], node=7d5e83aa-736a-4190-8b64-7261db7382f6]. Dumping pending objects that might be the cause:
2018-07-18 16:34:40.537 UTC [FDPS] [exchange-worker-#35%fdps%] [WARN ] [,] o.apache.ignite.internal.diagnostic - Failed to wait for partition map exchange [topVer=AffinityTopologyVersion [topVer=32, minorTopVer=0], node=7d5e83aa-736a-4190-8b64-7261db7382f6]. Dumping pending objects that might be the cause:
Info about the other node 10-83-85-122
The other joining node never got started and was stuck in the ignite start phase . The logs also dont show the node to get up or the ip discovery to get kicked in . to eventually cause the node to be removed via autoscaling .
Transactional errors received
javax.cache.CacheException: Failed to acquire lock for keys (primary node left grid, retry transaction if possible) [keys=[UserKeyCacheObjectImpl [part=281,
Partition map exchange is a process of exchanging information between nodes where each piece of data is stored. It happens every time, when topology changes.
Every node sends a GridDhtPartitionsSingleMessage to a coordinator. Once the coordinator collected all such messages, it sends GridDhtPartitionsFullMessage back to other nodes. These messages are sent over communication SPI.
But if some of non-coordinator nodes don't send the SingleMessage to the coordinator, or if the coordinator doesn't send the FullMessage, then "Failed to wait for partition map exchange" error occurs.
Judging by the piece of log, that you provided, a node with ID=ba6aba6c didn't send the SingleMessage to the coordinator. It may mean, that communication SPI doesn't work there properly. Make sure, that ports, that are required for communication SPI are available. Usually it's 47100..47200.
Also joining node may be stuck on something. Look at its log to figure out, what happens there.

Ignite Cluster getting stuck when new node Join or release

I have 3 node cluster with 20+ client and it's running in spark context.Initially it working fine but randomly get issue whenever new node i.e. client try to connect with cluster.The cluster getting inoperative.I have got following logs when its stuck.If I restart any Ignite server explicitly then its release and work fine.I have use Ignite 2.4.0 version. same issue produced in Ignite 2.5.0 version too.
client side Logs
Failed to wait for partition map exchange [topVer=AffinityTopologyVersion [topVer=44, minorTopVer=0], node=4d885cfd-45ed-43a2-8088-f35c9469797f]. Dumping pending objects that might be the cause:
GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion [topVer=44, minorTopVer=0], evt=NODE_JOINED, evtNode=TcpDiscoveryNode [id=4d885cfd-45ed-43a2-8088-f35c9469797f, addrs=[0:0:0:0:0:0:0:1%lo, 10.13.10.179, 127.0.0.1], sockAddrs=[/0:0:0:0:0:0:0:1%lo:0, /127.0.0.1:0, hdn6.mstorm.com/10.13.10.179:0], discPort=0, order=44, intOrder=0, lastExchangeTime=1527651620413, loc=true, ver=2.4.0#20180305-sha1:aa342270, isClient=true], done=false]
Failed to wait for partition map exchange [topVer=AffinityTopologyVersion [topVer=44, minorTopVer=0], node=4d885cfd-45ed-43a2-8088-f35c9469797f]. Dumping pending objects that might be the cause:
GridDhtPartitionsExchangeFuture [topVer=AffinityTopologyVersion [topVer=44, minorTopVer=0], evt=NODE_JOINED, evtNode=TcpDiscoveryNode [id=4d885cfd-45ed-43a2-8088-f35c9469797f, addrs=[0:0:0:0:0:0:0:1%lo, 10.13.10.179, 127.0.0.1], sockAddrs=[/0:0:0:0:0:0:0:1%lo:0, /127.0.0.1:0, hdn6.mstorm.com/10.13.10.179:0], discPort=0, order=44, intOrder=0, lastExchangeTime=1527651620413, loc=true, ver=2.4.0#20180305-sha1:aa342270, isClient=true], done=false]
Failed to wait for initial partition map exchange. Possible reasons are:
^-- Transactions in deadlock.
^-- Long running transactions (ignore if this is the case).
^-- Unreleased explicit locks.
Still waiting for initial partition map exchange [fut=GridDhtPartitionsExchangeFuture [firstDiscoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode [id=4d885cfd-45ed-43a2-8088-f35c9469797f, addrs=
Server Side Logs
Possible starvation in striped pool. Thread name: sys-stripe-0-#1 Queue: [Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0, skipOnTimeout=false, msg=GridDhtTxPrepareResponse [nearEvicted=null, futId=869dd4ca361-fe7e167d-4d80-4f57-b004-13359a9f2c11, miniId=1, super=GridDistributedTxPrepareResponse [txState=null, part=-1, err=null, super=GridDistributedBaseMessage [ver=GridCacheVersion [topVer=139084030, order=1527604094903, nodeOrder=1], committedVers=null, rolledbackVers=null, cnt=0, super=GridCacheIdMessage [cacheId=0]]]]]], Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0, skipOnTimeout=false, msg=GridDhtAtomicSingleUpdateRequest [key=KeyCacheObjectImpl [part=984, val=null, hasValBytes=true], val=BinaryObjectImpl [arr= true, ctx=false, start=0], prevVal=null, super=GridDhtAtomicAbstractUpdateRequest [onRes=false, nearNodeId=null, nearFutId=0, flags=]]]], o.a.i.i.processors.cache.distributed.dht.atomic.GridDhtAtomicCache$DeferredUpdateTimeout#2735c674, Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0, skipOnTimeout=false, msg=GridDhtTxPrepareRequest [nearNodeId=628e3078-17fd-4e49-b9ae-ad94ad97a2f1, futId=6576e4ca361-6e7cdac2-d5a3-4624-9ad3-b93f25546cc3, miniId=1, topVer=AffinityTopologyVersion [topVer=20, minorTopVer=0], invalidateNearEntries={}, nearWrites=null, owned=null, nearXidVer=GridCacheVersion [topVer=139084030, order=1527604094933, nodeOrder=2], subjId=628e3078-17fd-4e49-b9ae-ad94ad97a2f1, taskNameHash=0, preloadKeys=null, super=GridDistributedTxPrepareRequest [threadId=86, concurrency=OPTIMISTIC, isolation=READ_COMMITTED, writeVer=GridCacheVersion [topVer=139084030, order=1527604094935, nodeOrder=2], timeout=0, reads=null, writes=[IgniteTxEntry [key=BinaryObjectImpl [arr= true, ctx=false, start=0], cacheId=-1755241537, txKey=null, val=[op=UPDATE, val=BinaryObjectImpl [arr= true, ctx=false, start=0]], prevVal=[op=NOOP, val=null], oldVal=[op=NOOP, val=null], entryProcessorsCol=null, ttl=-1, conflictExpireTime=-1, conflictVer=null, explicitVer=null, dhtVer=null, filters=null, filtersPassed=false, filtersSet=false, entry=null, prepared=0, locked=false, nodeId=null, locMapped=false, expiryPlc=null, transferExpiryPlc=false, flags=0, partUpdateCntr=0, serReadVer=null, xidVer=null]], dhtVers=null, txSize=0, plc=2, txState=null, flags=onePhase|last, super=GridDistributedBaseMessage [ver=GridCacheVersion [topVer=139084030, order=1527604094933, nodeOrder=2], committedVers=null, rolledbackVers=null, cnt=0, super=GridCacheIdMessage [cacheId=0]]]]]], Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0, skipOnTimeout=false, msg=GridDhtAtomicDeferredUpdateResponse [futIds=GridLongList [idx=2, arr=[65774,65775]]]]], Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0, skipOnTimeout=false, msg=GridNearAtomicSingleUpdateRequest [key=KeyCacheObjectImpl [part=1016, val=null, hasValBytes=true], parent=GridNearAtomicAbstractSingleUpdateRequest [nodeId=null, futId=49328, topVer=AffinityTopologyVersion [topVer=20, minorTopVer=0], parent=GridNearAtomicAbstractUpdateRequest [res=null, flags=needRes]]]]], Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0, skipOnTimeout=false, msg=GridDhtAtomicDeferredUpdateResponse [futIds=GridLongList [idx=1, arr=[98591]]]]], Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0, skipOnTimeout=false, msg=GridDhtAtomicDeferredUpdateResponse [futIds=GridLongList [idx=1, arr=[114926]]]]], Message closure [msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0, skipOnTimeout=false, msg=GridNearAtomicSingleUpdateRequest [key=KeyCacheObjectImpl [part=1016, val=null, hasValBytes=true], parent=GridNearAtomicAbstractSingleUpdateRequest [nodeId=null, futId=32946, topVer=AffinityTopologyVersion [topVer=20, minorTopVer=0], parent=GridNear

org.apache.ignite.IgniteCheckedException: Failed to find class with given class loader for unmarshalling

Using Ignite 2.1, I start first node in default server mode with peer class loading enabled from command line. I see the following line in the logs:
When I start the second node (using IgniteSpringBean on a tomcat server, in client mode) I am getting the following error, even though peer class loading is enabled:
org.apache.ignite.IgniteCheckedException: Failed to find class with given class loader for unmarshalling (make sure same versions of all classes are available on all nodes or enable peer-class-loading) [clsLdr=sun.misc.Launcher$AppClassLoader#18b4aac2,...
Visor tells me that both the server and the client node are in the topology and both have peer class loading enabled...
Server logs:
[vagrant#tw apache-ignite-fabric-2.1.0-bin]$ ./bin/ignite.sh ./config/example-default.xml -v
Ignite Command Line Startup, ver. 2.1.0#20170720-sha1:a6ca5c8a
2017 Copyright(C) Apache Software Foundation
[13:41:51,967][INFO][main][IgniteKernal]
>>> __________ ________________
>>> / _/ ___/ |/ / _/_ __/ __/
>>> _/ // (7 7 // / / / / _/
>>> /___/\___/_/|_/___/ /_/ /___/
>>>
>>> ver. 2.1.0#20170720-sha1:a6ca5c8a
>>> 2017 Copyright(C) Apache Software Foundation
>>>
>>> Ignite documentation: http://ignite.apache.org
[13:41:51,967][INFO][main][IgniteKernal] Config URL: file:/home/vagrant/ignite/apache-ignite-fabric-2.1.0-bin/./config/example-default.xml
[13:41:51,968][INFO][main][IgniteKernal] Daemon mode: off
[13:41:51,968][INFO][main][IgniteKernal] OS: Linux 3.10.0-327.el7.x86_64 amd64
[13:41:51,968][INFO][main][IgniteKernal] OS user: vagrant
[13:41:51,968][INFO][main][IgniteKernal] PID: 8122
[13:41:51,968][INFO][main][IgniteKernal] Language runtime: Java Platform API Specification ver. 1.8
[13:41:51,968][INFO][main][IgniteKernal] VM information: Java(TM) SE Runtime Environment 1.8.0_60-b27 Oracle Corporation Java HotSpot(TM) 64-Bit Server VM 25.60-b23
[13:41:51,970][INFO][main][IgniteKernal] VM total memory: 0.97GB
[13:41:51,970][INFO][main][IgniteKernal] Remote Management [restart: on, REST: on, JMX (remote: on, port: 49122, auth: off, ssl: off)]
[13:41:51,970][INFO][main][IgniteKernal] IGNITE_HOME=/home/vagrant/ignite/apache-ignite-fabric-2.1.0-bin
[13:41:51,971][INFO][main][IgniteKernal] VM arguments: [-Xms1g, -Xmx1g, -XX:+AggressiveOpts, -XX:MaxMetaspaceSize=256m, -DIGNITE_QUIET=false, -DIGNITE_SUCCESS_FILE=/home/vagrant/ignite/apache-ignite-fabric-2.1.0-bin/work/ignite_success_96df797d-5531-4b3e-b396-5f44cdc1470e, -Dcom.sun.management.jmxremote, -Dcom.sun.management.jmxremote.port=49122, -Dcom.sun.management.jmxremote.authenticate=false, -Dcom.sun.management.jmxremote.ssl=false, -DIGNITE_HOME=/home/vagrant/ignite/apache-ignite-fabric-2.1.0-bin, -DIGNITE_PROG_NAME=./bin/ignite.sh]
[13:41:51,973][INFO][main][IgniteKernal] System cache's MemoryPolicy size is configured to 40 MB. Use MemoryConfiguration.systemCacheMemorySize property to change the setting.
[13:41:51,980][INFO][main][IgniteKernal] Configured caches [in 'sysMemPlc' memoryPolicy: ['ignite-sys-cache']]
[13:41:51,980][WARNING][main][IgniteKernal] Peer class loading is enabled (disable it in production for performance and deployment consistency reasons)
[13:41:52,002][INFO][main][IgniteKernal] 3-rd party licenses can be found at: /home/vagrant/ignite/apache-ignite-fabric-2.1.0-bin/libs/licenses
[13:41:52,077][INFO][main][IgnitePluginProcessor] Configured plugins:
[13:41:52,078][INFO][main][IgnitePluginProcessor] ^-- None
[13:41:52,078][INFO][main][IgnitePluginProcessor]
[13:41:52,138][INFO][main][TcpCommunicationSpi] Successfully bound communication NIO server to TCP port [port=47100, locHost=0.0.0.0/0.0.0.0, selectorsCnt=4, selectorSpins=0, pairedConn=false]
[13:41:52,150][WARNING][main][TcpCommunicationSpi] Message queue limit is set to 0 which may lead to potential OOMEs when running cache operations in FULL_ASYNC or PRIMARY_SYNC modes due to message queues growth on sender and receiver sides.
[13:41:52,169][WARNING][main][NoopCheckpointSpi] Checkpoints are disabled (to enable configure any GridCheckpointSpi implementation)
[13:41:52,196][WARNING][main][GridCollisionManager] Collision resolution is disabled (all jobs will be activated upon arrival).
[13:41:52,197][INFO][main][IgniteKernal] Security status [authentication=off, tls/ssl=off]
[13:41:52,516][INFO][main][SqlListenerProcessor] SQL connector processor has started on TCP port 10800
[13:41:52,550][INFO][main][GridTcpRestProtocol] Command protocol successfully started [name=TCP binary, host=0.0.0.0/0.0.0.0, port=11211]
[13:41:52,593][INFO][main][IgniteKernal] Non-loopback local IPs: 10.0.10.103, 10.0.2.15, fe80:0:0:0:a00:27ff:fe51:d0d8%eth0, fe80:0:0:0:a00:27ff:fee7:1d4f%eth1
[13:41:52,593][INFO][main][IgniteKernal] Enabled local MACs: 08002751D0D8, 080027E71D4F
[13:41:52,637][INFO][main][TcpDiscoverySpi] Successfully bound to TCP port [port=47500, localHost=0.0.0.0/0.0.0.0, locNodeId=2a929c01-f8a6-4b14-9857-88eaa2b58a87]
[13:41:54,030][INFO][exchange-worker-#28%null%][time] Started exchange init [topVer=AffinityTopologyVersion [topVer=12, minorTopVer=0], crd=true, evt=10, node=TcpDiscoveryNode [id=2a929c01-f8a6-4b14-9857-88eaa2b58a87, addrs=[0:0:0:0:0:0:0:1%lo, 10.0.10.103, 10.0.2.15, 127.0.0.1], sockAddrs=[/10.0.10.103:47500, /10.0.2.15:47500, /0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500], discPort=47500, order=12, intOrder=7, lastExchangeTime=1505328114016, loc=true, ver=2.1.0#20170720-sha1:a6ca5c8a, isClient=false], evtNode=TcpDiscoveryNode [id=2a929c01-f8a6-4b14-9857-88eaa2b58a87, addrs=[0:0:0:0:0:0:0:1%lo, 10.0.10.103, 10.0.2.15, 127.0.0.1], sockAddrs=[/10.0.10.103:47500, /10.0.2.15:47500, /0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500], discPort=47500, order=12, intOrder=7, lastExchangeTime=1505328114016, loc=true, ver=2.1.0#20170720-sha1:a6ca5c8a, isClient=false], customEvt=null]
[13:41:54,042][WARNING][exchange-worker-#28%null%][IgniteCacheDatabaseSharedManager] No user-defined default MemoryPolicy found; system default of 1GB size will be used.
[13:41:54,299][INFO][exchange-worker-#28%null%][GridCacheProcessor] Started cache [name=ignite-sys-cache, memoryPolicyName=sysMemPlc, mode=REPLICATED, atomicity=TRANSACTIONAL]
[13:41:54,302][INFO][exchange-worker-#28%null%][GridDhtPartitionsExchangeFuture] Finished waiting for partition release future [topVer=AffinityTopologyVersion [topVer=12, minorTopVer=0], waitTime=0ms]
[13:41:54,333][INFO][exchange-worker-#28%null%][GridDhtPartitionsExchangeFuture] Snapshot initialization completed [topVer=AffinityTopologyVersion [topVer=12, minorTopVer=0], time=0ms]
[13:41:54,347][INFO][exchange-worker-#28%null%][time] Finished exchange init [topVer=AffinityTopologyVersion [topVer=12, minorTopVer=0], crd=true]
[13:41:54,350][INFO][exchange-worker-#28%null%][GridCachePartitionExchangeManager] Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion [topVer=12, minorTopVer=0], evt=NODE_JOINED, node=2a929c01-f8a6-4b14-9857-88eaa2b58a87]
[13:41:54,450][INFO][main][IgniteKernal] Performance suggestions for grid (fix if possible)
[13:41:54,451][INFO][main][IgniteKernal] To disable, set -DIGNITE_PERFORMANCE_SUGGESTIONS_DISABLED=true
[13:41:54,451][INFO][main][IgniteKernal] ^-- Disable grid events (remove 'includeEventTypes' from configuration)
[13:41:54,451][INFO][main][IgniteKernal] ^-- Enable G1 Garbage Collector (add '-XX:+UseG1GC' to JVM options)
[13:41:54,451][INFO][main][IgniteKernal] ^-- Set max direct memory size if getting 'OOME: Direct buffer memory' (add '-XX:MaxDirectMemorySize=<size>[g|G|m|M|k|K]' to JVM options)
[13:41:54,451][INFO][main][IgniteKernal] ^-- Disable processing of calls to System.gc() (add '-XX:+DisableExplicitGC' to JVM options)
[13:41:54,451][INFO][main][IgniteKernal] ^-- Speed up flushing of dirty pages by OS (alter vm.dirty_expire_centisecs parameter by setting to 500)
[13:41:54,451][INFO][main][IgniteKernal] ^-- Reduce pages swapping ratio (set vm.swappiness=10)
[13:41:54,451][INFO][main][IgniteKernal] Refer to this page for more performance suggestions: https://apacheignite.readme.io/docs/jvm-and-system-tuning
[13:41:54,451][INFO][main][IgniteKernal]
[13:41:54,451][INFO][main][IgniteKernal] To start Console Management & Monitoring run ignitevisorcmd.{sh|bat}
[13:41:54,451][INFO][main][IgniteKernal]
[13:41:54,459][INFO][main][IgniteKernal]
>>> +----------------------------------------------------------------------+
>>> Ignite ver. 2.1.0#20170720-sha1:a6ca5c8a97e9a4c9d73d40ce76d1504c14ba1940
>>> +----------------------------------------------------------------------+
>>> OS name: Linux 3.10.0-327.el7.x86_64 amd64
>>> CPU(s): 1
>>> Heap: 1.0GB
>>> VM name: 8122#tw.dna.com
>>> Local node [ID=2A929C01-F8A6-4B14-9857-88EAA2B58A87, order=12, clientMode=false]
>>> Local node addresses: [10.0.10.103/0:0:0:0:0:0:0:1%lo, 10.0.2.15/10.0.10.103, /10.0.2.15, /127.0.0.1]
>>> Local ports: TCP:10800 TCP:11211 TCP:47100 UDP:47400 TCP:47500
[13:41:54,462][INFO][main][GridDiscoveryManager] Topology snapshot [ver=12, servers=1, clients=0, CPUs=1, heap=1.0GB]
[13:42:54,444][INFO][grid-timeout-worker-#15%null%][IgniteKernal]
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
^-- Node [id=2a929c01, name=null, uptime=00:01:00:007]
^-- H/N/C [hosts=1, nodes=1, CPUs=1]
^-- CPU [cur=2.33%, avg=1.57%, GC=0%]
^-- PageMemory [pages=200]
^-- Heap [used=107MB, free=89.12%, comm=989MB]
^-- Non heap [used=36MB, free=97.59%, comm=37MB]
^-- Public thread pool [active=0, idle=0, qSize=0]
^-- System thread pool [active=0, idle=6, qSize=0]
^-- Outbound messages queue [size=0]
[13:43:46,444][INFO][disco-event-worker-#27%null%][GridDiscoveryManager] Added new node to topology: TcpDiscoveryNode [id=c8c42745-f838-48ea-9145-5783a6f77681, addrs=[0:0:0:0:0:0:0:1%lo, 10.0.10.101, 10.0.2.15, 127.0.0.1], sockAddrs=[/0:0:0:0:0:0:0:1%lo:0, /127.0.0.1:0, /10.0.10.101:0, /10.0.2.15:0], discPort=0, order=13, intOrder=8, lastExchangeTime=1505328226398, loc=false, ver=2.1.0#20170720-sha1:a6ca5c8a, isClient=true]
[13:43:46,446][INFO][disco-event-worker-#27%null%][GridDiscoveryManager] Topology snapshot [ver=13, servers=1, clients=1, CPUs=2, heap=3.0GB]
[13:43:46,448][INFO][exchange-worker-#28%null%][time] Started exchange init [topVer=AffinityTopologyVersion [topVer=13, minorTopVer=0], crd=true, evt=10, node=TcpDiscoveryNode [id=2a929c01-f8a6-4b14-9857-88eaa2b58a87, addrs=[0:0:0:0:0:0:0:1%lo, 10.0.10.103, 10.0.2.15, 127.0.0.1], sockAddrs=[/10.0.10.103:47500, /10.0.2.15:47500, /0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500], discPort=47500, order=12, intOrder=7, lastExchangeTime=1505328226435, loc=true, ver=2.1.0#20170720-sha1:a6ca5c8a, isClient=false], evtNode=TcpDiscoveryNode [id=2a929c01-f8a6-4b14-9857-88eaa2b58a87, addrs=[0:0:0:0:0:0:0:1%lo, 10.0.10.103, 10.0.2.15, 127.0.0.1], sockAddrs=[/10.0.10.103:47500, /10.0.2.15:47500, /0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500], discPort=47500, order=12, intOrder=7, lastExchangeTime=1505328226435, loc=true, ver=2.1.0#20170720-sha1:a6ca5c8a, isClient=false], customEvt=null]
[13:43:46,448][INFO][exchange-worker-#28%null%][GridDhtPartitionsExchangeFuture] Snapshot initialization completed [topVer=AffinityTopologyVersion [topVer=13, minorTopVer=0], time=0ms]
[13:43:46,449][INFO][exchange-worker-#28%null%][time] Finished exchange init [topVer=AffinityTopologyVersion [topVer=13, minorTopVer=0], crd=true]
[13:43:46,449][INFO][exchange-worker-#28%null%][GridCachePartitionExchangeManager] Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion [topVer=13, minorTopVer=0], evt=NODE_JOINED, node=c8c42745-f838-48ea-9145-5783a6f77681]
[13:43:47,121][INFO][grid-nio-worker-tcp-comm-0-#17%null%][TcpCommunicationSpi] Accepted incoming communication connection [locAddr=/10.0.10.103:47100, rmtAddr=/10.0.10.101:54857]
[13:43:47,357][INFO][exchange-worker-#28%null%][time] Started exchange init [topVer=AffinityTopologyVersion [topVer=13, minorTopVer=1], crd=true, evt=18, node=TcpDiscoveryNode [id=2a929c01-f8a6-4b14-9857-88eaa2b58a87, addrs=[0:0:0:0:0:0:0:1%lo, 10.0.10.103, 10.0.2.15, 127.0.0.1], sockAddrs=[/10.0.10.103:47500, /10.0.2.15:47500, /0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500], discPort=47500, order=12, intOrder=7, lastExchangeTime=1505328227343, loc=true, ver=2.1.0#20170720-sha1:a6ca5c8a, isClient=false], evtNode=TcpDiscoveryNode [id=2a929c01-f8a6-4b14-9857-88eaa2b58a87, addrs=[0:0:0:0:0:0:0:1%lo, 10.0.10.103, 10.0.2.15, 127.0.0.1], sockAddrs=[/10.0.10.103:47500, /10.0.2.15:47500, /0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500], discPort=47500, order=12, intOrder=7, lastExchangeTime=1505328227343, loc=true, ver=2.1.0#20170720-sha1:a6ca5c8a, isClient=false], customEvt=DynamicCacheChangeBatch [id=dcedd8c7e51-9d6cee64-90a5-4c0b-a1ed-b4c7a1697bfb, reqs=[DynamicCacheChangeRequest [cacheName=ignite-sys-atomic-cache#dna-EVENT_DELIVERY_SET, hasCfg=true, nodeId=c8c42745-f838-48ea-9145-5783a6f77681, clientStartOnly=false, stop=false, destroy=false]], exchangeActions=ExchangeActions [startCaches=[ignite-sys-atomic-cache#dna-EVENT_DELIVERY_SET], stopCaches=null, startGrps=[dna-EVENT_DELIVERY_SET], stopGrps=[], resetParts=null, stateChangeRequest=null], startCaches=false]]
[13:43:47,378][INFO][exchange-worker-#28%null%][GridCacheProcessor] Started cache [name=ignite-sys-atomic-cache#dna-EVENT_DELIVERY_SET, group=dna-EVENT_DELIVERY_SET, memoryPolicyName=default, mode=PARTITIONED, atomicity=TRANSACTIONAL]
[13:43:47,379][INFO][exchange-worker-#28%null%][GridDhtPartitionsExchangeFuture] Finished waiting for partition release future [topVer=AffinityTopologyVersion [topVer=13, minorTopVer=1], waitTime=0ms]
[13:43:47,496][INFO][exchange-worker-#28%null%][GridDhtPartitionsExchangeFuture] Snapshot initialization completed [topVer=AffinityTopologyVersion [topVer=13, minorTopVer=1], time=0ms]
[13:43:47,512][INFO][exchange-worker-#28%null%][time] Finished exchange init [topVer=AffinityTopologyVersion [topVer=13, minorTopVer=1], crd=true]
[13:43:47,515][INFO][exchange-worker-#28%null%][GridCachePartitionExchangeManager] Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion [topVer=13, minorTopVer=1], evt=DISCOVERY_CUSTOM_EVT, node=c8c42745-f838-48ea-9145-5783a6f77681]
[13:43:47,558][INFO][exchange-worker-#28%null%][time] Started exchange init [topVer=AffinityTopologyVersion [topVer=13, minorTopVer=2], crd=true, evt=18, node=TcpDiscoveryNode [id=2a929c01-f8a6-4b14-9857-88eaa2b58a87, addrs=[0:0:0:0:0:0:0:1%lo, 10.0.10.103, 10.0.2.15, 127.0.0.1], sockAddrs=[/10.0.10.103:47500, /10.0.2.15:47500, /0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500], discPort=47500, order=12, intOrder=7, lastExchangeTime=1505328227557, loc=true, ver=2.1.0#20170720-sha1:a6ca5c8a, isClient=false], evtNode=TcpDiscoveryNode [id=2a929c01-f8a6-4b14-9857-88eaa2b58a87, addrs=[0:0:0:0:0:0:0:1%lo, 10.0.10.103, 10.0.2.15, 127.0.0.1], sockAddrs=[/10.0.10.103:47500, /10.0.2.15:47500, /0:0:0:0:0:0:0:1%lo:47500, /127.0.0.1:47500], discPort=47500, order=12, intOrder=7, lastExchangeTime=1505328227557, loc=true, ver=2.1.0#20170720-sha1:a6ca5c8a, isClient=false], customEvt=DynamicCacheChangeBatch [id=3dedd8c7e51-9d6cee64-90a5-4c0b-a1ed-b4c7a1697bfb, reqs=[DynamicCacheChangeRequest [cacheName=datastructures_ATOMIC_PARTITIONED_0#dna-EVENT_DELIVERY_SET, hasCfg=true, nodeId=c8c42745-f838-48ea-9145-5783a6f77681, clientStartOnly=false, stop=false, destroy=false]], exchangeActions=ExchangeActions [startCaches=[datastructures_ATOMIC_PARTITIONED_0#dna-EVENT_DELIVERY_SET], stopCaches=null, startGrps=[], stopGrps=[], resetParts=null, stateChangeRequest=null], startCaches=false]]
[13:43:47,597][INFO][exchange-worker-#28%null%][GridCacheProcessor] Started cache [name=datastructures_ATOMIC_PARTITIONED_0#dna-EVENT_DELIVERY_SET, group=dna-EVENT_DELIVERY_SET, memoryPolicyName=default, mode=PARTITIONED, atomicity=ATOMIC]
[13:43:47,597][INFO][exchange-worker-#28%null%][GridDhtPartitionsExchangeFuture] Finished waiting for partition release future [topVer=AffinityTopologyVersion [topVer=13, minorTopVer=2], waitTime=0ms]
[13:43:47,623][INFO][exchange-worker-#28%null%][GridDhtPartitionsExchangeFuture] Snapshot initialization completed [topVer=AffinityTopologyVersion [topVer=13, minorTopVer=2], time=0ms]
[13:43:47,625][INFO][exchange-worker-#28%null%][time] Finished exchange init [topVer=AffinityTopologyVersion [topVer=13, minorTopVer=2], crd=true]
[13:43:47,626][INFO][exchange-worker-#28%null%][GridCachePartitionExchangeManager] Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion [topVer=13, minorTopVer=2], evt=DISCOVERY_CUSTOM_EVT, node=c8c42745-f838-48ea-9145-5783a6f77681]
[13:43:47,915][SEVERE][tcp-disco-msg-worker-#2%null%][TcpDiscoverySpi] Failed to unmarshal discovery custom message.
class org.apache.ignite.IgniteCheckedException: Failed to find class with given class loader for unmarshalling (make sure same versions of all classes are available on all nodes or enable peer-class-loading) [clsLdr=sun.misc.Launcher$AppClassLoader#18b4aac2, cls=scan.fragment.node.ignite.VersionedInterceptor]
at org.apache.ignite.marshaller.jdk.JdkMarshaller.unmarshal0(JdkMarshaller.java:124)
at org.apache.ignite.marshaller.AbstractNodeNameAwareMarshaller.unmarshal(AbstractNodeNameAwareMarshaller.java:94)
at org.apache.ignite.marshaller.jdk.JdkMarshaller.unmarshal0(JdkMarshaller.java:143)
at org.apache.ignite.marshaller.AbstractNodeNameAwareMarshaller.unmarshal(AbstractNodeNameAwareMarshaller.java:82)
at org.apache.ignite.internal.util.IgniteUtils.unmarshal(IgniteUtils.java:9733)
at org.apache.ignite.spi.discovery.tcp.messages.TcpDiscoveryCustomEventMessage.message(TcpDiscoveryCustomEventMessage.java:81)
at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.notifyDiscoveryListener(ServerImpl.java:5436)
at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processCustomMessage(ServerImpl.java:5321)
at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2629)
at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.processMessage(ServerImpl.java:2420)
at org.apache.ignite.spi.discovery.tcp.ServerImpl$MessageWorkerAdapter.body(ServerImpl.java:6576)
at org.apache.ignite.spi.discovery.tcp.ServerImpl$RingMessageWorker.body(ServerImpl.java:2506)
at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)
Caused by: java.lang.ClassNotFoundException: scan.fragment.node.ignite.VersionedInterceptor
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.ignite.internal.util.IgniteUtils.forName(IgniteUtils.java:8465)
at org.apache.ignite.marshaller.jdk.JdkMarshallerObjectInputStream.resolveClass(JdkMarshallerObjectInputStream.java:54)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1613)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
at java.util.ArrayList.readObject(ArrayList.java:791)
at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1900)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
at org.apache.ignite.marshaller.jdk.JdkMarshaller.unmarshal0(JdkMarshaller.java:121)
... 12 more
Peer class loading is working with Compute Grid [1] only. It looks like your VersionedInterceptor is part of cache configuration (implementation of CacheInterceptor?), such classes have to be explicitly deployed on all nodes prior to cluster start.