Aerospike Cluster Nodes intermittently going down and coming back up - aerospike

I have an Aerospike cluster of 15 nodes. This cluster performs fairly well under a normal load 10k TPS. I did some tests today, on a higher TPS. I raised the TPS to around 130k-150k TPS.
I observed that some nodes intermittently went down, and automatically came back up after a few seconds. Due to these nodes going down, we are getting heartbeat timeouts, and hence, read timeouts.
One cluster node configuration: 8 cores. 120GB RAM. I am storing data in memory.
All nodes have sufficient space remaining. Out of a total cluster space of 1.2TB (15*120), only 275 GB of space is used up.
Also, the network in not at all flaky. All these machines are in a data centre, and are high bandwidth machines.
Some observations made by monitoring AMC:
Saw some nodes (around 5-6) become inactive for a few seconds
There were a high number of client connections on few of these nodes that went down. For example: there were 6000-7000 client connections on all other nodes. One of the node had an unusual 25000 client connections.
Some error logs in cluster nodes:
Sep 15 2020 17:00:43 GMT: WARNING (hb): (hb.c:4864) (repeated:5) could not create heartbeat connection to node {10.33.162.134:2057}
Sep 15 2020 17:00:43 GMT: WARNING (socket): (socket.c:808) (repeated:5) Error while connecting socket to 10.33.162.134:2057
Sep 15 2020 17:00:53 GMT: WARNING (socket): (socket.c:740) (repeated:3) Timeout while connecting
Sep 15 2020 17:00:53 GMT: WARNING (hb): (hb.c:4864) (repeated:3) could not create heartbeat connection to node {10.33.162.134:2057}
Sep 15 2020 17:00:53 GMT: WARNING (socket): (socket.c:808) (repeated:3) Error while connecting socket to 10.33.162.134:2057
Sep 15 2020 17:01:03 GMT: WARNING (socket): (socket.c:740) (repeated:1) Timeout while connecting
Sep 15 2020 17:01:03 GMT: WARNING (hb): (hb.c:4864) (repeated:1) could not create heartbeat connection to node {10.33.162.134:2057}
Sep 15 2020 17:01:03 GMT: WARNING (socket): (socket.c:808) (repeated:1) Error while connecting socket to 10.33.162.134:2057
Sep 15 2020 17:01:13 GMT: WARNING (socket): (socket.c:740) (repeated:2) Timeout while connecting
Sep 15 2020 17:01:13 GMT: WARNING (hb): (hb.c:4864) (repeated:2) could not create heartbeat connection to node {10.33.162.134:2057}
Sep 15 2020 17:01:13 GMT: WARNING (socket): (socket.c:808) (repeated:2) Error while connecting socket to 10.33.162.134:2057
Sep 15 2020 17:02:44 GMT: WARNING (socket): (socket.c:740) Timeout while connecting
Sep 15 2020 17:02:44 GMT: WARNING (socket): (socket.c:808) Error while connecting socket to 10.33.54.144:2057
Sep 15 2020 17:02:44 GMT: WARNING (hb): (hb.c:4864) could not create heartbeat connection to node {10.33.54.144:2057}
Sep 15 2020 17:02:53 GMT: WARNING (socket): (socket.c:740) (repeated:1) Timeout while connecting
We also saw some of these error logs in nodes that were going down:
Sep 15 2020 17:08:58 GMT: WARNING (hb): (hb.c:5122) sending mesh message to bb9280f220a0102 on fd 4155 failed : Broken pipe
Sep 15 2020 17:08:58 GMT: WARNING (hb): (hb.c:5122) sending mesh message to bb9b676220a0102 on fd 4149 failed : Broken pipe
Sep 15 2020 17:08:58 GMT: WARNING (hb): (hb.c:5122) sending mesh message to bb9fbd6200a0102 on fd 42 failed : Broken pipe
Sep 15 2020 17:08:58 GMT: WARNING (hb): (hb.c:5122) sending mesh message to bb96d3d220a0102 on fd 4444 failed : Broken pipe
Sep 15 2020 17:08:58 GMT: WARNING (hb): (hb.c:5122) sending mesh message to bb99036210a0102 on fd 4278 failed : Broken pipe
Sep 15 2020 17:08:58 GMT: WARNING (hb): (hb.c:5122) sending mesh message to bb9f102220a0102 on fd 4143 failed : Broken pipe
Sep 15 2020 17:08:58 GMT: WARNING (hb): (hb.c:5122) sending mesh message to bb91822210a0102 on fd 4515 failed : Broken pipe
Sep 15 2020 17:08:58 GMT: WARNING (hb): (hb.c:5122) sending mesh message to bb9e5ff200a0102 on fd 4173 failed : Broken pipe
Sep 15 2020 17:08:58 GMT: WARNING (hb): (hb.c:5122) sending mesh message to bb93f65200a0102 on fd 38 failed : Broken pipe
Sep 15 2020 17:08:58 GMT: WARNING (hb): (hb.c:5122) sending mesh message to bb9132f220a0102 on fd 4414 failed : Connection reset by peer
Sep 15 2020 17:08:58 GMT: WARNING (hb): (hb.c:5122) sending mesh message to bb939be210a0102 on fd 4567 failed : Connection reset by peer
Sep 15 2020 17:08:58 GMT: WARNING (hb): (hb.c:5122) sending mesh message to bb9b19a220a0102 on fd 4165 failed : Broken pipe
Attaching the aerospike.conf file here:
service {
user root
group root
service-threads 12
transaction-queues 12
transaction-threads-per-queue 4
proto-fd-max 50000
migrate-threads 1
pidfile /var/run/aerospike/asd.pid
}
logging {
file /var/log/aerospike/aerospike.log {
context any info
context migrate debug
}
}
network {
service {
address any
port 3000
}
heartbeat {
mode mesh
port 2057
mesh-seed-address-port 10.34.154.177 2057
mesh-seed-address-port 10.34.15.40 2057
mesh-seed-address-port 10.32.255.229 2057
mesh-seed-address-port 10.33.54.144 2057
mesh-seed-address-port 10.32.190.157 2057
mesh-seed-address-port 10.32.101.63 2057
mesh-seed-address-port 10.34.2.241 2057
mesh-seed-address-port 10.32.214.251 2057
mesh-seed-address-port 10.34.30.114 2057
mesh-seed-address-port 10.33.162.134 2057
mesh-seed-address-port 10.33.190.57 2057
mesh-seed-address-port 10.34.61.109 2057
mesh-seed-address-port 10.34.47.19 2057
mesh-seed-address-port 10.33.34.24 2057
mesh-seed-address-port 10.34.118.182 2057
interval 150
timeout 20
}
fabric {
port 3001
}
info {
port 3003
}
}
namespace PS1 {
replication-factor 2
memory-size 70G
single-bin false
data-in-index false
storage-engine memory
stop-writes-pct 90
high-water-memory-pct 75
}
namespace LS1 {
replication-factor 2
memory-size 30G
single-bin false
data-in-index false
storage-engine memory
stop-writes-pct 90
high-water-memory-pct 75
}
Any possible explanations for this?

Seems like the nodes are having network connectivity issues at such higher throughput. This can have different root causes, from simple network related bottleneck (bandwidth, packets per second), to something on the node itself getting in the way of interfacing properly with the network (soft interrupts surge, improper distribution of network queues, CPU thrashing). This would prevent heartbeat connections/messages from going through, resulting in nodes leaving the cluster until it recovers. If running on a cloud/virtualized environment, some hosts may have noisier neighbors than others, etc...
The increase in number of connections is a symptom as any slow down on a node would cause the client to compensate by increasing the throughput (which will increase the number of connections, which can also lead to a downward spiraling effect).
Finally, a single node leaving or joining the cluster shouldn't impact read transactions much. Check your policy and make sure you have the socketTimeout / totalTimeout / maxRetries, etc... set correctly so that a read can quickly retry against a different replica.
This article can help on this latest point: https://discuss.aerospike.com/t/understanding-timeout-and-retry-policies/2852/3

Related

Openvpn Raspberry PI login loop

Suddenly my nordvpn through openvpn on my raspberry pi isn't working anymore. Now I get the following error:
Sun Sep 13 12:25:14 2020 Outgoing Control Channel Authentication: Using 512 bit message hash 'SHA512' for HMAC authentication
Sun Sep 13 12:25:14 2020 Incoming Control Channel Authentication: Using 512 bit message hash 'SHA512' for HMAC authentication
Sun Sep 13 12:25:14 2020 TCP/UDP: Preserving recently used remote address: [AF_INET]62.112.11.159:443
Sun Sep 13 12:25:14 2020 Socket Buffers: R=[87380->87380] S=[16384->16384]
Sun Sep 13 12:25:14 2020 Attempting to establish TCP connection with [AF_INET]62.112.11.159:443 [nonblock]
Sun Sep 13 12:25:15 2020 TCP connection established with [AF_INET]62.112.11.159:443
Sun Sep 13 12:25:15 2020 TCP_CLIENT link local: (not bound)
Sun Sep 13 12:25:15 2020 TCP_CLIENT link remote: [AF_INET]62.112.11.159:443
Sun Sep 13 12:25:15 2020 Connection reset, restarting [0]
Sun Sep 13 12:25:15 2020 SIGUSR1[soft,connection-reset] received, process restarting
Sun Sep 13 12:25:15 2020 Restart pause, 5 second(s)
No idea what to do. I can't find any server log. Tried removing and reinstalling. Tried updating. I can connect to the internet. It's just when I try to connect to a OVPN file it does this in a loop. I can even give a wrong login information and it won't say anything. Can anyone shine some light on this? Thanks

Redis crashing without any log errors

I'm debugging some weird behavior in my redis, where it's crashing each 2 days more or less, but not showing any errors whatsoever, only this on the logs:
1:C 10 Sep 2020 15:44:14.517 # Configuration loaded
1:M 10 Sep 2020 15:44:14.522 * Running mode=standalone, port=6379.
1:M 10 Sep 2020 15:44:14.522 # Server initialized
1:M 10 Sep 2020 15:44:14.524 * Ready to accept connections
1:C 12 Sep 2020 13:20:23.751 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:C 12 Sep 2020 13:20:23.751 # Redis version=6.0.5, bits=64, commit=00000000, modified=0, pid=1, just started
1:C 12 Sep 2020 13:20:23.751 # Configuration loaded
1:M 12 Sep 2020 13:20:23.757 * Running mode=standalone, port=6379.
1:M 12 Sep 2020 13:20:23.757 # Server initialized
1:M 12 Sep 2020 13:20:23.758 * Ready to accept connections
That's all redis says to me.
I have lots of RAM available, but I have redis running as a single instance on a docker container, could the lack of processing power cause this? Should I use multiple nodes? I don't want to setup a cluster just to find out the problem was another, how can I trace down the actually cause of the problem?
So, in the end, it was exactly what I thought it was not: a memory leak!
I had 16GB that was slowly being consumed until redis crashed with no warnings, nor the operating system/docker. I fixed the app that caused the leak and the problem was gone.

How to setup an aerospike cluster with a single node?

I currently have a working cluster with two nodes. Following is the content of /etc/aerospike/aerospike.conf -
network {
service {
address any
port 3000
}
heartbeat {
mode mesh
port 3002 # Heartbeat port for this node.
# List one or more other nodes, one ip-address & port per line:
mesh-seed-address-port <existing server's ip> 3002
mesh-seed-address-port <other server's ip> 3002
interval 250
timeout 10
}
fabric {
port 3001
}
info {
port 3003
}
}
I tried by changing the heartbeat setting by removing the address port of the other node -
heartbeat {
mode mesh
port 3002 # Heartbeat port for this node.
# List one or more other nodes, one ip-address & port per line:
mesh-seed-address-port <existing server's ip> 3002
interval 250
timeout 10
}
Then I restarted the aerospike and the amc services -
service aerospike restart
service amc restart
However, still the /var/log/aerospike/aerospike.log file shows two nodes present -
Mar 07 2017 13:16:28 GMT: INFO (info): (ticker.c:249) system-memory: free-kbytes 125756260 free-pct 99 heap-kbytes (2343074,2344032,2417664) heap-efficiency-pct 96.9
Mar 07 2017 13:16:28 GMT: INFO (info): (ticker.c:263) in-progress: tsvc-q 0 info-q 0 nsup-delete-q 0 rw-hash 0 proxy-hash 0 tree-gc-q 0
Mar 07 2017 13:16:28 GMT: INFO (info): (ticker.c:285) fds: proto (20,23,3) heartbeat (1,1,0) fabric (19,19,0)
Mar 07 2017 13:16:28 GMT: INFO (info): (ticker.c:294) heartbeat-received: self 0 foreign 1488
Mar 07 2017 13:16:28 GMT: INFO (info): (ticker.c:348) {FC} objects: all 0 master 0 prole 0
Mar 07 2017 13:16:28 GMT: INFO (info): (ticker.c:409) {FC} migrations: complete
Mar 07 2017 13:16:28 GMT: INFO (info): (ticker.c:428) {FC} memory-usage: total-bytes 0 index-bytes 0 sindex-bytes 0 data-bytes 0 used-pct 0.00
Mar 07 2017 13:16:28 GMT: INFO (info): (ticker.c:348) {TARGETPARAMS} objects: all 0 master 0 prole 0
Mar 07 2017 13:16:28 GMT: INFO (info): (ticker.c:409) {TARGETPARAMS} migrations: complete
Mar 07 2017 13:16:28 GMT: INFO (info): (ticker.c:428) {TARGETPARAMS} memory-usage: total-bytes 0 index-bytes 0 sindex-bytes 0 data-bytes 0 used-pct 0.00
Mar 07 2017 13:16:38 GMT: INFO (info): (ticker.c:169) NODE-ID bb93c00b70b0022 CLUSTER-SIZE 2
Mar 07 2017 13:16:38 GMT: INFO (info): (ticker.c:249) system-memory: free-kbytes 125756196 free-pct 99 heap-kbytes (2343073,2344032,2417664) heap-efficiency-pct 96.9
So does the AMC console.
This should help: http://www.aerospike.com/docs/operations/manage/cluster_mng/removing_node
Once the node is removed properly, you can restart it with the different heartbeat config so that it doesn't join the other node.
For version, simply do asd --version. You can also use asinfo -v build.
The version is also printed within asadm / AMC and in the logs right at startup.

Aerospike sudden crash

I am running a 5 nodes cluster with version 3.7.0.2 and after some hours of usage all 5 instances are crashed. I have seen some other reports of crash in this version. Should I download the 3.7.1 version? Will it fix the crash?
Linux aerospike2 4.2.0-18-generic #22-Ubuntu SMP Fri Nov 6 18:25:50
UTC 2015 x86_64 x86_64 x86_64 GNU/Linux (Ubuntu 15.10)
config:
# Aerospike database configuration file.
service {
user root
group root
paxos-single-replica-limit 1 # Number of nodes where the replica count is automatically reduced to 1.
pidfile /var/run/aerospike/asd.pid
service-threads 32
transaction-queues 32
transaction-threads-per-queue 32
batch-index-threads 32
proto-fd-max 15000
batch-max-requests 200000
}
logging {
# Log file must be an absolute path.
file /var/log/aerospike/aerospike.log {
context any info
}
}
network {
service {
address 10.240.0.6
port 3000
}
heartbeat {
mode mesh
address 10.240.0.6 # IP of the NIC on which this node is listening
mesh-seed-address-port 10.240.0.6 3002
mesh-seed-address-port 10.240.0.5 3002
port 3002
interval 150
timeout 10
}
fabric {
port 3001
}
info {
port 3003
}
}
namespace test {
replication-factor 10
memory-size 3500M
default-ttl 0 # 30 days, use 0 to never expire/evict.
ldt-enabled true
storage-engine device {
file /data/aerospike.dat
write-block-size 1M
filesize 300G
# data-in-memory true
}
}
LOGS:
Jan 07 2016 11:28:34 GMT: INFO (drv_ssd): (drv_ssd.c::3202) device /data/aerospike.dat: read complete: UNIQUE 13593274 (REPLACED 0) (GEN 63) (EXPIRED 0) (MAX-TTL 0) records
Jan 07 2016 11:28:34 GMT: INFO (drv_ssd): (drv_ssd.c::1072) ns test loading free & defrag queues
Jan 07 2016 11:28:34 GMT: INFO (drv_ssd): (drv_ssd.c::1006) /data/aerospike.dat init defrag profile: 0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
Jan 07 2016 11:28:34 GMT: INFO (drv_ssd): (drv_ssd.c::1096) /data/aerospike.dat init wblock free-q 220796, defrag-q 2
Jan 07 2016 11:28:34 GMT: INFO (drv_ssd): (drv_ssd.c::2373) ns test starting device maintenance threads
Jan 07 2016 11:28:34 GMT: INFO (drv_ssd): (drv_ssd.c::1488) ns test starting write worker threads
Jan 07 2016 11:28:34 GMT: INFO (drv_ssd): (drv_ssd.c::923) ns test starting defrag threads
Jan 07 2016 11:28:34 GMT: INFO (as): (as.c::457) initializing services...
Jan 07 2016 11:28:34 GMT: INFO (tsvc): (thr_tsvc.c::819) shared queues: 32 queues with 32 threads each
Jan 07 2016 11:28:34 GMT: INFO (hb): (hb.c::2649) Sending 10.240.0.14 as the IP address for receiving heartbeats
Jan 07 2016 11:28:34 GMT: INFO (hb): (hb.c::2661) heartbeat socket initialization
Jan 07 2016 11:28:34 GMT: INFO (hb): (hb.c::2675) initializing mesh heartbeat socket : 10.240.0.14:3002
Jan 07 2016 11:28:34 GMT: INFO (paxos): (paxos.c::3454) partitions from storage: total 4096 found 4096 lost(set) 0 lost(unset) 0
Jan 07 2016 11:28:34 GMT: INFO (partition): (partition.c::3432) {test} 4096 partitions: found 0 absent, 4096 stored
Jan 07 2016 11:28:34 GMT: INFO (paxos): (paxos.c::3458) Paxos service ignited: bb90e00f00a0142
Jan 07 2016 11:28:34 GMT: INFO (batch): (batch.c::609) Initialize batch-index-threads to 32
Jan 07 2016 11:28:34 GMT: INFO (batch): (batch.c::635) Created JEMalloc arena #151 for batch normal buffers
Jan 07 2016 11:28:34 GMT: INFO (batch): (batch.c::636) Created JEMalloc arena #152 for batch huge buffers
Jan 07 2016 11:28:34 GMT: INFO (batch): (thr_batch.c::347) Initialize batch-threads to 4
Jan 07 2016 11:28:34 GMT: INFO (drv_ssd): (drv_ssd.c::4147) {test} floor set at 1049 wblocks per device
Jan 07 2016 11:28:37 GMT: INFO (paxos): (paxos.c::3539) listening for other nodes (max 3000 milliseconds) ...
Jan 07 2016 11:28:37 GMT: INFO (hb): (hb.c::2143) connecting to remote heartbeat service at 10.240.0.6:3002
Jan 07 2016 11:28:37 GMT: INFO (hb): (hb.c::2143) connecting to remote heartbeat service at 10.240.0.5:3002
Jan 07 2016 11:28:37 GMT: INFO (hb): (hb.c::1085) initiated connection to mesh seed host at 10.240.0.6:3002 (10.240.0.6:3002) via socket 60 from 10.240.0.14:55702
Jan 07 2016 11:28:37 GMT: INFO (hb): (hb.c::1085) initiated connection to mesh seed host at 10.240.0.5:3002 (10.240.0.5:3002) via socket 61 from 10.240.0.14:40626
Jan 07 2016 11:28:37 GMT: INFO (hb): (hb.c::1085) initiated connection to mesh non-seed host at 10.240.0.23:3002 (10.240.0.23:3002) via socket 62 from 10.240.0.14:42802
Jan 07 2016 11:28:37 GMT: INFO (hb): (hb.c::1085) initiated connection to mesh non-seed host at 10.240.0.13:3002 (10.240.0.13:3002) via socket 63 from 10.240.0.14:35384
Jan 07 2016 11:28:37 GMT: INFO (hb): (hb.c::2571) new heartbeat received: bb90500f00a0142 principal node is bb91700f00a0142
Jan 07 2016 11:28:37 GMT: INFO (hb): (hb.c::2571) new heartbeat received: bb90600f00a0142 principal node is bb91700f00a0142
Jan 07 2016 11:28:37 GMT: INFO (fabric): (fabric.c::1811) fabric: node bb90500f00a0142 arrived
Jan 07 2016 11:28:37 GMT: INFO (fabric): (fabric.c::1811) fabric: node bb90600f00a0142 arrived
Jan 07 2016 11:28:37 GMT: INFO (paxos): (paxos.c::3547) ... other node(s) detected - node will operate in a multi-node cluster
Jan 07 2016 11:28:37 GMT: INFO (paxos): (paxos.c::2250) Skip node arrival bb90500f00a0142 cluster principal bb90e00f00a0142 pulse principal bb91700f00a0142
Jan 07 2016 11:28:37 GMT: INFO (paxos): (paxos.c::2250) Skip node arrival bb90600f00a0142 cluster principal bb90e00f00a0142 pulse principal bb91700f00a0142
Jan 07 2016 11:28:37 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #8 for thr_demarshal()
Jan 07 2016 11:28:37 GMT: INFO (ldt): (thr_nsup.c::1139) LDT supervisor started
Jan 07 2016 11:28:37 GMT: INFO (nsup): (thr_nsup.c::1176) namespace supervisor started
Jan 07 2016 11:28:37 GMT: INFO (paxos): (paxos.c::3516) paxos supervisor thread started
Jan 07 2016 11:28:37 GMT: INFO (demarshal): (thr_demarshal.c::308) Service started: socket 0.0.0.0:3000
Jan 07 2016 11:28:37 GMT: INFO (hb): (hb.c::2571) new heartbeat received: bb90d00f00a0142 principal node is bb91700f00a0142
Jan 07 2016 11:28:37 GMT: INFO (hb): (hb.c::2571) new heartbeat received: bb91700f00a0142 principal node is bb91700f00a0142
Jan 07 2016 11:28:37 GMT: INFO (fabric): (fabric.c::1811) fabric: node bb90d00f00a0142 arrived
Jan 07 2016 11:28:37 GMT: INFO (fabric): (fabric.c::1811) fabric: node bb91700f00a0142 arrived
Jan 07 2016 11:28:37 GMT: INFO (paxos): (paxos.c::2250) Skip node arrival bb90d00f00a0142 cluster principal bb90e00f00a0142 pulse principal bb91700f00a0142
Jan 07 2016 11:28:37 GMT: INFO (paxos): (paxos.c::2250) Skip node arrival bb91700f00a0142 cluster principal bb90e00f00a0142 pulse principal bb91700f00a0142
Jan 07 2016 11:28:38 GMT: INFO (partition): (partition.c::383) DISALLOW MIGRATIONS
Jan 07 2016 11:28:38 GMT: INFO (paxos): (paxos.c::3198) SUCCESSION [6]#bb91700f00a0142*: bb91700f00a0142 bb90e00f00a0142 bb90d00f00a0142 bb90600f00a0142 bb90500f00a0142
Jan 07 2016 11:28:38 GMT: INFO (paxos): (paxos.c::3209) node bb91700f00a0142 is now principal pro tempore
Jan 07 2016 11:28:38 GMT: INFO (paxos): (paxos.c::2331) Sent partition sync request to node bb91700f00a0142
Jan 07 2016 11:28:38 GMT: INFO (partition): (partition.c::383) DISALLOW MIGRATIONS
Jan 07 2016 11:28:38 GMT: INFO (paxos): (paxos.c::3198) SUCCESSION [6]#bb91700f00a0142*: bb91700f00a0142 bb90e00f00a0142 bb90d00f00a0142 bb90600f00a0142 bb90500f00a0142
Jan 07 2016 11:28:38 GMT: INFO (paxos): (paxos.c::3209) node bb91700f00a0142 is still principal pro tempore
Jan 07 2016 11:28:38 GMT: INFO (paxos): (paxos.c::2331) Sent partition sync request to node bb91700f00a0142
Jan 07 2016 11:28:38 GMT: INFO (paxos): (paxos.c::3293) received partition sync message from bb91700f00a0142
Jan 07 2016 11:28:38 GMT: INFO (partition): (partition.c::2490) CLUSTER SIZE = 5
Jan 07 2016 11:28:38 GMT: INFO (partition): (partition.c::2533) Global state is well formed
Jan 07 2016 11:28:38 GMT: INFO (paxos): (partition.c::2262) setting replication factors: cluster size 5, paxos single replica limit 1
Jan 07 2016 11:28:38 GMT: INFO (paxos): (partition.c::2278) {test} replication factor is 5
Jan 07 2016 11:28:38 GMT: INFO (config): (cluster_config.c::421) rack aware is disabled
Jan 07 2016 11:28:38 GMT: INFO (partition): (cluster_config.c::380) rack aware is disabled
Jan 07 2016 11:28:38 GMT: INFO (partition): (partition.c::3337) {test} re-balanced, expected migrations - (5789 tx, 6010 rx)
Jan 07 2016 11:28:38 GMT: INFO (paxos): (partition.c::3355) global partition state: total 4096 lost 0 unique 0 duplicate 4096
Jan 07 2016 11:28:38 GMT: INFO (paxos): (partition.c::3356) partition state after fixing lost partitions (master): total 4096 lost 0 unique 0 duplicate 4096
Jan 07 2016 11:28:38 GMT: INFO (paxos): (partition.c::3357) 0 new partition version tree paths generated
Jan 07 2016 11:28:38 GMT: INFO (partition): (partition.c::375) ALLOW MIGRATIONS
Jan 07 2016 11:28:38 GMT: INFO (paxos): (paxos.c::3293) received partition sync message from bb91700f00a0142
Jan 07 2016 11:28:38 GMT: INFO (paxos): (paxos.c::803) Node allows migrations. Ignoring duplicate partition sync message.
Jan 07 2016 11:28:38 GMT: WARNING (paxos): (paxos.c::3301) unable to apply partition sync message state
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #18 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #19 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #20 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #21 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #22 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #23 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #24 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #25 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #26 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #27 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #28 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #30 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #29 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #31 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #32 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #33 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #34 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #35 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #36 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #37 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #38 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #39 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #40 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #41 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #42 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #43 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #44 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #45 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #46 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #47 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #48 for thr_demarshal()
Jan 07 2016 11:28:39 GMT: INFO (demarshal): (thr_demarshal.c::860) Waiting to spawn demarshal threads ...
Jan 07 2016 11:28:39 GMT: INFO (demarshal): (thr_demarshal.c::863) Started 32 Demarshal Threads
Jan 07 2016 11:28:39 GMT: INFO (as): (as.c::494) service ready: soon there will be cake!
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5084) system memory: free 6590544kb ( 86 percent free )
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5090) ClusterSize 5 ::: objects 13593274 ::: sub_objects 0
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5099) rec refs 13596175 ::: rec locks 1 ::: trees 0 ::: wr reqs 0 ::: mig tx 2633 ::: mig rx 30
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5104) replica errs :: null 0 non-null 0 ::: sync copy errs :: master 0
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5114) trans_in_progress: wr 0 prox 0 wait 0 ::: q 0 ::: iq 0 ::: dq 0 : fds - proto (22, 35, 13) : hb (4, 4, 0) : fab (72, 72, 0)
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5116) heartbeat_received: self 0 : foreign 322
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5117) heartbeat_stats: bt 0 bf 0 nt 0 ni 0 nn 0 nnir 0 nal 0 sf1 0 sf2 0 sf3 0 sf4 0 sf5 0 sf6 0 mrf 0 eh 0 efd 0 efa 0 um 0 mcf 0 rc 0
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5129) tree_counts: nsup 0 scan 0 dup 0 wprocess 0 migrx 30 migtx 2633 ssdr 0 ssdw 0 rw 0
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5158) {test} disk bytes used 89561376640 : avail pct 71 : cache-read pct 0.00
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5160) {test} memory bytes used 869969536 (index 869969536 : sindex 0) : used pct 23.70
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5171) {test} ldt_gc: cnt 0 io 0 gc 0 (0, 0, 0)
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5194) {test} migrations - remaining (5777 tx, 5982 rx), active (1 tx, 2 rx), 0.34% complete
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5203) partitions: actual 792 sync 3304 desync 0 zombie 0 absent 0
Jan 07 2016 11:28:49 GMT: INFO (info): (hist.c::137) histogram dump: reads (0 total) msec
Jan 07 2016 11:28:49 GMT: INFO (info): (hist.c::137) histogram dump: writes_master (0 total) msec
Jan 07 2016 11:28:49 GMT: INFO (info): (hist.c::137) histogram dump: proxy (0 total) msec
Jan 07 2016 11:28:49 GMT: INFO (info): (hist.c::137) histogram dump: udf (0 total) msec
Jan 07 2016 11:28:49 GMT: INFO (info): (hist.c::137) histogram dump: query (0 total) msec
Jan 07 2016 11:28:49 GMT: INFO (info): (hist.c::137) histogram dump: query_rec_count (0 total) count
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5385) node id bb90e00f00a0142
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5389) reads 0,0 : writes 0,0
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5393) udf reads 0,0 : udf writes 0,0 : udf deletes 0,0 : lua errors 0
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5396) basic scans 0,0 : aggregation scans 0,0 : udf background scans 0,0 :: active scans 0
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5400) index (new) batches 0,0 : direct (old) batches 0,0
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5404) aggregation queries 0,0 : lookup queries 0,0
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5406) proxies 0,0
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5415) {test} objects 13593274 : sub-objects 0 : master objects 2625756 : master sub-objects 0 : prole objects 3126 : prole sub-objects 0
Jan 07 2016 11:28:54 GMT: WARNING (fabric): (fabric.c::2093) releasing fb: 0x7f7c05441008 with fne: 0x7f7c03c0e108 and fd: 68 (Failed)
Jan 07 2016 11:28:54 GMT: WARNING (fabric): (fabric.c::2093) releasing fb: 0x7f7c07e1b008 with fne: 0x7f7c03c0e108 and fd: 78 (Failed)
Jan 07 2016 11:28:54 GMT: WARNING (fabric): (fabric.c::2093) releasing fb: 0x7f7c07e9d008 with fne: 0x7f7c03c0e108 and fd: 80 (Failed)
Jan 07 2016 11:28:54 GMT: WARNING (fabric): (fabric.c::2093) releasing fb: 0x7f7c07dda008 with fne: 0x7f7c03c0e108 and fd: 76 (Failed)
Jan 07 2016 11:28:54 GMT: WARNING (fabric): (fabric.c::2093) releasing fb: 0x7f7c07d99008 with fne: 0x7f7c03c0e108 and fd: 75 (Failed)
Jan 07 2016 11:28:54 GMT: WARNING (fabric): (fabric.c::2093) releasing fb: 0x7f7c07ede008 with fne: 0x7f7c03c0e108 and fd: 81 (Failed)
Jan 07 2016 11:28:54 GMT: WARNING (fabric): (fabric.c::2093) releasing fb: 0x7f7c07e5c008 with fne: 0x7f7c03c0e108 and fd: 79 (Failed)
Jan 07 2016 11:28:54 GMT: INFO (drv_ssd): (drv_ssd.c::2088) device /data/aerospike.dat: used 89561376640, contig-free 220797M (220797 wblocks), swb-free 0, w-q 0 w-tot 0 (0.0/s), defrag-q 0 defrag-tot 2 (0.1/s) defrag-w-tot 0 (0.0/s)
Jan 07 2016 11:28:54 GMT: WARNING (rw): (thr_rw.c::307) write_request_destructor(): Close fd FOR BATCH.
Jan 07 2016 11:28:54 GMT: WARNING (rw): (thr_rw.c::307) write_request_destructor(): Close fd FOR BATCH.
Jan 07 2016 11:28:54 GMT: WARNING (rw): (thr_rw.c::307) write_request_destructor(): Close fd FOR BATCH.
Jan 07 2016 11:28:54 GMT: WARNING (rw): (thr_rw.c::307) write_request_destructor(): Close fd FOR BATCH.
Jan 07 2016 11:28:54 GMT: CRITICAL (demarshal): (thr_demarshal.c:thr_demarshal_resume:124) unable to resume socket FD -1 on epoll instance FD 115: 9 (Bad file descriptor)
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::94) SIGABRT received, aborting Aerospike Community Edition build 3.7.1 os ubuntu12.04
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: found 13 frames
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: frame 0: /usr/bin/asd(as_sig_handle_abort+0x5d) [0x48a07a]
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: frame 1: /lib/x86_64-linux-gnu/libc.so.6(+0x352f0) [0x7f7c3c97e2f0]
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: frame 2: /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x37) [0x7f7c3c97e267]
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: frame 3: /lib/x86_64-linux-gnu/libc.so.6(abort+0x16a) [0x7f7c3c97feca]
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: frame 4: /usr/bin/asd(cf_fault_event+0x2a3) [0x516b1a]
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: frame 5: /usr/bin/asd(thr_demarshal_resume+0x8b) [0x49f473]
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: frame 6: /usr/bin/asd(as_end_of_transaction_ok+0x9) [0x4d58f4]
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: frame 7: /usr/bin/asd(write_request_destructor+0x132) [0x4c1c8e]
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: frame 8: /usr/bin/asd(cf_rchash_free+0x26) [0x541028]
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: frame 9: /usr/bin/asd(cf_rchash_reduce+0xb5) [0x541fe9]
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: frame 10: /usr/bin/asd(rw_retransmit_fn+0x44) [0x4c0eca]
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: frame 11: /lib/x86_64-linux-gnu/libpthread.so.0(+0x76aa) [0x7f7c3dbe16aa]
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: frame 12: /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f7c3ca4feed]
Jan 07 2016 12:13:37 GMT: INFO (as): (as.c::410) <><><><><><><><><><> Aerospike Community Edition build 3.7.1 <><><><><><><><><><>
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) # Aerospike database configuration file.
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) service {
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) user root
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) group root
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) paxos-single-replica-limit 1 # Number of nodes where the replica count is automatically reduced to 1.
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) pidfile /var/run/aerospike/asd.pid
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) service-threads 32
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) transaction-queues 32
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) transaction-threads-per-queue 32
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) batch-index-threads 32
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) proto-fd-max 15000
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) batch-max-requests 200000
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) }
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) logging {
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) # Log file must be an absolute path.
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) file /var/log/aerospike/aerospike.log {
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) context any info
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) }
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) }
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) network {
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) service {
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) #address any
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) port 3000
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) }
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) heartbeat {
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) mode mesh
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) mesh-seed-address-port 10.240.0.6 3002
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) mesh-seed-address-port 10.240.0.5 3002
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) port 3002
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) interval 150
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) timeout 10
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) }
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) fabric {
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) port 3001
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) }
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) info {
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) port 3003
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) }
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) }
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) namespace test {
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) replication-factor 10
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) memory-size 3500M
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) default-ttl 0 # 30 days, use 0 to never expire/evict.
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) ldt-enabled true
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) storage-engine device {
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) file /data/aerospike.dat
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) write-block-size 1M
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) filesize 300G
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) }
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) }
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3265) system file descriptor limit: 100000, proto-fd-max: 15000
Jan 07 2016 12:13:37 GMT: INFO (cf:misc): (id.c::119) Node ip: 10.240.0.14
Jan 07 2016 12:13:37 GMT: INFO (cf:misc): (id.c::327) Heartbeat address for mesh: 10.240.0.14
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3309) Rack Aware mode not enabled
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3312) Node id bb90e00f00a0142
Jan 07 2016 12:13:37 GMT: INFO (namespace): (namespace_cold.c::101) ns test beginning COLD start
Jan 07 2016 12:13:37 GMT: INFO (drv_ssd): (drv_ssd.c::3797) opened file /data/aerospike.dat: usable size 322122547200
Jan 07 2016 12:13:37 GMT: INFO (drv_ssd): (drv_ssd.c::1107) /data/aerospike.dat has 307200 wblocks of size 1048576
Jan 07 2016 12:13:37 GMT: INFO (drv_ssd): (drv_ssd.c::3176) device /data/aerospike.dat: reading device to load index
Jan 07 2016 12:13:37 GMT: INFO (drv_ssd): (drv_ssd.c::3181) In TID 13102: Using arena #150 for loading data for namespace "test"
Jan 07 2016 12:13:39 GMT: INFO (drv_ssd): (drv_ssd.c::3977) {test} loaded 134133 records, 0 subrecords, /data/aerospike.dat 0%
Jan 07 2016 12:13:41 GMT: INFO (drv_ssd): (drv_ssd.c::3977) {test} loaded 258771 records, 0 subrecords, /data/aerospike.dat 0%
Jan 07 2016 12:13:43 GMT: INFO (drv_ssd): (drv_ssd.c::3977) {test} loaded 388121 records, 0 subrecords, /data/aerospike.dat 0%
Jan 07 2016 12:13:45 GMT: INFO (drv_ssd): (drv_ssd.c::3977) {test} loaded 512116 records, 0 subrecords, /data/aerospike.dat 1%
Jan 07 2016 12:13:47 GMT: INFO (drv_ssd): (drv_ssd.c::3977) {test} loaded 641566 records, 0 subrecords, /data/aerospike.dat 1%
This was fixed in version 3.7.1 and above of aerospike server.
More details on the issue and Jira:
[AER-4487], [AER-4690] - (Clustering/Migration) Race condition causing incorrect heartbeat fd saved and later not removable.
Please also see:
https://discuss.aerospike.com/t/aerospike-crash/2327

Aerospike DB always starts in COLD mode

It's stated here that Aerospike should try to start in warm mode, meaning reuse same memory region holding keys. Instead, every time the database is restarted all keys are loaded back from the SSD drive, which can take tens of minutes if not hours. What I see in the log is the following:
Oct 12 2015 03:24:11 GMT: INFO (config): (cfg.c::3234) Node id bb9e10daab0c902
Oct 12 2015 03:24:11 GMT: INFO (namespace): (namespace_cold.c::101) ns organic **beginning COLD start**
Oct 12 2015 03:24:11 GMT: INFO (drv_ssd): (drv_ssd.c::3607) opened device /dev/xvdb: usable size 322122547200, io-min-size 512
Oct 12 2015 03:24:11 GMT: INFO (drv_ssd): (drv_ssd.c::3681) shadow device /dev/xvdc is compatible with main device
Oct 12 2015 03:24:11 GMT: INFO (drv_ssd): (drv_ssd.c::1107) /dev/xvdb has 307200 wblocks of size 1048576
Oct 12 2015 03:24:11 GMT: INFO (drv_ssd): (drv_ssd.c::3141) device /dev/xvdb: reading device to load index
Oct 12 2015 03:24:11 GMT: INFO (drv_ssd): (drv_ssd.c::3146) In TID 104520: Using arena #150 for loading data for namespace "organic"
Oct 12 2015 03:24:13 GMT: INFO (drv_ssd): (drv_ssd.c::3942) {organic} loaded 962647 records, 0 subrecords, /dev/xvdb 0%
What could be the reason that Aerospike fails to perform fast restart?
Thanks!
You are using community edition of the software. Warm start is not supported in it. It is available only in the enterprise edition.