How to setup an aerospike cluster with a single node? - aerospike

I currently have a working cluster with two nodes. Following is the content of /etc/aerospike/aerospike.conf -
network {
service {
address any
port 3000
}
heartbeat {
mode mesh
port 3002 # Heartbeat port for this node.
# List one or more other nodes, one ip-address & port per line:
mesh-seed-address-port <existing server's ip> 3002
mesh-seed-address-port <other server's ip> 3002
interval 250
timeout 10
}
fabric {
port 3001
}
info {
port 3003
}
}
I tried by changing the heartbeat setting by removing the address port of the other node -
heartbeat {
mode mesh
port 3002 # Heartbeat port for this node.
# List one or more other nodes, one ip-address & port per line:
mesh-seed-address-port <existing server's ip> 3002
interval 250
timeout 10
}
Then I restarted the aerospike and the amc services -
service aerospike restart
service amc restart
However, still the /var/log/aerospike/aerospike.log file shows two nodes present -
Mar 07 2017 13:16:28 GMT: INFO (info): (ticker.c:249) system-memory: free-kbytes 125756260 free-pct 99 heap-kbytes (2343074,2344032,2417664) heap-efficiency-pct 96.9
Mar 07 2017 13:16:28 GMT: INFO (info): (ticker.c:263) in-progress: tsvc-q 0 info-q 0 nsup-delete-q 0 rw-hash 0 proxy-hash 0 tree-gc-q 0
Mar 07 2017 13:16:28 GMT: INFO (info): (ticker.c:285) fds: proto (20,23,3) heartbeat (1,1,0) fabric (19,19,0)
Mar 07 2017 13:16:28 GMT: INFO (info): (ticker.c:294) heartbeat-received: self 0 foreign 1488
Mar 07 2017 13:16:28 GMT: INFO (info): (ticker.c:348) {FC} objects: all 0 master 0 prole 0
Mar 07 2017 13:16:28 GMT: INFO (info): (ticker.c:409) {FC} migrations: complete
Mar 07 2017 13:16:28 GMT: INFO (info): (ticker.c:428) {FC} memory-usage: total-bytes 0 index-bytes 0 sindex-bytes 0 data-bytes 0 used-pct 0.00
Mar 07 2017 13:16:28 GMT: INFO (info): (ticker.c:348) {TARGETPARAMS} objects: all 0 master 0 prole 0
Mar 07 2017 13:16:28 GMT: INFO (info): (ticker.c:409) {TARGETPARAMS} migrations: complete
Mar 07 2017 13:16:28 GMT: INFO (info): (ticker.c:428) {TARGETPARAMS} memory-usage: total-bytes 0 index-bytes 0 sindex-bytes 0 data-bytes 0 used-pct 0.00
Mar 07 2017 13:16:38 GMT: INFO (info): (ticker.c:169) NODE-ID bb93c00b70b0022 CLUSTER-SIZE 2
Mar 07 2017 13:16:38 GMT: INFO (info): (ticker.c:249) system-memory: free-kbytes 125756196 free-pct 99 heap-kbytes (2343073,2344032,2417664) heap-efficiency-pct 96.9
So does the AMC console.

This should help: http://www.aerospike.com/docs/operations/manage/cluster_mng/removing_node
Once the node is removed properly, you can restart it with the different heartbeat config so that it doesn't join the other node.
For version, simply do asd --version. You can also use asinfo -v build.
The version is also printed within asadm / AMC and in the logs right at startup.

Related

Aerospike Cluster Nodes intermittently going down and coming back up

I have an Aerospike cluster of 15 nodes. This cluster performs fairly well under a normal load 10k TPS. I did some tests today, on a higher TPS. I raised the TPS to around 130k-150k TPS.
I observed that some nodes intermittently went down, and automatically came back up after a few seconds. Due to these nodes going down, we are getting heartbeat timeouts, and hence, read timeouts.
One cluster node configuration: 8 cores. 120GB RAM. I am storing data in memory.
All nodes have sufficient space remaining. Out of a total cluster space of 1.2TB (15*120), only 275 GB of space is used up.
Also, the network in not at all flaky. All these machines are in a data centre, and are high bandwidth machines.
Some observations made by monitoring AMC:
Saw some nodes (around 5-6) become inactive for a few seconds
There were a high number of client connections on few of these nodes that went down. For example: there were 6000-7000 client connections on all other nodes. One of the node had an unusual 25000 client connections.
Some error logs in cluster nodes:
Sep 15 2020 17:00:43 GMT: WARNING (hb): (hb.c:4864) (repeated:5) could not create heartbeat connection to node {10.33.162.134:2057}
Sep 15 2020 17:00:43 GMT: WARNING (socket): (socket.c:808) (repeated:5) Error while connecting socket to 10.33.162.134:2057
Sep 15 2020 17:00:53 GMT: WARNING (socket): (socket.c:740) (repeated:3) Timeout while connecting
Sep 15 2020 17:00:53 GMT: WARNING (hb): (hb.c:4864) (repeated:3) could not create heartbeat connection to node {10.33.162.134:2057}
Sep 15 2020 17:00:53 GMT: WARNING (socket): (socket.c:808) (repeated:3) Error while connecting socket to 10.33.162.134:2057
Sep 15 2020 17:01:03 GMT: WARNING (socket): (socket.c:740) (repeated:1) Timeout while connecting
Sep 15 2020 17:01:03 GMT: WARNING (hb): (hb.c:4864) (repeated:1) could not create heartbeat connection to node {10.33.162.134:2057}
Sep 15 2020 17:01:03 GMT: WARNING (socket): (socket.c:808) (repeated:1) Error while connecting socket to 10.33.162.134:2057
Sep 15 2020 17:01:13 GMT: WARNING (socket): (socket.c:740) (repeated:2) Timeout while connecting
Sep 15 2020 17:01:13 GMT: WARNING (hb): (hb.c:4864) (repeated:2) could not create heartbeat connection to node {10.33.162.134:2057}
Sep 15 2020 17:01:13 GMT: WARNING (socket): (socket.c:808) (repeated:2) Error while connecting socket to 10.33.162.134:2057
Sep 15 2020 17:02:44 GMT: WARNING (socket): (socket.c:740) Timeout while connecting
Sep 15 2020 17:02:44 GMT: WARNING (socket): (socket.c:808) Error while connecting socket to 10.33.54.144:2057
Sep 15 2020 17:02:44 GMT: WARNING (hb): (hb.c:4864) could not create heartbeat connection to node {10.33.54.144:2057}
Sep 15 2020 17:02:53 GMT: WARNING (socket): (socket.c:740) (repeated:1) Timeout while connecting
We also saw some of these error logs in nodes that were going down:
Sep 15 2020 17:08:58 GMT: WARNING (hb): (hb.c:5122) sending mesh message to bb9280f220a0102 on fd 4155 failed : Broken pipe
Sep 15 2020 17:08:58 GMT: WARNING (hb): (hb.c:5122) sending mesh message to bb9b676220a0102 on fd 4149 failed : Broken pipe
Sep 15 2020 17:08:58 GMT: WARNING (hb): (hb.c:5122) sending mesh message to bb9fbd6200a0102 on fd 42 failed : Broken pipe
Sep 15 2020 17:08:58 GMT: WARNING (hb): (hb.c:5122) sending mesh message to bb96d3d220a0102 on fd 4444 failed : Broken pipe
Sep 15 2020 17:08:58 GMT: WARNING (hb): (hb.c:5122) sending mesh message to bb99036210a0102 on fd 4278 failed : Broken pipe
Sep 15 2020 17:08:58 GMT: WARNING (hb): (hb.c:5122) sending mesh message to bb9f102220a0102 on fd 4143 failed : Broken pipe
Sep 15 2020 17:08:58 GMT: WARNING (hb): (hb.c:5122) sending mesh message to bb91822210a0102 on fd 4515 failed : Broken pipe
Sep 15 2020 17:08:58 GMT: WARNING (hb): (hb.c:5122) sending mesh message to bb9e5ff200a0102 on fd 4173 failed : Broken pipe
Sep 15 2020 17:08:58 GMT: WARNING (hb): (hb.c:5122) sending mesh message to bb93f65200a0102 on fd 38 failed : Broken pipe
Sep 15 2020 17:08:58 GMT: WARNING (hb): (hb.c:5122) sending mesh message to bb9132f220a0102 on fd 4414 failed : Connection reset by peer
Sep 15 2020 17:08:58 GMT: WARNING (hb): (hb.c:5122) sending mesh message to bb939be210a0102 on fd 4567 failed : Connection reset by peer
Sep 15 2020 17:08:58 GMT: WARNING (hb): (hb.c:5122) sending mesh message to bb9b19a220a0102 on fd 4165 failed : Broken pipe
Attaching the aerospike.conf file here:
service {
user root
group root
service-threads 12
transaction-queues 12
transaction-threads-per-queue 4
proto-fd-max 50000
migrate-threads 1
pidfile /var/run/aerospike/asd.pid
}
logging {
file /var/log/aerospike/aerospike.log {
context any info
context migrate debug
}
}
network {
service {
address any
port 3000
}
heartbeat {
mode mesh
port 2057
mesh-seed-address-port 10.34.154.177 2057
mesh-seed-address-port 10.34.15.40 2057
mesh-seed-address-port 10.32.255.229 2057
mesh-seed-address-port 10.33.54.144 2057
mesh-seed-address-port 10.32.190.157 2057
mesh-seed-address-port 10.32.101.63 2057
mesh-seed-address-port 10.34.2.241 2057
mesh-seed-address-port 10.32.214.251 2057
mesh-seed-address-port 10.34.30.114 2057
mesh-seed-address-port 10.33.162.134 2057
mesh-seed-address-port 10.33.190.57 2057
mesh-seed-address-port 10.34.61.109 2057
mesh-seed-address-port 10.34.47.19 2057
mesh-seed-address-port 10.33.34.24 2057
mesh-seed-address-port 10.34.118.182 2057
interval 150
timeout 20
}
fabric {
port 3001
}
info {
port 3003
}
}
namespace PS1 {
replication-factor 2
memory-size 70G
single-bin false
data-in-index false
storage-engine memory
stop-writes-pct 90
high-water-memory-pct 75
}
namespace LS1 {
replication-factor 2
memory-size 30G
single-bin false
data-in-index false
storage-engine memory
stop-writes-pct 90
high-water-memory-pct 75
}
Any possible explanations for this?
Seems like the nodes are having network connectivity issues at such higher throughput. This can have different root causes, from simple network related bottleneck (bandwidth, packets per second), to something on the node itself getting in the way of interfacing properly with the network (soft interrupts surge, improper distribution of network queues, CPU thrashing). This would prevent heartbeat connections/messages from going through, resulting in nodes leaving the cluster until it recovers. If running on a cloud/virtualized environment, some hosts may have noisier neighbors than others, etc...
The increase in number of connections is a symptom as any slow down on a node would cause the client to compensate by increasing the throughput (which will increase the number of connections, which can also lead to a downward spiraling effect).
Finally, a single node leaving or joining the cluster shouldn't impact read transactions much. Check your policy and make sure you have the socketTimeout / totalTimeout / maxRetries, etc... set correctly so that a read can quickly retry against a different replica.
This article can help on this latest point: https://discuss.aerospike.com/t/understanding-timeout-and-retry-policies/2852/3

(gcloud.beta.compute.ssh) [/usr/bin/ssh] exited with return code [255]

Try to using ssh connect google cloud computer engine (macOs Catalina)
gcloud beta compute ssh --zone "us-west1-b" "mac-vm" --project "mac-vm-282201"
and get error
ssh: connect to host 34.105.11.187 port 22: Operation timed out
ERROR: (gcloud.beta.compute.ssh) [/usr/bin/ssh] exited with return code [255].
and I try
ssh -I ~/.ssh/mac-vm-key asd61404#34.105.11.187
also get error
ssh: connect to host 34.105.11.187 port 22: Operation timed out
so I found this code to diagnose it
gcloud compute ssh —zone "us-west1-b" "mac-vm" —project "mac-vm-282201" —ssh-flag="-vvv"
return
OpenSSH_7.9p1, LibreSSL 2.7.3
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 48: Applying options for *
debug2: resolve_canonicalize: hostname 34.105.11.187 is address
debug2: ssh_connect_direct
debug1: Connecting to 34.105.11.187 [34.105.11.187] port 22.
I don't know, how can I fix this issue.
Thanks in advance!
here is my recent Serial console
Jul 4 02:28:39 mac-vm google_network_daemon[684]: For info, please visit https://www.isc.org/software/dhcp/
Jul 4 02:28:39 mac-vm dhclient[684]:
Jul 4 02:28:39 mac-vm dhclient[684]: Listening on Socket/ens4
[ 19.458355] google_network_daemon[684]: Listening on Socket/ens4
Jul 4 02:28:39 mac-vm google_network_daemon[684]: Listening on Socket/ens4
Jul 4 02:28:39 mac-vm dhclient[684]: Sending on Socket/ens4
[ 19.458697] google_network_daemon[684]: Sending on Socket/ens4
Jul 4 02:28:39 mac-vm google_network_daemon[684]: Sending on Socket/ens4
Jul 4 02:28:39 mac-vm systemd[1]: Finished Wait until snapd is fully seeded.
Jul 4 02:28:39 mac-vm systemd[1]: Starting Apply the settings specified in cloud-config...
Jul 4 02:28:39 mac-vm systemd[1]: Condition check resulted in Auto import assertions from block devices being skipped.
Jul 4 02:28:39 mac-vm systemd[1]: Reached target Multi-User System.
Jul 4 02:28:39 mac-vm systemd[1]: Reached target Graphical Interface.
Jul 4 02:28:39 mac-vm systemd[1]: Starting Update UTMP about System Runlevel Changes...
Jul 4 02:28:39 mac-vm systemd[1]: systemd-update-utmp-runlevel.service: Succeeded.
Jul 4 02:28:39 mac-vm systemd[1]: Finished Update UTMP about System Runlevel Changes.
[ 20.216129] cloud-init[718]: Cloud-init v. 20.1-10-g71af48df-0ubuntu5 running 'modules:config' at Sat, 04 Jul 2020 02:28:39 +0000. Up 20.11 seconds.
Jul 4 02:28:39 mac-vm cloud-init[718]: Cloud-init v. 20.1-10-g71af48df-0ubuntu5 running 'modules:config' at Sat, 04 Jul 2020 02:28:39 +0000. Up 20.11 seconds.
Jul 4 02:28:39 mac-vm systemd[1]: Finished Apply the settings specified in cloud-config.
Jul 4 02:28:39 mac-vm systemd[1]: Starting Execute cloud user/final scripts...
Jul 4 02:28:41 mac-vm google-clock-skew: INFO Synced system time with hardware clock.
[ 20.886105] cloud-init[725]: Cloud-init v. 20.1-10-g71af48df-0ubuntu5 running 'modules:final' at Sat, 04 Jul 2020 02:28:41 +0000. Up 20.76 seconds.
[ 20.886430] cloud-init[725]: Cloud-init v. 20.1-10-g71af48df-0ubuntu5 finished at Sat, 04 Jul 2020 02:28:41 +0000. Datasource DataSourceGCE. Up 20.87 seconds
Jul 4 02:28:41 mac-vm cloud-init[725]: Cloud-init v. 20.1-10-g71af48df-0ubuntu5 running 'modules:final' at Sat, 04 Jul 2020 02:28:41 +0000. Up 20.76 seconds.
Jul 4 02:28:41 mac-vm cloud-init[725]: Cloud-init v. 20.1-10-g71af48df-0ubuntu5 finished at Sat, 04 Jul 2020 02:28:41 +0000. Datasource DataSourceGCE. Up 20.87 seconds
Jul 4 02:28:41 mac-vm systemd[1]: Finished Execute cloud user/final scripts.
Jul 4 02:28:41 mac-vm systemd[1]: Reached target Cloud-init target.
Jul 4 02:28:41 mac-vm systemd[1]: Starting Google Compute Engine Startup Scripts...
Jul 4 02:28:41 mac-vm startup-script: INFO Starting startup scripts.
Jul 4 02:28:41 mac-vm startup-script: INFO Found startup-script in metadata.
Jul 4 02:28:42 mac-vm startup-script: INFO startup-script: sudo: ufw: command not found
Jul 4 02:28:42 mac-vm startup-script: INFO startup-script: Return code 1.
Jul 4 02:28:42 mac-vm startup-script: INFO Finished running startup scripts.
Jul 4 02:28:42 mac-vm systemd[1]: google-startup-scripts.service: Succeeded.
Jul 4 02:28:42 mac-vm systemd[1]: Finished Google Compute Engine Startup Scripts.
Jul 4 02:28:42 mac-vm systemd[1]: Startup finished in 1.396s (kernel) + 20.065s (userspace) = 21.461s.
Jul 4 02:29:06 mac-vm systemd[1]: systemd-hostnamed.service: Succeeded.
Jul 4 02:43:32 mac-vm systemd[1]: Starting Cleanup of Temporary Directories...
Jul 4 02:43:32 mac-vm systemd[1]: systemd-tmpfiles-clean.service: Succeeded.
Jul 4 02:43:32 mac-vm systemd[1]: Finished Cleanup of Temporary Directories.

Cannot restart redis-sentinel unit

I'm trying to configure 3 Redis instances and 6 sentinels (3 of them running on the Redises and the rest are on the different hosts). But when I install redis-sentinel package and put my configuration under /etc/redis/sentinel.conf and restart the service using systemctl restart redis-sentinel I get this error:
Job for redis-sentinel.service failed because a timeout was exceeded.
See "systemctl status redis-sentinel.service" and "journalctl -xe" for details.
Here is the output of journalctl -u redis-sentinel:
Jan 01 08:07:07 redis1 systemd[1]: Starting Advanced key-value store...
Jan 01 08:07:07 redis1 redis-sentinel[16269]: 16269:X 01 Jan 2020 08:07:07.263 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
Jan 01 08:07:07 redis1 redis-sentinel[16269]: 16269:X 01 Jan 2020 08:07:07.263 # Redis version=5.0.7, bits=64, commit=00000000, modified=0, pid=16269, just started
Jan 01 08:07:07 redis1 redis-sentinel[16269]: 16269:X 01 Jan 2020 08:07:07.263 # Configuration loaded
Jan 01 08:07:07 redis1 systemd[1]: redis-sentinel.service: Can't open PID file /var/run/sentinel/redis-sentinel.pid (yet?) after start: No such file or directory
Jan 01 08:08:37 redis1 systemd[1]: redis-sentinel.service: Start operation timed out. Terminating.
Jan 01 08:08:37 redis1 systemd[1]: redis-sentinel.service: Failed with result 'timeout'.
Jan 01 08:08:37 redis1 systemd[1]: Failed to start Advanced key-value store.
Jan 01 08:08:37 redis1 systemd[1]: redis-sentinel.service: Service hold-off time over, scheduling restart.
Jan 01 08:08:37 redis1 systemd[1]: redis-sentinel.service: Scheduled restart job, restart counter is at 5.
Jan 01 08:08:37 redis1 systemd[1]: Stopped Advanced key-value store.
Jan 01 08:08:37 redis1 systemd[1]: Starting Advanced key-value store...
Jan 01 08:08:37 redis1 redis-sentinel[16307]: 16307:X 01 Jan 2020 08:08:37.738 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
Jan 01 08:08:37 redis1 redis-sentinel[16307]: 16307:X 01 Jan 2020 08:08:37.739 # Redis version=5.0.7, bits=64, commit=00000000, modified=0, pid=16307, just started
Jan 01 08:08:37 redis1 redis-sentinel[16307]: 16307:X 01 Jan 2020 08:08:37.739 # Configuration loaded
Jan 01 08:08:37 redis1 systemd[1]: redis-sentinel.service: Can't open PID file /var/run/sentinel/redis-sentinel.pid (yet?) after start: No such file or directory
and my sentinel.conf file:
port 26379
daemonize yes
sentinel myid 851994c7364e2138e03ee1cd346fbdc4f1404e4c
sentinel deny-scripts-reconfig yes
sentinel monitor mymaster 172.28.128.11 6379 2
sentinel down-after-milliseconds mymaster 5000
# Generated by CONFIG REWRITE
dir "/"
protected-mode no
sentinel failover-timeout mymaster 60000
sentinel config-epoch mymaster 0
sentinel leader-epoch mymaster 0
sentinel current-epoch 0
If you are trying to run your Redis servers on Debian based distribution, add below to your Redis configurations:
pidfile /var/run/redis/redis-sentinel.pid to /etc/redis/sentinel.conf
pidfile /var/run/redis/redis-server.pid to /etc/redis/redis.conf
What's the output in the sentinel log file?
I had a similar issue where Sentinel received a lot of sigterms.
In that case you need to make sure that if you use the daemonize yes setting, the systemd unit file must be using Type=forking.
Also make sure that the location of the PID file specified in the sentinel config matches the location specified in the systemd unit file.
If you face below error in journalctl or systemctl logs,
Jun 26 10:13:02 x systemd[1]: redis-server.service: Failed with result 'exit-code'.
Jun 26 10:13:02 x systemd[1]: redis-server.service: Scheduled restart job, restart counter is at 5.
Jun 26 10:13:02 x systemd[1]: Stopped Advanced key-value store.
Jun 26 10:13:02 x systemd[1]: redis-server.service: Start request repeated too quickly.
Jun 26 10:13:02 x systemd[1]: redis-server.service: Failed with result 'exit-code'.
Jun 26 10:13:02 x systemd[1]: Failed to start Advanced key-value store.
Then check /var/log/redis/redis-server.log for more information
In most cases issue is mentioned there.
i.e if a dump.rdb file is placed in /var/lib/redis then the issue might be with database count or redis version.
or in another scenario disabled IPV6 might be the issue.

Aerospike sudden crash

I am running a 5 nodes cluster with version 3.7.0.2 and after some hours of usage all 5 instances are crashed. I have seen some other reports of crash in this version. Should I download the 3.7.1 version? Will it fix the crash?
Linux aerospike2 4.2.0-18-generic #22-Ubuntu SMP Fri Nov 6 18:25:50
UTC 2015 x86_64 x86_64 x86_64 GNU/Linux (Ubuntu 15.10)
config:
# Aerospike database configuration file.
service {
user root
group root
paxos-single-replica-limit 1 # Number of nodes where the replica count is automatically reduced to 1.
pidfile /var/run/aerospike/asd.pid
service-threads 32
transaction-queues 32
transaction-threads-per-queue 32
batch-index-threads 32
proto-fd-max 15000
batch-max-requests 200000
}
logging {
# Log file must be an absolute path.
file /var/log/aerospike/aerospike.log {
context any info
}
}
network {
service {
address 10.240.0.6
port 3000
}
heartbeat {
mode mesh
address 10.240.0.6 # IP of the NIC on which this node is listening
mesh-seed-address-port 10.240.0.6 3002
mesh-seed-address-port 10.240.0.5 3002
port 3002
interval 150
timeout 10
}
fabric {
port 3001
}
info {
port 3003
}
}
namespace test {
replication-factor 10
memory-size 3500M
default-ttl 0 # 30 days, use 0 to never expire/evict.
ldt-enabled true
storage-engine device {
file /data/aerospike.dat
write-block-size 1M
filesize 300G
# data-in-memory true
}
}
LOGS:
Jan 07 2016 11:28:34 GMT: INFO (drv_ssd): (drv_ssd.c::3202) device /data/aerospike.dat: read complete: UNIQUE 13593274 (REPLACED 0) (GEN 63) (EXPIRED 0) (MAX-TTL 0) records
Jan 07 2016 11:28:34 GMT: INFO (drv_ssd): (drv_ssd.c::1072) ns test loading free & defrag queues
Jan 07 2016 11:28:34 GMT: INFO (drv_ssd): (drv_ssd.c::1006) /data/aerospike.dat init defrag profile: 0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
Jan 07 2016 11:28:34 GMT: INFO (drv_ssd): (drv_ssd.c::1096) /data/aerospike.dat init wblock free-q 220796, defrag-q 2
Jan 07 2016 11:28:34 GMT: INFO (drv_ssd): (drv_ssd.c::2373) ns test starting device maintenance threads
Jan 07 2016 11:28:34 GMT: INFO (drv_ssd): (drv_ssd.c::1488) ns test starting write worker threads
Jan 07 2016 11:28:34 GMT: INFO (drv_ssd): (drv_ssd.c::923) ns test starting defrag threads
Jan 07 2016 11:28:34 GMT: INFO (as): (as.c::457) initializing services...
Jan 07 2016 11:28:34 GMT: INFO (tsvc): (thr_tsvc.c::819) shared queues: 32 queues with 32 threads each
Jan 07 2016 11:28:34 GMT: INFO (hb): (hb.c::2649) Sending 10.240.0.14 as the IP address for receiving heartbeats
Jan 07 2016 11:28:34 GMT: INFO (hb): (hb.c::2661) heartbeat socket initialization
Jan 07 2016 11:28:34 GMT: INFO (hb): (hb.c::2675) initializing mesh heartbeat socket : 10.240.0.14:3002
Jan 07 2016 11:28:34 GMT: INFO (paxos): (paxos.c::3454) partitions from storage: total 4096 found 4096 lost(set) 0 lost(unset) 0
Jan 07 2016 11:28:34 GMT: INFO (partition): (partition.c::3432) {test} 4096 partitions: found 0 absent, 4096 stored
Jan 07 2016 11:28:34 GMT: INFO (paxos): (paxos.c::3458) Paxos service ignited: bb90e00f00a0142
Jan 07 2016 11:28:34 GMT: INFO (batch): (batch.c::609) Initialize batch-index-threads to 32
Jan 07 2016 11:28:34 GMT: INFO (batch): (batch.c::635) Created JEMalloc arena #151 for batch normal buffers
Jan 07 2016 11:28:34 GMT: INFO (batch): (batch.c::636) Created JEMalloc arena #152 for batch huge buffers
Jan 07 2016 11:28:34 GMT: INFO (batch): (thr_batch.c::347) Initialize batch-threads to 4
Jan 07 2016 11:28:34 GMT: INFO (drv_ssd): (drv_ssd.c::4147) {test} floor set at 1049 wblocks per device
Jan 07 2016 11:28:37 GMT: INFO (paxos): (paxos.c::3539) listening for other nodes (max 3000 milliseconds) ...
Jan 07 2016 11:28:37 GMT: INFO (hb): (hb.c::2143) connecting to remote heartbeat service at 10.240.0.6:3002
Jan 07 2016 11:28:37 GMT: INFO (hb): (hb.c::2143) connecting to remote heartbeat service at 10.240.0.5:3002
Jan 07 2016 11:28:37 GMT: INFO (hb): (hb.c::1085) initiated connection to mesh seed host at 10.240.0.6:3002 (10.240.0.6:3002) via socket 60 from 10.240.0.14:55702
Jan 07 2016 11:28:37 GMT: INFO (hb): (hb.c::1085) initiated connection to mesh seed host at 10.240.0.5:3002 (10.240.0.5:3002) via socket 61 from 10.240.0.14:40626
Jan 07 2016 11:28:37 GMT: INFO (hb): (hb.c::1085) initiated connection to mesh non-seed host at 10.240.0.23:3002 (10.240.0.23:3002) via socket 62 from 10.240.0.14:42802
Jan 07 2016 11:28:37 GMT: INFO (hb): (hb.c::1085) initiated connection to mesh non-seed host at 10.240.0.13:3002 (10.240.0.13:3002) via socket 63 from 10.240.0.14:35384
Jan 07 2016 11:28:37 GMT: INFO (hb): (hb.c::2571) new heartbeat received: bb90500f00a0142 principal node is bb91700f00a0142
Jan 07 2016 11:28:37 GMT: INFO (hb): (hb.c::2571) new heartbeat received: bb90600f00a0142 principal node is bb91700f00a0142
Jan 07 2016 11:28:37 GMT: INFO (fabric): (fabric.c::1811) fabric: node bb90500f00a0142 arrived
Jan 07 2016 11:28:37 GMT: INFO (fabric): (fabric.c::1811) fabric: node bb90600f00a0142 arrived
Jan 07 2016 11:28:37 GMT: INFO (paxos): (paxos.c::3547) ... other node(s) detected - node will operate in a multi-node cluster
Jan 07 2016 11:28:37 GMT: INFO (paxos): (paxos.c::2250) Skip node arrival bb90500f00a0142 cluster principal bb90e00f00a0142 pulse principal bb91700f00a0142
Jan 07 2016 11:28:37 GMT: INFO (paxos): (paxos.c::2250) Skip node arrival bb90600f00a0142 cluster principal bb90e00f00a0142 pulse principal bb91700f00a0142
Jan 07 2016 11:28:37 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #8 for thr_demarshal()
Jan 07 2016 11:28:37 GMT: INFO (ldt): (thr_nsup.c::1139) LDT supervisor started
Jan 07 2016 11:28:37 GMT: INFO (nsup): (thr_nsup.c::1176) namespace supervisor started
Jan 07 2016 11:28:37 GMT: INFO (paxos): (paxos.c::3516) paxos supervisor thread started
Jan 07 2016 11:28:37 GMT: INFO (demarshal): (thr_demarshal.c::308) Service started: socket 0.0.0.0:3000
Jan 07 2016 11:28:37 GMT: INFO (hb): (hb.c::2571) new heartbeat received: bb90d00f00a0142 principal node is bb91700f00a0142
Jan 07 2016 11:28:37 GMT: INFO (hb): (hb.c::2571) new heartbeat received: bb91700f00a0142 principal node is bb91700f00a0142
Jan 07 2016 11:28:37 GMT: INFO (fabric): (fabric.c::1811) fabric: node bb90d00f00a0142 arrived
Jan 07 2016 11:28:37 GMT: INFO (fabric): (fabric.c::1811) fabric: node bb91700f00a0142 arrived
Jan 07 2016 11:28:37 GMT: INFO (paxos): (paxos.c::2250) Skip node arrival bb90d00f00a0142 cluster principal bb90e00f00a0142 pulse principal bb91700f00a0142
Jan 07 2016 11:28:37 GMT: INFO (paxos): (paxos.c::2250) Skip node arrival bb91700f00a0142 cluster principal bb90e00f00a0142 pulse principal bb91700f00a0142
Jan 07 2016 11:28:38 GMT: INFO (partition): (partition.c::383) DISALLOW MIGRATIONS
Jan 07 2016 11:28:38 GMT: INFO (paxos): (paxos.c::3198) SUCCESSION [6]#bb91700f00a0142*: bb91700f00a0142 bb90e00f00a0142 bb90d00f00a0142 bb90600f00a0142 bb90500f00a0142
Jan 07 2016 11:28:38 GMT: INFO (paxos): (paxos.c::3209) node bb91700f00a0142 is now principal pro tempore
Jan 07 2016 11:28:38 GMT: INFO (paxos): (paxos.c::2331) Sent partition sync request to node bb91700f00a0142
Jan 07 2016 11:28:38 GMT: INFO (partition): (partition.c::383) DISALLOW MIGRATIONS
Jan 07 2016 11:28:38 GMT: INFO (paxos): (paxos.c::3198) SUCCESSION [6]#bb91700f00a0142*: bb91700f00a0142 bb90e00f00a0142 bb90d00f00a0142 bb90600f00a0142 bb90500f00a0142
Jan 07 2016 11:28:38 GMT: INFO (paxos): (paxos.c::3209) node bb91700f00a0142 is still principal pro tempore
Jan 07 2016 11:28:38 GMT: INFO (paxos): (paxos.c::2331) Sent partition sync request to node bb91700f00a0142
Jan 07 2016 11:28:38 GMT: INFO (paxos): (paxos.c::3293) received partition sync message from bb91700f00a0142
Jan 07 2016 11:28:38 GMT: INFO (partition): (partition.c::2490) CLUSTER SIZE = 5
Jan 07 2016 11:28:38 GMT: INFO (partition): (partition.c::2533) Global state is well formed
Jan 07 2016 11:28:38 GMT: INFO (paxos): (partition.c::2262) setting replication factors: cluster size 5, paxos single replica limit 1
Jan 07 2016 11:28:38 GMT: INFO (paxos): (partition.c::2278) {test} replication factor is 5
Jan 07 2016 11:28:38 GMT: INFO (config): (cluster_config.c::421) rack aware is disabled
Jan 07 2016 11:28:38 GMT: INFO (partition): (cluster_config.c::380) rack aware is disabled
Jan 07 2016 11:28:38 GMT: INFO (partition): (partition.c::3337) {test} re-balanced, expected migrations - (5789 tx, 6010 rx)
Jan 07 2016 11:28:38 GMT: INFO (paxos): (partition.c::3355) global partition state: total 4096 lost 0 unique 0 duplicate 4096
Jan 07 2016 11:28:38 GMT: INFO (paxos): (partition.c::3356) partition state after fixing lost partitions (master): total 4096 lost 0 unique 0 duplicate 4096
Jan 07 2016 11:28:38 GMT: INFO (paxos): (partition.c::3357) 0 new partition version tree paths generated
Jan 07 2016 11:28:38 GMT: INFO (partition): (partition.c::375) ALLOW MIGRATIONS
Jan 07 2016 11:28:38 GMT: INFO (paxos): (paxos.c::3293) received partition sync message from bb91700f00a0142
Jan 07 2016 11:28:38 GMT: INFO (paxos): (paxos.c::803) Node allows migrations. Ignoring duplicate partition sync message.
Jan 07 2016 11:28:38 GMT: WARNING (paxos): (paxos.c::3301) unable to apply partition sync message state
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #18 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #19 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #20 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #21 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #22 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #23 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #24 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #25 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #26 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #27 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #28 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #30 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #29 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #31 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #32 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #33 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #34 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #35 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #36 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #37 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #38 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #39 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #40 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #41 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #42 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #43 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #44 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #45 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #46 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #47 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #48 for thr_demarshal()
Jan 07 2016 11:28:39 GMT: INFO (demarshal): (thr_demarshal.c::860) Waiting to spawn demarshal threads ...
Jan 07 2016 11:28:39 GMT: INFO (demarshal): (thr_demarshal.c::863) Started 32 Demarshal Threads
Jan 07 2016 11:28:39 GMT: INFO (as): (as.c::494) service ready: soon there will be cake!
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5084) system memory: free 6590544kb ( 86 percent free )
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5090) ClusterSize 5 ::: objects 13593274 ::: sub_objects 0
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5099) rec refs 13596175 ::: rec locks 1 ::: trees 0 ::: wr reqs 0 ::: mig tx 2633 ::: mig rx 30
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5104) replica errs :: null 0 non-null 0 ::: sync copy errs :: master 0
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5114) trans_in_progress: wr 0 prox 0 wait 0 ::: q 0 ::: iq 0 ::: dq 0 : fds - proto (22, 35, 13) : hb (4, 4, 0) : fab (72, 72, 0)
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5116) heartbeat_received: self 0 : foreign 322
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5117) heartbeat_stats: bt 0 bf 0 nt 0 ni 0 nn 0 nnir 0 nal 0 sf1 0 sf2 0 sf3 0 sf4 0 sf5 0 sf6 0 mrf 0 eh 0 efd 0 efa 0 um 0 mcf 0 rc 0
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5129) tree_counts: nsup 0 scan 0 dup 0 wprocess 0 migrx 30 migtx 2633 ssdr 0 ssdw 0 rw 0
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5158) {test} disk bytes used 89561376640 : avail pct 71 : cache-read pct 0.00
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5160) {test} memory bytes used 869969536 (index 869969536 : sindex 0) : used pct 23.70
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5171) {test} ldt_gc: cnt 0 io 0 gc 0 (0, 0, 0)
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5194) {test} migrations - remaining (5777 tx, 5982 rx), active (1 tx, 2 rx), 0.34% complete
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5203) partitions: actual 792 sync 3304 desync 0 zombie 0 absent 0
Jan 07 2016 11:28:49 GMT: INFO (info): (hist.c::137) histogram dump: reads (0 total) msec
Jan 07 2016 11:28:49 GMT: INFO (info): (hist.c::137) histogram dump: writes_master (0 total) msec
Jan 07 2016 11:28:49 GMT: INFO (info): (hist.c::137) histogram dump: proxy (0 total) msec
Jan 07 2016 11:28:49 GMT: INFO (info): (hist.c::137) histogram dump: udf (0 total) msec
Jan 07 2016 11:28:49 GMT: INFO (info): (hist.c::137) histogram dump: query (0 total) msec
Jan 07 2016 11:28:49 GMT: INFO (info): (hist.c::137) histogram dump: query_rec_count (0 total) count
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5385) node id bb90e00f00a0142
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5389) reads 0,0 : writes 0,0
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5393) udf reads 0,0 : udf writes 0,0 : udf deletes 0,0 : lua errors 0
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5396) basic scans 0,0 : aggregation scans 0,0 : udf background scans 0,0 :: active scans 0
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5400) index (new) batches 0,0 : direct (old) batches 0,0
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5404) aggregation queries 0,0 : lookup queries 0,0
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5406) proxies 0,0
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5415) {test} objects 13593274 : sub-objects 0 : master objects 2625756 : master sub-objects 0 : prole objects 3126 : prole sub-objects 0
Jan 07 2016 11:28:54 GMT: WARNING (fabric): (fabric.c::2093) releasing fb: 0x7f7c05441008 with fne: 0x7f7c03c0e108 and fd: 68 (Failed)
Jan 07 2016 11:28:54 GMT: WARNING (fabric): (fabric.c::2093) releasing fb: 0x7f7c07e1b008 with fne: 0x7f7c03c0e108 and fd: 78 (Failed)
Jan 07 2016 11:28:54 GMT: WARNING (fabric): (fabric.c::2093) releasing fb: 0x7f7c07e9d008 with fne: 0x7f7c03c0e108 and fd: 80 (Failed)
Jan 07 2016 11:28:54 GMT: WARNING (fabric): (fabric.c::2093) releasing fb: 0x7f7c07dda008 with fne: 0x7f7c03c0e108 and fd: 76 (Failed)
Jan 07 2016 11:28:54 GMT: WARNING (fabric): (fabric.c::2093) releasing fb: 0x7f7c07d99008 with fne: 0x7f7c03c0e108 and fd: 75 (Failed)
Jan 07 2016 11:28:54 GMT: WARNING (fabric): (fabric.c::2093) releasing fb: 0x7f7c07ede008 with fne: 0x7f7c03c0e108 and fd: 81 (Failed)
Jan 07 2016 11:28:54 GMT: WARNING (fabric): (fabric.c::2093) releasing fb: 0x7f7c07e5c008 with fne: 0x7f7c03c0e108 and fd: 79 (Failed)
Jan 07 2016 11:28:54 GMT: INFO (drv_ssd): (drv_ssd.c::2088) device /data/aerospike.dat: used 89561376640, contig-free 220797M (220797 wblocks), swb-free 0, w-q 0 w-tot 0 (0.0/s), defrag-q 0 defrag-tot 2 (0.1/s) defrag-w-tot 0 (0.0/s)
Jan 07 2016 11:28:54 GMT: WARNING (rw): (thr_rw.c::307) write_request_destructor(): Close fd FOR BATCH.
Jan 07 2016 11:28:54 GMT: WARNING (rw): (thr_rw.c::307) write_request_destructor(): Close fd FOR BATCH.
Jan 07 2016 11:28:54 GMT: WARNING (rw): (thr_rw.c::307) write_request_destructor(): Close fd FOR BATCH.
Jan 07 2016 11:28:54 GMT: WARNING (rw): (thr_rw.c::307) write_request_destructor(): Close fd FOR BATCH.
Jan 07 2016 11:28:54 GMT: CRITICAL (demarshal): (thr_demarshal.c:thr_demarshal_resume:124) unable to resume socket FD -1 on epoll instance FD 115: 9 (Bad file descriptor)
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::94) SIGABRT received, aborting Aerospike Community Edition build 3.7.1 os ubuntu12.04
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: found 13 frames
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: frame 0: /usr/bin/asd(as_sig_handle_abort+0x5d) [0x48a07a]
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: frame 1: /lib/x86_64-linux-gnu/libc.so.6(+0x352f0) [0x7f7c3c97e2f0]
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: frame 2: /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x37) [0x7f7c3c97e267]
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: frame 3: /lib/x86_64-linux-gnu/libc.so.6(abort+0x16a) [0x7f7c3c97feca]
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: frame 4: /usr/bin/asd(cf_fault_event+0x2a3) [0x516b1a]
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: frame 5: /usr/bin/asd(thr_demarshal_resume+0x8b) [0x49f473]
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: frame 6: /usr/bin/asd(as_end_of_transaction_ok+0x9) [0x4d58f4]
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: frame 7: /usr/bin/asd(write_request_destructor+0x132) [0x4c1c8e]
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: frame 8: /usr/bin/asd(cf_rchash_free+0x26) [0x541028]
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: frame 9: /usr/bin/asd(cf_rchash_reduce+0xb5) [0x541fe9]
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: frame 10: /usr/bin/asd(rw_retransmit_fn+0x44) [0x4c0eca]
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: frame 11: /lib/x86_64-linux-gnu/libpthread.so.0(+0x76aa) [0x7f7c3dbe16aa]
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: frame 12: /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f7c3ca4feed]
Jan 07 2016 12:13:37 GMT: INFO (as): (as.c::410) <><><><><><><><><><> Aerospike Community Edition build 3.7.1 <><><><><><><><><><>
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) # Aerospike database configuration file.
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) service {
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) user root
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) group root
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) paxos-single-replica-limit 1 # Number of nodes where the replica count is automatically reduced to 1.
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) pidfile /var/run/aerospike/asd.pid
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) service-threads 32
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) transaction-queues 32
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) transaction-threads-per-queue 32
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) batch-index-threads 32
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) proto-fd-max 15000
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) batch-max-requests 200000
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) }
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) logging {
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) # Log file must be an absolute path.
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) file /var/log/aerospike/aerospike.log {
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) context any info
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) }
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) }
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) network {
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) service {
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) #address any
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) port 3000
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) }
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) heartbeat {
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) mode mesh
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) mesh-seed-address-port 10.240.0.6 3002
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) mesh-seed-address-port 10.240.0.5 3002
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) port 3002
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) interval 150
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) timeout 10
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) }
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) fabric {
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) port 3001
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) }
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) info {
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) port 3003
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) }
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) }
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) namespace test {
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) replication-factor 10
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) memory-size 3500M
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) default-ttl 0 # 30 days, use 0 to never expire/evict.
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) ldt-enabled true
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) storage-engine device {
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) file /data/aerospike.dat
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) write-block-size 1M
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) filesize 300G
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) }
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) }
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3265) system file descriptor limit: 100000, proto-fd-max: 15000
Jan 07 2016 12:13:37 GMT: INFO (cf:misc): (id.c::119) Node ip: 10.240.0.14
Jan 07 2016 12:13:37 GMT: INFO (cf:misc): (id.c::327) Heartbeat address for mesh: 10.240.0.14
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3309) Rack Aware mode not enabled
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3312) Node id bb90e00f00a0142
Jan 07 2016 12:13:37 GMT: INFO (namespace): (namespace_cold.c::101) ns test beginning COLD start
Jan 07 2016 12:13:37 GMT: INFO (drv_ssd): (drv_ssd.c::3797) opened file /data/aerospike.dat: usable size 322122547200
Jan 07 2016 12:13:37 GMT: INFO (drv_ssd): (drv_ssd.c::1107) /data/aerospike.dat has 307200 wblocks of size 1048576
Jan 07 2016 12:13:37 GMT: INFO (drv_ssd): (drv_ssd.c::3176) device /data/aerospike.dat: reading device to load index
Jan 07 2016 12:13:37 GMT: INFO (drv_ssd): (drv_ssd.c::3181) In TID 13102: Using arena #150 for loading data for namespace "test"
Jan 07 2016 12:13:39 GMT: INFO (drv_ssd): (drv_ssd.c::3977) {test} loaded 134133 records, 0 subrecords, /data/aerospike.dat 0%
Jan 07 2016 12:13:41 GMT: INFO (drv_ssd): (drv_ssd.c::3977) {test} loaded 258771 records, 0 subrecords, /data/aerospike.dat 0%
Jan 07 2016 12:13:43 GMT: INFO (drv_ssd): (drv_ssd.c::3977) {test} loaded 388121 records, 0 subrecords, /data/aerospike.dat 0%
Jan 07 2016 12:13:45 GMT: INFO (drv_ssd): (drv_ssd.c::3977) {test} loaded 512116 records, 0 subrecords, /data/aerospike.dat 1%
Jan 07 2016 12:13:47 GMT: INFO (drv_ssd): (drv_ssd.c::3977) {test} loaded 641566 records, 0 subrecords, /data/aerospike.dat 1%
This was fixed in version 3.7.1 and above of aerospike server.
More details on the issue and Jira:
[AER-4487], [AER-4690] - (Clustering/Migration) Race condition causing incorrect heartbeat fd saved and later not removable.
Please also see:
https://discuss.aerospike.com/t/aerospike-crash/2327

Aerospike DB always starts in COLD mode

It's stated here that Aerospike should try to start in warm mode, meaning reuse same memory region holding keys. Instead, every time the database is restarted all keys are loaded back from the SSD drive, which can take tens of minutes if not hours. What I see in the log is the following:
Oct 12 2015 03:24:11 GMT: INFO (config): (cfg.c::3234) Node id bb9e10daab0c902
Oct 12 2015 03:24:11 GMT: INFO (namespace): (namespace_cold.c::101) ns organic **beginning COLD start**
Oct 12 2015 03:24:11 GMT: INFO (drv_ssd): (drv_ssd.c::3607) opened device /dev/xvdb: usable size 322122547200, io-min-size 512
Oct 12 2015 03:24:11 GMT: INFO (drv_ssd): (drv_ssd.c::3681) shadow device /dev/xvdc is compatible with main device
Oct 12 2015 03:24:11 GMT: INFO (drv_ssd): (drv_ssd.c::1107) /dev/xvdb has 307200 wblocks of size 1048576
Oct 12 2015 03:24:11 GMT: INFO (drv_ssd): (drv_ssd.c::3141) device /dev/xvdb: reading device to load index
Oct 12 2015 03:24:11 GMT: INFO (drv_ssd): (drv_ssd.c::3146) In TID 104520: Using arena #150 for loading data for namespace "organic"
Oct 12 2015 03:24:13 GMT: INFO (drv_ssd): (drv_ssd.c::3942) {organic} loaded 962647 records, 0 subrecords, /dev/xvdb 0%
What could be the reason that Aerospike fails to perform fast restart?
Thanks!
You are using community edition of the software. Warm start is not supported in it. It is available only in the enterprise edition.