data loss detection in aerospike - aerospike

If we have 6 number of partition with replication factor 2 and with paxos-single-replica-limit 3 (once we are down to 3 nodes, replication factor becomes 1). And all of a sudden 3 nodes die because of cascading effect. It might so happen that few partition were not able to migrate. But as par this doc the cluster will continue as if nothing happened. In the case of strongly consistency mode the partition may go as dead partition and we have to manually revive it.
How can i know when there has been a data loss, so that i can backup from previous snapshot.
If it matters we are on community edition.

In the case of strongly consistent mode(requires enterprise license), there will not be any data loss. And if majority of cluster literally dies, the dead partition will need to be manually revived.
In the absence of strongly consistent mode (default) one can grep for "rebalanced: expected-migrations" in aerospike logs of all live aerospike nodes. The result would look somewhat like below
Jun 27 2022 19:11:22 GMT: INFO (partition): (partition_balance.c:928) {test} rebalanced: expected-migrations (0,0,0) fresh-partitions 0
Jun 27 2022 19:18:13 GMT: INFO (partition): (partition_balance.c:928) {test2} rebalanced: expected-migrations (2325,1718,1978) fresh-partitions 0
Jun 27 2022 19:18:13 GMT: INFO (partition): (partition_balance.c:928) {test} rebalanced: expected-migrations (2325,1718,1978) fresh-partitions 0
Jun 27 2022 19:35:29 GMT: INFO (partition): (partition_balance.c:928) {test2} rebalanced: expected-migrations (514,50,50) fresh-partitions 0
Jun 27 2022 19:35:29 GMT: INFO (partition): (partition_balance.c:928) {test} rebalanced: expected-migrations (0,0,0) fresh-partitions 0
Jun 27 2022 19:58:18 GMT: INFO (partition): (partition_balance.c:928) {test2} rebalanced: expected-migrations (1941,1711,1293) fresh-partitions 0
Jun 27 2022 19:58:18 GMT: INFO (partition): (partition_balance.c:928) {test} rebalanced: expected-migrations (1941,1711,1293) fresh-partitions 0
Jun 27 2022 20:12:54 GMT: INFO (partition): (partition_balance.c:928) {test2} rebalanced: expected-migrations (1369,1089,1393) fresh-partitions 170
Jun 27 2022 20:12:54 GMT: INFO (partition): (partition_balance.c:928) {test} rebalanced: expected-migrations (833,307,1245) fresh-partitions 0
Jun 27 2022 20:19:07 GMT: INFO (partition): (partition_balance.c:928) {test2} rebalanced: expected-migrations (1467,1172,1576) fresh-partitions 190
Jun 27 2022 20:19:07 GMT: INFO (partition): (partition_balance.c:928) {test} rebalanced: expected-migrations (385,418,770) fresh-partitions 0
Jun 27 2022 20:19:59 GMT: INFO (partition): (partition_balance.c:928) {test2} rebalanced: expected-migrations (1830,1477,1926) fresh-partitions 128
Jun 27 2022 20:19:59 GMT: INFO (partition): (partition_balance.c:928) {test} rebalanced: expected-migrations (581,614,1162) fresh-partitions 0
look for fresh-partitions here. If it is more than 1 that means one partition is not available and aerospike created a fresh partition for you. If the other node has died that means there is a data loss. If other nodes come back again (because they did not die but got network partitioned) the older data will not be lost but conflict resolution will take place between older partition and freshly created partition(the default strategy of conflict resolution is generation number which means the key that got modified more often will be present after conflict resolution).
source

Related

Why does redis forcibly demote master to slave?

Run the container using docker redis:latest, and after about 30 minutes, the master changes to a slave and it is no longer writable.
Also, the slave outputs an error once per second that it cannot find the master.
1:M 08 Jul 2022 03:10:55.899 * DB saved on disk
1:M 08 Jul 2022 03:15:56.087 * 100 changes in 300 seconds. Saving...
1:M 08 Jul 2022 03:15:56.089 * Background saving started by pid 61
61:C 08 Jul 2022 03:15:56.091 * DB saved on disk
61:C 08 Jul 2022 03:15:56.092 * Fork CoW for RDB: current 0 MB, peak 0 MB, average 0 MB
1:M 08 Jul 2022 03:15:56.189 * Background saving terminated with success
1:S 08 Jul 2022 03:20:12.258 * Before turning into a replica, using my own master parameters to synthesize a cached master: I may be able to synchronize with the new master with just a partial transfer.
1:S 08 Jul 2022 03:20:12.258 * Connecting to MASTER 178.20.40.200:8886
1:S 08 Jul 2022 03:20:12.258 * MASTER <-> REPLICA sync started
1:S 08 Jul 2022 03:20:12.259 * REPLICAOF 178.20.40.200:8886 enabled (user request from 'id=39 addr=95.182.123.66:36904 laddr=172.31.9.234:6379 fd=11 name= age=1 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=47 qbuf-free=20427 argv-mem=24 multi-mem=0 rbs=1024 rbp=0 obl=0 oll=0 omem=0 tot-mem=22320 events=r cmd=slaveof user=default redir=-1 resp=2')
1:S 08 Jul 2022 03:20:12.524 * Non blocking connect for SYNC fired the event.
1:S 08 Jul 2022 03:20:12.791 * Master replied to PING, replication can continue...
1:S 08 Jul 2022 03:20:13.335 * Trying a partial resynchronization (request 6743ff015583c86f3ac7a4305026c42991a1ca18:1).
1:S 08 Jul 2022 03:20:13.603 * Full resync from master: ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ:1
1:S 08 Jul 2022 03:20:13.603 * MASTER <-> REPLICA sync: receiving 54976 bytes from master to disk
1:S 08 Jul 2022 03:20:14.138 * Discarding previously cached master state.
1:S 08 Jul 2022 03:20:14.138 * MASTER <-> REPLICA sync: Flushing old data
1:S 08 Jul 2022 03:20:14.139 * MASTER <-> REPLICA sync: Loading DB in memory
1:S 08 Jul 2022 03:20:14.140 # Wrong signature trying to load DB from file
1:S 08 Jul 2022 03:20:14.140 # Failed trying to load the MASTER synchronization DB from disk: Invalid argument
1:S 08 Jul 2022 03:20:14.140 * Reconnecting to MASTER 178.20.40.200:8886 after failure
1:S 08 Jul 2022 03:20:14.140 * MASTER <-> REPLICA sync started
...
1:S 08 Jul 2022 05:09:50.010 * MASTER <-> REPLICA sync started
1:S 08 Jul 2022 05:09:50.298 * Non blocking connect for SYNC fired the event.
1:S 08 Jul 2022 05:09:50.587 # Failed to read response from the server: Connection reset by peer
1:S 08 Jul 2022 05:09:50.587 # Master did not respond to command during SYNC handshake
1:S 08 Jul 2022 05:09:51.013 * Connecting to MASTER 178.20.40.200:8886
1:S 08 Jul 2022 05:09:51.014 * MASTER <-> REPLICA sync started
1:S 08 Jul 2022 05:09:51.294 * Non blocking connect for SYNC fired the event.
1:S 08 Jul 2022 05:09:51.581 # Failed to read response from the server: Connection reset by peer
1:S 08 Jul 2022 05:09:51.581 # Master did not respond to command during SYNC handshake
1:S 08 Jul 2022 05:09:52.017 * Connecting to MASTER 178.20.40.200:8886
1:S 08 Jul 2022 05:09:52.017 * MASTER <-> REPLICA sync started
1:S 08 Jul 2022 05:09:52.297 * Non blocking connect for SYNC fired the event.
1:S 08 Jul 2022 05:09:52.578 # Failed to read response from the server: Connection reset by peer
1:S 08 Jul 2022 05:09:52.578 # Master did not respond to command during SYNC handshake
1:S 08 Jul 2022 05:09:53.021 * Connecting to MASTER 178.20.40.200:8886
1:S 08 Jul 2022 05:09:53.021 * MASTER <-> REPLICA sync started
1:S 08 Jul 2022 05:09:53.308 * Non blocking connect for SYNC fired the event.
1:S 08 Jul 2022 05:09:53.594 # Failed to read response from the server: Connection reset by peer
1:S 08 Jul 2022 05:09:53.594 # Master did not respond to command during SYNC handshake
1:S 08 Jul 2022 05:09:54.025 * Connecting to MASTER 178.20.40.200:8886
1:S 08 Jul 2022 05:09:54.025 * MASTER <-> REPLICA sync started
1:S 08 Jul 2022 05:09:54.316 * Non blocking connect for SYNC fired the event.
1:S 08 Jul 2022 05:09:54.608 # Failed to read response from the server: Connection reset by peer
1:S 08 Jul 2022 05:09:54.608 # Master did not respond to command during SYNC handshake
1:S 08 Jul 2022 05:09:55.028 * Connecting to MASTER 178.20.40.200:8886
1:S 08 Jul 2022 05:09:55.028 * MASTER <-> REPLICA sync started
1:S 08 Jul 2022 05:09:55.309 * Non blocking connect for SYNC fired the event.
1:S 08 Jul 2022 05:09:55.588 # Failed to read response from the server: Connection reset by peer
1:S 08 Jul 2022 05:09:55.588 # Master did not respond to command during SYNC handshake
1:S 08 Jul 2022 05:09:56.031 * Connecting to MASTER 178.20.40.200:8886
1:S 08 Jul 2022 05:09:56.031 * MASTER <-> REPLICA sync started
1:S 08 Jul 2022 05:09:56.311 * Non blocking connect for SYNC fired the event.
1:S 08 Jul 2022 05:09:56.592 # Failed to read response from the server: Connection reset by peer
1:S 08 Jul 2022 05:09:56.592 # Master did not respond to command during SYNC handshake
1:S 08 Jul 2022 05:09:57.035 * Connecting to MASTER 178.20.40.200:8886
1:S 08 Jul 2022 05:09:57.035 * MASTER <-> REPLICA sync started
1:S 08 Jul 2022 05:09:57.321 * Non blocking connect for SYNC fired the event.
1:S 08 Jul 2022 05:09:57.610 * Master replied to PING, replication can continue...
...
SLAVEOF NO ONE
config set slave-read-only no
If I force the slave to be writable with the above command and try to write, all data will be flushed after about 5 seconds.
I don't want to turn master into slave.
I am getting this error on clean ec2 amazon linux.
I don't know what's causing this error because I also have enough memory.
Why does redis forcibly demote master to slave?

How to setup an aerospike cluster with a single node?

I currently have a working cluster with two nodes. Following is the content of /etc/aerospike/aerospike.conf -
network {
service {
address any
port 3000
}
heartbeat {
mode mesh
port 3002 # Heartbeat port for this node.
# List one or more other nodes, one ip-address & port per line:
mesh-seed-address-port <existing server's ip> 3002
mesh-seed-address-port <other server's ip> 3002
interval 250
timeout 10
}
fabric {
port 3001
}
info {
port 3003
}
}
I tried by changing the heartbeat setting by removing the address port of the other node -
heartbeat {
mode mesh
port 3002 # Heartbeat port for this node.
# List one or more other nodes, one ip-address & port per line:
mesh-seed-address-port <existing server's ip> 3002
interval 250
timeout 10
}
Then I restarted the aerospike and the amc services -
service aerospike restart
service amc restart
However, still the /var/log/aerospike/aerospike.log file shows two nodes present -
Mar 07 2017 13:16:28 GMT: INFO (info): (ticker.c:249) system-memory: free-kbytes 125756260 free-pct 99 heap-kbytes (2343074,2344032,2417664) heap-efficiency-pct 96.9
Mar 07 2017 13:16:28 GMT: INFO (info): (ticker.c:263) in-progress: tsvc-q 0 info-q 0 nsup-delete-q 0 rw-hash 0 proxy-hash 0 tree-gc-q 0
Mar 07 2017 13:16:28 GMT: INFO (info): (ticker.c:285) fds: proto (20,23,3) heartbeat (1,1,0) fabric (19,19,0)
Mar 07 2017 13:16:28 GMT: INFO (info): (ticker.c:294) heartbeat-received: self 0 foreign 1488
Mar 07 2017 13:16:28 GMT: INFO (info): (ticker.c:348) {FC} objects: all 0 master 0 prole 0
Mar 07 2017 13:16:28 GMT: INFO (info): (ticker.c:409) {FC} migrations: complete
Mar 07 2017 13:16:28 GMT: INFO (info): (ticker.c:428) {FC} memory-usage: total-bytes 0 index-bytes 0 sindex-bytes 0 data-bytes 0 used-pct 0.00
Mar 07 2017 13:16:28 GMT: INFO (info): (ticker.c:348) {TARGETPARAMS} objects: all 0 master 0 prole 0
Mar 07 2017 13:16:28 GMT: INFO (info): (ticker.c:409) {TARGETPARAMS} migrations: complete
Mar 07 2017 13:16:28 GMT: INFO (info): (ticker.c:428) {TARGETPARAMS} memory-usage: total-bytes 0 index-bytes 0 sindex-bytes 0 data-bytes 0 used-pct 0.00
Mar 07 2017 13:16:38 GMT: INFO (info): (ticker.c:169) NODE-ID bb93c00b70b0022 CLUSTER-SIZE 2
Mar 07 2017 13:16:38 GMT: INFO (info): (ticker.c:249) system-memory: free-kbytes 125756196 free-pct 99 heap-kbytes (2343073,2344032,2417664) heap-efficiency-pct 96.9
So does the AMC console.
This should help: http://www.aerospike.com/docs/operations/manage/cluster_mng/removing_node
Once the node is removed properly, you can restart it with the different heartbeat config so that it doesn't join the other node.
For version, simply do asd --version. You can also use asinfo -v build.
The version is also printed within asadm / AMC and in the logs right at startup.

Httpd(apache) is not starting on fedora 23

I so much tried to install nginx , but it didn't work. After that i tried install and configure apache(httpd) server on my Fedora23 dist. But my server doesn't want to be start namely it doesn't work and it returns some errors , when i try to start it: sudo systemctl start httpd Job for httpd.service failed because the control process exited with error code. See "systemctl status httpd.service" and "journalctl -xe" for details.
And after that i saw logs in journalctl -xe. it shows this:
Jun 28 11:26:49 cyber audit[15981]: CRED_REFR pid=15981 uid=0 auid=1000 ses=1 subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 msg='op=PAM:s
Jun 28 11:26:49 cyber sudo[15981]: pam_systemd(sudo:session): Cannot create session: Already running in a session
Jun 28 11:26:49 cyber kernel: audit: type=1105 audit(1467095209.363:842): pid=15981 uid=0 auid=1000 ses=1 subj=unconfined_u:unconfined_r:unconfined_t:
Jun 28 11:26:49 cyber sudo[15981]: pam_unix(sudo:session): session opened for user root by (uid=0)
Jun 28 11:26:49 cyber audit[15981]: USER_START pid=15981 uid=0 auid=1000 ses=1 subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 msg='op=PAM:
Jun 28 11:26:49 cyber polkitd[900]: Registered Authentication Agent for unix-process:15982:464115 (system bus name :1.256 [/usr/bin/pkttyagent --notif
Jun 28 11:26:49 cyber systemd[1]: Starting The Apache HTTP Server...
-- Subject: Unit httpd.service has begun start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit httpd.service has begun starting up.
Jun 28 11:26:49 cyber audit[15988]: AVC avc: denied { append } for pid=15988 comm="httpd" name="error.log" dev="dm-0" ino=1704951 scontext=system_u
Jun 28 11:26:49 cyber kernel: audit: type=1400 audit(1467095209.491:843): avc: denied { append } for pid=15988 comm="httpd" name="error.log" dev="d
Jun 28 11:26:49 cyber systemd[1]: httpd.service: Main process exited, code=exited, status=1/FAILURE
Jun 28 11:26:49 cyber systemd[1]: Failed to start The Apache HTTP Server.
-- Subject: Unit httpd.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit httpd.service has failed.
--
-- The result is failed.
Jun 28 11:26:49 cyber systemd[1]: httpd.service: Unit entered failed state.
Jun 28 11:26:49 cyber audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=httpd comm="system
Jun 28 11:26:49 cyber kernel: audit: type=1130 audit(1467095209.525:844): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0
Jun 28 11:26:49 cyber kernel: audit: type=1106 audit(1467095209.533:845): pid=15981 uid=0 auid=1000 ses=1 subj=unconfined_u:unconfined_r:unconfined_t:
Jun 28 11:26:49 cyber kernel: audit: type=1104 audit(1467095209.533:846): pid=15981 uid=0 auid=1000 ses=1 subj=unconfined_u:unconfined_r:unconfined_t:
Jun 28 11:26:49 cyber systemd[1]: httpd.service: Failed with result 'exit-code'.
Jun 28 11:26:49 cyber sudo[15981]: pam_unix(sudo:session): session closed for user root
Jun 28 11:26:49 cyber audit[15981]: USER_END pid=15981 uid=0 auid=1000 ses=1 subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 msg='op=PAM:se
Jun 28 11:26:49 cyber audit[15981]: CRED_DISP pid=15981 uid=0 auid=1000 ses=1 subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 msg='op=PAM:s
Jun 28 11:26:50 cyber polkitd[900]: Unregistered Authentication Agent for unix-process:15982:464115 (system bus name :1.256, object path /org/freedesk
Jun 28 11:27:15 cyber google-chrome.desktop[3111]: [1:1:0628/112715:ERROR:PlatformKeyboardEvent.cpp(117)] Not implemented reached in static PlatformEv
I did all steps like this steps
Can anybody explain me what is error and how solve it ?
service httpd status

Aerospike sudden crash

I am running a 5 nodes cluster with version 3.7.0.2 and after some hours of usage all 5 instances are crashed. I have seen some other reports of crash in this version. Should I download the 3.7.1 version? Will it fix the crash?
Linux aerospike2 4.2.0-18-generic #22-Ubuntu SMP Fri Nov 6 18:25:50
UTC 2015 x86_64 x86_64 x86_64 GNU/Linux (Ubuntu 15.10)
config:
# Aerospike database configuration file.
service {
user root
group root
paxos-single-replica-limit 1 # Number of nodes where the replica count is automatically reduced to 1.
pidfile /var/run/aerospike/asd.pid
service-threads 32
transaction-queues 32
transaction-threads-per-queue 32
batch-index-threads 32
proto-fd-max 15000
batch-max-requests 200000
}
logging {
# Log file must be an absolute path.
file /var/log/aerospike/aerospike.log {
context any info
}
}
network {
service {
address 10.240.0.6
port 3000
}
heartbeat {
mode mesh
address 10.240.0.6 # IP of the NIC on which this node is listening
mesh-seed-address-port 10.240.0.6 3002
mesh-seed-address-port 10.240.0.5 3002
port 3002
interval 150
timeout 10
}
fabric {
port 3001
}
info {
port 3003
}
}
namespace test {
replication-factor 10
memory-size 3500M
default-ttl 0 # 30 days, use 0 to never expire/evict.
ldt-enabled true
storage-engine device {
file /data/aerospike.dat
write-block-size 1M
filesize 300G
# data-in-memory true
}
}
LOGS:
Jan 07 2016 11:28:34 GMT: INFO (drv_ssd): (drv_ssd.c::3202) device /data/aerospike.dat: read complete: UNIQUE 13593274 (REPLACED 0) (GEN 63) (EXPIRED 0) (MAX-TTL 0) records
Jan 07 2016 11:28:34 GMT: INFO (drv_ssd): (drv_ssd.c::1072) ns test loading free & defrag queues
Jan 07 2016 11:28:34 GMT: INFO (drv_ssd): (drv_ssd.c::1006) /data/aerospike.dat init defrag profile: 0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
Jan 07 2016 11:28:34 GMT: INFO (drv_ssd): (drv_ssd.c::1096) /data/aerospike.dat init wblock free-q 220796, defrag-q 2
Jan 07 2016 11:28:34 GMT: INFO (drv_ssd): (drv_ssd.c::2373) ns test starting device maintenance threads
Jan 07 2016 11:28:34 GMT: INFO (drv_ssd): (drv_ssd.c::1488) ns test starting write worker threads
Jan 07 2016 11:28:34 GMT: INFO (drv_ssd): (drv_ssd.c::923) ns test starting defrag threads
Jan 07 2016 11:28:34 GMT: INFO (as): (as.c::457) initializing services...
Jan 07 2016 11:28:34 GMT: INFO (tsvc): (thr_tsvc.c::819) shared queues: 32 queues with 32 threads each
Jan 07 2016 11:28:34 GMT: INFO (hb): (hb.c::2649) Sending 10.240.0.14 as the IP address for receiving heartbeats
Jan 07 2016 11:28:34 GMT: INFO (hb): (hb.c::2661) heartbeat socket initialization
Jan 07 2016 11:28:34 GMT: INFO (hb): (hb.c::2675) initializing mesh heartbeat socket : 10.240.0.14:3002
Jan 07 2016 11:28:34 GMT: INFO (paxos): (paxos.c::3454) partitions from storage: total 4096 found 4096 lost(set) 0 lost(unset) 0
Jan 07 2016 11:28:34 GMT: INFO (partition): (partition.c::3432) {test} 4096 partitions: found 0 absent, 4096 stored
Jan 07 2016 11:28:34 GMT: INFO (paxos): (paxos.c::3458) Paxos service ignited: bb90e00f00a0142
Jan 07 2016 11:28:34 GMT: INFO (batch): (batch.c::609) Initialize batch-index-threads to 32
Jan 07 2016 11:28:34 GMT: INFO (batch): (batch.c::635) Created JEMalloc arena #151 for batch normal buffers
Jan 07 2016 11:28:34 GMT: INFO (batch): (batch.c::636) Created JEMalloc arena #152 for batch huge buffers
Jan 07 2016 11:28:34 GMT: INFO (batch): (thr_batch.c::347) Initialize batch-threads to 4
Jan 07 2016 11:28:34 GMT: INFO (drv_ssd): (drv_ssd.c::4147) {test} floor set at 1049 wblocks per device
Jan 07 2016 11:28:37 GMT: INFO (paxos): (paxos.c::3539) listening for other nodes (max 3000 milliseconds) ...
Jan 07 2016 11:28:37 GMT: INFO (hb): (hb.c::2143) connecting to remote heartbeat service at 10.240.0.6:3002
Jan 07 2016 11:28:37 GMT: INFO (hb): (hb.c::2143) connecting to remote heartbeat service at 10.240.0.5:3002
Jan 07 2016 11:28:37 GMT: INFO (hb): (hb.c::1085) initiated connection to mesh seed host at 10.240.0.6:3002 (10.240.0.6:3002) via socket 60 from 10.240.0.14:55702
Jan 07 2016 11:28:37 GMT: INFO (hb): (hb.c::1085) initiated connection to mesh seed host at 10.240.0.5:3002 (10.240.0.5:3002) via socket 61 from 10.240.0.14:40626
Jan 07 2016 11:28:37 GMT: INFO (hb): (hb.c::1085) initiated connection to mesh non-seed host at 10.240.0.23:3002 (10.240.0.23:3002) via socket 62 from 10.240.0.14:42802
Jan 07 2016 11:28:37 GMT: INFO (hb): (hb.c::1085) initiated connection to mesh non-seed host at 10.240.0.13:3002 (10.240.0.13:3002) via socket 63 from 10.240.0.14:35384
Jan 07 2016 11:28:37 GMT: INFO (hb): (hb.c::2571) new heartbeat received: bb90500f00a0142 principal node is bb91700f00a0142
Jan 07 2016 11:28:37 GMT: INFO (hb): (hb.c::2571) new heartbeat received: bb90600f00a0142 principal node is bb91700f00a0142
Jan 07 2016 11:28:37 GMT: INFO (fabric): (fabric.c::1811) fabric: node bb90500f00a0142 arrived
Jan 07 2016 11:28:37 GMT: INFO (fabric): (fabric.c::1811) fabric: node bb90600f00a0142 arrived
Jan 07 2016 11:28:37 GMT: INFO (paxos): (paxos.c::3547) ... other node(s) detected - node will operate in a multi-node cluster
Jan 07 2016 11:28:37 GMT: INFO (paxos): (paxos.c::2250) Skip node arrival bb90500f00a0142 cluster principal bb90e00f00a0142 pulse principal bb91700f00a0142
Jan 07 2016 11:28:37 GMT: INFO (paxos): (paxos.c::2250) Skip node arrival bb90600f00a0142 cluster principal bb90e00f00a0142 pulse principal bb91700f00a0142
Jan 07 2016 11:28:37 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #8 for thr_demarshal()
Jan 07 2016 11:28:37 GMT: INFO (ldt): (thr_nsup.c::1139) LDT supervisor started
Jan 07 2016 11:28:37 GMT: INFO (nsup): (thr_nsup.c::1176) namespace supervisor started
Jan 07 2016 11:28:37 GMT: INFO (paxos): (paxos.c::3516) paxos supervisor thread started
Jan 07 2016 11:28:37 GMT: INFO (demarshal): (thr_demarshal.c::308) Service started: socket 0.0.0.0:3000
Jan 07 2016 11:28:37 GMT: INFO (hb): (hb.c::2571) new heartbeat received: bb90d00f00a0142 principal node is bb91700f00a0142
Jan 07 2016 11:28:37 GMT: INFO (hb): (hb.c::2571) new heartbeat received: bb91700f00a0142 principal node is bb91700f00a0142
Jan 07 2016 11:28:37 GMT: INFO (fabric): (fabric.c::1811) fabric: node bb90d00f00a0142 arrived
Jan 07 2016 11:28:37 GMT: INFO (fabric): (fabric.c::1811) fabric: node bb91700f00a0142 arrived
Jan 07 2016 11:28:37 GMT: INFO (paxos): (paxos.c::2250) Skip node arrival bb90d00f00a0142 cluster principal bb90e00f00a0142 pulse principal bb91700f00a0142
Jan 07 2016 11:28:37 GMT: INFO (paxos): (paxos.c::2250) Skip node arrival bb91700f00a0142 cluster principal bb90e00f00a0142 pulse principal bb91700f00a0142
Jan 07 2016 11:28:38 GMT: INFO (partition): (partition.c::383) DISALLOW MIGRATIONS
Jan 07 2016 11:28:38 GMT: INFO (paxos): (paxos.c::3198) SUCCESSION [6]#bb91700f00a0142*: bb91700f00a0142 bb90e00f00a0142 bb90d00f00a0142 bb90600f00a0142 bb90500f00a0142
Jan 07 2016 11:28:38 GMT: INFO (paxos): (paxos.c::3209) node bb91700f00a0142 is now principal pro tempore
Jan 07 2016 11:28:38 GMT: INFO (paxos): (paxos.c::2331) Sent partition sync request to node bb91700f00a0142
Jan 07 2016 11:28:38 GMT: INFO (partition): (partition.c::383) DISALLOW MIGRATIONS
Jan 07 2016 11:28:38 GMT: INFO (paxos): (paxos.c::3198) SUCCESSION [6]#bb91700f00a0142*: bb91700f00a0142 bb90e00f00a0142 bb90d00f00a0142 bb90600f00a0142 bb90500f00a0142
Jan 07 2016 11:28:38 GMT: INFO (paxos): (paxos.c::3209) node bb91700f00a0142 is still principal pro tempore
Jan 07 2016 11:28:38 GMT: INFO (paxos): (paxos.c::2331) Sent partition sync request to node bb91700f00a0142
Jan 07 2016 11:28:38 GMT: INFO (paxos): (paxos.c::3293) received partition sync message from bb91700f00a0142
Jan 07 2016 11:28:38 GMT: INFO (partition): (partition.c::2490) CLUSTER SIZE = 5
Jan 07 2016 11:28:38 GMT: INFO (partition): (partition.c::2533) Global state is well formed
Jan 07 2016 11:28:38 GMT: INFO (paxos): (partition.c::2262) setting replication factors: cluster size 5, paxos single replica limit 1
Jan 07 2016 11:28:38 GMT: INFO (paxos): (partition.c::2278) {test} replication factor is 5
Jan 07 2016 11:28:38 GMT: INFO (config): (cluster_config.c::421) rack aware is disabled
Jan 07 2016 11:28:38 GMT: INFO (partition): (cluster_config.c::380) rack aware is disabled
Jan 07 2016 11:28:38 GMT: INFO (partition): (partition.c::3337) {test} re-balanced, expected migrations - (5789 tx, 6010 rx)
Jan 07 2016 11:28:38 GMT: INFO (paxos): (partition.c::3355) global partition state: total 4096 lost 0 unique 0 duplicate 4096
Jan 07 2016 11:28:38 GMT: INFO (paxos): (partition.c::3356) partition state after fixing lost partitions (master): total 4096 lost 0 unique 0 duplicate 4096
Jan 07 2016 11:28:38 GMT: INFO (paxos): (partition.c::3357) 0 new partition version tree paths generated
Jan 07 2016 11:28:38 GMT: INFO (partition): (partition.c::375) ALLOW MIGRATIONS
Jan 07 2016 11:28:38 GMT: INFO (paxos): (paxos.c::3293) received partition sync message from bb91700f00a0142
Jan 07 2016 11:28:38 GMT: INFO (paxos): (paxos.c::803) Node allows migrations. Ignoring duplicate partition sync message.
Jan 07 2016 11:28:38 GMT: WARNING (paxos): (paxos.c::3301) unable to apply partition sync message state
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #18 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #19 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #20 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #21 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #22 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #23 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #24 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #25 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #26 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #27 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #28 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #30 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #29 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #31 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #32 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #33 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #34 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #35 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #36 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #37 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #38 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #39 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #40 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #41 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #42 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #43 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #44 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #45 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #46 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #47 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #48 for thr_demarshal()
Jan 07 2016 11:28:39 GMT: INFO (demarshal): (thr_demarshal.c::860) Waiting to spawn demarshal threads ...
Jan 07 2016 11:28:39 GMT: INFO (demarshal): (thr_demarshal.c::863) Started 32 Demarshal Threads
Jan 07 2016 11:28:39 GMT: INFO (as): (as.c::494) service ready: soon there will be cake!
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5084) system memory: free 6590544kb ( 86 percent free )
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5090) ClusterSize 5 ::: objects 13593274 ::: sub_objects 0
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5099) rec refs 13596175 ::: rec locks 1 ::: trees 0 ::: wr reqs 0 ::: mig tx 2633 ::: mig rx 30
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5104) replica errs :: null 0 non-null 0 ::: sync copy errs :: master 0
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5114) trans_in_progress: wr 0 prox 0 wait 0 ::: q 0 ::: iq 0 ::: dq 0 : fds - proto (22, 35, 13) : hb (4, 4, 0) : fab (72, 72, 0)
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5116) heartbeat_received: self 0 : foreign 322
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5117) heartbeat_stats: bt 0 bf 0 nt 0 ni 0 nn 0 nnir 0 nal 0 sf1 0 sf2 0 sf3 0 sf4 0 sf5 0 sf6 0 mrf 0 eh 0 efd 0 efa 0 um 0 mcf 0 rc 0
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5129) tree_counts: nsup 0 scan 0 dup 0 wprocess 0 migrx 30 migtx 2633 ssdr 0 ssdw 0 rw 0
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5158) {test} disk bytes used 89561376640 : avail pct 71 : cache-read pct 0.00
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5160) {test} memory bytes used 869969536 (index 869969536 : sindex 0) : used pct 23.70
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5171) {test} ldt_gc: cnt 0 io 0 gc 0 (0, 0, 0)
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5194) {test} migrations - remaining (5777 tx, 5982 rx), active (1 tx, 2 rx), 0.34% complete
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5203) partitions: actual 792 sync 3304 desync 0 zombie 0 absent 0
Jan 07 2016 11:28:49 GMT: INFO (info): (hist.c::137) histogram dump: reads (0 total) msec
Jan 07 2016 11:28:49 GMT: INFO (info): (hist.c::137) histogram dump: writes_master (0 total) msec
Jan 07 2016 11:28:49 GMT: INFO (info): (hist.c::137) histogram dump: proxy (0 total) msec
Jan 07 2016 11:28:49 GMT: INFO (info): (hist.c::137) histogram dump: udf (0 total) msec
Jan 07 2016 11:28:49 GMT: INFO (info): (hist.c::137) histogram dump: query (0 total) msec
Jan 07 2016 11:28:49 GMT: INFO (info): (hist.c::137) histogram dump: query_rec_count (0 total) count
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5385) node id bb90e00f00a0142
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5389) reads 0,0 : writes 0,0
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5393) udf reads 0,0 : udf writes 0,0 : udf deletes 0,0 : lua errors 0
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5396) basic scans 0,0 : aggregation scans 0,0 : udf background scans 0,0 :: active scans 0
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5400) index (new) batches 0,0 : direct (old) batches 0,0
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5404) aggregation queries 0,0 : lookup queries 0,0
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5406) proxies 0,0
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5415) {test} objects 13593274 : sub-objects 0 : master objects 2625756 : master sub-objects 0 : prole objects 3126 : prole sub-objects 0
Jan 07 2016 11:28:54 GMT: WARNING (fabric): (fabric.c::2093) releasing fb: 0x7f7c05441008 with fne: 0x7f7c03c0e108 and fd: 68 (Failed)
Jan 07 2016 11:28:54 GMT: WARNING (fabric): (fabric.c::2093) releasing fb: 0x7f7c07e1b008 with fne: 0x7f7c03c0e108 and fd: 78 (Failed)
Jan 07 2016 11:28:54 GMT: WARNING (fabric): (fabric.c::2093) releasing fb: 0x7f7c07e9d008 with fne: 0x7f7c03c0e108 and fd: 80 (Failed)
Jan 07 2016 11:28:54 GMT: WARNING (fabric): (fabric.c::2093) releasing fb: 0x7f7c07dda008 with fne: 0x7f7c03c0e108 and fd: 76 (Failed)
Jan 07 2016 11:28:54 GMT: WARNING (fabric): (fabric.c::2093) releasing fb: 0x7f7c07d99008 with fne: 0x7f7c03c0e108 and fd: 75 (Failed)
Jan 07 2016 11:28:54 GMT: WARNING (fabric): (fabric.c::2093) releasing fb: 0x7f7c07ede008 with fne: 0x7f7c03c0e108 and fd: 81 (Failed)
Jan 07 2016 11:28:54 GMT: WARNING (fabric): (fabric.c::2093) releasing fb: 0x7f7c07e5c008 with fne: 0x7f7c03c0e108 and fd: 79 (Failed)
Jan 07 2016 11:28:54 GMT: INFO (drv_ssd): (drv_ssd.c::2088) device /data/aerospike.dat: used 89561376640, contig-free 220797M (220797 wblocks), swb-free 0, w-q 0 w-tot 0 (0.0/s), defrag-q 0 defrag-tot 2 (0.1/s) defrag-w-tot 0 (0.0/s)
Jan 07 2016 11:28:54 GMT: WARNING (rw): (thr_rw.c::307) write_request_destructor(): Close fd FOR BATCH.
Jan 07 2016 11:28:54 GMT: WARNING (rw): (thr_rw.c::307) write_request_destructor(): Close fd FOR BATCH.
Jan 07 2016 11:28:54 GMT: WARNING (rw): (thr_rw.c::307) write_request_destructor(): Close fd FOR BATCH.
Jan 07 2016 11:28:54 GMT: WARNING (rw): (thr_rw.c::307) write_request_destructor(): Close fd FOR BATCH.
Jan 07 2016 11:28:54 GMT: CRITICAL (demarshal): (thr_demarshal.c:thr_demarshal_resume:124) unable to resume socket FD -1 on epoll instance FD 115: 9 (Bad file descriptor)
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::94) SIGABRT received, aborting Aerospike Community Edition build 3.7.1 os ubuntu12.04
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: found 13 frames
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: frame 0: /usr/bin/asd(as_sig_handle_abort+0x5d) [0x48a07a]
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: frame 1: /lib/x86_64-linux-gnu/libc.so.6(+0x352f0) [0x7f7c3c97e2f0]
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: frame 2: /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x37) [0x7f7c3c97e267]
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: frame 3: /lib/x86_64-linux-gnu/libc.so.6(abort+0x16a) [0x7f7c3c97feca]
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: frame 4: /usr/bin/asd(cf_fault_event+0x2a3) [0x516b1a]
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: frame 5: /usr/bin/asd(thr_demarshal_resume+0x8b) [0x49f473]
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: frame 6: /usr/bin/asd(as_end_of_transaction_ok+0x9) [0x4d58f4]
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: frame 7: /usr/bin/asd(write_request_destructor+0x132) [0x4c1c8e]
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: frame 8: /usr/bin/asd(cf_rchash_free+0x26) [0x541028]
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: frame 9: /usr/bin/asd(cf_rchash_reduce+0xb5) [0x541fe9]
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: frame 10: /usr/bin/asd(rw_retransmit_fn+0x44) [0x4c0eca]
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: frame 11: /lib/x86_64-linux-gnu/libpthread.so.0(+0x76aa) [0x7f7c3dbe16aa]
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: frame 12: /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f7c3ca4feed]
Jan 07 2016 12:13:37 GMT: INFO (as): (as.c::410) <><><><><><><><><><> Aerospike Community Edition build 3.7.1 <><><><><><><><><><>
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) # Aerospike database configuration file.
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) service {
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) user root
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) group root
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) paxos-single-replica-limit 1 # Number of nodes where the replica count is automatically reduced to 1.
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) pidfile /var/run/aerospike/asd.pid
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) service-threads 32
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) transaction-queues 32
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) transaction-threads-per-queue 32
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) batch-index-threads 32
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) proto-fd-max 15000
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) batch-max-requests 200000
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) }
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) logging {
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) # Log file must be an absolute path.
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) file /var/log/aerospike/aerospike.log {
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) context any info
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) }
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) }
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) network {
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) service {
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) #address any
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) port 3000
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) }
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) heartbeat {
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) mode mesh
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) mesh-seed-address-port 10.240.0.6 3002
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) mesh-seed-address-port 10.240.0.5 3002
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) port 3002
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) interval 150
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) timeout 10
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) }
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) fabric {
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) port 3001
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) }
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) info {
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) port 3003
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) }
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) }
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) namespace test {
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) replication-factor 10
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) memory-size 3500M
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) default-ttl 0 # 30 days, use 0 to never expire/evict.
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) ldt-enabled true
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) storage-engine device {
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) file /data/aerospike.dat
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) write-block-size 1M
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) filesize 300G
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) }
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) }
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3265) system file descriptor limit: 100000, proto-fd-max: 15000
Jan 07 2016 12:13:37 GMT: INFO (cf:misc): (id.c::119) Node ip: 10.240.0.14
Jan 07 2016 12:13:37 GMT: INFO (cf:misc): (id.c::327) Heartbeat address for mesh: 10.240.0.14
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3309) Rack Aware mode not enabled
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3312) Node id bb90e00f00a0142
Jan 07 2016 12:13:37 GMT: INFO (namespace): (namespace_cold.c::101) ns test beginning COLD start
Jan 07 2016 12:13:37 GMT: INFO (drv_ssd): (drv_ssd.c::3797) opened file /data/aerospike.dat: usable size 322122547200
Jan 07 2016 12:13:37 GMT: INFO (drv_ssd): (drv_ssd.c::1107) /data/aerospike.dat has 307200 wblocks of size 1048576
Jan 07 2016 12:13:37 GMT: INFO (drv_ssd): (drv_ssd.c::3176) device /data/aerospike.dat: reading device to load index
Jan 07 2016 12:13:37 GMT: INFO (drv_ssd): (drv_ssd.c::3181) In TID 13102: Using arena #150 for loading data for namespace "test"
Jan 07 2016 12:13:39 GMT: INFO (drv_ssd): (drv_ssd.c::3977) {test} loaded 134133 records, 0 subrecords, /data/aerospike.dat 0%
Jan 07 2016 12:13:41 GMT: INFO (drv_ssd): (drv_ssd.c::3977) {test} loaded 258771 records, 0 subrecords, /data/aerospike.dat 0%
Jan 07 2016 12:13:43 GMT: INFO (drv_ssd): (drv_ssd.c::3977) {test} loaded 388121 records, 0 subrecords, /data/aerospike.dat 0%
Jan 07 2016 12:13:45 GMT: INFO (drv_ssd): (drv_ssd.c::3977) {test} loaded 512116 records, 0 subrecords, /data/aerospike.dat 1%
Jan 07 2016 12:13:47 GMT: INFO (drv_ssd): (drv_ssd.c::3977) {test} loaded 641566 records, 0 subrecords, /data/aerospike.dat 1%
This was fixed in version 3.7.1 and above of aerospike server.
More details on the issue and Jira:
[AER-4487], [AER-4690] - (Clustering/Migration) Race condition causing incorrect heartbeat fd saved and later not removable.
Please also see:
https://discuss.aerospike.com/t/aerospike-crash/2327

Aerospike DB always starts in COLD mode

It's stated here that Aerospike should try to start in warm mode, meaning reuse same memory region holding keys. Instead, every time the database is restarted all keys are loaded back from the SSD drive, which can take tens of minutes if not hours. What I see in the log is the following:
Oct 12 2015 03:24:11 GMT: INFO (config): (cfg.c::3234) Node id bb9e10daab0c902
Oct 12 2015 03:24:11 GMT: INFO (namespace): (namespace_cold.c::101) ns organic **beginning COLD start**
Oct 12 2015 03:24:11 GMT: INFO (drv_ssd): (drv_ssd.c::3607) opened device /dev/xvdb: usable size 322122547200, io-min-size 512
Oct 12 2015 03:24:11 GMT: INFO (drv_ssd): (drv_ssd.c::3681) shadow device /dev/xvdc is compatible with main device
Oct 12 2015 03:24:11 GMT: INFO (drv_ssd): (drv_ssd.c::1107) /dev/xvdb has 307200 wblocks of size 1048576
Oct 12 2015 03:24:11 GMT: INFO (drv_ssd): (drv_ssd.c::3141) device /dev/xvdb: reading device to load index
Oct 12 2015 03:24:11 GMT: INFO (drv_ssd): (drv_ssd.c::3146) In TID 104520: Using arena #150 for loading data for namespace "organic"
Oct 12 2015 03:24:13 GMT: INFO (drv_ssd): (drv_ssd.c::3942) {organic} loaded 962647 records, 0 subrecords, /dev/xvdb 0%
What could be the reason that Aerospike fails to perform fast restart?
Thanks!
You are using community edition of the software. Warm start is not supported in it. It is available only in the enterprise edition.