RabbitMQ Exception An unexpected connection driver error occured - rabbitmq

I am using RabbitMQ message broker as messaging service. I have seen this error 3-4 times in the past 10 days.
c.r.c.i.ForgivingExceptionHandler in RabbitMQ Error On Write Thread - An unexpected connection driver error occured
This is the only stacktrace i could see. No other information found. what could be the reasons for this error?
Sept 15 19:19:13 test-env app/w.1 [error] c.r.c.i.ForgivingExceptionHandler in RabbitMQ Error On Write Thread - An unexpected connection driver error occured
Sept 15 19:19:13 test-env app/w.1 javax.net.ssl.SSLException: Connection has been shutdown: javax.net.ssl.SSLException: java.net.SocketException: Connection reset
Sept 15 19:19:13 test-env app/w.1 at sun.security.ssl.SSLSocketImpl.checkEOF(SSLSocketImpl.java:1554)
Sept 15 19:19:13 test-env app/w.1 at sun.security.ssl.SSLSocketImpl.checkWrite(SSLSocketImpl.java:1566)
Sept 15 19:19:13 test-env app/w.1 at sun.security.ssl.AppOutputStream.write(AppOutputStream.java:71)
Sept 15 19:19:13 test-env app/w.1 at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
Sept 15 19:19:13 test-env app/w.1 at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
Sept 15 19:19:13 test-env app/w.1 at java.io.DataOutputStream.flush(DataOutputStream.java:123)
Sept 15 19:19:13 test-env app/w.1 at com.rabbitmq.client.impl.SocketFrameHandler.flush(SocketFrameHandler.java:197)
Sept 15 19:19:13 test-env app/w.1 at com.rabbitmq.client.impl.AMQConnection.flush(AMQConnection.java:573)
Sept 15 19:19:13 test-env app/w.1 at com.rabbitmq.client.impl.AMQCommand.transmit(AMQCommand.java:134)
Sept 15 19:19:13 test-env app/w.1 at com.rabbitmq.client.impl.AMQChannel.quiescingTransmit(AMQChannel.java:455)
Sept 15 19:19:13 test-env app/w.1 at com.rabbitmq.client.impl.AMQChannel.quiescingTransmit(AMQChannel.java:434)
Sept 15 19:19:13 test-env app/w.1 at com.rabbitmq.client.impl.AMQChannel.quiescingRpc(AMQChannel.java:351)
Sept 15 19:19:13 test-env app/w.1 at com.rabbitmq.client.impl.ChannelN.close(ChannelN.java:614)
Sept 15 19:19:13 test-env app/w.1 at com.rabbitmq.client.impl.ChannelN.close(ChannelN.java:547)
Sept 15 19:19:13 test-env app/w.1 at com.rabbitmq.client.impl.ChannelN.close(ChannelN.java:540)
Sept 15 19:19:13 test-env app/w.1 Caused by: javax.net.ssl.SSLException: java.net.SocketException: Connection reset
Sept 15 19:19:13 test-env app/w.1 at sun.security.ssl.Alerts.getSSLException(Alerts.java:214)
Sept 15 19:19:13 test-env app/w.1 at sun.security.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1967)
Sept 15 19:19:13 test-env app/w.1 at sun.security.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1924)
Sept 15 19:19:13 test-env app/w.1 at sun.security.ssl.SSLSocketImpl.handleException(SSLSocketImpl.java:1888)
Sept 15 19:19:13 test-env app/w.1 at sun.security.ssl.SSLSocketImpl.handleException(SSLSocketImpl.java:1833)
Sept 15 19:19:13 test-env app/w.1 at sun.security.ssl.AppOutputStream.write(AppOutputStream.java:128)
Sept 15 19:19:13 test-env app/w.1 at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
Sept 15 19:19:13 test-env app/w.1 at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
Sept 15 19:19:13 test-env app/w.1 at java.io.DataOutputStream.flush(DataOutputStream.java:123)
Sept 15 19:19:13 test-env app/w.1 at com.rabbitmq.client.impl.SocketFrameHandler.flush(SocketFrameHandler.java:197)
Sept 15 19:19:13 test-env app/w.1 at com.rabbitmq.client.impl.HeartbeatSender$HeartbeatRunnable.run(HeartbeatSender.java:140)
Sept 15 19:19:13 test-env app/w.1 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
Sept 15 19:19:13 test-env app/w.1 at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)

Related

data loss detection in aerospike

If we have 6 number of partition with replication factor 2 and with paxos-single-replica-limit 3 (once we are down to 3 nodes, replication factor becomes 1). And all of a sudden 3 nodes die because of cascading effect. It might so happen that few partition were not able to migrate. But as par this doc the cluster will continue as if nothing happened. In the case of strongly consistency mode the partition may go as dead partition and we have to manually revive it.
How can i know when there has been a data loss, so that i can backup from previous snapshot.
If it matters we are on community edition.
In the case of strongly consistent mode(requires enterprise license), there will not be any data loss. And if majority of cluster literally dies, the dead partition will need to be manually revived.
In the absence of strongly consistent mode (default) one can grep for "rebalanced: expected-migrations" in aerospike logs of all live aerospike nodes. The result would look somewhat like below
Jun 27 2022 19:11:22 GMT: INFO (partition): (partition_balance.c:928) {test} rebalanced: expected-migrations (0,0,0) fresh-partitions 0
Jun 27 2022 19:18:13 GMT: INFO (partition): (partition_balance.c:928) {test2} rebalanced: expected-migrations (2325,1718,1978) fresh-partitions 0
Jun 27 2022 19:18:13 GMT: INFO (partition): (partition_balance.c:928) {test} rebalanced: expected-migrations (2325,1718,1978) fresh-partitions 0
Jun 27 2022 19:35:29 GMT: INFO (partition): (partition_balance.c:928) {test2} rebalanced: expected-migrations (514,50,50) fresh-partitions 0
Jun 27 2022 19:35:29 GMT: INFO (partition): (partition_balance.c:928) {test} rebalanced: expected-migrations (0,0,0) fresh-partitions 0
Jun 27 2022 19:58:18 GMT: INFO (partition): (partition_balance.c:928) {test2} rebalanced: expected-migrations (1941,1711,1293) fresh-partitions 0
Jun 27 2022 19:58:18 GMT: INFO (partition): (partition_balance.c:928) {test} rebalanced: expected-migrations (1941,1711,1293) fresh-partitions 0
Jun 27 2022 20:12:54 GMT: INFO (partition): (partition_balance.c:928) {test2} rebalanced: expected-migrations (1369,1089,1393) fresh-partitions 170
Jun 27 2022 20:12:54 GMT: INFO (partition): (partition_balance.c:928) {test} rebalanced: expected-migrations (833,307,1245) fresh-partitions 0
Jun 27 2022 20:19:07 GMT: INFO (partition): (partition_balance.c:928) {test2} rebalanced: expected-migrations (1467,1172,1576) fresh-partitions 190
Jun 27 2022 20:19:07 GMT: INFO (partition): (partition_balance.c:928) {test} rebalanced: expected-migrations (385,418,770) fresh-partitions 0
Jun 27 2022 20:19:59 GMT: INFO (partition): (partition_balance.c:928) {test2} rebalanced: expected-migrations (1830,1477,1926) fresh-partitions 128
Jun 27 2022 20:19:59 GMT: INFO (partition): (partition_balance.c:928) {test} rebalanced: expected-migrations (581,614,1162) fresh-partitions 0
look for fresh-partitions here. If it is more than 1 that means one partition is not available and aerospike created a fresh partition for you. If the other node has died that means there is a data loss. If other nodes come back again (because they did not die but got network partitioned) the older data will not be lost but conflict resolution will take place between older partition and freshly created partition(the default strategy of conflict resolution is generation number which means the key that got modified more often will be present after conflict resolution).
source

Postgresql 13 AlmaLinux Permissions Issue

I'm trying to get postgresql 13.x running on AlmaLinux. The package manage shows v13.x is available and I install postgresql-server:
sudo yum install -y postgresql-server postgresql-contrib
I then try to initialize the first db:
sudo postgresql-setup --initdb
Because I had to redo some stuff, this directory is not empty, then I run into permissions issues. I've tried setting ownership to my account, but it doesn't work either:
ERROR: Data directory /var/lib/pgsql/data is not empty!
ERROR: Initializing database failed, possibly see /var/lib/pgsql/initdb_postgresql.log
[Wed Jun 29 18:31:40 rich#server1 /var/lib] cd pgsql
-bash: cd: pgsql: Permission denied
[Wed Jun 29 18:31:50 rich#server1 /var/lib] sudo ls -al pgsql/data
total 60
drwx------. 20 postgres postgres 4096 May 26 11:53 .
drwx------. 4 postgres postgres 54 Jun 29 18:27 ..
drwx------. 5 rich rich 41 Jun 29 11:01 base
-rw-------. 1 rich rich 30 Jun 29 15:58 current_logfiles
drwx------. 2 rich rich 4096 Jun 29 11:01 global
drwx------. 2 rich rich 32 Jun 29 11:02 log
drwx------. 2 rich rich 6 Jun 29 11:01 pg_commit_ts
drwx------. 2 rich rich 6 Jun 29 11:01 pg_dynshmem
-rw-------. 1 rich rich 4760 Jun 29 11:01 pg_hba.conf
-rw-------. 1 rich rich 1636 Jun 29 11:01 pg_ident.conf
drwx------. 4 rich rich 68 Jun 29 11:01 pg_logical
drwx------. 4 rich rich 36 Jun 29 11:01 pg_multixact
drwx------. 2 rich rich 6 Jun 29 11:01 pg_notify
drwx------. 2 rich rich 6 Jun 29 11:01 pg_replslot
drwx------. 2 rich rich 6 Jun 29 11:01 pg_serial
drwx------. 2 rich rich 6 Jun 29 11:01 pg_snapshots
drwx------. 2 rich rich 6 Jun 29 11:01 pg_stat
drwx------. 2 rich rich 6 Jun 29 11:01 pg_stat_tmp
drwx------. 2 rich rich 18 Jun 29 11:01 pg_subtrans
drwx------. 2 rich rich 6 Jun 29 11:01 pg_tblspc
drwx------. 2 rich rich 6 Jun 29 11:01 pg_twophase
-rw-------. 1 rich rich 3 Jun 29 11:01 PG_VERSION
drwx------. 3 rich rich 60 Jun 29 11:01 pg_wal
drwx------. 2 rich rich 18 Jun 29 11:01 pg_xact
-rw-------. 1 rich rich 88 Jun 29 11:01 postgresql.auto.conf
-rw-------. 1 rich rich 28098 Jun 29 11:01 postgresql.conf
So now I'm caught between initializing a new database and getting past permissions. Anybody know how I can get this running?
EDIT: further info
Here is the report behind the permissions issue:
░░ A start job for unit postgresql.service has begun execution.
░░
░░ The job identifier is 2450.
Jun 29 21:28:58 server1.project33.ca postgresql-check-db-dir[2593]: cat: /var/lib/pgsql/data/PG_VERSION: Permission denied
Jun 29 21:28:58 server1.project33.ca postgresql-check-db-dir[2592]: An old version '' of the database format was found.
Jun 29 21:28:58 server1.project33.ca postgresql-check-db-dir[2592]: You need to dump and reload before using PostgreSQL 13.7.
Jun 29 21:28:58 server1.project33.ca postgresql-check-db-dir[2592]: See /usr/share/doc/postgresql/README.rpm-dist for more information.
Jun 29 21:28:58 server1.project33.ca systemd[1]: postgresql.service: Control process exited, code=exited, status=1/FAILURE
░░ Subject: Unit process exited
░░ Defined-By: systemd
░░ Support: https://access.redhat.com/support
░░
░░ An ExecStartPre= process belonging to unit postgresql.service has exited.
░░
░░ The process' exit code is 'exited' and its exit status is 1.
Jun 29 21:28:58 server1.project33.ca systemd[1]: postgresql.service: Failed with result 'exit-code'.
░░ Subject: Unit failed
░░ Defined-By: systemd
░░ Support: https://access.redhat.com/support
░░
░░ The unit postgresql.service has entered the 'failed' state with result 'exit-code'.
Jun 29 21:28:58 server1.project33.ca systemd[1]: Failed to start PostgreSQL database server.
░░ Subject: A start job for unit postgresql.service has failed
░░ Defined-By: systemd
░░ Support: https://access.redhat.com/support
░░
░░ A start job for unit postgresql.service has finished with a failure.
░░
░░ The job identifier is 2450 and the job result is failed.
Solved: sudo postgresql-setup --initdb
Permissions issue with /var/lib/pgsql/data being owned by postgresql:postgresql.
Then log in using the postgres user, start with:
createdb `whoami`
...which creates the postgres database. While inside, you can create roles and databases, or exit to your user and use the createdb syntax again.

Aerospike Cluster Nodes intermittently going down and coming back up

I have an Aerospike cluster of 15 nodes. This cluster performs fairly well under a normal load 10k TPS. I did some tests today, on a higher TPS. I raised the TPS to around 130k-150k TPS.
I observed that some nodes intermittently went down, and automatically came back up after a few seconds. Due to these nodes going down, we are getting heartbeat timeouts, and hence, read timeouts.
One cluster node configuration: 8 cores. 120GB RAM. I am storing data in memory.
All nodes have sufficient space remaining. Out of a total cluster space of 1.2TB (15*120), only 275 GB of space is used up.
Also, the network in not at all flaky. All these machines are in a data centre, and are high bandwidth machines.
Some observations made by monitoring AMC:
Saw some nodes (around 5-6) become inactive for a few seconds
There were a high number of client connections on few of these nodes that went down. For example: there were 6000-7000 client connections on all other nodes. One of the node had an unusual 25000 client connections.
Some error logs in cluster nodes:
Sep 15 2020 17:00:43 GMT: WARNING (hb): (hb.c:4864) (repeated:5) could not create heartbeat connection to node {10.33.162.134:2057}
Sep 15 2020 17:00:43 GMT: WARNING (socket): (socket.c:808) (repeated:5) Error while connecting socket to 10.33.162.134:2057
Sep 15 2020 17:00:53 GMT: WARNING (socket): (socket.c:740) (repeated:3) Timeout while connecting
Sep 15 2020 17:00:53 GMT: WARNING (hb): (hb.c:4864) (repeated:3) could not create heartbeat connection to node {10.33.162.134:2057}
Sep 15 2020 17:00:53 GMT: WARNING (socket): (socket.c:808) (repeated:3) Error while connecting socket to 10.33.162.134:2057
Sep 15 2020 17:01:03 GMT: WARNING (socket): (socket.c:740) (repeated:1) Timeout while connecting
Sep 15 2020 17:01:03 GMT: WARNING (hb): (hb.c:4864) (repeated:1) could not create heartbeat connection to node {10.33.162.134:2057}
Sep 15 2020 17:01:03 GMT: WARNING (socket): (socket.c:808) (repeated:1) Error while connecting socket to 10.33.162.134:2057
Sep 15 2020 17:01:13 GMT: WARNING (socket): (socket.c:740) (repeated:2) Timeout while connecting
Sep 15 2020 17:01:13 GMT: WARNING (hb): (hb.c:4864) (repeated:2) could not create heartbeat connection to node {10.33.162.134:2057}
Sep 15 2020 17:01:13 GMT: WARNING (socket): (socket.c:808) (repeated:2) Error while connecting socket to 10.33.162.134:2057
Sep 15 2020 17:02:44 GMT: WARNING (socket): (socket.c:740) Timeout while connecting
Sep 15 2020 17:02:44 GMT: WARNING (socket): (socket.c:808) Error while connecting socket to 10.33.54.144:2057
Sep 15 2020 17:02:44 GMT: WARNING (hb): (hb.c:4864) could not create heartbeat connection to node {10.33.54.144:2057}
Sep 15 2020 17:02:53 GMT: WARNING (socket): (socket.c:740) (repeated:1) Timeout while connecting
We also saw some of these error logs in nodes that were going down:
Sep 15 2020 17:08:58 GMT: WARNING (hb): (hb.c:5122) sending mesh message to bb9280f220a0102 on fd 4155 failed : Broken pipe
Sep 15 2020 17:08:58 GMT: WARNING (hb): (hb.c:5122) sending mesh message to bb9b676220a0102 on fd 4149 failed : Broken pipe
Sep 15 2020 17:08:58 GMT: WARNING (hb): (hb.c:5122) sending mesh message to bb9fbd6200a0102 on fd 42 failed : Broken pipe
Sep 15 2020 17:08:58 GMT: WARNING (hb): (hb.c:5122) sending mesh message to bb96d3d220a0102 on fd 4444 failed : Broken pipe
Sep 15 2020 17:08:58 GMT: WARNING (hb): (hb.c:5122) sending mesh message to bb99036210a0102 on fd 4278 failed : Broken pipe
Sep 15 2020 17:08:58 GMT: WARNING (hb): (hb.c:5122) sending mesh message to bb9f102220a0102 on fd 4143 failed : Broken pipe
Sep 15 2020 17:08:58 GMT: WARNING (hb): (hb.c:5122) sending mesh message to bb91822210a0102 on fd 4515 failed : Broken pipe
Sep 15 2020 17:08:58 GMT: WARNING (hb): (hb.c:5122) sending mesh message to bb9e5ff200a0102 on fd 4173 failed : Broken pipe
Sep 15 2020 17:08:58 GMT: WARNING (hb): (hb.c:5122) sending mesh message to bb93f65200a0102 on fd 38 failed : Broken pipe
Sep 15 2020 17:08:58 GMT: WARNING (hb): (hb.c:5122) sending mesh message to bb9132f220a0102 on fd 4414 failed : Connection reset by peer
Sep 15 2020 17:08:58 GMT: WARNING (hb): (hb.c:5122) sending mesh message to bb939be210a0102 on fd 4567 failed : Connection reset by peer
Sep 15 2020 17:08:58 GMT: WARNING (hb): (hb.c:5122) sending mesh message to bb9b19a220a0102 on fd 4165 failed : Broken pipe
Attaching the aerospike.conf file here:
service {
user root
group root
service-threads 12
transaction-queues 12
transaction-threads-per-queue 4
proto-fd-max 50000
migrate-threads 1
pidfile /var/run/aerospike/asd.pid
}
logging {
file /var/log/aerospike/aerospike.log {
context any info
context migrate debug
}
}
network {
service {
address any
port 3000
}
heartbeat {
mode mesh
port 2057
mesh-seed-address-port 10.34.154.177 2057
mesh-seed-address-port 10.34.15.40 2057
mesh-seed-address-port 10.32.255.229 2057
mesh-seed-address-port 10.33.54.144 2057
mesh-seed-address-port 10.32.190.157 2057
mesh-seed-address-port 10.32.101.63 2057
mesh-seed-address-port 10.34.2.241 2057
mesh-seed-address-port 10.32.214.251 2057
mesh-seed-address-port 10.34.30.114 2057
mesh-seed-address-port 10.33.162.134 2057
mesh-seed-address-port 10.33.190.57 2057
mesh-seed-address-port 10.34.61.109 2057
mesh-seed-address-port 10.34.47.19 2057
mesh-seed-address-port 10.33.34.24 2057
mesh-seed-address-port 10.34.118.182 2057
interval 150
timeout 20
}
fabric {
port 3001
}
info {
port 3003
}
}
namespace PS1 {
replication-factor 2
memory-size 70G
single-bin false
data-in-index false
storage-engine memory
stop-writes-pct 90
high-water-memory-pct 75
}
namespace LS1 {
replication-factor 2
memory-size 30G
single-bin false
data-in-index false
storage-engine memory
stop-writes-pct 90
high-water-memory-pct 75
}
Any possible explanations for this?
Seems like the nodes are having network connectivity issues at such higher throughput. This can have different root causes, from simple network related bottleneck (bandwidth, packets per second), to something on the node itself getting in the way of interfacing properly with the network (soft interrupts surge, improper distribution of network queues, CPU thrashing). This would prevent heartbeat connections/messages from going through, resulting in nodes leaving the cluster until it recovers. If running on a cloud/virtualized environment, some hosts may have noisier neighbors than others, etc...
The increase in number of connections is a symptom as any slow down on a node would cause the client to compensate by increasing the throughput (which will increase the number of connections, which can also lead to a downward spiraling effect).
Finally, a single node leaving or joining the cluster shouldn't impact read transactions much. Check your policy and make sure you have the socketTimeout / totalTimeout / maxRetries, etc... set correctly so that a read can quickly retry against a different replica.
This article can help on this latest point: https://discuss.aerospike.com/t/understanding-timeout-and-retry-policies/2852/3

apache 2 does not install ubuntu 16.04

I am install apache2 ubuntu 16.04 .i am getting this error.
Job for apache2.service failed because the control process exited with error code. See "systemctl status apache2.service" and "journalctl -xe" for details.
invoke-rc.d: initscript apache2, action "start" failed.
● apache2.service - LSB: Apache2 web server
Loaded: loaded (/etc/init.d/apache2; bad; vendor preset: enabled)
Drop-In: /lib/systemd/system/apache2.service.d
└─apache2-systemd.conf
Active: failed (Result: exit-code) since Fri 2017-06-30 15:19:36 IST; 10ms ago
Docs: man:systemd-sysv-generator(8)
Process: 18460 ExecStart=/etc/init.d/apache2 start (code=exited, status=2)
Jun 30 15:19:36 shradha-Vostro-3546 systemd[1]: Starting LSB: Apache2 web se....
Jun 30 15:19:36 shradha-Vostro-3546 apache2[18460]: /etc/init.d/apache2: 46: ...
Jun 30 15:19:36 shradha-Vostro-3546 apache2[18460]: /etc/init.d/apache2: 57: ...
Jun 30 15:19:36 shradha-Vostro-3546 apache2[18460]: ERROR: APACHE_PID_FILE ne...
Jun 30 15:19:36 shradha-Vostro-3546 systemd[1]: apache2.service: Control pro...2
Jun 30 15:19:36 shradha-Vostro-3546 systemd[1]: Failed to start LSB: Apache2....
Jun 30 15:19:36 shradha-Vostro-3546 systemd[1]: apache2.service: Unit entere....
Jun 30 15:19:36 shradha-Vostro-3546 systemd[1]: apache2.service: Failed with....
please help me

Aerospike sudden crash

I am running a 5 nodes cluster with version 3.7.0.2 and after some hours of usage all 5 instances are crashed. I have seen some other reports of crash in this version. Should I download the 3.7.1 version? Will it fix the crash?
Linux aerospike2 4.2.0-18-generic #22-Ubuntu SMP Fri Nov 6 18:25:50
UTC 2015 x86_64 x86_64 x86_64 GNU/Linux (Ubuntu 15.10)
config:
# Aerospike database configuration file.
service {
user root
group root
paxos-single-replica-limit 1 # Number of nodes where the replica count is automatically reduced to 1.
pidfile /var/run/aerospike/asd.pid
service-threads 32
transaction-queues 32
transaction-threads-per-queue 32
batch-index-threads 32
proto-fd-max 15000
batch-max-requests 200000
}
logging {
# Log file must be an absolute path.
file /var/log/aerospike/aerospike.log {
context any info
}
}
network {
service {
address 10.240.0.6
port 3000
}
heartbeat {
mode mesh
address 10.240.0.6 # IP of the NIC on which this node is listening
mesh-seed-address-port 10.240.0.6 3002
mesh-seed-address-port 10.240.0.5 3002
port 3002
interval 150
timeout 10
}
fabric {
port 3001
}
info {
port 3003
}
}
namespace test {
replication-factor 10
memory-size 3500M
default-ttl 0 # 30 days, use 0 to never expire/evict.
ldt-enabled true
storage-engine device {
file /data/aerospike.dat
write-block-size 1M
filesize 300G
# data-in-memory true
}
}
LOGS:
Jan 07 2016 11:28:34 GMT: INFO (drv_ssd): (drv_ssd.c::3202) device /data/aerospike.dat: read complete: UNIQUE 13593274 (REPLACED 0) (GEN 63) (EXPIRED 0) (MAX-TTL 0) records
Jan 07 2016 11:28:34 GMT: INFO (drv_ssd): (drv_ssd.c::1072) ns test loading free & defrag queues
Jan 07 2016 11:28:34 GMT: INFO (drv_ssd): (drv_ssd.c::1006) /data/aerospike.dat init defrag profile: 0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
Jan 07 2016 11:28:34 GMT: INFO (drv_ssd): (drv_ssd.c::1096) /data/aerospike.dat init wblock free-q 220796, defrag-q 2
Jan 07 2016 11:28:34 GMT: INFO (drv_ssd): (drv_ssd.c::2373) ns test starting device maintenance threads
Jan 07 2016 11:28:34 GMT: INFO (drv_ssd): (drv_ssd.c::1488) ns test starting write worker threads
Jan 07 2016 11:28:34 GMT: INFO (drv_ssd): (drv_ssd.c::923) ns test starting defrag threads
Jan 07 2016 11:28:34 GMT: INFO (as): (as.c::457) initializing services...
Jan 07 2016 11:28:34 GMT: INFO (tsvc): (thr_tsvc.c::819) shared queues: 32 queues with 32 threads each
Jan 07 2016 11:28:34 GMT: INFO (hb): (hb.c::2649) Sending 10.240.0.14 as the IP address for receiving heartbeats
Jan 07 2016 11:28:34 GMT: INFO (hb): (hb.c::2661) heartbeat socket initialization
Jan 07 2016 11:28:34 GMT: INFO (hb): (hb.c::2675) initializing mesh heartbeat socket : 10.240.0.14:3002
Jan 07 2016 11:28:34 GMT: INFO (paxos): (paxos.c::3454) partitions from storage: total 4096 found 4096 lost(set) 0 lost(unset) 0
Jan 07 2016 11:28:34 GMT: INFO (partition): (partition.c::3432) {test} 4096 partitions: found 0 absent, 4096 stored
Jan 07 2016 11:28:34 GMT: INFO (paxos): (paxos.c::3458) Paxos service ignited: bb90e00f00a0142
Jan 07 2016 11:28:34 GMT: INFO (batch): (batch.c::609) Initialize batch-index-threads to 32
Jan 07 2016 11:28:34 GMT: INFO (batch): (batch.c::635) Created JEMalloc arena #151 for batch normal buffers
Jan 07 2016 11:28:34 GMT: INFO (batch): (batch.c::636) Created JEMalloc arena #152 for batch huge buffers
Jan 07 2016 11:28:34 GMT: INFO (batch): (thr_batch.c::347) Initialize batch-threads to 4
Jan 07 2016 11:28:34 GMT: INFO (drv_ssd): (drv_ssd.c::4147) {test} floor set at 1049 wblocks per device
Jan 07 2016 11:28:37 GMT: INFO (paxos): (paxos.c::3539) listening for other nodes (max 3000 milliseconds) ...
Jan 07 2016 11:28:37 GMT: INFO (hb): (hb.c::2143) connecting to remote heartbeat service at 10.240.0.6:3002
Jan 07 2016 11:28:37 GMT: INFO (hb): (hb.c::2143) connecting to remote heartbeat service at 10.240.0.5:3002
Jan 07 2016 11:28:37 GMT: INFO (hb): (hb.c::1085) initiated connection to mesh seed host at 10.240.0.6:3002 (10.240.0.6:3002) via socket 60 from 10.240.0.14:55702
Jan 07 2016 11:28:37 GMT: INFO (hb): (hb.c::1085) initiated connection to mesh seed host at 10.240.0.5:3002 (10.240.0.5:3002) via socket 61 from 10.240.0.14:40626
Jan 07 2016 11:28:37 GMT: INFO (hb): (hb.c::1085) initiated connection to mesh non-seed host at 10.240.0.23:3002 (10.240.0.23:3002) via socket 62 from 10.240.0.14:42802
Jan 07 2016 11:28:37 GMT: INFO (hb): (hb.c::1085) initiated connection to mesh non-seed host at 10.240.0.13:3002 (10.240.0.13:3002) via socket 63 from 10.240.0.14:35384
Jan 07 2016 11:28:37 GMT: INFO (hb): (hb.c::2571) new heartbeat received: bb90500f00a0142 principal node is bb91700f00a0142
Jan 07 2016 11:28:37 GMT: INFO (hb): (hb.c::2571) new heartbeat received: bb90600f00a0142 principal node is bb91700f00a0142
Jan 07 2016 11:28:37 GMT: INFO (fabric): (fabric.c::1811) fabric: node bb90500f00a0142 arrived
Jan 07 2016 11:28:37 GMT: INFO (fabric): (fabric.c::1811) fabric: node bb90600f00a0142 arrived
Jan 07 2016 11:28:37 GMT: INFO (paxos): (paxos.c::3547) ... other node(s) detected - node will operate in a multi-node cluster
Jan 07 2016 11:28:37 GMT: INFO (paxos): (paxos.c::2250) Skip node arrival bb90500f00a0142 cluster principal bb90e00f00a0142 pulse principal bb91700f00a0142
Jan 07 2016 11:28:37 GMT: INFO (paxos): (paxos.c::2250) Skip node arrival bb90600f00a0142 cluster principal bb90e00f00a0142 pulse principal bb91700f00a0142
Jan 07 2016 11:28:37 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #8 for thr_demarshal()
Jan 07 2016 11:28:37 GMT: INFO (ldt): (thr_nsup.c::1139) LDT supervisor started
Jan 07 2016 11:28:37 GMT: INFO (nsup): (thr_nsup.c::1176) namespace supervisor started
Jan 07 2016 11:28:37 GMT: INFO (paxos): (paxos.c::3516) paxos supervisor thread started
Jan 07 2016 11:28:37 GMT: INFO (demarshal): (thr_demarshal.c::308) Service started: socket 0.0.0.0:3000
Jan 07 2016 11:28:37 GMT: INFO (hb): (hb.c::2571) new heartbeat received: bb90d00f00a0142 principal node is bb91700f00a0142
Jan 07 2016 11:28:37 GMT: INFO (hb): (hb.c::2571) new heartbeat received: bb91700f00a0142 principal node is bb91700f00a0142
Jan 07 2016 11:28:37 GMT: INFO (fabric): (fabric.c::1811) fabric: node bb90d00f00a0142 arrived
Jan 07 2016 11:28:37 GMT: INFO (fabric): (fabric.c::1811) fabric: node bb91700f00a0142 arrived
Jan 07 2016 11:28:37 GMT: INFO (paxos): (paxos.c::2250) Skip node arrival bb90d00f00a0142 cluster principal bb90e00f00a0142 pulse principal bb91700f00a0142
Jan 07 2016 11:28:37 GMT: INFO (paxos): (paxos.c::2250) Skip node arrival bb91700f00a0142 cluster principal bb90e00f00a0142 pulse principal bb91700f00a0142
Jan 07 2016 11:28:38 GMT: INFO (partition): (partition.c::383) DISALLOW MIGRATIONS
Jan 07 2016 11:28:38 GMT: INFO (paxos): (paxos.c::3198) SUCCESSION [6]#bb91700f00a0142*: bb91700f00a0142 bb90e00f00a0142 bb90d00f00a0142 bb90600f00a0142 bb90500f00a0142
Jan 07 2016 11:28:38 GMT: INFO (paxos): (paxos.c::3209) node bb91700f00a0142 is now principal pro tempore
Jan 07 2016 11:28:38 GMT: INFO (paxos): (paxos.c::2331) Sent partition sync request to node bb91700f00a0142
Jan 07 2016 11:28:38 GMT: INFO (partition): (partition.c::383) DISALLOW MIGRATIONS
Jan 07 2016 11:28:38 GMT: INFO (paxos): (paxos.c::3198) SUCCESSION [6]#bb91700f00a0142*: bb91700f00a0142 bb90e00f00a0142 bb90d00f00a0142 bb90600f00a0142 bb90500f00a0142
Jan 07 2016 11:28:38 GMT: INFO (paxos): (paxos.c::3209) node bb91700f00a0142 is still principal pro tempore
Jan 07 2016 11:28:38 GMT: INFO (paxos): (paxos.c::2331) Sent partition sync request to node bb91700f00a0142
Jan 07 2016 11:28:38 GMT: INFO (paxos): (paxos.c::3293) received partition sync message from bb91700f00a0142
Jan 07 2016 11:28:38 GMT: INFO (partition): (partition.c::2490) CLUSTER SIZE = 5
Jan 07 2016 11:28:38 GMT: INFO (partition): (partition.c::2533) Global state is well formed
Jan 07 2016 11:28:38 GMT: INFO (paxos): (partition.c::2262) setting replication factors: cluster size 5, paxos single replica limit 1
Jan 07 2016 11:28:38 GMT: INFO (paxos): (partition.c::2278) {test} replication factor is 5
Jan 07 2016 11:28:38 GMT: INFO (config): (cluster_config.c::421) rack aware is disabled
Jan 07 2016 11:28:38 GMT: INFO (partition): (cluster_config.c::380) rack aware is disabled
Jan 07 2016 11:28:38 GMT: INFO (partition): (partition.c::3337) {test} re-balanced, expected migrations - (5789 tx, 6010 rx)
Jan 07 2016 11:28:38 GMT: INFO (paxos): (partition.c::3355) global partition state: total 4096 lost 0 unique 0 duplicate 4096
Jan 07 2016 11:28:38 GMT: INFO (paxos): (partition.c::3356) partition state after fixing lost partitions (master): total 4096 lost 0 unique 0 duplicate 4096
Jan 07 2016 11:28:38 GMT: INFO (paxos): (partition.c::3357) 0 new partition version tree paths generated
Jan 07 2016 11:28:38 GMT: INFO (partition): (partition.c::375) ALLOW MIGRATIONS
Jan 07 2016 11:28:38 GMT: INFO (paxos): (paxos.c::3293) received partition sync message from bb91700f00a0142
Jan 07 2016 11:28:38 GMT: INFO (paxos): (paxos.c::803) Node allows migrations. Ignoring duplicate partition sync message.
Jan 07 2016 11:28:38 GMT: WARNING (paxos): (paxos.c::3301) unable to apply partition sync message state
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #18 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #19 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #20 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #21 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #22 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #23 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #24 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #25 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #26 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #27 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #28 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #30 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #29 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #31 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #32 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #33 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #34 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #35 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #36 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #37 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #38 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #39 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #40 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #41 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #42 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #43 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #44 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #45 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #46 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #47 for thr_demarshal()
Jan 07 2016 11:28:38 GMT: INFO (demarshal): (thr_demarshal.c::279) Saved original JEMalloc arena #48 for thr_demarshal()
Jan 07 2016 11:28:39 GMT: INFO (demarshal): (thr_demarshal.c::860) Waiting to spawn demarshal threads ...
Jan 07 2016 11:28:39 GMT: INFO (demarshal): (thr_demarshal.c::863) Started 32 Demarshal Threads
Jan 07 2016 11:28:39 GMT: INFO (as): (as.c::494) service ready: soon there will be cake!
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5084) system memory: free 6590544kb ( 86 percent free )
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5090) ClusterSize 5 ::: objects 13593274 ::: sub_objects 0
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5099) rec refs 13596175 ::: rec locks 1 ::: trees 0 ::: wr reqs 0 ::: mig tx 2633 ::: mig rx 30
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5104) replica errs :: null 0 non-null 0 ::: sync copy errs :: master 0
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5114) trans_in_progress: wr 0 prox 0 wait 0 ::: q 0 ::: iq 0 ::: dq 0 : fds - proto (22, 35, 13) : hb (4, 4, 0) : fab (72, 72, 0)
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5116) heartbeat_received: self 0 : foreign 322
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5117) heartbeat_stats: bt 0 bf 0 nt 0 ni 0 nn 0 nnir 0 nal 0 sf1 0 sf2 0 sf3 0 sf4 0 sf5 0 sf6 0 mrf 0 eh 0 efd 0 efa 0 um 0 mcf 0 rc 0
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5129) tree_counts: nsup 0 scan 0 dup 0 wprocess 0 migrx 30 migtx 2633 ssdr 0 ssdw 0 rw 0
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5158) {test} disk bytes used 89561376640 : avail pct 71 : cache-read pct 0.00
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5160) {test} memory bytes used 869969536 (index 869969536 : sindex 0) : used pct 23.70
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5171) {test} ldt_gc: cnt 0 io 0 gc 0 (0, 0, 0)
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5194) {test} migrations - remaining (5777 tx, 5982 rx), active (1 tx, 2 rx), 0.34% complete
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5203) partitions: actual 792 sync 3304 desync 0 zombie 0 absent 0
Jan 07 2016 11:28:49 GMT: INFO (info): (hist.c::137) histogram dump: reads (0 total) msec
Jan 07 2016 11:28:49 GMT: INFO (info): (hist.c::137) histogram dump: writes_master (0 total) msec
Jan 07 2016 11:28:49 GMT: INFO (info): (hist.c::137) histogram dump: proxy (0 total) msec
Jan 07 2016 11:28:49 GMT: INFO (info): (hist.c::137) histogram dump: udf (0 total) msec
Jan 07 2016 11:28:49 GMT: INFO (info): (hist.c::137) histogram dump: query (0 total) msec
Jan 07 2016 11:28:49 GMT: INFO (info): (hist.c::137) histogram dump: query_rec_count (0 total) count
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5385) node id bb90e00f00a0142
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5389) reads 0,0 : writes 0,0
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5393) udf reads 0,0 : udf writes 0,0 : udf deletes 0,0 : lua errors 0
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5396) basic scans 0,0 : aggregation scans 0,0 : udf background scans 0,0 :: active scans 0
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5400) index (new) batches 0,0 : direct (old) batches 0,0
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5404) aggregation queries 0,0 : lookup queries 0,0
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5406) proxies 0,0
Jan 07 2016 11:28:49 GMT: INFO (info): (thr_info.c::5415) {test} objects 13593274 : sub-objects 0 : master objects 2625756 : master sub-objects 0 : prole objects 3126 : prole sub-objects 0
Jan 07 2016 11:28:54 GMT: WARNING (fabric): (fabric.c::2093) releasing fb: 0x7f7c05441008 with fne: 0x7f7c03c0e108 and fd: 68 (Failed)
Jan 07 2016 11:28:54 GMT: WARNING (fabric): (fabric.c::2093) releasing fb: 0x7f7c07e1b008 with fne: 0x7f7c03c0e108 and fd: 78 (Failed)
Jan 07 2016 11:28:54 GMT: WARNING (fabric): (fabric.c::2093) releasing fb: 0x7f7c07e9d008 with fne: 0x7f7c03c0e108 and fd: 80 (Failed)
Jan 07 2016 11:28:54 GMT: WARNING (fabric): (fabric.c::2093) releasing fb: 0x7f7c07dda008 with fne: 0x7f7c03c0e108 and fd: 76 (Failed)
Jan 07 2016 11:28:54 GMT: WARNING (fabric): (fabric.c::2093) releasing fb: 0x7f7c07d99008 with fne: 0x7f7c03c0e108 and fd: 75 (Failed)
Jan 07 2016 11:28:54 GMT: WARNING (fabric): (fabric.c::2093) releasing fb: 0x7f7c07ede008 with fne: 0x7f7c03c0e108 and fd: 81 (Failed)
Jan 07 2016 11:28:54 GMT: WARNING (fabric): (fabric.c::2093) releasing fb: 0x7f7c07e5c008 with fne: 0x7f7c03c0e108 and fd: 79 (Failed)
Jan 07 2016 11:28:54 GMT: INFO (drv_ssd): (drv_ssd.c::2088) device /data/aerospike.dat: used 89561376640, contig-free 220797M (220797 wblocks), swb-free 0, w-q 0 w-tot 0 (0.0/s), defrag-q 0 defrag-tot 2 (0.1/s) defrag-w-tot 0 (0.0/s)
Jan 07 2016 11:28:54 GMT: WARNING (rw): (thr_rw.c::307) write_request_destructor(): Close fd FOR BATCH.
Jan 07 2016 11:28:54 GMT: WARNING (rw): (thr_rw.c::307) write_request_destructor(): Close fd FOR BATCH.
Jan 07 2016 11:28:54 GMT: WARNING (rw): (thr_rw.c::307) write_request_destructor(): Close fd FOR BATCH.
Jan 07 2016 11:28:54 GMT: WARNING (rw): (thr_rw.c::307) write_request_destructor(): Close fd FOR BATCH.
Jan 07 2016 11:28:54 GMT: CRITICAL (demarshal): (thr_demarshal.c:thr_demarshal_resume:124) unable to resume socket FD -1 on epoll instance FD 115: 9 (Bad file descriptor)
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::94) SIGABRT received, aborting Aerospike Community Edition build 3.7.1 os ubuntu12.04
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: found 13 frames
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: frame 0: /usr/bin/asd(as_sig_handle_abort+0x5d) [0x48a07a]
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: frame 1: /lib/x86_64-linux-gnu/libc.so.6(+0x352f0) [0x7f7c3c97e2f0]
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: frame 2: /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x37) [0x7f7c3c97e267]
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: frame 3: /lib/x86_64-linux-gnu/libc.so.6(abort+0x16a) [0x7f7c3c97feca]
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: frame 4: /usr/bin/asd(cf_fault_event+0x2a3) [0x516b1a]
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: frame 5: /usr/bin/asd(thr_demarshal_resume+0x8b) [0x49f473]
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: frame 6: /usr/bin/asd(as_end_of_transaction_ok+0x9) [0x4d58f4]
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: frame 7: /usr/bin/asd(write_request_destructor+0x132) [0x4c1c8e]
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: frame 8: /usr/bin/asd(cf_rchash_free+0x26) [0x541028]
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: frame 9: /usr/bin/asd(cf_rchash_reduce+0xb5) [0x541fe9]
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: frame 10: /usr/bin/asd(rw_retransmit_fn+0x44) [0x4c0eca]
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: frame 11: /lib/x86_64-linux-gnu/libpthread.so.0(+0x76aa) [0x7f7c3dbe16aa]
Jan 07 2016 11:28:54 GMT: WARNING (as): (signal.c::96) stacktrace: frame 12: /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f7c3ca4feed]
Jan 07 2016 12:13:37 GMT: INFO (as): (as.c::410) <><><><><><><><><><> Aerospike Community Edition build 3.7.1 <><><><><><><><><><>
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) # Aerospike database configuration file.
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) service {
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) user root
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) group root
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) paxos-single-replica-limit 1 # Number of nodes where the replica count is automatically reduced to 1.
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) pidfile /var/run/aerospike/asd.pid
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) service-threads 32
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) transaction-queues 32
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) transaction-threads-per-queue 32
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) batch-index-threads 32
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) proto-fd-max 15000
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) batch-max-requests 200000
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) }
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) logging {
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) # Log file must be an absolute path.
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) file /var/log/aerospike/aerospike.log {
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) context any info
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) }
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) }
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) network {
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) service {
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) #address any
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) port 3000
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) }
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) heartbeat {
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) mode mesh
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) mesh-seed-address-port 10.240.0.6 3002
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) mesh-seed-address-port 10.240.0.5 3002
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) port 3002
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) interval 150
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) timeout 10
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) }
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) fabric {
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) port 3001
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) }
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) info {
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) port 3003
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) }
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) }
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) namespace test {
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) replication-factor 10
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) memory-size 3500M
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) default-ttl 0 # 30 days, use 0 to never expire/evict.
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) ldt-enabled true
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) storage-engine device {
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) file /data/aerospike.dat
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) write-block-size 1M
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) filesize 300G
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) }
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247) }
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3247)
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3265) system file descriptor limit: 100000, proto-fd-max: 15000
Jan 07 2016 12:13:37 GMT: INFO (cf:misc): (id.c::119) Node ip: 10.240.0.14
Jan 07 2016 12:13:37 GMT: INFO (cf:misc): (id.c::327) Heartbeat address for mesh: 10.240.0.14
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3309) Rack Aware mode not enabled
Jan 07 2016 12:13:37 GMT: INFO (config): (cfg.c::3312) Node id bb90e00f00a0142
Jan 07 2016 12:13:37 GMT: INFO (namespace): (namespace_cold.c::101) ns test beginning COLD start
Jan 07 2016 12:13:37 GMT: INFO (drv_ssd): (drv_ssd.c::3797) opened file /data/aerospike.dat: usable size 322122547200
Jan 07 2016 12:13:37 GMT: INFO (drv_ssd): (drv_ssd.c::1107) /data/aerospike.dat has 307200 wblocks of size 1048576
Jan 07 2016 12:13:37 GMT: INFO (drv_ssd): (drv_ssd.c::3176) device /data/aerospike.dat: reading device to load index
Jan 07 2016 12:13:37 GMT: INFO (drv_ssd): (drv_ssd.c::3181) In TID 13102: Using arena #150 for loading data for namespace "test"
Jan 07 2016 12:13:39 GMT: INFO (drv_ssd): (drv_ssd.c::3977) {test} loaded 134133 records, 0 subrecords, /data/aerospike.dat 0%
Jan 07 2016 12:13:41 GMT: INFO (drv_ssd): (drv_ssd.c::3977) {test} loaded 258771 records, 0 subrecords, /data/aerospike.dat 0%
Jan 07 2016 12:13:43 GMT: INFO (drv_ssd): (drv_ssd.c::3977) {test} loaded 388121 records, 0 subrecords, /data/aerospike.dat 0%
Jan 07 2016 12:13:45 GMT: INFO (drv_ssd): (drv_ssd.c::3977) {test} loaded 512116 records, 0 subrecords, /data/aerospike.dat 1%
Jan 07 2016 12:13:47 GMT: INFO (drv_ssd): (drv_ssd.c::3977) {test} loaded 641566 records, 0 subrecords, /data/aerospike.dat 1%
This was fixed in version 3.7.1 and above of aerospike server.
More details on the issue and Jira:
[AER-4487], [AER-4690] - (Clustering/Migration) Race condition causing incorrect heartbeat fd saved and later not removable.
Please also see:
https://discuss.aerospike.com/t/aerospike-crash/2327