Aerospike Community Edition: what should I do to `aerospike.conf` to setup a cluster? - aerospike

I'm trying to setup a three-node Aerospike cluster on Ubuntu 14.04. Apart from the IP address/name, each machine is identical. I installed Aerospike and the management console, per the documentation, on each machine.
I then edited the network/service and network/heartbeat sections in /etc/aerospike/aerospike.conf:
network {
service {
address any
port 3000
access-address 10.0.1.11 # 10.0.1.12 and 10.0.1.13 on the other two nodes
}
heartbeat {
mode mesh
port 3002
mesh-seed-address-port 10.0.1.11 3002
mesh-seed-address-port 10.0.1.12 3002
mesh-seed-address-port 10.0.1.13 3002
interval 150
timeout 10
}
[...]
}
When I sudo service aerospike start on each of the nodes, the service runs but it's not clustered. If I try to add another node in the management console, it informs me: "Node 10.0.1.12:3000 cannot be monitored here as it belongs to a different cluster."
Can you see what I'm doing wrong? What changes should I make to aerospike.conf, on each of the nodes, in order to setup an Aerospike cluster instead of three isolated instances?

Your configuration appears correct.
Check if you are able to open a TCP connection over ports 3001 and 3002 from each host to the rest.
nc -z -w5 <host> 3001; echo $?
nc -z -w5 <host> 3002; echo $?
If not I would first suspect firewall configuration.
Update 1:
The netcat commands returned 0 so let's try to get more info.
Run and provide the output of the following on each node:
asinfo -v service
asinfo -v services
asadm -e info
Update 2:
After inspecting the output in the gists, the asadm -e "info net" indicated that all nodes had the same Node IDs.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Network Information~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Node Node Fqdn Ip Client Current HB HB
. Id . . Conns Time Self Foreign
h *BB9000000000094 hadoop01.woolford.io:3000 10.0.1.11:3000 15 174464730 37129 0
Number of rows: 1
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Network Information~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Node Node Fqdn Ip Client Current HB HB
. Id . . Conns Time Self Foreign
h *BB9000000000094 hadoop03.woolford.io:3000 10.0.1.13:3000 5 174464730 37218 0
Number of rows: 1
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Network Information~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Node Node Fqdn Ip Client Current HB HB
. Id . . Conns Time Self Foreign
h *BB9000000000094 hadoop02.woolford.io:3000 10.0.1.12:3000 5 174464731 37203 0
Number of rows: 1
The Node ID is constructed with the fabric port (port 3001 in hex) followed by the MAC address in reverse byte order. Another flag was that the "HB Self" was non-zero and is expected to be zero in a mesh configuration (in a multicast configuration this will also be non-zero since the nodes will receive their own heartbeat messages).
Because all of the Node IDs are the same, this would indicate that all of the MAC address are the same (though it is possible to change the node IDs using rack aware). Heartbeats that appear to have originated from the local node (determined by hb having the same node id) are ignored.
Update 3:
The MAC addresses are all unique, which contradicts previous conclusions. A reply provided the interface name being used, em1, which is not an interface name Aerospike looks for. Aerospike looks for interfaces named either eth#, bond#, or wlan#. I assume since the name wasn't one of the expected three this caused the issue with the MAC addresses; if so I would suspect the following warning exists in the logs?
Tried eth,bond,wlan and list of all available interfaces on device.Failed to retrieve physical address with errno %d %s
For such scenarios the network-interface-name parameter may be used to instruct Aerospike which interface to use for node id generation. This parameter also determines which interface's IP address should be advertised to the client applications.
network {
service {
address any
port 3000
access-address 10.0.1.11 # 10.0.1.12 and 10.0.1.13 on the other two nodes
network-interface-name em1 # Needed for Node ID
}
Update 4:
With the 3.6.0 release, these device names will be automatically discovered. See AER-4026 in release notes.

Related

`oc cluster up` fails during initial startup

I am trying out okd but it fails for me during the oc cluster up port check step. The debug output is not very verbose to be polite. Do you have an idea what to look for.
$ oc cluster up
Getting a Docker client ...
Checking if image openshift/origin-control-plane:v3.11 is available ...
Checking type of volume mount ...
Determining server IP ...
Checking if OpenShift is already running ...
Checking for supported Docker version (=>1.22) ...
Checking if insecured registry is configured properly in Docker ...
Checking if required ports are available ...
error: a port needed by OpenShift is not available
But the required ports 53 and 8443 are not taken
sudo netstat -tulpn | grep '\(:8443\|:53\)'
At least netstat returns nothing
Versions:
$ oc version
oc v3.11.0+0cbc58b
kubernetes v1.11.0+d4cacc0
features: Basic-Auth GSSAPI Kerberos SPNEGO
and
CentOS Linux release 7.6.1810 (Core)
I have not been able to find out how to turn debugging on so that it is possible to see what it really checks for.
Has the user you are running the command as enough priveledges to open privileged ports (ports <1024) on your host machine?
try running cluster up as root or with sudo
yes I starting whole okd as root user

Google Cloud VM instance SSH connection ~60 seconds timeout with 30 second keepalive

I've been connecting to a Google Cloud VM instance via gcloud ssh from my macOS:
$ gcloud compute ssh [username]#[instance]
Starting from a week ago, the connection will just drop after ~60 seconds of idle connection and returns:
Connection to [my_external_ip] closed by remote host.
Connection to [my_external_ip] closed.
ERROR: (gcloud.compute.ssh) [/usr/bin/ssh] exited with return code [255].
I configured the TCP keepalive time to 30 seconds on both my macbook and the VM. But that did not solve the problem.
Any idea how do I extend the connection duration?
This is unlikely an issue with your timeout setting, but more likely an issue with your firewall rules or routes.
Firstly I would suggest checking your firewall rules and ensure you have an ingress firewall rule opening port 22. If you have, check the configuration of this rule, in particular:
Check the IP range in 'Source filters'. Does the range include the IP address of your home computer? For testing purposes, to ensure it does, you could temporarily set this to 0.0.0.0/0 to include all IP addresses.
Check the 'Targets' drop-down. Is this set to apply to 'All instances in the network' or is it set to 'Specified target tags'? If you have set it to 'Specified target tags', make sure that the same tag is added to the 'Network tags' section of the instance, otherwise the firewall rule will not apply to the instance and allow SSH traffic.
Ensure this rule has a higher priority than any other rules that could counteract it (when I say higher priority I mean lower number, for example, a a rule with a priority of 1000 is a higher priority than a rule with a priority of 20000).
If the above doesn't resolve the issue, run the following command to check the routes:
gcloud compute routes list
Ensure there is an entry which contains the following:
default 0.0.0.0/0 default-internet-gateway
EDIT
If you are able to sometimes SSH into the instance but then the connection drops, there may be some useful information in the logs, or the serial console.
You can access the serial console by clicking on the instance name in the GCP Console, then clicking on "Serial port 1 ".
When you SSH into the instance, information about the SSH session populates the serial console output (this can be refreshed by hitting the 'Refresh' at the top of the page.) Information about the session ending also populates the serial console. There may be some useful information/clues about why the session ends in this output.
It might also be worth checking the status of SSH daemon on the instance and giving it a restart to see if that makes a difference:
Check status of sshd:
systemctl status sshd
Restart sshd:
sudo systemctl restart sshd

Redis 3 Waiting for the cluster to join

So i'm trying to create a cluster using the default redis guide.
but when running ruby /usr/share/doc/redis-tools/examples/redis-trib.rb create .... i get stuck forever in "Waiting for the cluster to join".
Each redis conf is bound to their respective static ip address (Not only 127)
My nodes are all located on an separate instance of ubuntu 16.04 in a Exsi envoirment without ANY firewall between them.
Each host is not created separatly, I just copied the first and changed hostnames + static iface for the other two, if that could cause something?
Master-slave replication works, so i doubt there is an connection issue?
Here is a print, if that can help in some way: http://i.imgur.com/LrNOrut.png
Any ideas?
UPDATE
I have checked all hosts from another physical interface and I have connected to them successfully with cluster-enabled no
Both 6379 and 16379 are accepting connections on both 127.0.0.1 and 192....
And all hosts can reach each other with telnet <host> <16379>
Try to keep only one IP in "bind " configuration directive in /etc/redis/redis.conf or even comment it out
I had same problem when there was following string in my config :
bind 127.0.0.1 172.19.2.10Х
Removed loopback interface on all the nodes and passed that obstacle.

Force docker-machine to specific IP using Hyper-V, network unreachable

I have found a partial answer to this question, and it is successfully setting the machine at the desired IP address. But the network is unreachable from inside a docker-machine created with the Hyper-V driver.
The TLDR on the answer above is to create a script, /var/lib/boot2docker/bootsync.sh:
sudo cat /var/run/udhcpc.eth0.pid | xargs sudo kill
sudo ifconfig eth0 192.168.XXX.YYY netmask 255.255.255.0 broadcast 192.168.XXX.255 up
Once I make the script, I restart the machine.
When I restart the machine, the IP is set to my desired address (expected). I can remote in at the address, so it is at least available through the host. But when I test for connections, there is no connection to the internet (unexpected).
Boot2Docker version 17.05.0-ce, build HEAD : 5ed2840 - Fri May 5 21:04:09 UTC 2017
Docker version 17.05.0-ce, build 89658be
docker#machine:~$ docker pull ubuntu
Using default tag: latest
Error response from daemon: Get https://registry-1.docker.io/v2/: dial tcp: lookup registry-1.docker.io on [::1]:53: read udp [::1]:48331->[::1]:53: read: connection refused
docker#machine:~$ ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8): 56 data bytes
ping: sendto: Network is unreachable
If I remove the script and restart again, I am reassigned a new/random IP address (expected), remote in at that new IP address, and can do network connections (expected):
docker#pm:~$ docker pull ubuntu
Using default tag: latest
latest: Pulling from library/ubuntu
aafe6b5e13de: Pull complete
0a2b43a72660: Pull complete
18bdd1e546d2: Pull complete
8198342c3e05: Pull complete
f56970a44fd4: Pull complete
Digest: sha256:f3a61450ae43896c4332bda5e78b453f4a93179045f20c8181043b26b5e79028
Status: Downloaded newer image for ubuntu:latest
docker#pm:~$ ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8): 56 data bytes
64 bytes from 8.8.8.8: seq=0 ttl=43 time=18.424 ms
64 bytes from 8.8.8.8: seq=1 ttl=43 time=27.638 ms
The accepted answer has several up votes, but it reads like this is a confirmed work around on VirtualBox. Not sure what about Hyper-V would be causing the IP assignment to cut off internet access.
I had the same problem, and I solved it by adding the following to the end of bootsync.sh:
route add default gw <address>
There was no default route to the gateway or the internet, so it must be set manually.

RabbitMQ - AWS EC2 Clustering hell

Sorry, should be shot for having to even ask this, but wasted day on this - and feel like I've read everything there is.
I can't create a cluster on my EC2 instances (3) that are spread on three different regions. The hosts:
rabbit#ip-172-31-47-217
rabbit#ip-172-31-1-82
rabbit#ip-172-31-36-111
The initial state before trying to make the cluster:
ubuntu#ip-172-31-47-217:~$ sudo rabbitmqctl cluster_status
Cluster status of node 'rabbit#ip-172-31-47-217' ...
[{nodes,[{disc,['rabbit#ip-172-31-47-217']}]},
{running_nodes,['rabbit#ip-172-31-47-217']},
{partitions,[]}]
ubuntu#ip-172-31-36-111:~$ sudo rabbitmqctl cluster_status
Cluster status of node 'rabbit#ip-172-31-36-111' ...
[{nodes,[{disc,['rabbit#ip-172-31-36-111']}]},
{running_nodes,['rabbit#ip-172-31-36-111']},
{partitions,[]}]
ubuntu#ip-172-31-1-82:~$ sudo rabbitmqctl cluster_status
Cluster status of node 'rabbit#ip-172-31-1-82' ...
[{nodes,[{disc,['rabbit#ip-172-31-1-82']}]},
{running_nodes,['rabbit#ip-172-31-1-82']},
{partitions,[]}]
When I try to check status from one server for another:
sudo rabbitmqctl status -n rabbit#ip-172-31-1-82
Status of node 'rabbit#ip-172-31-1-82' ...
Error: unable to connect to node 'rabbit#ip-172-31-1-82': nodedown
nodes in question: ['rabbit#ip-172-31-1-82']
hosts, their running nodes and ports:
- unable to connect to epmd on ip-172-31-1-82: timeout (timed out)
current node details:
- node name: 'rabbitmqctl3835#ip-172-31-36-111'
- home dir: /var/lib/rabbitmq
- cookie hash: 0tsf/OyQZI7zobmv1Ia97w==
All three servers have the same erlang cookie hash.
I can verify the host names are setup properly:
host ip-172-31-36-111
ip-172-31-36-111.us-west-2.compute.internal has address 172.31.36.111
I know the ports are open:
netstat -plten | grep beam
Because I opened all TCP and UDP at this point as a test, no change.
and finally if this would behave differently given those failures:
sudo rabbitmqctl join_cluster --ram rabbit#ip-172-31-1-82
Clustering node 'rabbit#ip-172-31-47-217' with 'rabbit#ip-172-31-1-82' ...
Error: {cannot_discover_cluster,"The nodes provided are either offline or not running"}
Please help, being driven insane by this.
The problem is that they are in different regions (presumably in EC2-classic - you didn't mention whether you were using a VPC). This means they cannot communicate via their private IPs (see e.g. Can EC2 instances in different regions communicate over their private IP addresses?)
ping 172.31.36.111
will fail from one of the other servers, for example. Pinging using the hostname probably will probably even fail on the DNS lookup.
Your options are:
Put them in separate zones in a single region (in EC2 classic, they will be able to communicate). You could also use a VPC in this case, putting the in separate subnets but allowing interconnections via appropriately set up security groups.
Set up /etc/hosts on each server to point the relevant public IPs of the other servers (you could attach elastic IPs to each server to ensure stability across server restarts). You could also set the hostname of each server for clarity. Set you your security groups to allow access on the relevant ports that rabbitmq uses. There may be security implications of doing this, since the data will be travelling over the public internet.
Set up a VPN between each server in the cluster. Amazon VPC has a VPN facility, but there are ways of setting it up yourself I think.
I think only option 1 is simplest. Option 2 has major security implications (I believe there are ways of securing the connection between the cluster servers, but they aren't documented on the rabbitmq website as far as I can tell). Option 3 is complex but probably the best option if you need multiple regions.
Note that rabbitmq clusters aren't meant to be run over wide geographical areas, since they aren't too reliable in the face of network partitions. See here: https://www.rabbitmq.com/clustering.html