Replication issue on basic 3-node Disque cluster

Replication issue on basic 3-node Disque cluster - disque

I'm hitting a replication issue on a three-node Disque cluster, it seems weird because the use case is fairly typical so it's entirely possible I'm doing something wrong.
This is how to reproduce it locally:
# relevant disque info
disque_version:1.0-rc1
disque_git_sha1:0192ba7e
disque_git_dirty:0
disque_build_id:b02910aa5c47590a
Start 3 disque nodes in ports 9001, 9002 and 9003, and then have servers on port 9002 and 9003 meet with 9001.
127.0.0.1:9002> CLUSTER MEET 127.0.0.1 9001 #=> OK
127.0.0.1:9003> CLUSTER MEET 127.0.0.1 9001 #=> OK
The HELLO reports the same data for all three nodes, as expected.
127.0.0.1:9003> hello
1) (integer) 1
2) "e93cbbd17ad12369dd2066a55f9d4c51be9c93dd"
3) 1) "b61c63e8fd0c67544f895f5d045aa832ccb47e08"
2) "127.0.0.1"
3) "9001"
4) "1"
4) 1) "b32eb6501e272a06d4c20a1459260ceba658b5cd"
2) "127.0.0.1"
3) "9002"
4) "1"
5) 1) "e93cbbd17ad12369dd2066a55f9d4c51be9c93dd"
2) "127.0.0.1"
3) "9003"
4) "1"
Enqueuing a job succeeds, but the job does not show on on either QLEN or QPEEK in the other nodes.
127.0.0.1:9001> addjob myqueue body 1 #=> D-b61c63e8-IFA29ufvL37FRVjVVWisbO/x-05a1
127.0.0.1:9001> qlen myqueue #=> 1
127.0.0.1:9002> qlen myqueue #=> 0
127.0.0.1:9002> qpeek myqueue 1 #=> (empty list or set)
127.0.0.1:9003> qlen myqueue #=> 0
127.0.0.1:9003> qpeek myqueue 1 #=> (empty list or set)
When explicitly setting a replication value higher than the amount of nodes, disque fails with a NOREPL as one would expect, an explicit replication level of 2 succeeds, but the jobs are still nowhere to be seen in nodes 9002 and 9003. The same behavior happens regardless of the node in which I add the job.
My understanding is that replication happens synchronously when calling ADDJOB (unless explicitly using ASYNC), but it doesn't seem to be working properly, the test suite is passing in the master branch so I'm hitting a wall here and will have to dig into the source code, any help will be greatly appreciated!

The job is replicated, but it's enqueued only in one node. Try killing the first node to see the job enqueued in a different one.

Related

High performance Airflow Celery Workers do not work

I have
2 Airflow and Celary nodes (node 1, node 2)
1 Postgresql node (node 3)
2 RebbitMQ nodes (node 4, node 5)
I want to implement failover of Сelary workers.
Node 1 - Airflow(webserver, scheduler), Celary(workers_1, flower_1)
Node 2 - Celary(workers_2, flower_2)
I run tasks that write the current timestamp to the database for 5 minutes. After that, I kill the worker_1 (I use kill ... (pid worker) , or systemctl stop/restart worker.service). I expect that the Celary, having received an error, will start re-executing the task on worker_2, but the task is not re-executed.
I tried adding variables to airflow.cfg and ./airflow-certified.
task_acks_late=True
CELERY_ACKS_LATE=True
CELERY_TASK_REJECT_ON_WORKER_LOST=True
task_reject_on_worker_lost=True
task_track_started=True
But it did not help.
How can I make it so that if my worker dies, the task starts again on another worker?

Redis cluster cannot add nodes

There are two redis server. And I have run three redis instances on each server.
When I executed cluster meet [ip] [port] to add the cluster nodes, I found I just could add the nods which was running on the same server. Everytime I run this command, it alwasys echo an "OK" for me. But when I use cluster nodes to check the nodes list, it always shows like this.
172.18.0.155:7010> cluster meet 172.18.0.156 7020
OK
172.18.0.155:7010> cluster nodes
ad829d8b297c79f644f48609f17985c5586b4941 127.0.0.1:7010#17010 myself,master - 0 1540538312000 1 connected
87a8017cfb498e47b6b48f0ad69fc066c466a9c2 172.18.0.156:7020#17020 handshake - 1540538308677 0 0 disconnected
fdf5879554741759aab14eba701dc185b605ac16 127.0.0.1:7012#17012 master - 0 1540538313000 0 connected
ec7b3ecba7a175ddb81f254821243dd469a7f961 127.0.0.1:7011#17011 master - 0 1540538314288 2 connected
You can see the nodes status is disconnected. And you can find it will disappare from the list, if you check again about 5s later.
Has anybody meet this problem before? I have no idea how to solve this problem. Please help me. Thanks a lot.

I have solved the problem. I found I had done some mistakes with the bind configuration. When I just add one IP which communicate with other nodes for the bind setting. The cluster nodes can add normally.

Aerospike: count all objects in a namespace?

So I've been going through Aerospike's documentations, and I still fail to understand how to get the number of master (not-replicated) records in a names-space via the Java API (not AQL).. any help?

I tried this with a server running on localhost, port 3000. You can explore the Info class further and use better constructs. I have 10,000 objects in a namespace called 'test'.
String output = Info.request("127.0.0.1", 3000, "namespace/test");
String[] mp = output.split(";") ;
console.printf(mp[0]);
I got the following output on my console:
objects=10000

I repeated with a three node cluster. 10,000 master objects, replication factor of 2. Created cluster by running 3 aerospike processes on same VM but separated ports as 3000, 4000 & 5000.
Java code (not parsing the string):
console.printf("\nNode at Port 3000\n");
console.printf(Info.request("127.0.0.1", 3000, "namespace/test"));
console.printf("\nNode at Port 4000\n");
console.printf(Info.request("127.0.0.1", 4000, "namespace/test"));
console.printf("\nNode at Port 5000\n");
console.printf(Info.request("127.0.0.1", 5000, "namespace/test"));
Relevant Output:
Node at Port 3000
objects=6537;sub_objects=0;master_objects=3286;master_sub_objects=0;prole_objects=3251;prole_sub_objects=0;...
Node at Port 4000
objects=6674;sub_objects=0;master_objects=3294;master_sub_objects=0;prole_objects=3380;prole_sub_objects=0;...
Node at Port 5000
objects=6789;sub_objects=0;master_objects=3420;master_sub_objects=0;prole_objects=3369;prole_sub_objects=0;...
Cross check with asadm
asadm>info
3-node cluster, relevant output:

How to scale down a CrateDB cluster?

For testing, I wanted to shrink my 3 node cluster to 2 nodes, to later go and do the same thing for my 5 node cluster.
However, after following the best practice of shrinking a cluster:
Back up all tables
For all tables: alter table xyz set (number_of_replicas=2) if it was less than 2 before
SET GLOBAL PERSISTENT discovery.zen.minimum_master_nodes = <half of the cluster + 1>;
3 a. If the data check should always be green, set the min_availability to 'full':
https://crate.io/docs/reference/configuration.html#graceful-stop
Initiate graceful stop on one node
Wait for the data check to turn green
Repeat from 3.
When done, persist the node configurations in crate.yml:
gateway.recover_after_nodes: n
discovery.zen.minimum_master_nodes:[![enter image description here][1]][1] (n/2) +1
gateway.expected_nodes: n
My cluster never went back to "green" again, and I also have a critical node check failing.
What went wrong here?
crate.yml:
...
################################## Discovery ##################################
# Discovery infrastructure ensures nodes can be found within a cluster
# and master node is elected. Multicast discovery is the default.
# Set to ensure a node sees M other master eligible nodes to be considered
# operational within the cluster. Its recommended to set it to a higher value
# than 1 when running more than 2 nodes in the cluster.
#
# We highly recommend to set the minimum master nodes as follows:
# minimum_master_nodes: (N / 2) + 1 where N is the cluster size
# That will ensure a full recovery of the cluster state.
#
discovery.zen.minimum_master_nodes: 2
# Set the time to wait for ping responses from other nodes when discovering.
# Set this option to a higher value on a slow or congested network
# to minimize discovery failures:
#
# discovery.zen.ping.timeout: 3s
#
# Time a node is waiting for responses from other nodes to a published
# cluster state.
#
# discovery.zen.publish_timeout: 30s
# Unicast discovery allows to explicitly control which nodes will be used
# to discover the cluster. It can be used when multicast is not present,
# or to restrict the cluster communication-wise.
# For example, Amazon Web Services doesn't support multicast discovery.
# Therefore, you need to specify the instances you want to connect to a
# cluster as described in the following steps:
#
# 1. Disable multicast discovery (enabled by default):
#
discovery.zen.ping.multicast.enabled: false
#
# 2. Configure an initial list of master nodes in the cluster
# to perform discovery when new nodes (master or data) are started:
#
# If you want to debug the discovery process, you can set a logger in
# 'config/logging.yml' to help you doing so.
#
################################### Gateway ###################################
# The gateway persists cluster meta data on disk every time the meta data
# changes. This data is stored persistently across full cluster restarts
# and recovered after nodes are started again.
# Defines the number of nodes that need to be started before any cluster
# state recovery will start.
#
gateway.recover_after_nodes: 3
# Defines the time to wait before starting the recovery once the number
# of nodes defined in gateway.recover_after_nodes are started.
#
#gateway.recover_after_time: 5m
# Defines how many nodes should be waited for until the cluster state is
# recovered immediately. The value should be equal to the number of nodes
# in the cluster.
#
gateway.expected_nodes: 3

So there are two things that are important:
The number of replicas is essentially the number of nodes you can loose in a typical setup (2 is recommended so that you can scale down AND loose a node in the process and still be ok)
The procedure is recommended for clusters > 2 nodes ;)
CrateDB will automatically distribute the shards across the cluster in a way that no replica and primary share a node. If that is not possible (which is the case if you have 2 nodes and 1 primary with 2 replicas, the data check will never return to 'green'. So in your case, set the number of replicas to 1 in order to get the cluster back to green (alter table mytable set (number_of_replicas = 1)).
The critical node check is due to the cluster not having received an updated crate.yml yet: Your file also still has the configuration of a 3-node cluster in it, hence the message. Since CrateDB only loads the expected_nodes at startup (it's not a runtime setting), a restart of the whole cluster is required to conclude scaling down. It can be done with a rolling restart, but be sure to set SET GLOBAL PERSISTENT discovery.zen.minimum_master_nodes = <half of the cluster + 1>; properly, otherwise the consensus will not work...
Also, it's recommended to scale down one-by-one in order to avoid overloading the cluster with rebalancing and accidentally loosing data.

Redis Sentinel : last node doesn't become master

I'm trying to set up an automatic failover system in a 3 nodes redis cluster. I installed redis-sentinel on each of these nodes (juste like this guy : http://www.symantec.com/connect/blogs/configuring-redis-high-availability).
Everything is fine as long as I have two or three nodes. The problem is that whenever there's only onte node remaining and that it's a slave, it does not get elected as master automatically. The quorum is set to 1, therefore the last node detects the odown of the master but can't vote for the failover since there's no majority.
To overcome this (surprising) issue, I wrote a little script that ask the other nodes for their masters, and if they don't answer I set the current node as the master. This script is called within the redis-sentinel.conf file, as a notification script. However ... As soon as the redis-sentinel service is started, this configuration is "erased" ! If I look at the configuration file in /etc, the "sentinel notification-script" line has disappeared (redis-sentinel rewrites its configuration file so why not) BUT the configuration I wrote is no longer available :
1) 1) "name"
2) "mymaster"
3) "ip"
4) "x.x.x.x"
5) "port"
6) "6379"
7) "runid"
8) "somerunid"
9) "flags"
10) "master"
11) "pending-commands"
12) "0"
13) "last-ping-sent"
14) "0"
15) "last-ok-ping-reply"
16) "395"
17) "last-ping-reply"
18) "395"
19) "down-after-milliseconds"
20) "30000"
21) "info-refresh"
22) "674"
23) "role-reported"
24) "master"
25) "role-reported-time"
26) "171302"
27) "config-epoch"
28) "0"
29) "num-slaves"
30) "1"
31) "num-other-sentinels"
32) "1"
33) "quorum"
34) "1"
35) "failover-timeout"
36) "180000"
37) "parallel-syncs"
38) "1"
That is the result of the sentinel-masters command. The only thing is that I previously set the "down-after-milliseconds" to 5000 and the "failover-timeout" to 10000 ...
I don't know if anyone has met anything similar ? Well, should someone has a little idea about wwhat's happening, I'd be glad about it ;)

This is a reason to not place your sentinels on your redis instance nodes. Think of them as monitoring agents. You wouldn't place your website monitor on the same node running your website and expect to catch the node death. The same is expected w/Sentinel.
The proper route to sentinel monitoring is to ideally run them from the clients, and if that isn't possible or workable, then from dedicated nodes as close to the clients as possible.
As antirez said, you need to have enough sentinels to have the election. There are two elections: 1: deciding on the new master and 2: deciding which sentinel handles the promotion. In your scenario you only have one sentinel, but to elect a sentinel to handle the promotion your sentinel needs votes from a quorum of Sentinels. This number is a majority of all sentinels seen. In your case it needs two sentinels to vote before an election can take place. This quorum number is not configurable and unaffected by the quorum setting. This is in place to reduce the chances of multiple masters.
I would also strongly advise against setting a quorum to be less than half+1 of your sentinels. This can lead to split brain operation where you have two masters. Or in your case you could have three. If you lost connectivity between your master and the two slaves but clients still had connectivity your settings could trigger split brain - where a slave was promoted and new connections talked to that master while existing ones continue talking to the original. Thus you have valid data in two masters which likely conflict with each other.
The author of that Symantec article only consider the Redis daemon dying, not the node. Thus it really isn't an HA setup.

the quorum is only used to reach the ODOWN state, that triggers the failover. For the failover to actually happen the slave must be voted by a majority, so a single node can't get elected. If you have such a requirement, and you don't care about only the majority side being able to continue in your cluster (this means unbound data loss in the minority side if clients get partitioned with a minority where there is a master), you can just add sentinels in your clients machines as well, this way the total num of Sentinels is, for example, 5, and even if two Redis nodes are down, the only remaining node plus two sentinels running client side are enough to get majority of 3. Unfortunately the Sentinel documentation is not complete enough to explain this stuff. There is all the info to get the picture right, but no examples for a faster reading / deploying.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas