I am writing a script to monitor redis replication latency in a group of redis slaves managed using sentinel. I am looking at the results of the INFO replication command, which look like this:
# Replication
role:master
connected_slaves:5
slave0:ip=x.x.x.x,port=6379,state=online,offset=22246539656,lag=0
slave1:ip=y.y.y.y,port=6379,state=online,offset=22246538633,lag=1
slave2:ip=z.z.z.z,port=6379,state=online,offset=22247193804,lag=0
slave3:ip=n.n.n.n,port=6379,state=online,offset=22246538633,lag=1
slave4:ip=m.m.m.m,port=6379,state=online,offset=22244239193,lag=1
master_repl_offset:22246539199
repl_backlog_active:1
repl_backlog_size:536870912
repl_backlog_first_byte_offset:21709668288
repl_backlog_histlen:536870912
I had thought that the offset for each slave was a measure of how much data had been replicated so far, so I could look at the difference between the master_repl_offset and the offset values for the various slaves to determine the amount of data not yet replicated. However, in the above output, the offsets for slave0 and slave2 are both higher than for the master. Have I misunderstood what these numbers mean?
Related
If this sound silly to you I apologise in advance, I am new to splunk and did udemy course but can't figure out this.
If I check my indexes.conf file in cluster master I get repFator=0
#
# By default none of the indexes are replicated.
#
repFactor = 0
but if I check https://:8089/services/cluster/config
I see replication factor :
replication_factor 2
So I am confused whether my data is getting replicated,
I have two indexes in a cluster
I believe replication_factor determines how many replicas to have amongst nodes in the cluster, and refFactor determines whether or not to replicate a particular index.
For repFactor, which is an index specific setting
The indexes.conf repFactor attribute
When you add a new index stanza, you must set the repFactor attribute to "auto". This causes the index's data to be replicated to other peers in the cluster.
Note: By default, repFactor is set to 0, which means that the index will not be replicated. For clustered indexes, you must set it to "auto".
The only valid values for repFactor are 0 and "auto".
For replication_factor, which is a cluster setting:
Replication factor and cluster resiliency
The cluster can tolerate a failure of (replication factor - 1) peer nodes. For example, to ensure that your system can tolerate a failure of two peers, you must configure a replication factor of 3, which means that the cluster stores three identical copies of each bucket on separate nodes. With a replication factor of 3, you can be certain that all your data will be available if no more than two peer nodes in the cluster fail. With two nodes down, you still have one complete copy of data available on the remaining peers.
By increasing the replication factor, you can tolerate more peer node failures. With a replication factor of 2, you can tolerate just one node failure; with a replication factor of 3, you can tolerate two concurrent failures; and so on.
The repFactor setting lets you choose which indexes are replicated. By default, none are. The replication_factor setting says how many copies of a replicated bucket to make. Both must be non-zero to replicate data.
The Cluster Manager should confirm that. Select Settings->Indexer Clustering to see which indexes are replicated and their state.
we have many datacenters but datacenter1 is the main.
the master in datacenter1 is being monitored by sentinel so if the master goes down one the replicas will become master and also all data is being synced continuously.
we want to have one Redis replica in each datacenter, replicate all data from datacenter1 but without the ability to become master. (always get data from data center 1 and just replica 1 have the ability to become master but other replicas must not be able)
is there a Redis config for this or any idea?
Redis Multi Datacenter
Redis config [1] has a replica-priority parameter which should serve your purpose.
The replica priority is an integer number published by Redis in the INFO
output. It is used by Redis Sentinel in order to select a replica to promote
into a master if the master is no longer working correctly.
A replica with a low priority number is considered better for promotion, so
for instance if there are three replicas with priority 10, 100, 25 Sentinel
will pick the one with priority 10, that is the lowest.
However a special priority of 0 marks the replica as not able to perform the
role of master, so a replica with priority of 0 will never be selected by
Redis Sentinel for promotion.
By default the priority is 100.
The idea can be setting lower replica-priority value to replicas in datacenter1 and higher value to replicas in other datacenters.
[1] redis.conf file of Redis version 6.2.6: https://github.com/redis/redis/blob/6.2.6/redis.conf
I'm trying to set up Redis Sentinel.
I know that when a master goes down the sentinel pick up one of its slaves and promote it as master.
I was wondering based on which attributes the new master is selected among the slaves and which slave got selected for being a new master?
After Sentinels election, the leader sentinel will do the following steps:
Remove slaves already in down status from slave list.
Remove slaves which disconnection time is more than ten times of down-after-milliseconds + master down time
Select slave(s) by replica-priority(configured in slave)
If multiple slaves are selected, sort them by sync offset, and select the most in-sync(maximum offset) slave.
If there are still multiple selection, sort with RunId and select the smaller one.
So you can see the process order of master selection can be following order:
Disconnection time
Priority
Replication offset
Run Id
I want to understand the behavior of aerospike in different consistancy mode.
Consider a aerospike cluster running with 3 nodes and replication factor 3.
AP modes is simple and it says
Aerospike will allow reads and writes in every sub-cluster.
And Maximum no. of node which can go down < 3 (replication factor)
For aerospike strong consistency it says
Note that the only successful writes are those made on replication-factor number of nodes. Every other write is unsuccessful
Does this really means the no writes are allowed if available nodes < replication factor.
And then same document says
All writes are committed to every replica before the system returns success to the client. In case one of the replica writes fails, the master will ensure that the write is completed to the appropriate number of replicas within the cluster (or sub cluster in case the system has been compromised.)
what does appropriate number of replica means ?
So if I lose one node from my 3 node cluster with strong consistency and replication factor 3 , I will not be able to wright data ?
For aerospike strong consistency it says
Note that the only successful writes are those made on
replication-factor number of nodes. Every other write is unsuccessful
Does this really means the no writes are allowed if available nodes <
replication factor.
Yes, if there are fewer than replication-factor nodes then it is impossible to meet the user specified replication-factor.
All writes are committed to every replica before the system returns
success to the client. In case one of the replica writes fails, the
master will ensure that the write is completed to the appropriate
number of replicas within the cluster (or sub cluster in case the
system has been compromised.)
what does appropriate number of replica means ?
It means replication-factor nodes must receive the write. When a node fails, a new node can be promoted to replica status until either the node returns or an operator registers a new roster (cluster membership list).
So if I lose one node from my 3 node cluster with strong consistency
and replication factor 3 , I will not be able to wright data ?
Yes, so having all nodes a replicas wouldn't be a very useful configuration. Replication-factor 3 allows up to 2 nodes to be down, but only if the remaining nodes are able to satisfy the replication-factor. So for replication-factor 3 you would probably want to run with a minimum of 5 nodes.
You are correct, with 3 nodes and RF 3, losing one node means the cluster will not be able to successfully take write transactions since it wouldn't be able to write the required number of copies (3 in this case).
Appropriate number of replicas means a number of replicas that would match the replication factor configured.
I'm using Apache Cassandra 2.1.1 and when using nodetool status the Load for one of my nodes is about half the size of the other two while the Owns is almost equal on all the nodes. I am somewhat new to Cassandra and don't know if I should be worried about this or not. I have tried using repair and cleanup after restarting all the nodes, but it still appears unbalanced. I am using GossipingPropertyFileSnitch with each node configured dc=DC1 and rack=RAC1 specified in cassandra-rackdc.properties. I am also using Murmur3Partitioner with NetworkTopologyStrategy where my keyspace is defined as
CREATE KEYSPACE awl WITH replication = {'class': 'NetworkTopologyStrategy', 'DC1': '2'} AND durable_writes = true;
I believe the problem to be with the awl keyspace since the size of the data/awl folder is the same size as reported by nodetool status. My output for nodetool status is below. Any help would be much appreciated.
Datacenter: DC1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 10.1.1.152 3.56 GB 256 68.4% d42945cc-59eb-41de-9872-1fa252762797 RAC1
UN 10.1.1.153 6.8 GB 256 67.2% 065c471d-5025-4bf1-854d-52d579f2a6d3 RAC1
UN 10.1.1.154 6.31 GB 256 64.4% 46f05522-29cc-491c-ab65-334b205fc415 RAC1
I would suspect this is due to the distribution of the key values that are being inserted. They are probably not well distributed across the possible key values, so many of them are hashing to one node. Since you are using replication factor 2, the second replica is the next node in the ring, resulting in two nodes with more data than the third node.
You didn't show your table schema, so I don't know what you are using for the partition and clustering keys. You want to use key values that have a high cardinality and good distribution to avoid hot spots where a lot of inserts are hashing to one node. With a better distribution you will get better performance and more even space usage across the nodes.