Maxscale, ranks and priority - galera

I'm using a maxscale (6.2) readwritesplit router with 3 galera servers (mariadb 10.4). 2 of them are in DC1, and a 3rd one is on a distant DC2.
Using ranks is the only option I see here, as priority will be used by galeramon to select the master.
My goal is to tell maxscale to use DC1 as much as possible, so server1 as master (priority=1, rank=primary), server2 as slave (priority=2, rank=primary), and use DC2 server3 only if server1 or server2 are not reachable (priority=3, rank=secondary).
Is this the correct behavior?
[server1]
type=server
address=192.168.0.11
priority=1
rank=primary
[server2]
type=server
address=192.168.0.12
priority=2
rank=primary
[server3]
type=server
address=192.168.0.21
priority=3
rank=secondary
[split]
type=service
router=readwritesplit
servers=server1,server2,server3
[monitor]
type=monitor
module=galeramon
servers=server1,server2,server3
root_node_as_master=true
use_priority=true

Yes, the readwritesplit module only uses connections that have the same rank. This means that if both server1 and server2 fail, the readwritesplit service will use server3 as long as it is up.
When either of the two other servers comes back up, the galeramon monitor will shift the Master label from server3 to one of the other nodes. At that point readwritesplit will discard the connection to server3 and reconnect to the nodes with the higher rank, if possible.

Related

It is possible for only the Redis master instances to handle all reads and writes, and for the slaves to be used only for failover?

My work system consists of Spring web applications and it uses Redis as a transaction counter and it conditionally blocks transaction requests.
The transaction is as follows:
Check whether or not data exists. (HGET)
If it doesn't, saves new one with count 0 and set expiration time. (HSET, EXPIRE)
Increases a count value. (INCRBY)
If the increased count value reaches a specific configured limit, it sets the transaction to 'blocked' (HSET)
The limit value is my company's business policy.
Such read and write operations are requested one after another, immediately.
Currently, I use one Redis instance at one machine. (only master, no replications.)
I want to get Redis HA, so I need slave insntaces but at the same time, I want to have all reads and writes to Redis only to master insntaces because of slave data relication latency.
After some research, I found that it is a good idea to have a proxy server to use Redis HA. However, with proxy, it seems impossible to use only the master instances to receive requests and the slaves only for failover.
Is it possible??
Thanks in advance.
What you need is Redis Sentinel.
With Redis Sentinel, you can get the master address from sentinel, and read/write with master. If master is down, Redis Sentinel will do failover, and elect a new master. Then you can get the address of the new master from sentinel.
As you're going to use Lettuce for Redis cluster driver, you should set read preference to Master and things should be working fine a sample code might look like this.
LettuceClientConfiguration lettuceClientConfiguration =
LettuceClientConfiguration.builder().readFrom(ReadFrom.MASTER).build();
RedisClusterConfiguration redisClusterConfiguration = new RedisClusterConfiguration();
List<RedisNode> redisNodes = new ArrayList<>();
redisNodes.add(new RedisNode("127.0.0.1", 9000));
redisNodes.add(new RedisNode("127.0.0.1", 9001));
redisNodes.add(new RedisNode("127.0.0.1", 9002));
redisNodes.add(new RedisNode("127.0.0.1", 9003));
redisNodes.add(new RedisNode("127.0.0.1", 9004));
redisNodes.add(new RedisNode("127.0.0.1", 9005));
redisClusterConfiguration.setClusterNodes(redisNodes);
LettuceConnectionFactory lettuceConnectionFactory =
new LettuceConnectionFactory(redisClusterConfiguration, lettuceClientConfiguration);
lettuceConnectionFactory.afterPropertiesSet();
See in action at Redis Cluster Configuration

How does PerformanceCounter count current connections to IIS website?

We can use C# code or performance monitor in windows server to view current connections to IIS website.
PerformanceCounter performanceCounter = new System.Diagnostics.PerformanceCounter();
performanceCounter.CategoryName = "Web Service";
performanceCounter.CounterName = "Current Connections";
performanceCounter.InstanceName = "SMS_Collection_CFC";
string data = string.Format("{0}\t{1} = {2}", performanceCounter.CategoryName,
performanceCounter.CounterName, performanceCounter.NextValue());
This can return the connections number.
Is this counting the TCP connections under the hood? We know there are many TCP connection status like ESTABLISHED,TIME_WAIT, which status is performance counter counting?
Since nobody answers this post, I post my findings.
In the server, I invoke the related code in the original post, and it returns 574.
string data = string.Format("{0}\t{1} = {2}", performanceCounter.CategoryName,
performanceCounter.CounterName, performanceCounter.NextValue());
And then, I run the netstat command.The website is ocupying port 9010.
netstat -an | find /i "9010"
It returens 550 established TCP connections. So I guess it is monitoring established TCP connections.

Redis cluster cannot add nodes

There are two redis server. And I have run three redis instances on each server.
When I executed cluster meet [ip] [port] to add the cluster nodes, I found I just could add the nods which was running on the same server. Everytime I run this command, it alwasys echo an "OK" for me. But when I use cluster nodes to check the nodes list, it always shows like this.
172.18.0.155:7010> cluster meet 172.18.0.156 7020
OK
172.18.0.155:7010> cluster nodes
ad829d8b297c79f644f48609f17985c5586b4941 127.0.0.1:7010#17010 myself,master - 0 1540538312000 1 connected
87a8017cfb498e47b6b48f0ad69fc066c466a9c2 172.18.0.156:7020#17020 handshake - 1540538308677 0 0 disconnected
fdf5879554741759aab14eba701dc185b605ac16 127.0.0.1:7012#17012 master - 0 1540538313000 0 connected
ec7b3ecba7a175ddb81f254821243dd469a7f961 127.0.0.1:7011#17011 master - 0 1540538314288 2 connected
You can see the nodes status is disconnected. And you can find it will disappare from the list, if you check again about 5s later.
Has anybody meet this problem before? I have no idea how to solve this problem. Please help me. Thanks a lot.
I have solved the problem. I found I had done some mistakes with the bind configuration. When I just add one IP which communicate with other nodes for the bind setting. The cluster nodes can add normally.

How to scale down a CrateDB cluster?

For testing, I wanted to shrink my 3 node cluster to 2 nodes, to later go and do the same thing for my 5 node cluster.
However, after following the best practice of shrinking a cluster:
Back up all tables
For all tables: alter table xyz set (number_of_replicas=2) if it was less than 2 before
SET GLOBAL PERSISTENT discovery.zen.minimum_master_nodes = <half of the cluster + 1>;
3 a. If the data check should always be green, set the min_availability to 'full':
https://crate.io/docs/reference/configuration.html#graceful-stop
Initiate graceful stop on one node
Wait for the data check to turn green
Repeat from 3.
When done, persist the node configurations in crate.yml:
gateway.recover_after_nodes: n
discovery.zen.minimum_master_nodes:[![enter image description here][1]][1] (n/2) +1
gateway.expected_nodes: n
My cluster never went back to "green" again, and I also have a critical node check failing.
What went wrong here?
crate.yml:
...
################################## Discovery ##################################
# Discovery infrastructure ensures nodes can be found within a cluster
# and master node is elected. Multicast discovery is the default.
# Set to ensure a node sees M other master eligible nodes to be considered
# operational within the cluster. Its recommended to set it to a higher value
# than 1 when running more than 2 nodes in the cluster.
#
# We highly recommend to set the minimum master nodes as follows:
# minimum_master_nodes: (N / 2) + 1 where N is the cluster size
# That will ensure a full recovery of the cluster state.
#
discovery.zen.minimum_master_nodes: 2
# Set the time to wait for ping responses from other nodes when discovering.
# Set this option to a higher value on a slow or congested network
# to minimize discovery failures:
#
# discovery.zen.ping.timeout: 3s
#
# Time a node is waiting for responses from other nodes to a published
# cluster state.
#
# discovery.zen.publish_timeout: 30s
# Unicast discovery allows to explicitly control which nodes will be used
# to discover the cluster. It can be used when multicast is not present,
# or to restrict the cluster communication-wise.
# For example, Amazon Web Services doesn't support multicast discovery.
# Therefore, you need to specify the instances you want to connect to a
# cluster as described in the following steps:
#
# 1. Disable multicast discovery (enabled by default):
#
discovery.zen.ping.multicast.enabled: false
#
# 2. Configure an initial list of master nodes in the cluster
# to perform discovery when new nodes (master or data) are started:
#
# If you want to debug the discovery process, you can set a logger in
# 'config/logging.yml' to help you doing so.
#
################################### Gateway ###################################
# The gateway persists cluster meta data on disk every time the meta data
# changes. This data is stored persistently across full cluster restarts
# and recovered after nodes are started again.
# Defines the number of nodes that need to be started before any cluster
# state recovery will start.
#
gateway.recover_after_nodes: 3
# Defines the time to wait before starting the recovery once the number
# of nodes defined in gateway.recover_after_nodes are started.
#
#gateway.recover_after_time: 5m
# Defines how many nodes should be waited for until the cluster state is
# recovered immediately. The value should be equal to the number of nodes
# in the cluster.
#
gateway.expected_nodes: 3
So there are two things that are important:
The number of replicas is essentially the number of nodes you can loose in a typical setup (2 is recommended so that you can scale down AND loose a node in the process and still be ok)
The procedure is recommended for clusters > 2 nodes ;)
CrateDB will automatically distribute the shards across the cluster in a way that no replica and primary share a node. If that is not possible (which is the case if you have 2 nodes and 1 primary with 2 replicas, the data check will never return to 'green'. So in your case, set the number of replicas to 1 in order to get the cluster back to green (alter table mytable set (number_of_replicas = 1)).
The critical node check is due to the cluster not having received an updated crate.yml yet: Your file also still has the configuration of a 3-node cluster in it, hence the message. Since CrateDB only loads the expected_nodes at startup (it's not a runtime setting), a restart of the whole cluster is required to conclude scaling down. It can be done with a rolling restart, but be sure to set SET GLOBAL PERSISTENT discovery.zen.minimum_master_nodes = <half of the cluster + 1>; properly, otherwise the consensus will not work...
Also, it's recommended to scale down one-by-one in order to avoid overloading the cluster with rebalancing and accidentally loosing data.

Redis Cluster fail-over issue. Slave to master promotion takes more than 2 seconds. How to solve it?

I have a Redis cluster with multiple masters and 1 slave per master.
I am using jedis. Redis version is : 3.0.7.
Now when one of the master goes down Jedis is trying to make slave as master and it takes more than 2 seconds. Which is a huge time in the my scenarios.
*Following the two different Exceptions and the Stack :
JedisClusterException & JedisClusterMaxRedirectionsException.
JedisConnectionException!! - exc: redis.clients.jedis.exceptions.JedisClusterException: CLUSTERDOWN The cluster is down
redis.clients.jedis.exceptions.JedisClusterException: CLUSTERDOWN The cluster is down
at redis.clients.jedis.Protocol.processError(Protocol.java:115)
at redis.clients.jedis.Protocol.process(Protocol.java:151)
at redis.clients.jedis.Protocol.read(Protocol.java:205)
at redis.clients.jedis.Connection.readProtocolWithCheckingBroken(Connection.java:297)
at redis.clients.jedis.Connection.getIntegerReply(Connection.java:222)
at redis.clients.jedis.Jedis.incr(Jedis.java:548)
at redis.clients.jedis.JedisCluster$27.execute(JedisCluster.java:319)
at redis.clients.jedis.JedisCluster$27.execute(JedisCluster.java:316)
at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:119)
at redis.clients.jedis.JedisClusterCommand.run(JedisClusterCommand.java:30)
at redis.clients.jedis.JedisCluster.incr(JedisCluster.java:321)
at com.demo.redisCluster.App.main(App.java:37)
.................................................
JedisClusterMaxRedirectionsException!! - exc: redis.clients.jedis.exceptions.JedisClusterMaxRedirectionsException: Too many Cluster redirections?
redis.clients.jedis.exceptions.JedisClusterMaxRedirectionsException: Too many Cluster redirections?
at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:97)
at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:131)
at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:152)
at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:131)
at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:152)
at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:131)
at redis.clients.jedis.JedisClusterCommand.run(JedisClusterCommand.java:30)
at redis.clients.jedis.JedisCluster.incr(JedisCluster.java:321)
at com.demo.redisCluster.App.main(App.java:37)*