It is possible for only the Redis master instances to handle all reads and writes, and for the slaves to be used only for failover? - redis

My work system consists of Spring web applications and it uses Redis as a transaction counter and it conditionally blocks transaction requests.
The transaction is as follows:
Check whether or not data exists. (HGET)
If it doesn't, saves new one with count 0 and set expiration time. (HSET, EXPIRE)
Increases a count value. (INCRBY)
If the increased count value reaches a specific configured limit, it sets the transaction to 'blocked' (HSET)
The limit value is my company's business policy.
Such read and write operations are requested one after another, immediately.
Currently, I use one Redis instance at one machine. (only master, no replications.)
I want to get Redis HA, so I need slave insntaces but at the same time, I want to have all reads and writes to Redis only to master insntaces because of slave data relication latency.
After some research, I found that it is a good idea to have a proxy server to use Redis HA. However, with proxy, it seems impossible to use only the master instances to receive requests and the slaves only for failover.
Is it possible??
Thanks in advance.

What you need is Redis Sentinel.
With Redis Sentinel, you can get the master address from sentinel, and read/write with master. If master is down, Redis Sentinel will do failover, and elect a new master. Then you can get the address of the new master from sentinel.

As you're going to use Lettuce for Redis cluster driver, you should set read preference to Master and things should be working fine a sample code might look like this.
LettuceClientConfiguration lettuceClientConfiguration =
LettuceClientConfiguration.builder().readFrom(ReadFrom.MASTER).build();
RedisClusterConfiguration redisClusterConfiguration = new RedisClusterConfiguration();
List<RedisNode> redisNodes = new ArrayList<>();
redisNodes.add(new RedisNode("127.0.0.1", 9000));
redisNodes.add(new RedisNode("127.0.0.1", 9001));
redisNodes.add(new RedisNode("127.0.0.1", 9002));
redisNodes.add(new RedisNode("127.0.0.1", 9003));
redisNodes.add(new RedisNode("127.0.0.1", 9004));
redisNodes.add(new RedisNode("127.0.0.1", 9005));
redisClusterConfiguration.setClusterNodes(redisNodes);
LettuceConnectionFactory lettuceConnectionFactory =
new LettuceConnectionFactory(redisClusterConfiguration, lettuceClientConfiguration);
lettuceConnectionFactory.afterPropertiesSet();
See in action at Redis Cluster Configuration

Related

Why don't sentinels subscribe channel "__sentinel__:hello" in sentinelReconnectInstance

When looking into Redis's source code, I find when sentinelRedisInstance is SRI_SENTINEL, sentinelReconnectInstance will not initialize its link->pc and will not subscribe channel "__sentinel__:hello", as the following code shows.
void sentinelReconnectInstance(sentinelRedisInstance *ri) {
...
if ((ri->flags & (SRI_MASTER|SRI_SLAVE)) && link->pc == NULL) {
...
retval = redisAsyncCommand(link->pc,
sentinelReceiveHelloMessages, ri, "%s %s",
sentinelInstanceMapCommand(ri,"SUBSCRIBE"),
SENTINEL_HELLO_CHANNEL);
...
As a result, I think sentinels can't get any message from channel "__sentinel__:hello".
However, in redis's doc, it says
Every Sentinel is subscribed to the Pub/Sub channel sentinel:hello of every master and replica, looking for unknown sentinels. When new sentinels are detected, they are added as sentinels of this master.
I think it means that all sentinels actually subscribe to channel "__sentinel__:hello", but I can't see any corresponding implementation in redis's source code.
Correct me if I'm wrong.
sentinelRedisInstance of type SRI_MASTER connects to master node that this sentinel is monitoring, sentinelRedisInstance of type SRI_SLAVE connects to slave node, and sentinelRedisInstance of type SRI_SENTINEL connects to other sentinels.
There's no need to subscribe to sentinel's channel, instead, sentinels only need to subscribe to channels of master and slave nodes. If there's a new sentinel, it will publish hello message to master and slave's channel. So that other sentinels will discover them.

Apache Ignite - Distibuted Queue and Executors

I am planning to use Apache Ignite Distributed Queue.
I am using Ignite with a spring boot application. So, on bootup, I will be adding 20 names in a queue. But, since there are 3 servers in a cluster, the same 20 names gets added 3 times. But, i want to add them only once in the queue.
Ignite ignite = Ignition.ignite();
IgniteQueue<String> queue = ignite.queue(
"queueName", // Queue name.
0, // Queue capacity. 0 for unbounded queue.
null // Collection configuration.
);
Distributed executors, will be able to poll from the queue and run the task. Here, the executor is expected to poll, run the task and then add the same name to the queue. Trying to achieve round robin here.
Only one executor should be running the same task at any point of time, though there are multiple servers in a cluster.
Any suggestion for this.
You can launch ignite cluster singleton service https://apacheignite.readme.io/docs/cluster-singletons which will fill data to queue. Also you can adding data from coordinator node (oldest node in cluster) ignite.cluster().forOldest().node().isLocal()
I fixed bootup time duplicate cache loading issue this way:
final IgniteAtomicLong cacheLoadCnt = ignite.atomicLong(cacheName + "Cnt", 0, true);
if (cacheLoadCnt.get() == 0) {
loadCache();
cacheLoadCnt.addAndGet(1);
}

Gemfire WAN with Peer to Peer combined

We are using the multi-site WAN configuration. We have two clusters across geographical distances in North America and Europe.
Context: Cluster 1 has two members A and B that are both gateway senders. Cluster B has two members C and D that are both gateway receivers. When member A in cluster 1 starts, it reads data from database and loads it into the gemfire cache which gets sent to the cluster 2. Everything so far is good.
Problem: If both members in Cluster 2 are restarted at the same time, they lose all the gemfire regions/data. At that point, we could restart member A in cluster 1, it again loads data from the DB and gets pushed to cluster B. But we would prefer to avoid the restart of member A and without persisting to hard disk.
Is there a solution where if cluster 2 is restarted, it can request a full copy of data from cluster 1?
Not sure if it's possible, but could we somehow setup peer to peer for the gateway receivers in cluster 2 (on top of WAN), so they would be updated automatically upon restart.
Thanks
Getting a full copy of data over WAN is not supported at this time. What you could do instead is run a function on all members of site A, that simply iterates over all data and puts it back again in the region. i.e something like:
public void execute(FunctionContext context) {
RegionFunctionContext ctx = (RegionFunctionContext)context;
Region localData = PartitionRegionHelper.getLocalDataForContext(ctx);
for (Object key : localData.keySet()) {
Object val = localData.get(key);
localData.put(key, val);
}
}

Redis Cluster fail-over issue. Slave to master promotion takes more than 2 seconds. How to solve it?

I have a Redis cluster with multiple masters and 1 slave per master.
I am using jedis. Redis version is : 3.0.7.
Now when one of the master goes down Jedis is trying to make slave as master and it takes more than 2 seconds. Which is a huge time in the my scenarios.
*Following the two different Exceptions and the Stack :
JedisClusterException & JedisClusterMaxRedirectionsException.
JedisConnectionException!! - exc: redis.clients.jedis.exceptions.JedisClusterException: CLUSTERDOWN The cluster is down
redis.clients.jedis.exceptions.JedisClusterException: CLUSTERDOWN The cluster is down
at redis.clients.jedis.Protocol.processError(Protocol.java:115)
at redis.clients.jedis.Protocol.process(Protocol.java:151)
at redis.clients.jedis.Protocol.read(Protocol.java:205)
at redis.clients.jedis.Connection.readProtocolWithCheckingBroken(Connection.java:297)
at redis.clients.jedis.Connection.getIntegerReply(Connection.java:222)
at redis.clients.jedis.Jedis.incr(Jedis.java:548)
at redis.clients.jedis.JedisCluster$27.execute(JedisCluster.java:319)
at redis.clients.jedis.JedisCluster$27.execute(JedisCluster.java:316)
at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:119)
at redis.clients.jedis.JedisClusterCommand.run(JedisClusterCommand.java:30)
at redis.clients.jedis.JedisCluster.incr(JedisCluster.java:321)
at com.demo.redisCluster.App.main(App.java:37)
.................................................
JedisClusterMaxRedirectionsException!! - exc: redis.clients.jedis.exceptions.JedisClusterMaxRedirectionsException: Too many Cluster redirections?
redis.clients.jedis.exceptions.JedisClusterMaxRedirectionsException: Too many Cluster redirections?
at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:97)
at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:131)
at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:152)
at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:131)
at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:152)
at redis.clients.jedis.JedisClusterCommand.runWithRetries(JedisClusterCommand.java:131)
at redis.clients.jedis.JedisClusterCommand.run(JedisClusterCommand.java:30)
at redis.clients.jedis.JedisCluster.incr(JedisCluster.java:321)
at com.demo.redisCluster.App.main(App.java:37)*

Jedis getResource() is taking lot of time

I am trying to use sentinal redis to get/set keys from redis. I was trying to stress test my setup with about 2000 concurrent requests.
i used sentinel to put a single key on redis and then I executed 1000 concurrent get requests from redis.
But the underlying jedis used my sentinel is blocking call on getResource() (pool size is 500) and the overall average response time that I am achieving is around 500 ms, but my target was about 10 ms.
I am attaching sample of jvisualvm snapshot here
redis.clients.jedis.JedisSentinelPool.getResource() 98.02227 4.0845232601E7 ms 4779
redis.clients.jedis.BinaryJedis.get() 1.6894469 703981.381 ms 141
org.apache.catalina.core.ApplicationFilterChain.doFilter() 0.12820946 53424.035 ms 6875
org.springframework.core.serializer.support.DeserializingConverter.convert() 0.046286926 19287.457 ms 4
redis.clients.jedis.JedisSentinelPool.returnResource() 0.04444578 18520.263 ms 4
org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept() 0.035538 14808.45 ms 11430
May anyone help to debug further into the issue?
From JedisSentinelPool implementation of getResource() from Jedis sources (2.6.2):
#Override
public Jedis getResource() {
while (true) {
Jedis jedis = super.getResource();
jedis.setDataSource(this);
// get a reference because it can change concurrently
final HostAndPort master = currentHostMaster;
final HostAndPort connection = new HostAndPort(jedis.getClient().getHost(), jedis.getClient()
.getPort());
if (master.equals(connection)) {
// connected to the correct master
return jedis;
} else {
returnBrokenResource(jedis);
}
}
}
Note the while(true) and the returnBrokenResource(jedis), it means that it tries to get a jedis resource randomly from the pool that is indeed connected to the correct master and retries if it is not the good one. It is a dirty check and also a blocking call.
The super.getResource() call refers to JedisPool traditionnal implementation that is actually based on Apache Commons Pool (2.0). It does a lot to get an object from the pool, and I think it even repairs fail connections for instance. With a lot of contention on your pool, as probably in your stress test, it can probably take a lot of time to get a resource from the pool, just to see it is not connected to the correct master, so you end up calling it again, adding contention, slowing getting the resource etc...
You should check all the jedis instances in your pool to see if there's a lot of 'bad' connections.
Maybe you should give up using a common pool for your stress test (only create Jedis instances manually connected to the correct node, and close them nicely), or setting multiple ones to mitigate the cost of looking to "dirty" unchecked jedis resources.
Also with a pool of 500 jedis instances, you can't emulate 1000 concurrent queries, you need at least 1000.