Failover and client timeout - redis

I am using ServiceStack 5.0.2 with Redis Sentinel (3 + 3) and having issues in case of a failover: commands being issued during or after a failover fail with timeout.
I have come up with an idea to implement retry pattern via custom IRedisClient. But probably there is a better strategy to employ in this case.
Answer given in the post How does ServiceStack PooledRedisClientManager failover work? does not seem to be the right way to go.
Thank you,

Redis Clients wrap a TCP connection with a Redis Server, a Redis Client that was connected with the instance that failed over will fail, but any new Redis Clients retrieved from the pool after failover will be connected to the new failed over instance.

Related

is stackexchange.redis a "smart client" when using Redis cluster?

A smart client for Redis cluster will "take persistent connections to many nodes, will cache hashslot -> node info, and will update the table when they receive a -MOVED error".
I checked numerous documents but can't find a definitive answer on whether Stackexchange.Redis is a smart client. Can anyone advice? Thanks.
I am using Stackexchange.Redis client in my web application to connect to Redis cluster having 6 instances of Redis servers. Stackexchange.Redis client works perfectly with Redis cluster and we did not get -MOVED error.

Can the clients of a RabbitMQ cluster reconnect to another Node if the one they are connected to fails?

I feel like I am missing something very fundamental here.
I can bring up a RabbitMQ cluster with three nodes (rabbit1, rabbit2 and rabbit3) without an issue. Then when I start writing my microservices it seems like each client connects to only one rabbit instance. So let's say I have all my services connect to rabbit1.
If rabbit1 then goes down will my entire infrastructure blow up? Do the services have a way of switching to another rabbit node? It seems like they cannot, in which case, what is the point of having a cluster?
In case someone else runs into this and has trouble (like myself) finding this in the documentation, RabbitMQ does not manage client connection auto-recovery. From the docs:
Some client libraries provide a mechanism for automatic recovery from
network connection failures... Other clients may consider network
failure recovery to be a responsibility of the application.
So first check if you library offers auto-recovery, if not you'll have to implement it yourself.

Client's interaction with Redis Cluster

I've started exploring Redis Cluster and it's C client(hiredis). I've been unable to find much info about the client's interaction with the Redis cluster. I've got some queries in this regard:
Does the client make a connection with all the nodes of the cluster(master and slaves) in the beginning?
Is there any coordinator node which proxies the client's request to the correct node?
If not, does the client periodically get the info about the hash-slot holdings of each node in the cluster(in order to send its request to the correct node)?
Which client-cluster connection specific parameters are configurable?
Does the client make a connection with all the nodes?
Yes, the client maintains a connection with all the masters at least.
Is there a coordinator node which proxies the client's request to the correct node?
No, there isn't. By design, redis cluster does not have a proxy.
(Aside: There is some talk of developing a proxy solution for redis - but I don't expect it to be released any time soon.)
Does the client periodically get info about hash slot bindings?
When a client starts up, it builds up a cache of hash-slot mappings. Then, at runtime, if a slot is migrated to another master, redis cluster will return a specific error that will tell the client the new owner for that slot. The client is then expected to cache the new owner, and retry the request against the new node.
As a result of this design, clients usually have a very good cache of every slot and it's owner, and there is very little overhead.
which client connection parameters are configurable?
The most important parameter is the list of server nodes to connect to the cluster. You don't have to specify all the nodes - the client can auto-discover all the masters. As long as even one node is active, the client will discover all the other nodes.
Apart from that, you have connection timeout parameters, parameters to control TLS.

Redis Cluster configuration for CacheManager.NET

I have a basic question about Redis connection parameters from CacheManager.NET perspective. In case when we have Redis cluster with a master and 2 slaves, and with quorum of sentinel processes, should we provide the IP:PORT combinations pointing to the sentinel processes OR the actual Redis server processes.
As suggested in https://seanmcgary.com/posts/how-to-build-a-fault-tolerant-redis-cluster-with-sentinel, it is advisable to ask the sentinel process about the actual master before making the connection. And probably that goes in line with Jedis which provides JedisSentinelPool to do the initial lookup.
Essentially what we want is that the load balancing on reads (via CacheManager.NET) and the writes should go to the current master node of the cluster.
CacheManager relies on StackExchange.Redis for the Redis implementation. Therefor, whatever this client library supports, CacheManager does, too.
Unfortunately, sentinel support is not implemented, there are issues on github for years regarding that
That being said, I did some testing with a Multi Master/Slave + Sentinel setup. Added all the non-sentinel nodes as endpoints to the Multiplexer configuration and it kinda works because the Redis Client knows how to handle multiple master/slave instances.
In the process of switching to another master, the client might throw exceptions that it cannot write to a readonly slave and such. CacheManager might retry those calls and after a short amount of time, when the leader election is done, the call should go through.
But this is not 100% stable and I would not put that in production, as "official" support is still missing...
Alternative to running with sentinels, you could run Redis in Cluster mode which should just work, or behind a proxy which deals with all that master/slave stuff.
Twemproxy is one alternative.
I still have to add support for Twemproxy to CacheManager, as many features are simply not available, like Lua scripting or get a list of servers or flush commands...
This will come in 1.0.2
Hope that helps.

Apache ActiveMQ Failover Protocol

Why only can java provide support for failover protocol in activemq whereas not other languages.
My doubt is that in the failover protocol like failover://(tcp://host1:61616,tcp://host2:61616)?randomize=false also the client uses one of the the inner urls like tcp://host1:61616 and then how does the broker comes to know that the call was using some failover protocol or not and then how the broker decides that it needs to replicate the message ?
Please understand that failover protocol is meant for reconnect logic on client side only and AMQ broker isn't even aware if a client is using failover protocol or not.
From the official AMQ documentation:
The Failover transport layers reconnect logic on top of any of the
other transports.
The Failover configuration syntax allows you to specify any number of
composite uris. The Failover transport randomly chooses one of the
composite URI and attempts to establish a connection to it. If it does
not succeed or if it subsequently fails, a new connection is
established to one of the other uris in the list.
Not sure what you mean by replication here but as per the official doc
The Failover transport tracks transactions by default. The inflight
transactions are replayed on reconnection.
There are different scenarios to put up a HA solution with ActiveMQ.
If clients connect using the failover protocol to host1,host2, then the broker setup needs to be setup for HA as well.
One solution is to cluster host1 and host2 in an Active-Active solution. Then messages are always propagated when they are asked for - the queues are shared in the entire cluster among all amq brokers.
Otherwise, if the active-active solution is not prefered, then a master-slave solution can be setup where the two brokers, host1 and host2, share the data area (for instance using a Database for persistance or a shared SAN disk).
There are more combinations of setups, but the failover protocol assumes that the entire solution can handle that messages arrives to different brokers, if one goes down. As far as I know, there is no other magic in the failover protocol, from the broker perspective.