I've started exploring Redis Cluster and it's C client(hiredis). I've been unable to find much info about the client's interaction with the Redis cluster. I've got some queries in this regard:
Does the client make a connection with all the nodes of the cluster(master and slaves) in the beginning?
Is there any coordinator node which proxies the client's request to the correct node?
If not, does the client periodically get the info about the hash-slot holdings of each node in the cluster(in order to send its request to the correct node)?
Which client-cluster connection specific parameters are configurable?
Does the client make a connection with all the nodes?
Yes, the client maintains a connection with all the masters at least.
Is there a coordinator node which proxies the client's request to the correct node?
No, there isn't. By design, redis cluster does not have a proxy.
(Aside: There is some talk of developing a proxy solution for redis - but I don't expect it to be released any time soon.)
Does the client periodically get info about hash slot bindings?
When a client starts up, it builds up a cache of hash-slot mappings. Then, at runtime, if a slot is migrated to another master, redis cluster will return a specific error that will tell the client the new owner for that slot. The client is then expected to cache the new owner, and retry the request against the new node.
As a result of this design, clients usually have a very good cache of every slot and it's owner, and there is very little overhead.
which client connection parameters are configurable?
The most important parameter is the list of server nodes to connect to the cluster. You don't have to specify all the nodes - the client can auto-discover all the masters. As long as even one node is active, the client will discover all the other nodes.
Apart from that, you have connection timeout parameters, parameters to control TLS.
Related
I want to setup a rabbitmq cluster behind a load balancer and connect to it using spring amqp. Questions :
Does spring client need to know the address of each node in the RMQ cluster or it is sufficient for it to know just the address of load balancer.
If Spring client is only aware of the load balancer, how will it maintain connections/connection factory for each node in the cluster.
Is there any code sample, which shows how to make the spring client work with load balancer.
It only needs the load balancer; however, Spring AMQP maintains a long-lived shared connection, so a load balancer generally doesn't bring much value unless you have multiple applications.
With a single application (with one connection factory), you will only be connected to one broker.
Clarification
See the documentation.
Starting with version 1.3, the CachingConnectionFactory can be configured to cache connections as well as just channels. In this case, each call to createConnection() creates a new connection (or retrieves an idle one from the cache). Closing a connection returns it to the cache (if the cache size has not been reached). Channels created on such connections are cached too. The use of separate connections might be useful in some environments, such as consuming from an HA cluster, in conjunction with a load balancer, to connect to different cluster members. Set the cacheMode to CacheMode.CONNECTION.
By default all components (listener containers, RabbitTemplates) share a single connection to the broker.
Starting with version 2.0.2, RabbitTemplate has a property usePublisherConnection; if this is set to true, publishers will use a separate connection to the listener containers - this is generally recommended to avoid a blocked publisher connection preventing consumers from receiving messages.
As shown in the quote, the use of a single (or 2) connections is controlled by connection factory's cache mode.
Setting the cache mode to CONNECTION, means that each component (listener container consumer, RabbitTemplate) gets its own connection. In practice there will only be one or two publisher connections because publish operations are, generally, short lived and the connection is cached for reuse. You might get one or two more publisher connections if concurrent publish operations are performed.
We are trying to prevent our application startups from just spinning if we cannot reach the remote cluster. From what I've read Force Server Mode states
In this case, discovery will happen as if all the nodes in topology
were server nodes.
What i want to know is:
Does this client then permanently act as a server which would run computes and store caching data?
If connection to the cluster does not happen at first, a later connection to an establish cluster cause issue with consistency? What would be the expect behavior with a Topology version mismatch? Id their potential for a split brain scenario?
No, it's still a client node, but behaves as a server on discovery protocol level. For example, it can start without any server nodes running.
Client node can never cause data inconsistency as it never stores the data. This does not depend forceServerMode flag.
I have a cluster of rabbitmq servers. But the machines where the servers are hosted show differences in terms of installed software. So, they have no overlapping capabilities, they are specialized workers, for example, only one node has email software installed.
I know that a queue is bound to the node on which is created. My question is, how I can set up my queues so I can send certain messages to the special endowed node, where my special software is waiting for work, bypassing rabbitmq distributing messages round-robin algorithm.
Maybe that is not a solution, am open to any working solution
You could always connect to the IP address of the specific node in the cluster, rather than connecting to some kind of a load balancer that is in front of the cluster - so specify a different IP in the client's method for opening the connection. This of course defeats the purpose of the cluster, but it seems to me that your setup does the same thing :)
By rabbitmq server do you mean actual rabbitmq server or clients/workers?
If I understand correctly, you can create a single exchange of type "topic". For each worker create an exclusive queue and bind it to the exchange with some unique routing key which in your case will be the feature of the host. When submitting a message to the exchange, use the feature as a routing key. The message will be routed to the appropriate host.
i have the following scenario which i want to fulfill:
rabbit mq must be loadbalanced (is it something which is provided by rabbitmq out of the box OR something like haproxy load balancer would work great. Which one is well loadbalanced.)
CAN haproxy directly push messages to rabbitmq (lets say a POST request coming to http://localhost:3333/redirectToRabbit gets redirected to rabbit and optionally either the ACK or RESPONSE goes back to client. Also note haproxy would load balance the request)
with HA; what the best configuration ( exchange with durable queue, durable queue or something else. NOTE: How would the messages gets redirected to some other rabbitmq instance if one of the rabbitmq instance goes down -- persisted and auto redirection to available rabbitmq )
Assuming you setup a two-node RabbitMQ cluster. Before talking about ha proxy, you need to understand the ha policies and the behavior of ha queues first. Different ha options might cause completely different behaviors of RabbitMQ message replication and node failover. RabbitMQ is so flexible, so don't expect a golden way of configuration which could meet all scenarios.
Then, since you have two nodes which could accept connections, your client could either use a loadbalancer (such as ha proxy) or to use a client driver which supports connecting to multiple nodes of a cluster. Either way will work.
When using haproxy, you have one load balancer ip. Client connects only to this load balancer ip, the load balancer forward you connection to the underlying nodes. But as long as a connection created, the client connection instance keeps talking to one of the node. When one of the node is down, if no "Health Checking" options are configured in your load balancer, client might get random connection failures. When you have "Health Checking" options configured correctly, the load balancer knows which nodes are down, so that clients will only connect to healthy nodes, which solves the issue.
When not using a load balancer and only base on client driver to connect to all the nodes, the client driver should be able to handle connection failure or health check internally and do failover/retry, etc, to ensure connections go to healthy nodes.
I have a Redis Cluster that clients are connecting to via HAPRoxy with a Virtual IP. The Redis cluster has three nodes (with each node sharing the same server with a running sentinel instance).
My question is, when i clients gets a "MOVED" error/message from a cluster node upon sending a request, does it bypass the HAProxy the second time when it connects since it has been provided with an IP:port when the MOVEd message was issued? If not, how does the HAProxy know the second time to send it to the correct node?
I just need to understand how this works under the hood.
If you want to use HAProxy in front of Redis Cluster nodes, you will need to either:
Set up an HAProxy for each master/slave pair, and wire up something to update HAProxy when a failure happens, as well as probably intercept the topology related commands to insert the virtual IPs rather than the IPs the nodes themselves have and report via the topology commands/responses.
Customize HAProxy to teach it how to be the cluster-aware Redis client so the actual client doesn't know about cluster at all. This means teaching it the Redis protocol, storing the cluster's topology information, and selecting the node to query based on the key(s) being accessed by the consumer code.
With Redis Cluster the client must be able to access every node in the cluster. Of the two options above Option 2 is the "easier" one, but at this point I wouldn't recommend either.
Conceivably you could use the VIP as a "first place to get the topology info" IP but I suspect you'd have serious issues develop as that original IP would not be one of the ones properly being reported as a nod handling data. For that you could simply use round-robin DNS and avoid that problem, or use the built-in "here is a list of cluster IPs (or names?)" to the initial connection configuration.
Your simplest, and least likely to be problematic, route is to go "full native" and simply give full and direct access to every node in the cluster to your clients and not use HAProxy at all.