Highly available and load balanced ActiveMQ cluster - activemq

Please be aware that I am a relative newbie to ActiveMQ.
I am currently working with a small cluster of ActiveMQ (version 5.15.x) nodes (< 5). I recently experimented with setting up the configuration to use "Shared File System Master Slave" with kahadb, in order to provide high availability to the cluster.
Having done so and seeing how it operates, I'm now considering whether this configuration provides the level of throughput required for both consumers/producers, as only one broker's ports are available at one time.
My question is basically two part. First, does it make sense to configure the cluster as highly available AND load balanced (through Network of Brokers)? Second, is the above even technically viable, or do I need to review my design consideration to favor one aspect over the other?

I had some discussions with the ActiveMQ maintainers in IRC on this topic a few months back.
It seems that they would recommend using ActiveMQ Artemis instead of ActiveMQ 5.
Artemis has a HA solution:
https://activemq.apache.org/artemis/docs/latest/clusters.html
https://activemq.apache.org/artemis/docs/latest/ha.html
The idea is to use Data Replication to allow for failover, etc:
When using replication, the live and the backup servers do not share the same data directories, all data synchronization is done over the network. Therefore all (persistent) data received by the live server will be duplicated to the backup.
And, I think you want to have at least 3 nodes (or some odd number) to avoid issues with split brain during network partitions.
It seems like Artemis can mostly be used as a drop-in replacement for ActiveMQ; it can still speak the OpenWire protocol, etc.
However, I haven't actually tried this yet, so YMMV.

Related

ActiveMQ datastore for cluster setup

We have been using ActiveMQ version 5.16.0 broker with single instances in production. Now we are planning to use cluster of AMQ brokers for HA and load distribution with consistency in message data. Currently we are using only one queue
HA can be achieved using failover but do we need to use the same datastore or it can be separated? If I use different instances for AMQ brokers then how to setup a common datastore.
Please guide me how to setup datastore for HA and load distribution
Multiple ActiveMQ servers clustered together can provide HA in a couple ways:
Scale message flow by using compute resources across multiple broker nodes
Maintain message flow during single node planned or unplanned outage of a broker node
Share data store in the event of ActiveMQ process failure.
Network of brokers solve #1 and #2. A standard 3-node cluster will give you excellent performance and ability to scale the number of producers and consumers, along with splitting the full flow across 3-nodes to provide increased capacity.
Solving for #3 is complicated-- in all messaging products. Brokers are always working to be completely empty-- so clustering the data store of a single-broker becomes an anti-pattern of sorts. Many times, relying on RAID disk with a single broker node will provide higher reliability than adding NFSv4, GFSv2, or JDBC and using shared-store.
That being said, if you must use a shared store-- follow best practices and use GFSv2 or NFSv4. JDBC is much slower and requires significant DB maintenance to keep running efficiently.
Note: [#Kevin Boone]'s note about CIFS/SMB is incorrect and CIFS/SMB should not be used. Otherwise, his responses are solid.
You can configure ActiveMQ so that instances share a message store, or so they have separate message stores. If they share a message store, then (essentially) the brokers will automatically form a master-slave configuration, such that only one broker (at a time) will accept connections from clients, and only one broker will update the store. Clients need to identify both brokers in their connection URIs, and will connect to whichever broker happens to be master.
With a shared message store like this, locks in the message store coordinate the master-slave assignment, which makes the choice of message store critical. Stores can be shared filesystems, or shared databases. Only a few shared filesystem implementations work properly -- anything based on NFS 4.x should work. CIFS/SMB stores can work, but there's so much variation between providers that it's hard to be sure. NFS v3 doesn't work, however well-implemented, because the locking semantics are inappropriate. In any case, the store needs to be robust, or replicated, or both, because the whole broker cluster depends on it. No store, no brokers.
In my experience, it's easier to get good throughput from a shared file store than a shared database although, of course, there are many factors to consider. Poor network connectivity will make it hard to get good throughput with any kind of shared store (or any kind of cluster, for that matter).
When using individual message stores, it's typical to put the brokers into some kind of mesh, with 'network connectors' to pass messages from one broker to another. Both brokers will accept connections from clients (there is no master), and the network connections will deal with the situation where messages are sent to one broker, but need to be consumed from another.
Clients' don't necessarily need to specify all brokers in their connection URIs, but generally will, in case one of the brokers is down.
A mesh is generally easier to set up, and (broadly speaking) can handle more client load, than a master-slave with shared filestore. However, (a) losing a broker amounts to losing any messages that were associated with it (until the broker can be restored) and (b) the mesh interferes with messaging patterns like message grouping and exclusive consumers.
There's really no hard-and-fast rule to determine which configuration to use. Many installers who already have some sort of shared store infrastructure (a decent relational database, or a clustered NFS, for example) will tend to want to use it. The rise in cloud deployments has had the effect that mesh operation with no shared store has become (I think) a lot more popular, because it's so symmetric.
There's more -- a lot more -- that could be said here. As a broad question, I suspect the OP is a bit out-of-scope for SO. You'll probably get more traction if you break your question up into smaller, more focused, parts.

Load balancing for RabbitMQ server (broker), not the consumers(clients)

In this example I have a setup of 2 consumers and 2 publishers in my network. The centre is a RabbitMQ broker as shown in the screenshot below. Due to fail-safe reasons, I am wondering if RabbitMQ supports load-balancing or mirroring of the server (broker) in any way. I just would like to get rid of the star topology for two reasons:
1) If one broker fails, another publisher can take over immediately
2) If one brokers network throughput is not good enough the other takes over
Solving one or the other (or even both) would be great.
My current infrastructure
Preferred infrastructure
RabbitMQ clustering (docs) can meet your first requirement. Use three nodes and be sure your applications are coded and tested to take failure scenarios into account.
I don't know of anything out-of-the-box that can meet your second requirement. You will have to implement something that uses cluster statistics or application statistics to determine when to switch to another cluster due to lower throughput.
NOTE: the RabbitMQ team monitors the rabbitmq-users mailing list and only sometimes answers questions on StackOverflow.

In RabbitMQ, How to make Queues in different clusters to Be Highly Available without Clustering?

In RabbitMQ, if two clusters are hosted on geographical different locations, then we can’t use Clustering. Then how to make them highly available I.e. if one site’s whole cluster goes down then the messages should be mirrored to other site and other site should be able to cater those messages. Note : sites are connected by WAN
See I can’t lose any message on the both sites. Publishing message to the right site can be taken care of, but if the messages are in queue(work queue) or messages are being processed by consumer and suddenly if the site goes down which includes the broker and consumer, how can those messages be catered by the other site. Like in a cluster if one node dies, the other one has all the messages mirrored and knows which were acknowledged, but how to achieve this on WAN, cause clustering cross WAN is not practical.
I think the question illustrates a conceptual problem with the design. To summarize,
There are two sites, connected via WAN
One site is the primary, while one is the active standby
There is a desire for complete replication of system state (total consistency) between site A and B, to include the status of messages in the queue and messages being processed.
Essentially, you want 100% consistency, availability, and partition tolerance. Such a design is not possible according to CAP Theorem. What RabbitMQ provides is either consistency and availability, with low partition tolerance via clustering, or availability and partition tolerance via federation or shovel. RabbitMQ does not deal very well with the case of needing consistency and partition tolerance, since message brokers really handle highly transient traffic.
Instead, what is needed is to fully scope the problem to something that can be solved using the available technologies. It sounds to me like the correct approach (since it's over a WAN) is to sacrifice availability for consistency and partition tolerance, and have your application handle the failover case. You may be able to configure RabbitMQ sufficiently in this regard - see https://www.rabbitmq.com/partitions.html.

Apache Kafka: Mirroring vs. Replication

Mirroring is replicating data between Kafka cluster, while Replication is for replicating nodes within a Kafka cluster.
Is there any specific use of Replication, if Mirroring has already been setup?
They are used for different use cases. Let's try to clarify.
As described in the documentation,
The purpose of adding replication in Kafka is for stronger durability and higher availability. We want to guarantee that any successfully published message will not be lost and can be consumed, even when there are server failures. Such failures can be caused by machine error, program error, or more commonly, software upgrades. We have the following high-level goals:
Inside a cluster there might be network partitions (a single server fails, and so forth), therefore we want to provide replication between the nodes. Given a setup of three nodes and one cluster, if server1 fails, there are two replicas Kafka can choose from. Same cluster implies same response times (ok, it also depends on how these servers are configured, sure, but in a normal scenario they should not differ so much).
Mirroring, on the other hand, seems to be very valuable, for example, when you are migrating a data center, or when you have multiple data centers (e.g., AWS in the US and AWS in Ireland). Of course, these are just a couple of use cases. So what you do here is to give applications belonging to the same data center a faster and better way to access data - data locality in some contexts is everything.
If you have one node in each cluster, in case of failure, you might have way higher response times to go, let's say, from AWS located in Ireland to AWS in the US.
You might claim that in order to achieve data locality (services in cluster one read from kafka in cluster one) one still needs to copy the data from one cluster to the other. That's definitely true, but the advantages you might get with mirroring could be higher than those you would get by reading directly (via an SSH tunnel?) from Kafka located in another data center, for example single connections down, clients connection/session times longer (depending on the location of the data center), legislation (some data can be collected in a country while some other data shouldn't).
Replication is the basis of higher availability. You shouldn't use Mirroring to handle high availability in a context where data locality matters. At the same time, you should not use just Replication where you need to duplicate data across data centers (I don't even know if you can without Mirroring/an ssh tunnel).

Is it necessary to use three nodes to build RabbitMQ cluster?

I have to say the official website provides very little information to understand RabbitMQ clearly.
The official website suggests using three nodes to build a cluster. What is the reason for that? I suppose it's like ZooKeeper, which needs an odd number of nodes to do a quorum and elect the master.
Also, what is the advantage of using a non-HA cluster? Improve the performance or what? If the node which a queue resides is down, then the queue is not working. So for all situation, is it necessary to set the cluster to be mirror queue and auto-sync?
Three nodes is the minimum to have a reasonable HA.
Suppose you have a queue mirrored in two nodes, if one gets down, another one will be promoted as the new slave or master.
Please read here section Automatically handling partitions and the section More about pause-minority mode
is therefore not a good idea to enable pause-minority mode on a
cluster of two nodes since in the event of any network partition or
node failure, both nodes will pause
RabbitMQ can handle the cluster in different ways, depending on where you deploy it - LAN or WAN or unstable LAN etc. And you can also use federation, shovel
what is the advantage of using a non-HA cluster? Improve the performance or what?
I'd say yes, or simply you have an environment where you don't need to have HA queues since you can have only temporary queues.
is it necessary to set the cluster to be mirror queue and auto-sync?
You can also decide for manual-sync, since when you sync the queue is blocked, and if you have lots of messages to sync, it can be a problem. For example, you can decide to sync the queues when you don't have traffic.
Here (section Unsynchronised Slaves) it is explained clearly.
Your question is a bit general, and it depends on what are you looking for.