ActiveMQ, Network of brokers, offline durable subscriber dedupe - activemq

Scenario: Two ActiveMQ nodes A, B. No master slave, but peers, with network connectors between them.
A durable topic subscriber is registered with both (as it uses failover and at one point connects to A and at another point connects to B).
Issue: If subscriber is being online against A, a copy of each message is placed in the offload subscription on B.
Question: Is this by design? Can this be configured so that a message is deduped and only sent to the subscriber in one of subscriptions?

Apparently by-design: http://activemq.apache.org/how-do-distributed-queues-work.html
See "Distributed Topics in Store/Forward" where it says:
For topics the above algorithm is followed except, every interested client receives a copy of the message - plus ActiveMQ will check for loops (to avoid a message flowing infinitely around a ring of brokers).

Related

Confirmation of messages between nodes in dynamic shovelling

I know Rabbit MQ supports the mechanism of Publisher Confirms – the broker's acknowledgements to publishers. The documentation states the broker confirms messages as it handles them by sending a basic.ack on a channel that was set in “confirm mode”. This communication is between a broker and a publisher client.
Let’s assume that I have a main node A and a secondary B in another data center and that dynamic shovelling is set from A to B. According to the documentation “ack-mode” determines how the shovel acknowledges messages. If set to on “on-confirm” messages are acknowledge to the source broker (A) after they have been confirmed by the destination (broker B).
I’d like to ask whether these two mechanisms are connected (or whether they can be). When a client connected to node A receives a confirmation, does that mean that the message has been published to node B too (if ack-mode=on-confirm)?
No these are not connected , in case of Dynamic Shovels what comes into picture is ack-mode which is one of the configuration parameter of the shovel. It can take three possible values and these are
on-confirm
on-publish
no-ack
This is how it works.
ack-mode Determines how the shovel should acknowledge messages. If set to on-confirm (the default), messages are acknowledged to the source broker after they have been confirmed by the destination. This handles network errors and broker failures without losing messages, and is the slowest option.
If set to on-publish, messages are acknowledged to the source broker after they have been published at the destination. This handles network errors without losing messages, but may lose messages in the event of broker failures.
If set to no-ack, message acknowledgements are not used. This is the fastest option, but may lose messages in the event of network or broker failures.

RabbitMQ - How to Federate / Mirror Messages

I setup two nodes, A and B. Both have RabbitMQ with the federation plugin installed.
From the Web UI, I can see the "Federation Status" > "State" is "running" on A and B.
On A, I created a queue called "test1".
On B, I can see the "test1" queue (replicated from A).
On A, I added a message.
However, the message does not appear in the replicated queue on B - the message stays on A.
This is the policy I used on A and B:
rabbitmqctl set_policy --apply-to exchanges my-queue "test1" \
'{"federation-upstream-set":"all"}'
So, it's like this: A (upstream) -> B (downstream) and B (upstream) -> A (downstream)
Am I supposed to see messages replicated to both A and B? Did I misconfigure the directions?
However, the message does not appear in the replicated queue on B - the message stays on A.
TL;DR: federated exchange != federated queue.
References:
https://www.rabbitmq.com/federated-exchanges.html
https://www.rabbitmq.com/federated-queues.html
The "How it works" section on federated queues explains:
" The federated queue will only retrieve messages when it has run out of messages locally, it has consumers that need messages, and the upstream queue has "spare" messages that are not being consumed ... "
Whereas the "What does a federated exchange do?" explains:
" ... messages published to the upstream exchanges are copied to the federated exchange, as though they were published directly to it ... "
recap:
if you use a federated queue,
you would need a consumer on the B side that needs messages (pull model?).
if you use a federated exchange,
messages a copied directly (push model?).
Use cases
Redundancy / Backups
Federated exchanges copy messages (max-hops copies) so they can be used for redundancy.
E.g.
here is my data, back it up.
Content distribution network
Federated exchanges copy messages (max-hops copies) so they can be used to distribute content across regions (that's also redundancy btw) provided you configure the topology correctly.
E.g.
hey everybody, please apply this security patch, which you can find at your nearest broker.
Load balancing
Federated queues can be used for load balancing: if a message is available upstream and there is no consumer there to process it, a free consumer downstream is able to receive the message and work on it. Rock on.
E.g.
I'm a computer, and I feel bored, can I help you? Any job you need me to do?
double-whammy
Federated exchange + federated queues = you can distribute the same set of tasks to multiple regions (cluster), and one worker in each cluster can perform the job.
E.g.
It's end of the quarter, I need performance metrics for each region (cluster), each store manager (one node in cluster) will aggregate metrics (inside cluster), and we'll give gift cards to the top 3.

RabbitMQ clustering and mirror queues behavior behind the scenes

Can someone please explain what is going on behind the scenes in a RabbitMQ cluster with multiple nodes and queues in mirrored fashion when publishing to a slave node?
From what I read, it seems that all actions other than publishes go only to the master and the master then broadcasts the effect of the actions to the slaves(this is from the documentation). Form my understanding it means a consumer will always consume message from the master queue. Also, if I send a request to a slave for consuming a message, that slave will do an extra hop by getting to the master for fetching that message.
But what happens when I publish to a slave node? Will this node do the same thing of sending first the message to the master?
It seems there are so many extra hops when dealing with slaves, so it seems you could have a better performance if you know only the master. But how do you handle master failure? Then one of the slaves will be elected master, so you have to know where to connect to?
Asking all of this because we are using RabbitMQ cluster with HAProxy in front, so we can decouple the cluster structure from our apps. This way, whenever a node goes done, the HAProxy will redirect to living nodes. But we have problems when we kill one of the rabbit nodes. The connection to rabbit is permanent, so if it fails, you have to recreate it. Also, you have to resend the messages in this cases, otherwise you will lose them.
Even with all of this, messages can still be lost, because they may be in transit when I kill a node (in some buffers, somewhere on the network etc). So you have to use transactions or publisher confirms, which guarantee the delivery after all the mirrors have been filled up with the message. But here another issue. You may have duplicate messages, because the broker might have sent a confirmation that never reached the producer (due to network failures, etc). Therefore consumer applications will need to perform deduplication or handle incoming messages in an idempotent manner.
Is there a way of avoiding this? Or I have to decide whether I can lose couple of messages versus duplication of some messages?
Can someone please explain what is going on behind the scenes in a RabbitMQ cluster with multiple nodes and queues in mirrored fashion when publishing to a slave node?
This blog outlines exactly what happens.
But what happens when I publish to a slave node? Will this node do the same thing of sending first the message to the master?
The message will be redirected to the master Queue - that is, the node on which the Queue was created.
But how do you handle master failure? Then one of the slaves will be elected master, so you have to know where to connect to?
Again, this is covered here. Essentially, you need a separate service that polls RabbitMQ and determines whether nodes are alive or not. RabbitMQ provides a management API for this. Your publishing and consuming applications need to refer to this service either directly, or through a mutual data-store in order to determine that correct node to publish to or consume from.
The connection to rabbit is permanent, so if it fails, you have to recreate it. Also, you have to resend the messages in this cases, otherwise you will lose them.
You need to subscribe to connection-interrupted events to react to severed connections. You will need to build in some level of redundancy on the client in order to ensure that messages are not lost. I suggest, as above, that you introduce a service specifically designed to interrogate RabbitMQ. You client can attempt to publish a message to the last known active connection, and should this fail, the client might ask the monitor service for an up-to-date listing of the RabbitMQ cluster. Assuming that there is at least one active node, the client may then establish a connection to it and publish the message successfully.
Even with all of this, messages can still be lost, because they may be in transit when I kill a node
There are certain edge-cases that you can't cover with redundancy, and neither can RabbitMQ. For example, when a message lands in a Queue, and the HA policy invokes a background process to copy the message to a backup node. During this process there is potential for the message to be lost before it is persisted to the backup node. Should the active node immediately fail, the message will be lost for good. There is nothing that can be done about this. Unfortunately, when we get down to the level of actual bytes travelling across the wire, there's a limit to the amount of safeguards that we can build.
herefore consumer applications will need to perform deduplication or handle incoming messages in an idempotent manner.
You can handle this a number of ways. For example, setting the message-ttl to a relatively low value will ensure that duplicated messages don't remain on the Queue for extended periods of time. You can also tag each message with a unique reference, and check that reference at the consumer level. Of course, this would require storing a cache of processed messages to compare incoming messages against; the idea being that if a previously processed message arrives, its tag will have been cached by the consumer, and the message can be ignored.
One thing that I'd stress with AMQP and Queue-based solutions in general is that your infrastructure provides the tools, but not the entire solution. You have to bridge those gaps based on your business needs. Often, the best solution is derived through trial and error. I hope my suggestions are of use. I blog about a number of RabbitMQ design solutions here, including the issues you mentioned, here if you're interested.

ActiveMQ network of brokers don't forward messages

I had two ActiveMQ brokers (A and B) that were configured as store-forward network. They work perfectly to forward messages from A to B when there is a consumer connected on broker B and producer sends messages to A. The problem is that when the consumer is killed and reconnected to A, the queued messages on B (they were forwarded from A) won't forward back to A where the consumer connected to. Even I send new messages to B, all messages were stuck on B until I restart brokers. I have tried to set networkTTL="4" and duplex="true" on the broker network connector, but it doesn't work.
Late answer, but hopefully this will help someone else in the future.
Messages are getting stuck in B because by default AMQ doesn't allow messages to be sent back to a broker to which they have previously been delivered. In the normal case, this prevents messages from going in cycles around mesh-like network topologies without getting delivered, but in the failover case it results in messages stuck on one broker and unable to get to the broker where all the consumers are.
To allow messages to go back to a broker if the current broker is a dead-end because there are no consumers connected to it, you should use replayWhenNoConsumers=true to allow forwarding messages that got stuck on B back to A.
That configuration option, some settings you might want to use in conjunction with it, and some considerations when using it, are described in the "Stuck Messages (version 5.6)" section of http://activemq.apache.org/networks-of-brokers.html, http://tmielke.blogspot.de/2012/03/i-have-messages-on-queue-but-they-dont.html, and https://issues.apache.org/jira/browse/AMQ-4465. Be sure that you can live with the side effects of these changes (e.g. the potential for duplicate message delivery of other messages across your broker-to-broker network connections).
Can you give more information on the configuration of broker A and B, as well as what you are trying to achieve?
It seems to me you could achieve what you want by setting a network of brokers (with A and B), with the producer only connecting to one, the consumer to the other.
The messages will automatically be transmitted to the other broker as long as the other broker has an active subscription to the destination the message was sent to.
I would not recommend changing the networkTTL if you are not sure of the consequences it produces (it tends to lead to unwanted messages loops).

ActiveMQ message ordering with Network of brokers

I have configured two brokers A and B using networkconnectors.
Is the message order preserved if I am using exclusive consumer (single consumer) or message groups(JMXgroupID)?
In the network of broker documentation I found this:
Total message ordering is not preserved with networks of brokers. Total ordering works with a single consumer but a networkBridge introduces a second consumer. In addition, network bridge consumers forward messages via producer.send(..), so they go from the head of the queue on the forwarding broker to the tail of the queue on the target. If single consumer moves between networked brokers, total order may be preserved if all messages always follow the consumer but this can be difficult to guarantee with large message backlogs.
In single broker message ordering is possible through exclusive consumer. What will happen if I am using network of brokers with exclusive consumers?
Total message ordering in a network of brokers doesn't work - even if you have a single consumer or multiple consumers using the "exclusive consumer" feature.
Consider the following scenario with 2 brokers (let's call these broker-A & broker-B) in a network, 1 consumer (consumer-A), 2 producers (producer-A & producer-B), and 1 queue (queue-A).
producer-A connects to broker-A, sends a message (message-1) to queue-A, and disconnects.
producer-B connects to broker-B, sends a message (message-2) to queue-A, and disconnects.
producer-A reconnects to broker-A, sends a message (message-3) to queue-A, and disconnects.
consumer-A connects to broker-A and receives the messages from queue-A. Even though the messages were sent in order message-1, message-2, message-3 the consumer will actually receive the messages in the order message-1, message-3, message-2 because message-1 and message-3 were sent to broker-A and message-2 was sent to broker-B and had to be moved across the network of brokers to broker-A based on consumer demand.
It's worth noting that one of the main goals of a network of brokers is scalability. However, in order to guarantee messages are consumed in order the messages have to be consumed serially which can drastically reduce performance and would almost certainly nullify any scalability gains provided by the network of brokers. Total message ordering and a network of brokers are fundamentally opposed ideas. If you really want total message ordering you shouldn't use a network of brokers.