Rabbitmq: how worker can "ignore" a message and let an other worker treating it - rabbitmq

Here's my current architecture
I have a bunch of IoT devices, that connects through raw duplex persistent TCP to 1 instances of my "worker" that is connected to a RabbitMQ Queue
My publisher publishes some messages that look like that
{
"iot_device_name" : "A",
"command" : "reboot"
}
The worker is then able to map the iot_device_name to the TCP socket.
All is working nice, but if we want to add HA and to scale out a bit, it would be better to have 4 instances of the worker. Load balancing the TCP question is not a problem (with HaProxy or Nginx).
Now the problem is on how to split the load on the Queue part, as the list of IoT devices handled by a worker is dynamic (i.e a device could disconnect and reconnect to an other worker)
So is there a way for a worker to say: "Hmmm no I can't treat this message because I don't know this device, give me an other" so that an other worker can then take it and handle it ?
Other information that may be of help:
the workers are all in the same network, that is also the same than the publisher
the numbers of workers is not dynamic and even if we extrapolate the number of devices for the next years, 8 workers would takes us VERY FAR, as it simply route message/transcode messages, so their cpu load is ridiculous.

So if I understand your architecture correctly, you have commands sent to your publisher on one side, which are pushed into rabbitmq.
On the consumer side, you have multiple workers, to which the messages are dispatched, and each worker has a bunch of devices connected to it.
If indeed this is your architecture, I'd propose the following for your rabbitmq configuration:
use a direct exchange
each worker has it's own queue (exclusive), and manages the bindings between the exchange and its queue dynamically:
each time a device connects to a worker, that worker adds a binding between its queue and the exchange with as routing key the identifier for the device
each time a worker detects that a device is not connected to it anymore, he removes the related binding from the rabbitmq configuration
related to the detection of disconnected devices, I'd expect it common that it's upon receiving a command to push to the device that a worker realize the device isn't connected to it anymore, in such cases in addition to adapting the bindings, the worker would republish the message to the same exchange with the same routing key, so that it can have another shot at being consumed by the proper worker
I'd also consider configuring a TTL on the queues, no point in consuming a message that's too old
The publisher will of course also need to alter the message, including the intended device identification as routing key
I hope the proposal here makes sense, there are a few other cases to be considered: alternate exchange to make sure we don't lose requests if there is a (short) period when the device hasn't reconnected to a worker and we get a command for it anyway, adding a property to a message republished to ensure we do not add an infinite loop in the system, ... but what is indicated above should be a reasonable starting point to achieve your goal

Related

RabbitMQ security design to declare queues from server (and use from client)

I have a test app (first with RabbitMQ) which runs on partially trusted clients (in that i don't want them creating queues on their own), so i will look into the security permissions of the queues and credentials that the clients connect with.
For messaging there are mostly one-way broadcasts from server to clients, and sometimes a query from server to a specific client (over which the replies will be sent on a replyTo queue which is dedicated to that client on which the server listens for responses).
I currently have a receive function on the server which looks out for "Announce" broadcast from clients:
agentAnnounceListener.Received += (model, ea) =>
{
var body = ea.Body;
var props = ea.BasicProperties;
var message = Encoding.UTF8.GetString(body);
Console.WriteLine(
"[{0}] from: {1}. body: {2}",
DateTimeOffset.FromUnixTimeMilliseconds(ea.BasicProperties.Timestamp.UnixTime).Date,
props.ReplyTo,
message);
// create return replyTo queue, snipped in next code section
};
I am looking to create the return to topic in the above receive handler:
var result = channel.QueueDeclare(
queue: ea.BasicProperties.ReplyTo,
durable: false,
exclusive: false,
autoDelete: false,
arguments: null);
Alternatively, i could store the received announcements in a database, and on a regular timer run through this list and declare a queue for each on every pass.
In both scenarioes this newly created channel would then be used at a future point by the server to send queries to the client.
My questions are please:
1) Is it better to create a reply channel on the server when receiving the message from client, or if i do it externally (on a timer) are there any performance issues for declaring queues that already exist (there could be thousands of end points)?
2) If a client starts to miss behave, is there any way that they can be booted (in the receive function i can look up how many messages per minute and boot if certain criteria are met)? Are there any other filters that can be defined prior to receive in the pipeline to kick clients who are sending too many messages?
3) In the above example notice my messages continuously come in each run (the same old messages), how do i clear them out please?
I think preventing clients from creating queues just complicates the design without much security benefit.
You are allowing clients to create messages. In RabbitMQ, its not very easy to stop clients from flooding your server with messages.
If you want to rate-limit your clients, RabbitMQ may not be the best choice. It does rate-limiting automatically when servers starts to struggle with processing all the messages, but you can't set a strict rate limit on per-client basis on the server using out-of-the-box solution. Also, clients are normally allowed to create queues.
Approach 1 - Web App
Maybe you should try to use web application instead:
Clients authenticate with your server
To Announce, clients send a POST request to a certain endpoint, ie /api/announce, maybe providing some credentials that allow them to do so
To receive incoming messages, GET /api/messages
To acknowledge processed message: POST /api/acknowledge
When client acknowledges receipt, you delete your message from database.
With this design, you can write custom logic to rate-limit or ban clients that misbehave and you have full control of your server
Approach 2 - RabbitMQ Management API
If you still want to use RabbitMQ, you can potentially achieve what you want by using RabbitMQ Management API
You'll need to write an app that will query RabbitMQ Management API on timer basis and:
Get all the current connections, and check message rate for each of them.
If message rate exceed your threshold, close connection or revoke user's permissions using /api/permissions/vhost/user endpoint.
In my opinion, web app may be easier if you don't need all the queueing functionality like worker queues or complicated routing that you can get out of the box with RabbitMQ.
Here are some general architecture/reliability ideas for your scenario. Responses to your 3 specific questions are at the end.
General Architecture Ideas
I'm not sure that the declare-response-queues-on-server approach yields performance/stability benefits; you'd have to benchmark that. I think the simplest topology to achieve what you want is the following:
Each client, when it connects, declares an exclusive and/or autodelete anonymous queue. If the clients' network connectivity is so sketchy that holding open a direct connection is undesirable, so something similar to Alex's proposed "Web App" above, and have clients hit an endpoint that declares an exclusive/autodelete queue on their behalf, and closes the connection (automatically deleting the queue upon consumer departure) when a client doesn't get in touch regularly enough. This should only be done if you can't tune the RabbitMQ heartbeats from the clients to work in the face of network unreliability, or if you can prove that you need queue-creation rate limiting inside the web app layer.
Each client's queue is bound to a broadcast topic exchange, which the server uses to communicate broadcast messages (wildcarded routing key) or specifically targeted messages (routing key that only matches one client's queue name).
When the server needs to get a reply back from the clients, you could either have the server declare the response queue before sending the "response-needed" message, and encode the response queue in the message (basically what you're doing now), or you could build semantics in your clients in which they stop consuming from their broadcast queue for a fixed amount of time before attempting an exclusive (mutex) consume again, publish their responses to their own queue, and ensure that the server consumes those responses within the allotted time, before closing the server consume and restoring normal broadcast semantics. That second approach is much more complicated and likely not worth it, though.
Preventing Clients Overwhelming RabbitMQ
Things that can reduce the server load and help prevent clients DoSing your server with RMQ operations include:
Setting appropriate, low max-length thresholds on all the queues, so the amount of messages stored by the server will never exceed a certain multiple of the number of clients.
Setting per-queue expirations, or per-message expirations, to make sure that stale messages do not accumulate.
Rate-limiting specific RabbitMQ operations is quite tricky, but you can rate-limit at the TCP level (using e.g. HAProxy or other router/proxy stacks), to ensure that your clients don't send too much data, or open too many connections, at a time. In my experience (just one data point; if in doubt, benchmark!) RabbitMQ cares less about the count of messages ingested per time than it does the data volume and largest possible per-message size ingested. Lots of small messages are usually OK; a few huge ones can cause latency spikes, otherwise, rate-limiting the bytes at the TCP layer will probably allow you to scale such a system very far before you have to re-assess.
Specific Answers
In light of the above, my answers to your specific questions would be:
Q: Should you create reply queues on the server in response to received messages?
A: Yes, probably. If you're worried about the queue-creation rate
that happens as a result of that, you can rate-limit per server instance. It looks like you're using Node, so you should be able to use one of the existing solutions for that platform to have a single queue-creation rate limiter per node server instance, which, unless you have many thousands of servers (not clients), should allow you to reach a very, very large scale before re-assessing.
Q: Are there performance implications to declaring queues based on client actions? Or re-declaring queues?
A: Benchmark and see! Re-declares are probably OK; if you rate-limit properly you may not need to worry about this at all. In my experience, floods of queue-declare events can cause latency to go up a bit, but don't break the server. But that's just my experience! Everyone's scenario/deployment is different, so there's no substitute for benchmarking. In this case, you'd fire up a publisher/consumer with a steady stream of messages, tracking e.g. publish/confirm latency or message-received latency, rabbitmq server load/resource usage, etc. While some number of publish/consume pairs were running, declare a lot of queues in high parallel and see what happens to your metrics. Also in my experience, the redeclaration of queues (idempotent) doesn't cause much if any noticeable load spikes. More important to watch is the rate of establishing new connections/channels. You can also rate-limit queue creations very effectively on a per-server basis (see my answer to the first question), so I think if you implement that correctly you won't need to worry about this for a long time. Whether RabbitMQ's performance suffers as a function of the number of queues that exist (as opposed to declaration rate) would be another thing to benchmark though.
Q: Can you kick clients based on misbehavior? Message rates?
A: Yes, though it's a bit tricky to set up, this can be done in an at least somewhat elegant way. You have two options:
Option one: what you proposed: keep track of message rates on your server, as you're doing, and "kick" clients based on that. This has coordination problems if you have more than one server, and requires writing code that lives in your message-receive loops, and doesn't trip until RabbitMQ actually delivers the messages to your server's consumers. Those are all significant drawbacks.
Option two: use max-length, and dead letter exchanges to build a "kick bad clients" agent. The length limits on RabbitMQ queues tell the queue system "if more messages than X are in the queue, drop them or send them to the dead letter exchange (if one is configured)". Dead-letter exchanges allow you to send messages that are greater than the length (or meet other conditions) to a specific queue/exchange. Here's how you can combine those to detect clients that publish messages too quickly (faster than your server can consume them) and kick clients:
Each client declares it's main $clientID_to_server queue with a max-length of some number, say X that should never build up in the queue unless the client is "outrunning" the server. That queue has a dead-letter topic exchange of ratelimit or some constant name.
Each client also declares/owns a queue called $clientID_overwhelm, with a max-length of 1. That queue is bound to the ratelimit exchange with a routing key of $clientID_to_server. This means that when messages are published to the $clientID_to_server queue at too great a rate for the server to keep up, the messages will be routed to $clientID_overwhelm, but only one will be kept around (so you don't fill up RabbitMQ, and only ever store X+1 messages per client).
You start a simple agent/service which discovers (e.g. via the RabbitMQ Management API) all connected client IDs, and consumes (using just one connection) from all of their *_overwhelm queues. Whenever it receives a message on that connection, it gets the client ID from the routing key of that message, and then kicks that client (either by doing something out-of-band in your app; deleting that client's $clientID_to_server and $clientID_overwhelm queues, thus forcing an error the next time the client tries to do anything; or closing that client's connection to RabbitMQ via the /connections endpoint in the RabbitMQ management API--this is pretty intrusive and should only be done if you really need to). This service should be pretty easy to write, since it doesn't need to coordinate state with any other parts of your system besides RabbitMQ. You'll lose some messages from misbehaving clients with this solution, though: if you need to keep them all, remove the max-length limit on the overwhelm queue (and run the risk of filling up RabbitMQ).
Using that approach, you can detect spamming clients as they happen according to RabbitMQ, not just as they happen according to your server. You could extend it by also adding a per-message TTL to messages sent by the clients, and triggering the dead-letter-kick behavior if messages sit in the queue for more than a certain amount of time--this would change the pseudo-rate-limiting from "when the server consumer gets behind by message count" to "when the server consumer gets behind by message delivery timestamp".
Q: Why do messages get redelivered on each run, and how do I get rid of them?
A: Use acknowledgements or noack (but probably acknowledgements). Getting a message in "receive" just pulls it into your consumer, but doesn't pop it from the queue. It's like a database transaction: to finally pop it you have to acknowledge it after you receive it. Altnernatively, you could start your consumer in "noack" mode, which will cause the receive behavior to work the way you assumed it would. However, be warned, noack mode imposes a big tradeoff: since RabbitMQ is delivering messages to your consumer out-of-band (basically: even if your server is locked up or sleeping, if it has issued a consume, rabbit is pushing messages to it), if you consume in noack mode those messages are permanently removed from RabbitMQ when it pushes them to the server, so if the server crashes or shuts down before draining its "local queue" with any messages pending-receive, those messages will be lost forever. Be careful with this if it's important that you don't lose messages.

How do I monitor RabbitMQ exchange lifecycle events

I'm working with a product suite which uses RabbitMQ as a back end for service bus messaging. Many of the clients use software (NeuronESB) which is supposed to automatically configure exchanges, queues and channels as needed. Somewhere in the system exchanges in Rabbit are being deleted and not re-created, resulting in unexpected issues. Because of the size of the system and closed source nature of at least one of the service bus clients, an audit of code has been unsuccessful in determining the source of the deletion of these exchanges.
I have tried using the firehose functionality of Rabbit, but that only provides the messages being sent through Rabbit, not the internal activities I need.
What methods are available for logging the creation and deletion of exchanges in RabbitMQ? Ideally I would like to know the date, time and client IP of the deleter, but even just getting the date and time would allow me to narrow my search of logs to help find the offender.
Try Events Exchange plugin that should do the trick.
If not working for some reason, the last resort I can think of:
Get a test environment with less clients/messages if you app is busy, then analyse your traffic with wireshark (it can understand amqp) to filter out requests to delete exchange.

Redis publish-subscribe: Is Redis guaranteed to deliver the message even under massive stress?

Provided that both the client subscribed and the server publishing the message retain the connection, is Redis guaranteed to always deliver the published message to the subscribed client eventually, even under situations where the client and/or server are massively stressed? Or should I plan for the possibility that Redis might ocasionally drop messages as things get "hot"?
Redis does absolutely not provide any guaranteed delivery for the publish-and-subscribe traffic. This mechanism is only based on sockets and event loops, there is no queue involved (even in memory). If a subscriber is not listening while a publication occurs, the event will be lost for this subscriber.
It is possible to implement some guaranteed delivery mechanisms on top of Redis, but not with the publish-and-subscribe API. The list data type in Redis can be used as a queue, and as the the foundation of more advanced queuing systems, but it does not provide multicast capabilities (so no publish-and-subscribe).
AFAIK, there is no obvious way to easily implement publish-and-subscribe and guaranteed delivery at the same time with Redis.
Redis does not provide guaranteed delivery using its Pub/Sub mechanism. Moreover, if a subscriber is not actively listening on a channel, it will not receive messages that would have been published.
I previously wrote a detailed article that describes how one can use Redis lists in combination with BLPOP to implement reliable multicast pub/sub delivery:
http://blog.radiant3.ca/2013/01/03/reliable-delivery-message-queues-with-redis/
For the record, here's the high-level strategy:
When each consumer starts up and gets ready to consume messages, it registers by adding itself to a Set representing all consumers registered on a queue.
When a producers publishes a message on a queue, it:
Saves the content of the message in a Redis key
Iterates over the set of consumers registered on the queue, and pushes the message ID in a List for each of the registered consumers
Each consumer continuously looks out for a new entry in its consumer-specific list and when one comes in, removes the entry (using a BLPOP operation), handles the message and moves on to the next message.
I have also made a Java implementation of these principles available open-source:
https://github.com/davidmarquis/redisq
These principles have been used to process about 1,000 messages per second from a single Redis instance and two instances of the consumer application, each instance consuming messages with 5 threads.

Notify consumer when a queue is deleted on rabbitmq

I have some clients that are connected to an exchange via autodelete:yes. These all are publishers and consumers. But basically for now let's assume they are publising messages. Because each client has a unique binding key I can do explicit stuff on each message on the machine that consumes these machines. Everything works fine.
Now if the clients crashed or I terminate it manually (via SIGINT, ctrl+c) then the queue get deleted. Is there any way I can notifiy the consumers on the remote machines that the queue is deleted?
I'm thinking of creating a signal handler on my client application, thus whenever I catch a SIGINT or SIGTERM, then I'll notify the remote consumer (I'll send them a message that that the queue with the unique id is going to be deleted)
Is there any other ways to do this, or is my way the correct way to do this?
As a general rule in messaging, consuming applications do not care about the status of producing applications.
In RabbitMQ, producing applications may become aware of a consuming application's status by way of one of two mechanisms. The first (and preferred) method is via a Dead-Letter Exchange (dlx). When your message can't be delivered (because the destination queue does not exist), it is routed here, and your application is able to pull messages off queues configured on the DLX to figure out if they didn't make it to their destination.
The second method is to set the Mandatory flag on the message. This will cause the broker to send the message right back to the producing application via a Basic.Return method in cases where the destination queue is no longer there.
If the above items don't meet your needs, you may want to revisit your architecture somewhat as there is probably a better way to design your application.

Why is pausing a queue not a broker function?

I was looking for an ActiveMQ broker admin command, to tell it to pause a queue - that is:
continue accepting messages from producing clients
cease delivering to consuming clients, allowing the queue backlog to grow until the queue is resumed, whereupon the backlog is sent to clients.
I was unable to find such a command. The commonest answer was that it should be managed at the client end -- that is, locate every consumer and stop it. Other answers were workarounds, like manipulating network routes or firewalls so that the clients and broker could no longer communicate.
A cursory survey of other message queues indicates that ActiveMQ is not unusual in this regard.
It seems to me there are two reasons this functionality might not be implemented:
It is difficult to implement -- but I can't think of any reason why.
It is counter to the design philosophy of message queues
Which is it, and why?
Being able to pause a queue is supported in the newly released ActiveMQ 5.12.0:
When the queue is "paused":
NO messages sent to the associate consumers
messages still to be enqueued on the queue
ability to be able to browse the queue
all the JMX counters for the queue to be available and correct.
...
implemented pause/resume/isPaused queue view mbean ops and attribute
when paused, there is no dispatch to regular queue consumers, send
and browse work as normal. Any inflight messages will continue inflight
till ackes as normal.
See https://issues.apache.org/jira/browse/AMQ-5229
If you have Jolokia enabled (I think it is enabled by default nowadays), you can use something like the following curl request to pause the queue:
curl --user admin:admin http://127.0.0.1:8161/api/jolokia/exec/org.apache.activemq:brokerName=localhost,destinationName=myQueue,destinationType=Queue,type=Broker/pause
(Using the default username, password and broker name and a queue called myQueue)
Replace "pause" with "resume" in order to resume the queue.
Probably not too complicated to implement - as you say.
I don't know if it's an active design decision of if there has been no demand. Other similar products such as IBM WebSphere MQ implements "get/put inhibited" on queues, so it's obviously is not totally against the philosofy of messaging - rather a tool to operate and trouble shoot live systems.
I'm a bit biased, but I actually like to decouple the sender from the receive (if the are two different systems, that might eventually get switched/upgraded/changed..).
An easy way to decouple the systems, and be able to do what you want is to make the sender send to one queue "DATA.OUT" and the receiver listen to another "DATA.IN". Then you can use Apache Camel (which is typically bundled with ActiveMQ to achieve Enterprise Integration Patterns), to route from DATA.OUT to DATA.IN.
A Camel Route is possible to start/stop via JMX, which will achieve something similar to what you described.
I guess ActiveMQ design in the matter rather have you do these kind of things in a middleware layer, such as Apache Camel, rather than direct on the queues.