Scatter Gather : Wait for all "Gather-Workers" to complete [duplicate] - rabbitmq

I've configured a rabbitmq fanout exchange called "ex_foo" for a RPC workload. When clients connect to the server, they create their own non-durable RPC receive queue and connect to it with a BasicConsumer. The apps listen for messages/commands and respond to the queue defined in the reply_to part of the request.
One of the simple messages/commands I'm sending out the the fanout exchange (and thus, every application/client connected to it) is a type of ping request message, and my problem is that I don't know how many ping responses I will get (or should expect), because I don't know how many clients are connected to the fanout exchange at any one time. All clients connected to the fanout exchange should reply.
If gets delivered to 10 queues on the fanout exchange (ie: 10 clients are connected), how do I know how many responses to expect? In order to know that, would I have to know how many times it was delivered? Is there anything more sophisticated and a sleep timer? Simply, my admin tool can't just wait indefinitely and needs to quit after it has recveived all pings (or a time-out has elapsed).

What you are looking for is something like a Scatter-Gather (http://www.eaipatterns.com/BroadcastAggregate.html) pattern, isn’t it?
You don’t know the consumers bound to the fan-out, so you can:
implement an keep-alive from the consumer(s) using for example an queue where the producer is bound.
Each consumer sends a keep-alive each one second, if you don’t receive a message you can considerer the consumer off-line.
Use an in-memory database where the consumer are registered (always with a keep-alive).
Use the HTTP API to know the consumers list bound to the fan-out, in this way:
http://rabbitmqip/vhost/yourfanout/bindings/source and the result is like this:
[{"source":"yourfanout","vhost":"/","destination":"amq.gen-xOpYc8m10Qy1s4KCNFCgFw","destination_type":"queue","routing_key":"","arguments":{},"properties_key":"~"},{"source":" yourfanout","vhost":"/","destination":"myqueue","destination_type":"queue","routing_key":"","arguments":{},"properties_key":"~"}]
Once count the consumers you know the replies count.
Call the API before send a request.
NOTE the last-one can works only if you use a temporary queue bound to the consumers.
I found this resource that could help you (http://geekswithblogs.net/michaelstephenson/archive/2012/08/06/150373.aspx)
I don't know exactly your final scope, but with a keep-alive you can wait max one second before decide if the consumer is alive.

Related

RabbitMQ security design to declare queues from server (and use from client)

I have a test app (first with RabbitMQ) which runs on partially trusted clients (in that i don't want them creating queues on their own), so i will look into the security permissions of the queues and credentials that the clients connect with.
For messaging there are mostly one-way broadcasts from server to clients, and sometimes a query from server to a specific client (over which the replies will be sent on a replyTo queue which is dedicated to that client on which the server listens for responses).
I currently have a receive function on the server which looks out for "Announce" broadcast from clients:
agentAnnounceListener.Received += (model, ea) =>
{
var body = ea.Body;
var props = ea.BasicProperties;
var message = Encoding.UTF8.GetString(body);
Console.WriteLine(
"[{0}] from: {1}. body: {2}",
DateTimeOffset.FromUnixTimeMilliseconds(ea.BasicProperties.Timestamp.UnixTime).Date,
props.ReplyTo,
message);
// create return replyTo queue, snipped in next code section
};
I am looking to create the return to topic in the above receive handler:
var result = channel.QueueDeclare(
queue: ea.BasicProperties.ReplyTo,
durable: false,
exclusive: false,
autoDelete: false,
arguments: null);
Alternatively, i could store the received announcements in a database, and on a regular timer run through this list and declare a queue for each on every pass.
In both scenarioes this newly created channel would then be used at a future point by the server to send queries to the client.
My questions are please:
1) Is it better to create a reply channel on the server when receiving the message from client, or if i do it externally (on a timer) are there any performance issues for declaring queues that already exist (there could be thousands of end points)?
2) If a client starts to miss behave, is there any way that they can be booted (in the receive function i can look up how many messages per minute and boot if certain criteria are met)? Are there any other filters that can be defined prior to receive in the pipeline to kick clients who are sending too many messages?
3) In the above example notice my messages continuously come in each run (the same old messages), how do i clear them out please?
I think preventing clients from creating queues just complicates the design without much security benefit.
You are allowing clients to create messages. In RabbitMQ, its not very easy to stop clients from flooding your server with messages.
If you want to rate-limit your clients, RabbitMQ may not be the best choice. It does rate-limiting automatically when servers starts to struggle with processing all the messages, but you can't set a strict rate limit on per-client basis on the server using out-of-the-box solution. Also, clients are normally allowed to create queues.
Approach 1 - Web App
Maybe you should try to use web application instead:
Clients authenticate with your server
To Announce, clients send a POST request to a certain endpoint, ie /api/announce, maybe providing some credentials that allow them to do so
To receive incoming messages, GET /api/messages
To acknowledge processed message: POST /api/acknowledge
When client acknowledges receipt, you delete your message from database.
With this design, you can write custom logic to rate-limit or ban clients that misbehave and you have full control of your server
Approach 2 - RabbitMQ Management API
If you still want to use RabbitMQ, you can potentially achieve what you want by using RabbitMQ Management API
You'll need to write an app that will query RabbitMQ Management API on timer basis and:
Get all the current connections, and check message rate for each of them.
If message rate exceed your threshold, close connection or revoke user's permissions using /api/permissions/vhost/user endpoint.
In my opinion, web app may be easier if you don't need all the queueing functionality like worker queues or complicated routing that you can get out of the box with RabbitMQ.
Here are some general architecture/reliability ideas for your scenario. Responses to your 3 specific questions are at the end.
General Architecture Ideas
I'm not sure that the declare-response-queues-on-server approach yields performance/stability benefits; you'd have to benchmark that. I think the simplest topology to achieve what you want is the following:
Each client, when it connects, declares an exclusive and/or autodelete anonymous queue. If the clients' network connectivity is so sketchy that holding open a direct connection is undesirable, so something similar to Alex's proposed "Web App" above, and have clients hit an endpoint that declares an exclusive/autodelete queue on their behalf, and closes the connection (automatically deleting the queue upon consumer departure) when a client doesn't get in touch regularly enough. This should only be done if you can't tune the RabbitMQ heartbeats from the clients to work in the face of network unreliability, or if you can prove that you need queue-creation rate limiting inside the web app layer.
Each client's queue is bound to a broadcast topic exchange, which the server uses to communicate broadcast messages (wildcarded routing key) or specifically targeted messages (routing key that only matches one client's queue name).
When the server needs to get a reply back from the clients, you could either have the server declare the response queue before sending the "response-needed" message, and encode the response queue in the message (basically what you're doing now), or you could build semantics in your clients in which they stop consuming from their broadcast queue for a fixed amount of time before attempting an exclusive (mutex) consume again, publish their responses to their own queue, and ensure that the server consumes those responses within the allotted time, before closing the server consume and restoring normal broadcast semantics. That second approach is much more complicated and likely not worth it, though.
Preventing Clients Overwhelming RabbitMQ
Things that can reduce the server load and help prevent clients DoSing your server with RMQ operations include:
Setting appropriate, low max-length thresholds on all the queues, so the amount of messages stored by the server will never exceed a certain multiple of the number of clients.
Setting per-queue expirations, or per-message expirations, to make sure that stale messages do not accumulate.
Rate-limiting specific RabbitMQ operations is quite tricky, but you can rate-limit at the TCP level (using e.g. HAProxy or other router/proxy stacks), to ensure that your clients don't send too much data, or open too many connections, at a time. In my experience (just one data point; if in doubt, benchmark!) RabbitMQ cares less about the count of messages ingested per time than it does the data volume and largest possible per-message size ingested. Lots of small messages are usually OK; a few huge ones can cause latency spikes, otherwise, rate-limiting the bytes at the TCP layer will probably allow you to scale such a system very far before you have to re-assess.
Specific Answers
In light of the above, my answers to your specific questions would be:
Q: Should you create reply queues on the server in response to received messages?
A: Yes, probably. If you're worried about the queue-creation rate
that happens as a result of that, you can rate-limit per server instance. It looks like you're using Node, so you should be able to use one of the existing solutions for that platform to have a single queue-creation rate limiter per node server instance, which, unless you have many thousands of servers (not clients), should allow you to reach a very, very large scale before re-assessing.
Q: Are there performance implications to declaring queues based on client actions? Or re-declaring queues?
A: Benchmark and see! Re-declares are probably OK; if you rate-limit properly you may not need to worry about this at all. In my experience, floods of queue-declare events can cause latency to go up a bit, but don't break the server. But that's just my experience! Everyone's scenario/deployment is different, so there's no substitute for benchmarking. In this case, you'd fire up a publisher/consumer with a steady stream of messages, tracking e.g. publish/confirm latency or message-received latency, rabbitmq server load/resource usage, etc. While some number of publish/consume pairs were running, declare a lot of queues in high parallel and see what happens to your metrics. Also in my experience, the redeclaration of queues (idempotent) doesn't cause much if any noticeable load spikes. More important to watch is the rate of establishing new connections/channels. You can also rate-limit queue creations very effectively on a per-server basis (see my answer to the first question), so I think if you implement that correctly you won't need to worry about this for a long time. Whether RabbitMQ's performance suffers as a function of the number of queues that exist (as opposed to declaration rate) would be another thing to benchmark though.
Q: Can you kick clients based on misbehavior? Message rates?
A: Yes, though it's a bit tricky to set up, this can be done in an at least somewhat elegant way. You have two options:
Option one: what you proposed: keep track of message rates on your server, as you're doing, and "kick" clients based on that. This has coordination problems if you have more than one server, and requires writing code that lives in your message-receive loops, and doesn't trip until RabbitMQ actually delivers the messages to your server's consumers. Those are all significant drawbacks.
Option two: use max-length, and dead letter exchanges to build a "kick bad clients" agent. The length limits on RabbitMQ queues tell the queue system "if more messages than X are in the queue, drop them or send them to the dead letter exchange (if one is configured)". Dead-letter exchanges allow you to send messages that are greater than the length (or meet other conditions) to a specific queue/exchange. Here's how you can combine those to detect clients that publish messages too quickly (faster than your server can consume them) and kick clients:
Each client declares it's main $clientID_to_server queue with a max-length of some number, say X that should never build up in the queue unless the client is "outrunning" the server. That queue has a dead-letter topic exchange of ratelimit or some constant name.
Each client also declares/owns a queue called $clientID_overwhelm, with a max-length of 1. That queue is bound to the ratelimit exchange with a routing key of $clientID_to_server. This means that when messages are published to the $clientID_to_server queue at too great a rate for the server to keep up, the messages will be routed to $clientID_overwhelm, but only one will be kept around (so you don't fill up RabbitMQ, and only ever store X+1 messages per client).
You start a simple agent/service which discovers (e.g. via the RabbitMQ Management API) all connected client IDs, and consumes (using just one connection) from all of their *_overwhelm queues. Whenever it receives a message on that connection, it gets the client ID from the routing key of that message, and then kicks that client (either by doing something out-of-band in your app; deleting that client's $clientID_to_server and $clientID_overwhelm queues, thus forcing an error the next time the client tries to do anything; or closing that client's connection to RabbitMQ via the /connections endpoint in the RabbitMQ management API--this is pretty intrusive and should only be done if you really need to). This service should be pretty easy to write, since it doesn't need to coordinate state with any other parts of your system besides RabbitMQ. You'll lose some messages from misbehaving clients with this solution, though: if you need to keep them all, remove the max-length limit on the overwhelm queue (and run the risk of filling up RabbitMQ).
Using that approach, you can detect spamming clients as they happen according to RabbitMQ, not just as they happen according to your server. You could extend it by also adding a per-message TTL to messages sent by the clients, and triggering the dead-letter-kick behavior if messages sit in the queue for more than a certain amount of time--this would change the pseudo-rate-limiting from "when the server consumer gets behind by message count" to "when the server consumer gets behind by message delivery timestamp".
Q: Why do messages get redelivered on each run, and how do I get rid of them?
A: Use acknowledgements or noack (but probably acknowledgements). Getting a message in "receive" just pulls it into your consumer, but doesn't pop it from the queue. It's like a database transaction: to finally pop it you have to acknowledge it after you receive it. Altnernatively, you could start your consumer in "noack" mode, which will cause the receive behavior to work the way you assumed it would. However, be warned, noack mode imposes a big tradeoff: since RabbitMQ is delivering messages to your consumer out-of-band (basically: even if your server is locked up or sleeping, if it has issued a consume, rabbit is pushing messages to it), if you consume in noack mode those messages are permanently removed from RabbitMQ when it pushes them to the server, so if the server crashes or shuts down before draining its "local queue" with any messages pending-receive, those messages will be lost forever. Be careful with this if it's important that you don't lose messages.

RabbitMQ: dropping messages when no consumers are connected

I'm trying to setup RabbitMQ in a model where there is only one producer and one consumer, and where messages sent by the producer are delivered to the consumer only if the consumer is connected, but dropped if the consumer is not present.
Basically I want the queue to drop all the messages it receives when no consumer is connected to it.
An additional constraint is that the queue must be declared on the RabbitMQ server side, and must not be explicitly created by the consumer or the producer.
Is that possible?
I've looked at a few things, but I can't seem to make it work:
durable vs non-durable does not work, because it is only useful when the broker restarts. I need the same effect but on a connection.
setting auto_delete to true on the queue means that my client can never connect to this queue again.
x-message-ttl and max-length make it possible to lose message even when there is a consumer connected.
I've looked at topic exchanges, but as far as I can tell, these only affect the routing of messages between the exchange and the queue based on the message content, and can't take into account whether or not a queue has connected consumers.
The effect that I'm looking for would be something like auto_delete on disconnect, and auto_create on connect. Is there a mechanism in rabbitmq that lets me do that?
After a bit more research, I discovered that one of the assumptions in my question regarding x-message-ttl was wrong. I overlooked a single sentence from the RabbitMQ documentation:
Setting the TTL to 0 causes messages to be expired upon reaching a queue unless they can be delivered to a consumer immediately
https://www.rabbitmq.com/ttl.html
It turns out that the simplest solution is to set x-message-ttl to 0 on my queue.
You can not doing it directly, but there is a mechanism not dificult to implement.
You have to enable the Event Exchange Plugin. This is a exchange at which your server app can connect and will receive internal events of RabbitMQ. You would be interested in the consumer.created and consumer.deleted events.
When these events are received you can trigger an action (create or delete the queue you need). More information here: https://www.rabbitmq.com/event-exchange.html
Hope this helps.
If your consumer is allowed to dynamically bind / unbind a queue during start/stop on the broker it should be possible by that way (e.g. queue is pre setup and the consumer binds the queue during startup to an exchange it wants to receive messages from)

RabbitMQ same message to each consumer

I have implemented the example from the RabbitMQ website:
RabbitMQ Example
I have expanded it to have an application with a button to send a message.
Now I started two consumer on two different computers.
When I send the message the first message is sent to computer1, then the second message is sent to computer2, the thrid to computer1 and so on.
Why is this, and how can I change the behavior to send each message to each consumer?
Why is this
As noted by Yazan, messages are consumed from a single queue in a round-robin manner. The behavior your are seeing is by design, making it easy to scale up the number of consumers for a given queue.
how can I change the behavior to send each message to each consumer?
To have each consumer receive the same message, you need to create a queue for each consumer and deliver the same message to each queue.
The easiest way to do this is to use a fanout exchange. This will send every message to every queue that is bound to the exchange, completely ignoring the routing key.
If you need more control over the routing, you can use a topic or direct exchange and manage the routing keys.
Whatever type of exchange you choose, though, you will need to have a queue per consumer and have each message routed to each queue.
you can't it's controlled by the server check Round-robin dispatching section
It decides which consumer turn is. i'm not sure if there is a set of algorithms you can pick from, but at the end server will control this (i think round robin algorithm is default)
unless you want to use routing keys and exchanges
I would see this more as a design question. Ideally, producers should create the exchanges and the consumers create the queues and each consumer can create its own queue and hook it up to an exchange. This makes sure every consumer gets its message with its private queue.
What youre doing is essentially 'worker queues' model which is used to distribute tasks among worker nodes. Since each task needs to be performed only once, the message is sent to only one node. If you want to send a message to all the nodes, you need a different model called 'pub-sub' where each message is broadcasted to all the subscribers. The following link shows a simple pub-sub tutorial
https://www.rabbitmq.com/tutorials/tutorial-three-python.html

API design around RabbitMQ for publisher/subscriber

TL;DR - Whats the best way to expose RabbitMQ to a consumer via REST API?
I'm creating an API to publish and consume message from RabbitMQ. In my current design, the publisher is going to make a POST request. My API will route the POST request to the exchange. In this way, the publisher doesn't have to know the server address, exchange name etc. while publishing.
Now the consumer part is where I'm not sure how to proceed.
At the beginning there will be no queues. When a new consumer wants to subscribe to a TOPIC, then I will create a queue and bind it to the exchange. I need help with answers to few questions -
Once I create a queue for the consumer, what's the next step to let the consumer get messages from that queue?
I make the consumer ask for a batch of messages(say 50 messages) from the queue. Then once I receive an ack from the consumer I will send the next 50 messages from queue. If I don't receive an ack I will requeue the 50 messages back into the queue. Isn't this expensive in terms of opening and closing connection between the consumer and my API?
If there is a better approach then please suggest
In general, your idea of putting RMQ behind a REST API is a good one. You don't want to expose RMQ to the world, directly.
For the specific questions:
Once I create a queue for the consumer, what's the next step to let the consumer get messages from that queue?
Have you read the tutorials? I would start there, for the language you are working with: http://www.rabbitmq.com/getstarted.html
Isn't this expensive in terms of opening and closing connection between the consumer and my API?
Don't open and close connections for each batch of messages.
Your application instance (the "consumer" app) should have a single connection. That connection stays open as long as you need it - across as many calls to RabbitMQ as you want.
I typically open my RMQ connection as soon as the app starts, and I leave it open until the app shuts down.
Within the consumer app, using that one single connection, you will create multiple channels through the connection. A channel is where the actual work is done.
Depending on your language, you will have a single channel per thread; a single channel per queue being consumed; etc
You can create and destroy channels very quickly, unlike connections.
More specifically with your idea of batch processing, this will be handled by putting a consumer prefetch limit on your consumer and then requiring messages to be acknowledged after processing it.

Waiting for all rabbitmq responses on a fanout exchange?

I've configured a rabbitmq fanout exchange called "ex_foo" for a RPC workload. When clients connect to the server, they create their own non-durable RPC receive queue and connect to it with a BasicConsumer. The apps listen for messages/commands and respond to the queue defined in the reply_to part of the request.
One of the simple messages/commands I'm sending out the the fanout exchange (and thus, every application/client connected to it) is a type of ping request message, and my problem is that I don't know how many ping responses I will get (or should expect), because I don't know how many clients are connected to the fanout exchange at any one time. All clients connected to the fanout exchange should reply.
If gets delivered to 10 queues on the fanout exchange (ie: 10 clients are connected), how do I know how many responses to expect? In order to know that, would I have to know how many times it was delivered? Is there anything more sophisticated and a sleep timer? Simply, my admin tool can't just wait indefinitely and needs to quit after it has recveived all pings (or a time-out has elapsed).
What you are looking for is something like a Scatter-Gather (http://www.eaipatterns.com/BroadcastAggregate.html) pattern, isn’t it?
You don’t know the consumers bound to the fan-out, so you can:
implement an keep-alive from the consumer(s) using for example an queue where the producer is bound.
Each consumer sends a keep-alive each one second, if you don’t receive a message you can considerer the consumer off-line.
Use an in-memory database where the consumer are registered (always with a keep-alive).
Use the HTTP API to know the consumers list bound to the fan-out, in this way:
http://rabbitmqip/vhost/yourfanout/bindings/source and the result is like this:
[{"source":"yourfanout","vhost":"/","destination":"amq.gen-xOpYc8m10Qy1s4KCNFCgFw","destination_type":"queue","routing_key":"","arguments":{},"properties_key":"~"},{"source":" yourfanout","vhost":"/","destination":"myqueue","destination_type":"queue","routing_key":"","arguments":{},"properties_key":"~"}]
Once count the consumers you know the replies count.
Call the API before send a request.
NOTE the last-one can works only if you use a temporary queue bound to the consumers.
I found this resource that could help you (http://geekswithblogs.net/michaelstephenson/archive/2012/08/06/150373.aspx)
I don't know exactly your final scope, but with a keep-alive you can wait max one second before decide if the consumer is alive.