Exchanges/queues access permissions in RabbitMQ? - rabbitmq

I'm writing a mobile application and considering RabbitMQ for user-to-user messaging. As far as I understand, it's possible for each client to listen on it's unique queue (named, say, after user id) and message to other clients by means of publishing to their queues. But I'm worried that such model is unreliable as nasty clients may, say, read from someone else's queue thus disrupting the whole scheme.
Is there a way to impose basic read-write permissions on RabbitMQ queues/exchanges thus providing security? Or am I getting the whole concept wrong and peer-to-peer communication just isn't what the thing is designed for?

Access control is implemented in RabbitMQ. If you want user U0 to have only read access to queue Q0, you can set U0 read permission to ^QO$. If you want finer control over users permissions, you'll have to implement this in your application, server-side.
Yet, I'd say that your design doesn't scale well. Although there isn't any hardcoded limit for the number of queues and exchanges you can delcare, you should keep this number quite reasonable since it's a bit resource-intensive. 1 million users would mean 1 million queue/exchange processes and possibly 1 million opened sockets if each user is consuming its queue. Not sure a single host would handle this properly.
For a messaging app, I would rather recommend you use any persistent database. RabbitMQ is not really suited for this kind of use-cases.

Related

RabbitMQ security design to declare queues from server (and use from client)

I have a test app (first with RabbitMQ) which runs on partially trusted clients (in that i don't want them creating queues on their own), so i will look into the security permissions of the queues and credentials that the clients connect with.
For messaging there are mostly one-way broadcasts from server to clients, and sometimes a query from server to a specific client (over which the replies will be sent on a replyTo queue which is dedicated to that client on which the server listens for responses).
I currently have a receive function on the server which looks out for "Announce" broadcast from clients:
agentAnnounceListener.Received += (model, ea) =>
{
var body = ea.Body;
var props = ea.BasicProperties;
var message = Encoding.UTF8.GetString(body);
Console.WriteLine(
"[{0}] from: {1}. body: {2}",
DateTimeOffset.FromUnixTimeMilliseconds(ea.BasicProperties.Timestamp.UnixTime).Date,
props.ReplyTo,
message);
// create return replyTo queue, snipped in next code section
};
I am looking to create the return to topic in the above receive handler:
var result = channel.QueueDeclare(
queue: ea.BasicProperties.ReplyTo,
durable: false,
exclusive: false,
autoDelete: false,
arguments: null);
Alternatively, i could store the received announcements in a database, and on a regular timer run through this list and declare a queue for each on every pass.
In both scenarioes this newly created channel would then be used at a future point by the server to send queries to the client.
My questions are please:
1) Is it better to create a reply channel on the server when receiving the message from client, or if i do it externally (on a timer) are there any performance issues for declaring queues that already exist (there could be thousands of end points)?
2) If a client starts to miss behave, is there any way that they can be booted (in the receive function i can look up how many messages per minute and boot if certain criteria are met)? Are there any other filters that can be defined prior to receive in the pipeline to kick clients who are sending too many messages?
3) In the above example notice my messages continuously come in each run (the same old messages), how do i clear them out please?
I think preventing clients from creating queues just complicates the design without much security benefit.
You are allowing clients to create messages. In RabbitMQ, its not very easy to stop clients from flooding your server with messages.
If you want to rate-limit your clients, RabbitMQ may not be the best choice. It does rate-limiting automatically when servers starts to struggle with processing all the messages, but you can't set a strict rate limit on per-client basis on the server using out-of-the-box solution. Also, clients are normally allowed to create queues.
Approach 1 - Web App
Maybe you should try to use web application instead:
Clients authenticate with your server
To Announce, clients send a POST request to a certain endpoint, ie /api/announce, maybe providing some credentials that allow them to do so
To receive incoming messages, GET /api/messages
To acknowledge processed message: POST /api/acknowledge
When client acknowledges receipt, you delete your message from database.
With this design, you can write custom logic to rate-limit or ban clients that misbehave and you have full control of your server
Approach 2 - RabbitMQ Management API
If you still want to use RabbitMQ, you can potentially achieve what you want by using RabbitMQ Management API
You'll need to write an app that will query RabbitMQ Management API on timer basis and:
Get all the current connections, and check message rate for each of them.
If message rate exceed your threshold, close connection or revoke user's permissions using /api/permissions/vhost/user endpoint.
In my opinion, web app may be easier if you don't need all the queueing functionality like worker queues or complicated routing that you can get out of the box with RabbitMQ.
Here are some general architecture/reliability ideas for your scenario. Responses to your 3 specific questions are at the end.
General Architecture Ideas
I'm not sure that the declare-response-queues-on-server approach yields performance/stability benefits; you'd have to benchmark that. I think the simplest topology to achieve what you want is the following:
Each client, when it connects, declares an exclusive and/or autodelete anonymous queue. If the clients' network connectivity is so sketchy that holding open a direct connection is undesirable, so something similar to Alex's proposed "Web App" above, and have clients hit an endpoint that declares an exclusive/autodelete queue on their behalf, and closes the connection (automatically deleting the queue upon consumer departure) when a client doesn't get in touch regularly enough. This should only be done if you can't tune the RabbitMQ heartbeats from the clients to work in the face of network unreliability, or if you can prove that you need queue-creation rate limiting inside the web app layer.
Each client's queue is bound to a broadcast topic exchange, which the server uses to communicate broadcast messages (wildcarded routing key) or specifically targeted messages (routing key that only matches one client's queue name).
When the server needs to get a reply back from the clients, you could either have the server declare the response queue before sending the "response-needed" message, and encode the response queue in the message (basically what you're doing now), or you could build semantics in your clients in which they stop consuming from their broadcast queue for a fixed amount of time before attempting an exclusive (mutex) consume again, publish their responses to their own queue, and ensure that the server consumes those responses within the allotted time, before closing the server consume and restoring normal broadcast semantics. That second approach is much more complicated and likely not worth it, though.
Preventing Clients Overwhelming RabbitMQ
Things that can reduce the server load and help prevent clients DoSing your server with RMQ operations include:
Setting appropriate, low max-length thresholds on all the queues, so the amount of messages stored by the server will never exceed a certain multiple of the number of clients.
Setting per-queue expirations, or per-message expirations, to make sure that stale messages do not accumulate.
Rate-limiting specific RabbitMQ operations is quite tricky, but you can rate-limit at the TCP level (using e.g. HAProxy or other router/proxy stacks), to ensure that your clients don't send too much data, or open too many connections, at a time. In my experience (just one data point; if in doubt, benchmark!) RabbitMQ cares less about the count of messages ingested per time than it does the data volume and largest possible per-message size ingested. Lots of small messages are usually OK; a few huge ones can cause latency spikes, otherwise, rate-limiting the bytes at the TCP layer will probably allow you to scale such a system very far before you have to re-assess.
Specific Answers
In light of the above, my answers to your specific questions would be:
Q: Should you create reply queues on the server in response to received messages?
A: Yes, probably. If you're worried about the queue-creation rate
that happens as a result of that, you can rate-limit per server instance. It looks like you're using Node, so you should be able to use one of the existing solutions for that platform to have a single queue-creation rate limiter per node server instance, which, unless you have many thousands of servers (not clients), should allow you to reach a very, very large scale before re-assessing.
Q: Are there performance implications to declaring queues based on client actions? Or re-declaring queues?
A: Benchmark and see! Re-declares are probably OK; if you rate-limit properly you may not need to worry about this at all. In my experience, floods of queue-declare events can cause latency to go up a bit, but don't break the server. But that's just my experience! Everyone's scenario/deployment is different, so there's no substitute for benchmarking. In this case, you'd fire up a publisher/consumer with a steady stream of messages, tracking e.g. publish/confirm latency or message-received latency, rabbitmq server load/resource usage, etc. While some number of publish/consume pairs were running, declare a lot of queues in high parallel and see what happens to your metrics. Also in my experience, the redeclaration of queues (idempotent) doesn't cause much if any noticeable load spikes. More important to watch is the rate of establishing new connections/channels. You can also rate-limit queue creations very effectively on a per-server basis (see my answer to the first question), so I think if you implement that correctly you won't need to worry about this for a long time. Whether RabbitMQ's performance suffers as a function of the number of queues that exist (as opposed to declaration rate) would be another thing to benchmark though.
Q: Can you kick clients based on misbehavior? Message rates?
A: Yes, though it's a bit tricky to set up, this can be done in an at least somewhat elegant way. You have two options:
Option one: what you proposed: keep track of message rates on your server, as you're doing, and "kick" clients based on that. This has coordination problems if you have more than one server, and requires writing code that lives in your message-receive loops, and doesn't trip until RabbitMQ actually delivers the messages to your server's consumers. Those are all significant drawbacks.
Option two: use max-length, and dead letter exchanges to build a "kick bad clients" agent. The length limits on RabbitMQ queues tell the queue system "if more messages than X are in the queue, drop them or send them to the dead letter exchange (if one is configured)". Dead-letter exchanges allow you to send messages that are greater than the length (or meet other conditions) to a specific queue/exchange. Here's how you can combine those to detect clients that publish messages too quickly (faster than your server can consume them) and kick clients:
Each client declares it's main $clientID_to_server queue with a max-length of some number, say X that should never build up in the queue unless the client is "outrunning" the server. That queue has a dead-letter topic exchange of ratelimit or some constant name.
Each client also declares/owns a queue called $clientID_overwhelm, with a max-length of 1. That queue is bound to the ratelimit exchange with a routing key of $clientID_to_server. This means that when messages are published to the $clientID_to_server queue at too great a rate for the server to keep up, the messages will be routed to $clientID_overwhelm, but only one will be kept around (so you don't fill up RabbitMQ, and only ever store X+1 messages per client).
You start a simple agent/service which discovers (e.g. via the RabbitMQ Management API) all connected client IDs, and consumes (using just one connection) from all of their *_overwhelm queues. Whenever it receives a message on that connection, it gets the client ID from the routing key of that message, and then kicks that client (either by doing something out-of-band in your app; deleting that client's $clientID_to_server and $clientID_overwhelm queues, thus forcing an error the next time the client tries to do anything; or closing that client's connection to RabbitMQ via the /connections endpoint in the RabbitMQ management API--this is pretty intrusive and should only be done if you really need to). This service should be pretty easy to write, since it doesn't need to coordinate state with any other parts of your system besides RabbitMQ. You'll lose some messages from misbehaving clients with this solution, though: if you need to keep them all, remove the max-length limit on the overwhelm queue (and run the risk of filling up RabbitMQ).
Using that approach, you can detect spamming clients as they happen according to RabbitMQ, not just as they happen according to your server. You could extend it by also adding a per-message TTL to messages sent by the clients, and triggering the dead-letter-kick behavior if messages sit in the queue for more than a certain amount of time--this would change the pseudo-rate-limiting from "when the server consumer gets behind by message count" to "when the server consumer gets behind by message delivery timestamp".
Q: Why do messages get redelivered on each run, and how do I get rid of them?
A: Use acknowledgements or noack (but probably acknowledgements). Getting a message in "receive" just pulls it into your consumer, but doesn't pop it from the queue. It's like a database transaction: to finally pop it you have to acknowledge it after you receive it. Altnernatively, you could start your consumer in "noack" mode, which will cause the receive behavior to work the way you assumed it would. However, be warned, noack mode imposes a big tradeoff: since RabbitMQ is delivering messages to your consumer out-of-band (basically: even if your server is locked up or sleeping, if it has issued a consume, rabbit is pushing messages to it), if you consume in noack mode those messages are permanently removed from RabbitMQ when it pushes them to the server, so if the server crashes or shuts down before draining its "local queue" with any messages pending-receive, those messages will be lost forever. Be careful with this if it's important that you don't lose messages.

Read all messages from the very begining

Consider a group chat scenario where 4 clients connect to a topic on an exchange. These clients each send an receive messages to the topic and as a result, they all send/receive messages from this topic.
Now imagine that a 5th client comes in and wants to read everything that was send from the beginning of time (as in, since the topic was first created and connected to).
Is there a built-in functionality in RabbitMQ to support this?
Many thanks,
Edit:
For clarification, what I'm really asking is whether or not RabbitMQ supports SOW since I was unable to find it on the documentations anywhere (http://devnull.crankuptheamps.com/documentation/html/develop/configuration/html/chapters/sow.html).
Specifically, the question is: is there a way for RabbitMQ to output all messages having been sent to a topic upon a new subscriber joining?
The short answer is no.
The long answer is maybe. If all potential "participants" are known up-front, the participant queues can be set up and configured in advance, subscribed to the topic, and will collect all messages published to the topic (matching the routing key) while the server is running. Additional server configurations can yield queues that persist across server reboots.
Note that the original question/feature request as-described is inconsistent with RabbitMQ's architecture. RabbitMQ is supposed to be a transient storage node, where clients connect and disconnect at random. Messages dumped into queues are intended to be processed by only one message consumer, and once processed, the message broker's job is to forget about the message.
One other way of implementing such a functionality is to have an audit queue, where all published messages are distributed to the queue, and a writer service writes them all to an audit log somewhere (usually in a persistent data store or text file). This would be something you would have to build, as there is currently no plug-in to automatically send messages out to a persistent storage (e.g. Couchbase, Elasticsearch).
Alternatively, if used as a debug tool, there is the Firehose plug-in. This is satisfactory when you are able to manually enable/disable it, but is not a good long-term solution as it will turn itself off upon any interruption of the broker.
What you would like to do is not a correct usage for RabbitMQ. Message Queues are not databases. They are not long term persistence solutions, like a RDBMS is. You can mainly use RabbitMQ as a buffer for processing incoming messages, which after the consumer handles it, get inserted into the database. When a new client connects to you service, the database will be read, not the message queue.
Relevant
Also, unless you are building a really big, highly scalable system, I doubt you actually need RabbitMQ.
Apache Kafka is the right solution for this use-case. "Log Compaction enabled topics" a.k.a. compacted topics are specifically designed for this usecase. But the catch is, obviously your messages have to be idempotent, strictly no delta-business. Because kafka will compact from time to time and may retain only the last message of a "key".

RabbitMQ Pub/Sub setup with large number of disconnected clients...

This is a new area for me so hopefully my question makes sense.
In my program I have a large number of clients which are windows services running on laptops - that are often disconnected. Occasionally they come on line and I want them to receive updates based on user profiles. There are many types of notifications that require the client to perform some work on the local application (i.e. the laptop).
I realize that I could do this with a series of restful database queries, but since there are so many clients (upwards to 10,000) and there are lots of different notification types, I was curious if perhaps this was not a problem better suited for a messaging product like RabbitMQ or even 0MQ.
But how would one set this up. (let's assume in RabbitMQ?
Would each user be assigned their own queue?
Or is it preferable to have each queue be a distinct notification type and you would use some combination of direct exchanges or filtering messages based on a routing key, where the routing key could be a username.
Since each user may potentially have a different set of notifications based on their user profile, I am thinking that each client/consumer would have a specific message for each notification sitting on a queue waiting for them to come online and process it.
Is this the right way of thinking about the problem? Thanks in advance.
It will be easier for you to balance a lot of queues than filter long ones, so it's better to use queue per consumer.
Messages can have arbitrary headers and body so it is the right place for notification types.
Since you will be using long-living queues, waiting for consumers on disk - you better use lazy queues https://www.rabbitmq.com/lazy-queues.html (it's available since version 3.6.0)

What is an MQ and why do I want to use it?

On my team at work, we use the IBM MQ technology a lot for cross-application communication. I've seen lately on Hacker News and other places about other MQ technologies like RabbitMQ. I have a basic understanding of what it is (a commonly checked area to put and get messages), but what I want to know what exactly is it good at? How will I know where I want to use it and when? Why not just stick with more rudimentary forms of interprocess messaging?
All the explanations so far are accurate and to the point - but might be missing something: one of the main benefits of message queueing: resilience.
Imagine this: you need to communicate with two or three other systems. A common approach these days will be web services which is fine if you need an answers right away.
However: web services can be down and not available - what do you do then? Putting your message into a message queue (which has a component on your machine/server, too) typically will work in this scenario - your message just doesn't get delivered and thus processed right now - but it will later on, when the other side of the service comes back online.
So in many cases, using message queues to connect disparate systems is a more reliable, more robust way of sending messages back and forth. It doesn't work well for everything (if you want to know the current stock price for MSFT, putting that request into a queue might not be the best of ideas) - but in lots of cases, like putting an order into your supplier's message queue, it works really well and can help ease some of the reliability issues with other technologies.
MQ stands for messaging queue.
It's an abstraction layer that allows multiple processes (likely on different machines) to communicate via various models (e.g., point-to-point, publish subscribe, etc.). Depending on the implementation, it can be configured for things like guaranteed reliability, error reporting, security, discovery, performance, etc.
You can do all this manually with sockets, but it's very difficult.
For example: Suppose you want to processes to communicate, but one of them can die in the middle and later get reconnected. How would you ensure that interim messages were not lost? MQ solutions can do that for you.
Message queueuing systems are supposed to give you several bonuses. Among most important ones are monitoring and transactional behavior.
Transactional design is important if you want to be immune to failures, such as power failure. Imagine that you want to notify a bank system of ATM money withdrawal, and it has to be done exactly once per request, no matter what servers failed temporarily in the middle. MQ systems would allow you to coordinate transactions across multiple database, MQ and other systems.
Needless to say, such systems are very slow compared to named pipes, TCP or other non-transactional tools. If high performance is required, you would not allow your messages to be written thru disk. Instead, it will complicate your design - to achieve exotic reliable AND fast communication, which pushes the designer into really non-trivial tricks.
MQ systems normally allow users to watch the queue contents, write plugins, clear queus, etc.
MQ simply stands for Message Queue.
You would use one when you need to reliably send a inter-process/cross-platform/cross-application message that isn't time dependent.
The Message Queue receives the message, places it in the proper queue, and waits for the application to retrieve the message when ready.
reference: web services can be down and not available - what do you do then?
As an extension to that; what if your local network and your local pc is down as well?? While you wait for the system to recover the dependent deployed systems elsewhere waiting for that data needs to see an alternative data stream.
Otherwise, that might not be good enough 'real time' response for today's and very soon in the future Internet of Things (IOT) requirements.
if you want true parallel, non volatile storage of various FIFO streams(at least at some point along the signal chain) use an FPGA and FRAM memory. FRAM runs at clock speed and FPGA devices can be reprogrammed on the fly adding and taking away however many independent parallel data streams are needed(within established constraints of course).

MSMQ, WCF, and Flaky Servers

I have two applications, let us call them A and B. Currently A uses WCF to send messages to B. A doesn't need a response and B never sends messages back to A.
Unfortunately, there is a flaky network connection between the servers A and B are running on. This results in A getting timeout errors from time to time.
I would like to use WCF+MSMQ as a buffer between the two applications. That way if B goes down temporarily, or is otherwise inaccessible, the messages are not lost.
From an architectural standpoint, how should I configure this?
I think you might have inflated your question a bit with the inclusion of the word "architectural".
If you truly need an architectural overview of this issue from that high of a level, including SLA concerns, your SL will be as good as your MSMQ deployment, so if you are concerned about SL, just look at the documentation on the internet about MSMQ and SLA.
If you are looking more for the actual implementation from a code standpoint, this article is excellent:
http://code.msdn.microsoft.com/msmqpluswcf
It goes over a lot of the things you'll need to know, including how to setup MSMQ and how to implement chunking to get around MSMQ's 4MB limit (if this is necessary... I hope it's not).
Here's a good article about creating a durable and transactional queue that will cross machines using an MSMQ cluster: http://www.devx.com/enterprise/Article/39015/1954