ActiveMQ prevent consumer handling specific message - activemq

We have a design challenge where the situation is as follow:
There are multiple producers and multiple consumers (on same queue).
Each message represent a task with parameters that consumer needs to handle.
The problem is that there are certain tasks that take lots of memory (and cpu power) which we know the consumer have no capacity to handle this. the good thing is that we know how much memory (and cpu power) it approximately can take in advance, so we could prevent a consumer taking that task and giving a change to other consumer with enough memory to handle.
There is the prefetch setting but i can't see how it can configure to meet this requirement
Finally I found an option to rollback a transaction, so the consumer can basically check if it has enough hardware resources to handle the task and if not rollback which retrieves the message back to queue allowing next consumer take it and so forth.
Not sure if that's the right approach or there is a better way?

The messages could have properties set which indicate whether or not they will require high CPU and/or memory and then consumers could use selectors to only receive the messages which fit their hardware constraints.

Related

RabbitMQ - Reprioritize message already in queue

We are building spark based jobs. Processing each message delivered by the queue takes time. There is a need to be able to reprioritize one already sent to the queue.
I am aware there is priority queue implementation available, but not sure how to re-prioritize the existing message in the queue?
One bad workaround is to push that message again as higher priority, so that it handled on priority. Later drop the message with same content which had low or no priority when it's turns comes next.
Is there a natural way we can handle this situation or any other queues that supports scenario better?
Unfortunately there isn't. Queues are to be considered as lists of messages in flight. It is not possible to delete/update them.
Your approach of submitting a higher priority message is the only feasible solution.
RabbitMQ is a messaging system (such as the postal one), it is not a DataBase or a storage service. The storage in form of queues is a necessary feature as much as the postal service needs storage for postcards in transit. It is optimized for the purpose and does not allow to access the messages easily.

RabbitMQ consumer overload

I`ve been reading about the principles of AMQP messaging confirms. (https://www.rabbitmq.com/confirms.html). Really helpful and wel written article but one particular thing about consumer aknowledgments is really confusing, here is the quote:
Another things that's important to consider when using automatic acknowledgement mode is that of consumer overload.
Consumer overload? Message queue is processed and kept in RAM by broker (if I understand it correctly). What overload is it about? Does consumer have some kind of second queue?
Another part of that article is even more confusing:
Consumers therefore can be overwhelmed by the rate of deliveries, potentially accumulating a backlog in memory and running out of heap or getting their process terminated by the OS.
What backlog? How is this all works together? What part of job is done by consumer (besides consuming message and processing it of course)? I thought that broker is keeping queues alive and forwards the messages but now I am reading about some mysterious backlogs and consumer overloads. This is really confusing, can someone explain it a bit or at least point me to the good source?
I believe the documentation you're referring to deals with what, in my opinion, is sort of a design flaw in either AMQP 0-9-1 or RabbitMQ's implementation of it.
Consider the following scenario:
A queue has thousands of messages sitting in it
A single consumer subscribes to the queue with AutoAck=true and no pre-fetch count set
What is going to happen?
RabbitMQ's implementation is to deliver an arbitrary number of messages to a client who has not pre-fetch count. Further, with Auto-Ack, prefetch count is irrelevant, because messages are acknowledged upon delivery to the consumer.
In-memory buffers:
The default client API implementations of the consumer have an in-memory buffer (in .NET it is some type of blocking collection (if I remember correctly). So, before the message is processed, but after the message is received from the broker, it goes into this in-memory holding area. Now, the design flaw is this holding area. A consumer has no choice but to accept the message coming from the broker, as it is published to the client asynchronously. This is a flaw with the AMQP protocol specification (see page 53).
Thus, every message in the queue at that point will be delivered to the consumer immediately and the consumer will be inundated with messages. Assuming each message is small, but takes 5 minutes to process, it is entirely possible that this one consumer will be able to drain the entire queue before any other consumers can attach to it. And since AutoAck is turned on, the broker will forget about these messages immediately after delivery.
Obviously this is not a good scenario if you'd like to get those messages processed, because they've left the relative safety of the broker and are now sitting in RAM at the consuming endpoint. Let's say an exception is encountered that crashes the consuming endpoint - poof, all the messages are gone.
How to work around this?
You must turn Auto-Ack off, and generally it is also a good idea to set reasonable pre-fetch count (usually 2-3 is sufficient).
Being able to signal back pressure a basic problem in distributed systems. Without explicit acknowledgements, the consumer does not have any way to say "Slow down" to broker. With auto-ack on, as soon as the TCP acknowledgement is received by broker, it deletes the message from its memory/disk.
However, it does not mean that the consuming application has processed the message or ave enough memory to store incoming messages. The backlog in the article is simply a data structure used to store unprocessed messages (in the consumer application)

RabbitMQ: throttling fast producer against large queues with slow consumer

We're currently using RabbitMQ, where a continuously super-fast producer is paired with a consumer limited by a limited resource (e.g. slow-ish MySQL inserts).
We don't like declaring a queue with x-max-length, since all messages will be dropped or dead-lettered once the limit is reached, and we don't want to loose messages.
Adding more consumers is easy, but they'll all be limited by the one shared resource, so that won't work. The problem still remains: How to slow down the producer?
Sure, we could put a flow control flag in Redis, memcached, MySQL or something else that the producer reads as pointed out in an answer to a similar question, or perhaps better, the producer could periodically test for queue length and throttle itself, but these seem like hacks to me.
I'm mostly questioning whether I have a fundamental misunderstanding. I had expected this to be a common scenario, and so I'm wondering:
What is best practice for throttling producers? How is this done with RabbitMQ? Or do you do this in a completely different way?
Background
Assume the producer actually knows how to slow himself down with the right input. E.g. a hardware sensor or hardware random number generator, that can generate as many events as needed.
In our particular real case, we have an API that users can use to add messages. Instead of devouring and discarding messages, we'd like to apply back-pressure by having our API return an error if the queue is "full", so the caller/user knows to back-off, or have the API block until the consumer catches up. We don't control our user, so regardless of how fast the consumer is, I can create a producer that is faster.
I was hoping for something like the API for a TCP socket, where a write() can block and where a select() can be used to determine if a handle is writable. So either having the RabbitMQ API block or have it return an error if the queue is full.
For the x-max-length property, you said you don't want messages to be dropped or dead-lettered. I see there was an update in adding some more capabilities for this. As I see it is specified in the documentation:
"Use the overflow setting to configure queue overflow behaviour. If overflow is set to reject-publish, the most recently published messages will be discarded. In addition, if publisher confirms are enabled, the publisher will be informed of the reject via a basic.nack message"
So as I understand it, you can use queue limit to reject the new messages from publishers thus pushing some backpressure to the upstream.
I don't think that this is in any way rabbitmq specific. Basically you have a scenario, where there are two systems of different processing capabilities, and this mismatch will either pose a risk of overflowing the queue (whatever it would be), or even in case of a constant mismatch between producer and consumer, simply create more and more time-distance between event creation and its handling.
I used to deal with this kind of scenarios, and unfortunately there is no magic bullet. You either have to speed up even handling (better hardware, more suited software?) or throttle the event creation (which has nothing to do with MQ really).
Now, I would ask you what's the goal and how the events are produced. Are the events are produced constantly, with either unlimitted or just very high rate (for example readings from sensors - the more, the better), or are they created in batches/spikes (for example: user requests in specific time periods, batch loads from CRM system). I assume that the goal is to process everything cause you mention you don't want to loose any queued message.
If the output is constant, then some limiter (either internal counter, if the producer is the only producer, or external queue length checks if queue can be filled with some other system) is definitely in place.
IF eventsInTimePeriod/timePeriod > estimatedConsumerBandwidth
THEN LowerRate()
ELSE RiseRate()
In real world scenarios we used to simply limit the output manually to the estimated values and there were some alerts set for queue length, time from queue entry to queue leaving etc. Where such limiters were omitted (by mistake mostly) we used to find later some tasks that were supposed to be handled in few hours, that were waiting for three months for their turn.
I'm afraid it's hard to answer to "How to slow down the producer?" if we know nothing about it, but some ideas are: aforementioned rate check or maybe a blocking AddMessage method:
AddMessage(message)
WHILE(getQueueLength() > maxAllowedQueueLength)
spin(1000); // or sleep or whatever
mqAdapter.AddMessage(message)
I'd say it all depends on specific of the producer application and in general your architecture.

processing messages

I use a queue per message type. I have tended to create a windows service per queue to process those messages. Is this the best use of resources? I suspect not. How do you decide how many processes should service a queue(s)?
One thing to consider here is service levels. Does all of the data represented by the message types require identical processing service levels? Are some messages more important than others? Do some messages have latency requirement for delivery? Are some messages critical to the business whereas others not? Are the expected volumes of all message types different?
Currently the way you have things set up means that you can manage each of your message type channels as a separate concern, which allows you maximum flexibility to support all possible service level scenarios. However this comes as a cost of higher resource cost/more moving parts.
I would say that unless resource usage is a concern, then your set up is the best possible as you decouple your data processing channels from one another very effectively in this way.

Maximize throughput with RabbitMQ

In our project, we want to use the RabbitMQ in "Task Queues" pattern to pass data.
On the producer side, we build a few TCP server(in node.js) to recv
high concurrent data and send it to MQ without doing anything.
On the consumer side, we use JAVA client to get the task data from
MQ, handle it and then ack.
So the question is:
To get the maximum message passing throughput/performance( For example, 400,000 msg/second) , How many queues is best? Does that more queue means better throughput/performance? And is there anything else should I notice?
Any known best practices guide for using RabbitMQ in such scenario?
Any comments are highly appreciated!!
For best performance in RabbitMQ, follow the advice of its creators. From the RabbitMQ blog:
RabbitMQ's queues are fastest when they're empty. When a queue is
empty, and it has consumers ready to receive messages, then as soon as
a message is received by the queue, it goes straight out to the
consumer. In the case of a persistent message in a durable queue, yes,
it will also go to disk, but that's done in an asynchronous manner and
is buffered heavily. The main point is that very little book-keeping
needs to be done, very few data structures are modified, and very
little additional memory needs allocating.
If you really want to dig deep into the performance of RabbitMQ queues, this other blog entry of theirs goes into the data much further.
According to a response I once got from the rabbitmq-discuss mailing group there are other things that you can try to increase throughput and reduce latency:
Use a larger prefetch count. Small values hurt performance.
A topic exchange is slower than a direct or a fanout exchange.
Make sure queues stay short. Longer queues impose more processing
overhead.
If you care about latency and message rates then use smaller messages.
Use an efficient format (e.g. avoid XML) or compress the payload.
Experiment with HiPE, which helps performance.
Avoid transactions and persistence. Also avoid publishing in immediate
or mandatory mode. Avoid HA. Clustering can also impact performance.
You will achieve better throughput on a multi-core system if you have
multiple queues and consumers.
Use at least v2.8.1, which introduces flow control. Make sure the
memory and disk space alarms never trigger.
Virtualisation can impose a small performance penalty.
Tune your OS and network stack. Make sure you provide more than enough
RAM. Provide fast cores and RAM.
You will increase the throughput with a larger prefetch count AND at the same time ACK multiple messages (instead of sending ACK for each message) from your consumer.
But, of course, ACK with multiple flag on (http://www.rabbitmq.com/amqp-0-9-1-reference.html#basic.ack) requires extra logic on your consumer application (http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/2013-August/029600.html). You will have to keep a list of delivery-tags of the messages delivered from the broker, their status (whether your application has handled them or not) and ACK every N-th delivery-tag (NDTAG) when all of the messages with delivery-tag less than or equal to NDTAG have been handled.