Does the feature of a rabbitMq queue with a time-to-live work with lazy queues?
From what I can see, a lazy queue is no different than a standard queue when it comes to ttl.
Reading this in the documentation makes me think it won't work.
Lazy Queues - queues that move their contents to disk as early as practically possible, and only load them in RAM when requested by consumers
https://www.rabbitmq.com/lazy-queues.html
https://www.rabbitmq.com/ttl.html
Related
Under the hood, how is a FIFO queue turned into a priority queue in a distributed fashion? Are they actually swapping the underlying datastructure, or is it a "hacked" fix
The underlying data structures are multiple queues, each assigned a priority. Each queue is an Erlang VM process. This is why having more than 10 or so priorities isn't recommended as performance suffers. If your load is light enough, this may be acceptable.
NOTE: the RabbitMQ team monitors the rabbitmq-users mailing list and only sometimes answers questions on StackOverflow.
When a message is published with a priority header, the message with the higher priority value gets placed on the head of the queue. This is done by actually swapping the messages in the queue. This is all done when the message is waiting to be consumed in the queue. In order to allow RabbitMQ to actually prioritise the messages, set the basic.qos of the consumer as low as possible. So if a consumer connects to an empty queue whose basic.qos is not set and to which messages are subsequently published, the messages may not spend any time at all waiting in the queue. In this case, the priority queue will not get any opportunity to prioritise them.
Reference: https://www.rabbitmq.com/priority.html
I want to know how does RabbitMQ store the messages physically in its RAM and Disk?
I know that RabbitMQ tries to keep the messages in memory (But I don't know how the messages are put in the Ram). But the messages can be spilled into disk when the messages are with persistent mode or when the broker has the memory pressure. (But I don't know how the messages are stored in Disk.)
I'd like to know the internals about these. Unfortunately, the official documentation in its homepage do not expose the internal details.
Which document should I read for this?
RabbitMQ uses a custom DB to store the messages, the db is usually located here:
/var/lib/rabbitmq/mnesia/rabbit#hostname/queues
Starting form the version 3.5.5 RabbitMQ introduced the new New Credit Flow
https://www.rabbitmq.com/blog/2015/10/06/new-credit-flow-settings-on-rabbitmq-3-5-5/
Let’s take a look at how RabbitMQ queues store messages. When a
message enters the queue, the queue needs to determine if the message
should be persisted or not. If the message has to be persisted, then
RabbitMQ will do so right away[3]. Now even if a message was persisted
to disk, this doesn’t mean the message got removed from RAM, since
RabbitMQ keeps a cache of messages in RAM for fast access when
delivering messages to consumers. Whenever we are talking about paging
messages out to disk, we are talking about what RabbitMQ does when it
has to send messages from this cache to the file system.
This post blog is enough detailed.
I also suggest to read about lazy queue:
https://www.rabbitmq.com/lazy-queues.html
and
https://www.rabbitmq.com/blog/2015/12/28/whats-new-in-rabbitmq-3-6-0/
Lazy Queues This new type of queues work by sending every message that
is delivered to them straight to the file system, and only loading
messages in RAM when consumers arrive to the queues. To optimize disk
reads messages are loaded in batches.
In one of our applications the back pressure did not work and there was a huge pileup in a queue on RabbitMQ. This caused the RMQ node to choke.
Is there a way to apply flow control (manually) on that queue in such cases? That would have slowed down the producer and given us headroom.
In your case the consumers are not fast enough to handle the messages.
Basically you had a load-spike.
So, it does not mean that you need to stop the publishers.
You could:
Increase the number of the consumers
Use the Lazy queues
you didn't see the flow control because RabbitMQ could handle the messages.
I have been looking at message queues (currently between Kafka and RabbitMQ) for one of my projects where these are biggest must have features.
Must have features
Messages in queues should be persistent. (only until they are processed successfully by consumers.)
Messages in queues should be removed only when downstream consumers were able to process the message successfully. Basically, a consumer should ACK. that it processed a message successfully.
Good to have features
To increase throughput, consumers should be able to pull batch of messages from queue.
If you are going with Kafka it will only retains message for a configurable duration of time after which the messages will be discarded to free up spaces no matter consumed or not.
And it is simply the responsibilities of the Kafka consumers to keep a track of what has been consumed.
IMHO if you require to keep the messages persisted for ever than consider using a different storage medium (database may be).
In our project, we want to use the RabbitMQ in "Task Queues" pattern to pass data.
On the producer side, we build a few TCP server(in node.js) to recv
high concurrent data and send it to MQ without doing anything.
On the consumer side, we use JAVA client to get the task data from
MQ, handle it and then ack.
So the question is:
To get the maximum message passing throughput/performance( For example, 400,000 msg/second) , How many queues is best? Does that more queue means better throughput/performance? And is there anything else should I notice?
Any known best practices guide for using RabbitMQ in such scenario?
Any comments are highly appreciated!!
For best performance in RabbitMQ, follow the advice of its creators. From the RabbitMQ blog:
RabbitMQ's queues are fastest when they're empty. When a queue is
empty, and it has consumers ready to receive messages, then as soon as
a message is received by the queue, it goes straight out to the
consumer. In the case of a persistent message in a durable queue, yes,
it will also go to disk, but that's done in an asynchronous manner and
is buffered heavily. The main point is that very little book-keeping
needs to be done, very few data structures are modified, and very
little additional memory needs allocating.
If you really want to dig deep into the performance of RabbitMQ queues, this other blog entry of theirs goes into the data much further.
According to a response I once got from the rabbitmq-discuss mailing group there are other things that you can try to increase throughput and reduce latency:
Use a larger prefetch count. Small values hurt performance.
A topic exchange is slower than a direct or a fanout exchange.
Make sure queues stay short. Longer queues impose more processing
overhead.
If you care about latency and message rates then use smaller messages.
Use an efficient format (e.g. avoid XML) or compress the payload.
Experiment with HiPE, which helps performance.
Avoid transactions and persistence. Also avoid publishing in immediate
or mandatory mode. Avoid HA. Clustering can also impact performance.
You will achieve better throughput on a multi-core system if you have
multiple queues and consumers.
Use at least v2.8.1, which introduces flow control. Make sure the
memory and disk space alarms never trigger.
Virtualisation can impose a small performance penalty.
Tune your OS and network stack. Make sure you provide more than enough
RAM. Provide fast cores and RAM.
You will increase the throughput with a larger prefetch count AND at the same time ACK multiple messages (instead of sending ACK for each message) from your consumer.
But, of course, ACK with multiple flag on (http://www.rabbitmq.com/amqp-0-9-1-reference.html#basic.ack) requires extra logic on your consumer application (http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/2013-August/029600.html). You will have to keep a list of delivery-tags of the messages delivered from the broker, their status (whether your application has handled them or not) and ACK every N-th delivery-tag (NDTAG) when all of the messages with delivery-tag less than or equal to NDTAG have been handled.