Shovels
consumes messages from the queue,
re-publishes each message to the destination broker (using, by default, the original exchange name and routing_key when
applicable).
I could not find any documentation what's the expected behavior for message TTLs when shovels are involved:
Does the time used for calculating TTL start when message is received
at the source broker or at the destination broker? Or is it just valid for the first publish, that is at the source broker?
What happens if the expiration time elapses before the message reaches the destination broker?
So, I think you answered the question in the documentation you pasted in. All shovel does is move messages from one queue to another, re-publishing them in the process. It's going to preserve all original message properties, which theoretically includes the TTL property.
That being said, I don't believe this is something you need to worry about.
Message TTL starts when the queue receives the message. When the message is re-published, the clock resets on the new queue.
Messages being transported by shovel will ideally spend no more than a few milliseconds in the initial queue, if they even end up there at all (a message queue with a consumer attached doesn't actually enqueue any messages under most conditions). So, the time spent in the first queue should be so small that it doesn't matter.
Message lifetime should have a fair amount of tolerance for network transport, etc., so the activities of shovel are on par with the normal noise.
If you find yourself in the situation where a large number of messages are accumulating in the queue before they can be shovel'd, then you might need to handle expiration in your application. There are other benefits and caveats to doing this, but you get a little finer-grained control overall.
Related
I'm not sure how to resiliently handle RabbitMQ messages in the event of an intermittent outage.
I subscribe in a windows service, read the message, then store it my database. If I can't process the record because of the data I publish it to a dead letter queue for a human to address and reprocess.
I am not sure what to do if I have some intermittent technical issue that will fix itself (database reboot, network outage, drive space, etc). I don't want hundreds of messages showing up on dead letter that just needed to wait for a for a glitch but now would be waiting on a human.
Currently, I re-queue the event and retry it once, but it retries so fast the issue is not usually resolved. I thought of retrying forever but I don't want a real issue to get stuck in an infinite loop.
Is a broad topic but from the server side you could persist your messages and make your queues durable, this means that in the eventuality the server gets restarted they won't be lost, check more here How to persist messages during RabbitMQ broker restart?
For the consumer (client) it will depend on how you configure your client, from the docs:
In the event of network failure (or a node crashing), messages can be duplicated, and consumers must be prepared to handle them. If possible, the simplest way to handle this is to ensure that your consumers handle messages in an idempotent way rather than explicitly deal with deduplication.
If a message is delivered to a consumer and then requeued (because it was not acknowledged before the consumer connection dropped, for example) then RabbitMQ will set the redelivered flag on it when it is delivered again (whether to the same consumer or a different one). This is a hint that a consumer may have seen this message before (although that's not guaranteed, the message may have made it out of the broker but not into a consumer before the connection dropped). Conversely if the redelivered flag is not set then it is guaranteed that the message has not been seen before. Therefore if a consumer finds it more expensive to deduplicate messages or process them in an idempotent manner, it can do this only for messages with the redelivered flag set.
Check more here: https://www.rabbitmq.com/reliability.html#consumer
I`ve been reading about the principles of AMQP messaging confirms. (https://www.rabbitmq.com/confirms.html). Really helpful and wel written article but one particular thing about consumer aknowledgments is really confusing, here is the quote:
Another things that's important to consider when using automatic acknowledgement mode is that of consumer overload.
Consumer overload? Message queue is processed and kept in RAM by broker (if I understand it correctly). What overload is it about? Does consumer have some kind of second queue?
Another part of that article is even more confusing:
Consumers therefore can be overwhelmed by the rate of deliveries, potentially accumulating a backlog in memory and running out of heap or getting their process terminated by the OS.
What backlog? How is this all works together? What part of job is done by consumer (besides consuming message and processing it of course)? I thought that broker is keeping queues alive and forwards the messages but now I am reading about some mysterious backlogs and consumer overloads. This is really confusing, can someone explain it a bit or at least point me to the good source?
I believe the documentation you're referring to deals with what, in my opinion, is sort of a design flaw in either AMQP 0-9-1 or RabbitMQ's implementation of it.
Consider the following scenario:
A queue has thousands of messages sitting in it
A single consumer subscribes to the queue with AutoAck=true and no pre-fetch count set
What is going to happen?
RabbitMQ's implementation is to deliver an arbitrary number of messages to a client who has not pre-fetch count. Further, with Auto-Ack, prefetch count is irrelevant, because messages are acknowledged upon delivery to the consumer.
In-memory buffers:
The default client API implementations of the consumer have an in-memory buffer (in .NET it is some type of blocking collection (if I remember correctly). So, before the message is processed, but after the message is received from the broker, it goes into this in-memory holding area. Now, the design flaw is this holding area. A consumer has no choice but to accept the message coming from the broker, as it is published to the client asynchronously. This is a flaw with the AMQP protocol specification (see page 53).
Thus, every message in the queue at that point will be delivered to the consumer immediately and the consumer will be inundated with messages. Assuming each message is small, but takes 5 minutes to process, it is entirely possible that this one consumer will be able to drain the entire queue before any other consumers can attach to it. And since AutoAck is turned on, the broker will forget about these messages immediately after delivery.
Obviously this is not a good scenario if you'd like to get those messages processed, because they've left the relative safety of the broker and are now sitting in RAM at the consuming endpoint. Let's say an exception is encountered that crashes the consuming endpoint - poof, all the messages are gone.
How to work around this?
You must turn Auto-Ack off, and generally it is also a good idea to set reasonable pre-fetch count (usually 2-3 is sufficient).
Being able to signal back pressure a basic problem in distributed systems. Without explicit acknowledgements, the consumer does not have any way to say "Slow down" to broker. With auto-ack on, as soon as the TCP acknowledgement is received by broker, it deletes the message from its memory/disk.
However, it does not mean that the consuming application has processed the message or ave enough memory to store incoming messages. The backlog in the article is simply a data structure used to store unprocessed messages (in the consumer application)
I'm positive I'm missing a nuance of MassTranist and/or RabbitMQ, but how long do durable (permanent?) messages stay on queues?
The situation I'm thinking of is one in which all consumers of a certain type of event are unavailable - obviously when they come back up, you want them to be able to take the appropriate actions based on the events they "missed" while they were offline.
However, what about the case when a new consumer starts reading off of the same queue after days/months/years? Is that consumer now going to be pulling in all events since the beginning of time? I'm almost certain that's not the case, but how is durability balanced with timeliness?
As I know MassTransit doesn't control message lifetime. RabbitMQ doing the same, thus message will stay in queue forever. The only exception from this is request/response model in which you can set up timeout period in which you want accept response.
In common way if you need to control lifetime you can store creating time in the message and check it in consumers.
I have a RabbitMQ setup where a (java) producer sends messages to a fanout exchange, which are handled by a consumer. It's no problem if messages get lost when the consumer dies, so for performance I set autoAck=true at the consumer side.
Now I'm investigating a situation in which the rate the consumer can handle messages, is lower than the rate at which they are sent.
After a while, a (huge) backlog of messages must queue up somewhere. Is there a way to get visibility on this backlog?
Using the rabbitmqmanagement interface does not work: the queue appears empty
Ready: 0
Unacknowledged: 0
Total: 0
I assume the queue is empty because the messages are (unlimitedly) prefetched by the rabbitmqclient used by the consumer. But limiting the prefetch by e.g.
channel.basicQos(10)
does not help either, probably because this only limits unacknowledged messages, and with autoAck=true, messages are ack'ed from the moment they are prefetched by the client.
Setting autoAck=false (and explicit ack'ing on delivery) is a solution (the Unacknowledged counter keeps on rising), but I was wondering whether this is the only way?
Preferably I'd like to limit the amount of cached messages at the client side irrespective of acknowledgements, such that the backlog eventually becomes visible through the rabbitmqmanagement interface.
Alternatively, is there a way to query the number of messages sitting somewhere in the client's prefetch queue waiting to be delivered?
I suggest using a combination of basicQos and autoAck=false. This will make everything show up in the queues both through the admin website and the REST APIs. Having an unlimited number of messages sent to each consumer seems to defeat the point of a queue.
If your queues are time sensitive you can also add a TTL on the queues so that messages are automatically Nacked after (as an example) 60 minutes.
We are building a solution in which we are publishing message to a time-out queue. After TTL expiry messages are pushed to main queue for re-processing.
We are setting up counter value so that messages will be tried for x no. of times for the redelivery.
Solution is working fine. But the scenario is when the message on the head position is highest TTL is not expired, other messages of lower expiry will not be re-published (to main queue).
Is this understanding correct ? If Yes what is the solution so that each message re-processed just after TTL.
Appreciating answers / viewpoint.
Thanks.
If you use per-queue message TTL, then message expires and get removed from queue from head to tail (in the same order they was published).
When you use per-message TTL, then messages removed from queue only when they reach queue head, so situation when expired messages still reside in the middle of queue is normal. Such messages will not be send to consumer, and will be deadlettered (or dropped), but due to strict FIFO nature or RabbitMQ's queues that will happen as written above, when they reach queue head and delay before removal may be greater than actual message TTL. For example, if there are two message, first with TTL=10sec and the second one with TTL=1sec, second message will be deadlettered also in 10sec while it stay after first one.
To deal with messages that has different TTL, common workaround is to declare few queues, each for messages with same TTL or almost same, say, with precision 10sec. Actual precision may vary while it very application-specific and somehow empirical value.
If you will pick separate per-TTL queues, use per-queue TTL rather than per-message TTL for ease of messages workflow and to prevent disambiguation of understanding what happens with messages. Developers after you will thank you for that.
To re-process messages after their TTL use Dead Letter Exchanges, but beware of cycled messages problem: if RabbitMQ broker detects that your messages workflow cycled (messages get published to same exchange with the same routing key after it was deadlettered from it), it will silently drop message.
the queue ttl is simple enough and working fine.
but set per message ttl is not working expectly: each message publish to online consumer just after ttl.
why rabbitmq provide this feature? for which biz scenario?