Persistent message queue with at-least-once delivery - rabbitmq

I have been looking at message queues (currently between Kafka and RabbitMQ) for one of my projects where these are biggest must have features.
Must have features
Messages in queues should be persistent. (only until they are processed successfully by consumers.)
Messages in queues should be removed only when downstream consumers were able to process the message successfully. Basically, a consumer should ACK. that it processed a message successfully.
Good to have features
To increase throughput, consumers should be able to pull batch of messages from queue.

If you are going with Kafka it will only retains message for a configurable duration of time after which the messages will be discarded to free up spaces no matter consumed or not.
And it is simply the responsibilities of the Kafka consumers to keep a track of what has been consumed.
IMHO if you require to keep the messages persisted for ever than consider using a different storage medium (database may be).

Related

Distribute messages from RabbitMQ to consumers running on Heroku dynos as a 'round robin'

I have a RabbitMQ setup in which jobs are sent to an exchange, which passes them to a queue. A consumer carries out the jobs from the queue correctly in turn. However, these jobs are long processes (several minutes at least). For scalability, I need to be able to have multiple consumers picking a job from the top of the queue and executing it.
The consumer is running on a Heroku dyno called 'queue'. When I scale the dyno, it appears to create additional consumers for each dyno (I can see these on the RabbitMQ dashboard). However, the number of tasks in the queue is unchanged - the extra consumers appear to be doing nothing. Please see the picture below to understand my setup.
Am I missing something here?
Why are the consumers showing as 'idle'? I know from my logs that at least one consumer is actively working through a task.
How can my consumer utilisation be 0% when at least one consumer is definitely working hard.
How can I make the other three consumers actually pull some jobs from the queue?
Thanks
EDIT: I've discovered that the round robin dispatching is actually working, but only if the additional consumers are already running when the messages are sent to the queue. This seems like counterintuitive behaviour to me. If I saw a large queue and wanted to add more consumers, the added consumers would do nothing until more items are added to the queue.
To pick out the key point from the other answer, the likely culprit here is pre-fetching, as described under "Consumer Acknowledgements and Publisher Confirms".
Rather than delivering one message at a time and waiting for it to be acknowledged, the server will send batches to the consumer. If the consumer acknowledges some but then crashes, the remaining messages will be sent to a different consumer; but if the consumer is still running, the unacknowledged messages won't be sent to any new consumer.
This explains the behaviour you're seeing:
You create the queue, and deliver some messages to it, with no consumer running.
You run a single consumer, and it pre-fetches all the messages on the queue.
You run a second consumer; although the queue isn't empty, all the messages are marked as sent to the first consumer, awaiting acknowledgement; so the second consumer sits idle.
A new message arrives in the queue; it is distributed in round-robin fashion to the second consumer.
The solution is to specify the basic.qos option in the consumer. If you set this to 1, RabbitMQ won't send a message to a consumer until it has acknowledged the previous message; multiple consumers with that setting will receive messages in strictly round-robin fashion.
I am not familiar to Heroku, so I don't know how Heroku worker build rabbitMQ consumer, I just have a quick view over Heroku document.
Why are the consumers showing as 'idle'?
I think your mean the queue is 'idle'? Because the queue's state is about the queue's traffic, it just means there is not on-doing job for the queue's job thread. And it will become 'running' when a message is published in the queue.
How can my consumer utilisation be 0% when at least one consumer is definitely working hard.
The same as queue state, from official explanation, consumer utilisation too low means:
There were more consumers
The consumers were faster
The consumers had a higher prefetch count
In your situation, prefetch_count = 0 means no limits on prefetch, so it's too large. And Messages.total = Messages.unacked = 78 means your consumer is too slow, there are two many messages have been processed by consumer.
So if your message rate is not large enough, the state and consumer utilisation field of the queue is useless.
If I saw a large queue and wanted to add more consumers, the added consumers would do nothing until more items are added to the queue.
Because these unacked messages have already been prefetched by exist consumers, they will not be consumed by new consumers unless you requeue the unacked messages.

Resiliently processing messages from RabbitMQ

I'm not sure how to resiliently handle RabbitMQ messages in the event of an intermittent outage.
I subscribe in a windows service, read the message, then store it my database. If I can't process the record because of the data I publish it to a dead letter queue for a human to address and reprocess.
I am not sure what to do if I have some intermittent technical issue that will fix itself (database reboot, network outage, drive space, etc). I don't want hundreds of messages showing up on dead letter that just needed to wait for a for a glitch but now would be waiting on a human.
Currently, I re-queue the event and retry it once, but it retries so fast the issue is not usually resolved. I thought of retrying forever but I don't want a real issue to get stuck in an infinite loop.
Is a broad topic but from the server side you could persist your messages and make your queues durable, this means that in the eventuality the server gets restarted they won't be lost, check more here How to persist messages during RabbitMQ broker restart?
For the consumer (client) it will depend on how you configure your client, from the docs:
In the event of network failure (or a node crashing), messages can be duplicated, and consumers must be prepared to handle them. If possible, the simplest way to handle this is to ensure that your consumers handle messages in an idempotent way rather than explicitly deal with deduplication.
If a message is delivered to a consumer and then requeued (because it was not acknowledged before the consumer connection dropped, for example) then RabbitMQ will set the redelivered flag on it when it is delivered again (whether to the same consumer or a different one). This is a hint that a consumer may have seen this message before (although that's not guaranteed, the message may have made it out of the broker but not into a consumer before the connection dropped). Conversely if the redelivered flag is not set then it is guaranteed that the message has not been seen before. Therefore if a consumer finds it more expensive to deduplicate messages or process them in an idempotent manner, it can do this only for messages with the redelivered flag set.
Check more here: https://www.rabbitmq.com/reliability.html#consumer

RabbitMQ delivery throttle

So I'm testing RabbitMQ in one node. Plain and simple,
One producer sends messages to the queue,
Multiple consumers take tasks from that queue.
Currently consumers execute thousands of messages per second, they are too fast so I need them to slow down. Managing consumer-side throttling is not possible due to network unreliable nature.
Collectively consumers must not take more than 10 messages per second altogether from that queue.
Is there a way to configure RabbitMQ so as the queue dispatches a maximum of 10 messages per second?
If I remember correctly, once Rabbit MQ has delivered a message to the queue, it's up to consumers to consume a message. There are various consumers in different languages, you haven't mentioned anything specific, so I'm giving a generic answer.
In my understanding, you shouldn't try to impose any restrictions on Rabbit MQ itself, instead, consider implementing connection pool of message consumers that will be able to handle not more than X messages simultaneously on the client side. Alternatively, you can provide some kind of semaphore at the handler itself, but not on the Rabbit MQ server itself.

How can I get data from RabbitMQ? I don't want consume it from queue

Is there a tool can view data from queue? I just want know what data in queue, but I don't want consume these data. Web UI and REST API just show count number, I want details.
How can I use Mnesia query queue's data? like MySQL client.
There are a few options
Firehose
You may consider firehose feature
https://www.rabbitmq.com/firehose.html
RabbitMQ has a "firehose" feature, where the administrator can enable
(on a per-node, per-vhost basis) an exchange to which publish- and
delivery-notifications should be CCed.
rabbitmq_tracing plugin
https://www.rabbitmq.com/plugins.html
Second queue
Just setup your exchange so it will deliver messages to two queues. One queue is for actual business procesing. Second queue is for debug pourposes only. Reading messages from second queue will consume them. For that debug queue you may enable reasonable TTL and/or Queue Length Limit. Otherwise, unconsumed messages will eventually eat all disk space.
Consume and re-send
You may consume message (to see it) and immediatelyre-send same message to the same queue. RabbitMQ management GUI has this option. Note that this will change order of the messages.

ActiveMQ KahaDB Persistence Store Full

I am using ActiveMQ 5.4 with KahaDB as message store.
While Publishing Messages (with Persistence true) to a Topic, which has Durable subscriber, the persistence store is increasing even the messages are dispatched to Subscriber. So this is causing an issue as the message store is getting full and not accepting any more messages.
So my question is why the Persistence store is not discarding the messages in the KahaDB, even the messages are getting dispatched?
Regards,
Srinivas
What you are seeing is an interaction between the ActiveMQ message store behaviour and that for durable subscriptions on topics.
When you have durable subscriptions, a topic is treated like a queue for each subscriber's clientId (set on the Connection). The logic being that the client doesn't want to miss any messages when they disconnect. So if they disconnect, the durable subscription hangs around and keeps the messages alive.
The AMQ message store uses data logs for it's message journal. These are written sequentially, and never actually removed from (that would require random access). There is a second file which keeps track of which messages have been consumed. Once all the messages in a data file have been consumed, that file is deleted.
So what you're seeing is that some of the messages in the data file are not being consumed by these durable subscriptions and just hang around. ClientIds for durable subscribers not being consistently used would cause this issue. It's likely that there is something wrong with the way the feature is being used, if you use JMX to inspect the subscriptions on the broker that should help you track down the root cause.
As a general rule, whenever you think that you might want to use a durable subscription, use virtual topics instead - they are much easier to reason about, inspect and load balance. On the other hand if you just want to get the last couple of messages when you reconnect a topic subscriber rather than all the messages you may have missed, use retroactive consumers.
An easy way to get around this issue is to always use a time to live when you send a message - pretty much every use case has a time limit of when a message ought to be consumed by anyway. ActiveMQ will expire messages beyond this point, and free up the messages in the data files for deletion.