ActiveMQ KahaDB Persistence Store Full - activemq

I am using ActiveMQ 5.4 with KahaDB as message store.
While Publishing Messages (with Persistence true) to a Topic, which has Durable subscriber, the persistence store is increasing even the messages are dispatched to Subscriber. So this is causing an issue as the message store is getting full and not accepting any more messages.
So my question is why the Persistence store is not discarding the messages in the KahaDB, even the messages are getting dispatched?
Regards,
Srinivas

What you are seeing is an interaction between the ActiveMQ message store behaviour and that for durable subscriptions on topics.
When you have durable subscriptions, a topic is treated like a queue for each subscriber's clientId (set on the Connection). The logic being that the client doesn't want to miss any messages when they disconnect. So if they disconnect, the durable subscription hangs around and keeps the messages alive.
The AMQ message store uses data logs for it's message journal. These are written sequentially, and never actually removed from (that would require random access). There is a second file which keeps track of which messages have been consumed. Once all the messages in a data file have been consumed, that file is deleted.
So what you're seeing is that some of the messages in the data file are not being consumed by these durable subscriptions and just hang around. ClientIds for durable subscribers not being consistently used would cause this issue. It's likely that there is something wrong with the way the feature is being used, if you use JMX to inspect the subscriptions on the broker that should help you track down the root cause.
As a general rule, whenever you think that you might want to use a durable subscription, use virtual topics instead - they are much easier to reason about, inspect and load balance. On the other hand if you just want to get the last couple of messages when you reconnect a topic subscriber rather than all the messages you may have missed, use retroactive consumers.
An easy way to get around this issue is to always use a time to live when you send a message - pretty much every use case has a time limit of when a message ought to be consumed by anyway. ActiveMQ will expire messages beyond this point, and free up the messages in the data files for deletion.

Related

How to enqueue old messages into a new queue in RabbitMQ Exchange?

I have an exchange with a type of topic that only redirects messages to queue payments
Somewhere in the future, I will decide to add another queue payment_analyze to analyze all old and new messages that have been enqueued.
durable exchanges and queues survive rabbit MQ restarts, persistent messages get written to disk but when binding a new queue to an old durable exchange, old messages do not get redirected (only new ones do get redirected)
From my understanding, this is the intended behavior as exchanges do not store messages and only act as a "proxy"
How do I achieve this?
Possible Solution
Creating a queue named parking and adding every enqueued message to it, whenever a new queue is added, consume messages from parking without acknowledging to keep the new queue "semi" up to date.
Even though your configured persistent messages on the payments queue, this just means messages will survive a broker restart - once a message has been consumed and acknowledged it would be removed.
If you know you're going to need the payment_analyze queue at some point in the future, is it viable to just create this queue/binding upfront and route messages to both payment_analyze and payments? Messages on the payment_analyze will bank up until you're ready to start consuming them. Note: If you're producing a large number of messages this approach might result in storage issues...
As an alternative, you could write the messages to BLOB storage (or some other data store) as part of your payments queue consumer (or a different queue/consumer altogether) and then when you're ready to introduce the payment_analyze queue, you could write a script to read all the old messages from BLOB storage and send them to the RabbitMQ exchange. With 'topic' exchanges - see here - you can probably be clever with wildcards and routing keys in your queue bindings to ensure both old messages (from BLOB storage) as well as new messages are both routed to the payment_analyze queue, but only new messages are routed to the payments queue (so that your payments queue consumer is not reprocessing old messages).
Another option (assuming you're not overly invested in RabbitMQ) could be to consider Apache Kafka instead which deals with this scenario quite nicely as messages aren't automatically removed from a partition once they've been processed by a subscriber.
Anyways, just a few options to consider...

Resiliently processing messages from RabbitMQ

I'm not sure how to resiliently handle RabbitMQ messages in the event of an intermittent outage.
I subscribe in a windows service, read the message, then store it my database. If I can't process the record because of the data I publish it to a dead letter queue for a human to address and reprocess.
I am not sure what to do if I have some intermittent technical issue that will fix itself (database reboot, network outage, drive space, etc). I don't want hundreds of messages showing up on dead letter that just needed to wait for a for a glitch but now would be waiting on a human.
Currently, I re-queue the event and retry it once, but it retries so fast the issue is not usually resolved. I thought of retrying forever but I don't want a real issue to get stuck in an infinite loop.
Is a broad topic but from the server side you could persist your messages and make your queues durable, this means that in the eventuality the server gets restarted they won't be lost, check more here How to persist messages during RabbitMQ broker restart?
For the consumer (client) it will depend on how you configure your client, from the docs:
In the event of network failure (or a node crashing), messages can be duplicated, and consumers must be prepared to handle them. If possible, the simplest way to handle this is to ensure that your consumers handle messages in an idempotent way rather than explicitly deal with deduplication.
If a message is delivered to a consumer and then requeued (because it was not acknowledged before the consumer connection dropped, for example) then RabbitMQ will set the redelivered flag on it when it is delivered again (whether to the same consumer or a different one). This is a hint that a consumer may have seen this message before (although that's not guaranteed, the message may have made it out of the broker but not into a consumer before the connection dropped). Conversely if the redelivered flag is not set then it is guaranteed that the message has not been seen before. Therefore if a consumer finds it more expensive to deduplicate messages or process them in an idempotent manner, it can do this only for messages with the redelivered flag set.
Check more here: https://www.rabbitmq.com/reliability.html#consumer

Read all messages from the very begining

Consider a group chat scenario where 4 clients connect to a topic on an exchange. These clients each send an receive messages to the topic and as a result, they all send/receive messages from this topic.
Now imagine that a 5th client comes in and wants to read everything that was send from the beginning of time (as in, since the topic was first created and connected to).
Is there a built-in functionality in RabbitMQ to support this?
Many thanks,
Edit:
For clarification, what I'm really asking is whether or not RabbitMQ supports SOW since I was unable to find it on the documentations anywhere (http://devnull.crankuptheamps.com/documentation/html/develop/configuration/html/chapters/sow.html).
Specifically, the question is: is there a way for RabbitMQ to output all messages having been sent to a topic upon a new subscriber joining?
The short answer is no.
The long answer is maybe. If all potential "participants" are known up-front, the participant queues can be set up and configured in advance, subscribed to the topic, and will collect all messages published to the topic (matching the routing key) while the server is running. Additional server configurations can yield queues that persist across server reboots.
Note that the original question/feature request as-described is inconsistent with RabbitMQ's architecture. RabbitMQ is supposed to be a transient storage node, where clients connect and disconnect at random. Messages dumped into queues are intended to be processed by only one message consumer, and once processed, the message broker's job is to forget about the message.
One other way of implementing such a functionality is to have an audit queue, where all published messages are distributed to the queue, and a writer service writes them all to an audit log somewhere (usually in a persistent data store or text file). This would be something you would have to build, as there is currently no plug-in to automatically send messages out to a persistent storage (e.g. Couchbase, Elasticsearch).
Alternatively, if used as a debug tool, there is the Firehose plug-in. This is satisfactory when you are able to manually enable/disable it, but is not a good long-term solution as it will turn itself off upon any interruption of the broker.
What you would like to do is not a correct usage for RabbitMQ. Message Queues are not databases. They are not long term persistence solutions, like a RDBMS is. You can mainly use RabbitMQ as a buffer for processing incoming messages, which after the consumer handles it, get inserted into the database. When a new client connects to you service, the database will be read, not the message queue.
Relevant
Also, unless you are building a really big, highly scalable system, I doubt you actually need RabbitMQ.
Apache Kafka is the right solution for this use-case. "Log Compaction enabled topics" a.k.a. compacted topics are specifically designed for this usecase. But the catch is, obviously your messages have to be idempotent, strictly no delta-business. Because kafka will compact from time to time and may retain only the last message of a "key".

ActiveMQ - Kahadb log files will not clear

I've been tasked with investigating why the db-*.log files are not clearing.
From what I have found through vast searching, everything points to the messages being on the queue still. I've looked at hawtio at the queues on all the configured topics and the queue size is zero.
From my understanding the Enqueue size and Dequeue size in theory should be the same, but they're not. Seems my Dequeue size is 0.
I've looked at the topics and there's no operation to purge them.
I'd like to be able to clear out all messages so that the kahadb logs will disappear.
I think you point on one weakness of the ActiveMQ itself: it cannot guarantee the consumers are really strict when consuming the messages.
We have similar problems with our ActiveMQ (5.10.7) because it seems the KahaDB make likes a "disk fragmentation" and we noticed this could be from at least two issues with consumers:
Case 1: Slow consumer
We have in our system a consumer which cannot consume many messages at once. if only one unconsumed message stays in a KahaDB page, it will keep all the whole page (with all others messages which are already consumed, and acknowledged).
For preventing the KahaDB Storage to reach 100% (which will slows the producers) we transfer the messages in another ActiveMQ instance temporary queue like this:
from("activemqPROD:queue:BIG_QUEUE_UNCONSUMED")
.to("activemqTEMP:queue:TEMP_BIG_QUEUE");
then pushing them back:
from("activemqTEMP:queue:TEMP_BIG_QUEUE")
.to("activemqPROD:queue:BIG_QUEUE_UNCONSUMED");
The alternative is to store them on file system then reload them, but you loose the JMS (and custom) headers. With the temporary queue solution you keep all headers.
Case 2: Consumer who never gives acknowledgement
Sometimes even we make the previous operation, even all unconsumed queues are empty, the storage stays higher than 0%.
By looking into the KahaDB file we can see there are still pages present even no more messages in all QUEUES.
For the TOPICS, we stopped using durable subscriptions, then the storage should also stays at 0%.
The potential cause (this is a supposition, but with a strong confidence) is that some of the consumed messages were never acknowledged properly.
The reason we think this is the cause, it is because in the logs, we can still see messages
"not removing data file: 12345 as contained ack(s) refer to referenced file: [12344, 12345]"
This can happens for example when the consumer is disconnecting abruptly (they consumed some messages but disconnect before sending the ack)
In our case the messages never expires, then this could also be a potential issue for this case. However it is not clear if setting an expiration can destroy "non-acked" messages.
Because we do not want to loose any event, there is no expiration time for these specific queues.
According to your question, it looks you are in the second case, then our solution is:
Be sure no more producer / consumer are connecting to the ActiveMQ
Be sure all queues and durable topics are empty
Delete all files in the KahaDB storage (from file system)
Restart ActiveMQ (fresh)
Unfortunately we did not find a better way to manage with these cases, if someone else have a better alternative we would be happy to know it.
This article can also give you some solution (like setting an expiry policy for the ActiveMQ.DLQ queue).
add this log config to log4j.properties. Then you can see exactly what is holding kahadb files in kahadb.log.
log4j.appender.kahadb=org.apache.log4j.RollingFileAppender
log4j.appender.kahadb.file=${activemq.base}/data/kahadb.log
log4j.appender.kahadb.maxFileSize=1024KB
log4j.appender.kahadb.maxBackupIndex=5
log4j.appender.kahadb.append=true
log4j.appender.kahadb.layout=org.apache.log4j.PatternLayout
log4j.appender.kahadb.layout.ConversionPattern=%d [%-15.15t] %-5p %-30.30c{1} - %m%n
log4j.logger.org.apache.activemq.store.kahadb.MessageDatabase=TRACE, kahadb
As alternative: once you've found out which Queue is causing the log to exist, you could map it to its own KahaDB like described here http://activemq.apache.org/kahadb.html

Persistent message queue with at-least-once delivery

I have been looking at message queues (currently between Kafka and RabbitMQ) for one of my projects where these are biggest must have features.
Must have features
Messages in queues should be persistent. (only until they are processed successfully by consumers.)
Messages in queues should be removed only when downstream consumers were able to process the message successfully. Basically, a consumer should ACK. that it processed a message successfully.
Good to have features
To increase throughput, consumers should be able to pull batch of messages from queue.
If you are going with Kafka it will only retains message for a configurable duration of time after which the messages will be discarded to free up spaces no matter consumed or not.
And it is simply the responsibilities of the Kafka consumers to keep a track of what has been consumed.
IMHO if you require to keep the messages persisted for ever than consider using a different storage medium (database may be).