I've been tasked with investigating why the db-*.log files are not clearing.
From what I have found through vast searching, everything points to the messages being on the queue still. I've looked at hawtio at the queues on all the configured topics and the queue size is zero.
From my understanding the Enqueue size and Dequeue size in theory should be the same, but they're not. Seems my Dequeue size is 0.
I've looked at the topics and there's no operation to purge them.
I'd like to be able to clear out all messages so that the kahadb logs will disappear.
I think you point on one weakness of the ActiveMQ itself: it cannot guarantee the consumers are really strict when consuming the messages.
We have similar problems with our ActiveMQ (5.10.7) because it seems the KahaDB make likes a "disk fragmentation" and we noticed this could be from at least two issues with consumers:
Case 1: Slow consumer
We have in our system a consumer which cannot consume many messages at once. if only one unconsumed message stays in a KahaDB page, it will keep all the whole page (with all others messages which are already consumed, and acknowledged).
For preventing the KahaDB Storage to reach 100% (which will slows the producers) we transfer the messages in another ActiveMQ instance temporary queue like this:
from("activemqPROD:queue:BIG_QUEUE_UNCONSUMED")
.to("activemqTEMP:queue:TEMP_BIG_QUEUE");
then pushing them back:
from("activemqTEMP:queue:TEMP_BIG_QUEUE")
.to("activemqPROD:queue:BIG_QUEUE_UNCONSUMED");
The alternative is to store them on file system then reload them, but you loose the JMS (and custom) headers. With the temporary queue solution you keep all headers.
Case 2: Consumer who never gives acknowledgement
Sometimes even we make the previous operation, even all unconsumed queues are empty, the storage stays higher than 0%.
By looking into the KahaDB file we can see there are still pages present even no more messages in all QUEUES.
For the TOPICS, we stopped using durable subscriptions, then the storage should also stays at 0%.
The potential cause (this is a supposition, but with a strong confidence) is that some of the consumed messages were never acknowledged properly.
The reason we think this is the cause, it is because in the logs, we can still see messages
"not removing data file: 12345 as contained ack(s) refer to referenced file: [12344, 12345]"
This can happens for example when the consumer is disconnecting abruptly (they consumed some messages but disconnect before sending the ack)
In our case the messages never expires, then this could also be a potential issue for this case. However it is not clear if setting an expiration can destroy "non-acked" messages.
Because we do not want to loose any event, there is no expiration time for these specific queues.
According to your question, it looks you are in the second case, then our solution is:
Be sure no more producer / consumer are connecting to the ActiveMQ
Be sure all queues and durable topics are empty
Delete all files in the KahaDB storage (from file system)
Restart ActiveMQ (fresh)
Unfortunately we did not find a better way to manage with these cases, if someone else have a better alternative we would be happy to know it.
This article can also give you some solution (like setting an expiry policy for the ActiveMQ.DLQ queue).
add this log config to log4j.properties. Then you can see exactly what is holding kahadb files in kahadb.log.
log4j.appender.kahadb=org.apache.log4j.RollingFileAppender
log4j.appender.kahadb.file=${activemq.base}/data/kahadb.log
log4j.appender.kahadb.maxFileSize=1024KB
log4j.appender.kahadb.maxBackupIndex=5
log4j.appender.kahadb.append=true
log4j.appender.kahadb.layout=org.apache.log4j.PatternLayout
log4j.appender.kahadb.layout.ConversionPattern=%d [%-15.15t] %-5p %-30.30c{1} - %m%n
log4j.logger.org.apache.activemq.store.kahadb.MessageDatabase=TRACE, kahadb
As alternative: once you've found out which Queue is causing the log to exist, you could map it to its own KahaDB like described here http://activemq.apache.org/kahadb.html
Related
In my app(multiple instances), we occasionally see the case where connection is lost between my app and rabbitmq due to network issues(my app and rabbitmq are both alive), then after connection is recovered(re-established) we will receive messages that are unacked.
This creates an issue for us, because my app wasn't dead, and it is still processing the same message it received before, but now the message is redeivered, and it causes the app to process the message again (which can be fatal to us).
Since the app has multiple instances, it is not easy for an instance to check if another instance is processing the same message at the same time. We can't simply filter out redelivered message, because we need this feature to handle instance/app crashes/re-deployments.
It doesn't seem that there is an api to tell rabbitmq when to not redeliver unacked messages.
So what is the recommended practice to handle this situation ?
Thanks,
The general solution for such scenario is to make the consumers handle the messages in an idempotent manner . Generally what I do is from the producer side ( in case there is no unique identifier in the message body ) I add an attribute idempotencyId to the message body which is a guid and on the consumer side for each message this id is validated against the stored value in database , any duplicates are rejected.
This approach also works for messages which might be shoveled from another cluster or if in a same cluster multiple instances of consumers are listening then too this approach guarantee one time processing.
Would suggest to go over the RabbitMQ Reliability Guide here
Yeah, exactly-once delivery is not something RabbitMQ is good at. In fact, I'd say you should probably not be using it for these kinds of problems. Honestly, the only way to truly fix this is to use distributed transactions or locking.
Anyway, you could turn the problem on its head by ack'ing the message as soon as the consumer gets it, before it starts working on it. That would avoid the RabbitMQ-related duplication issue at least. This is at-most-once delivery.
Of course, it means that if the consumer crashes, the message is lost forever. So you need to persist the message right before you ack it so you can recover it later and also the consumer should remove it once it's complete.
Considering that crashes are rare, you can then have a single dedicated process that just works on those persisted messages. Or for that matter, handle them manually.
Just be aware that you are pushing the duplication problem in front of you, because the consumer might fail to remove the persisted message after it's done working with it anyway, but at least you have the option to implement it however you want.
Storage in this case could be anything from files, a RDBMS or something like ZooKeeper or Redis to lock/unlock in-flight messages.
Consider a group chat scenario where 4 clients connect to a topic on an exchange. These clients each send an receive messages to the topic and as a result, they all send/receive messages from this topic.
Now imagine that a 5th client comes in and wants to read everything that was send from the beginning of time (as in, since the topic was first created and connected to).
Is there a built-in functionality in RabbitMQ to support this?
Many thanks,
Edit:
For clarification, what I'm really asking is whether or not RabbitMQ supports SOW since I was unable to find it on the documentations anywhere (http://devnull.crankuptheamps.com/documentation/html/develop/configuration/html/chapters/sow.html).
Specifically, the question is: is there a way for RabbitMQ to output all messages having been sent to a topic upon a new subscriber joining?
The short answer is no.
The long answer is maybe. If all potential "participants" are known up-front, the participant queues can be set up and configured in advance, subscribed to the topic, and will collect all messages published to the topic (matching the routing key) while the server is running. Additional server configurations can yield queues that persist across server reboots.
Note that the original question/feature request as-described is inconsistent with RabbitMQ's architecture. RabbitMQ is supposed to be a transient storage node, where clients connect and disconnect at random. Messages dumped into queues are intended to be processed by only one message consumer, and once processed, the message broker's job is to forget about the message.
One other way of implementing such a functionality is to have an audit queue, where all published messages are distributed to the queue, and a writer service writes them all to an audit log somewhere (usually in a persistent data store or text file). This would be something you would have to build, as there is currently no plug-in to automatically send messages out to a persistent storage (e.g. Couchbase, Elasticsearch).
Alternatively, if used as a debug tool, there is the Firehose plug-in. This is satisfactory when you are able to manually enable/disable it, but is not a good long-term solution as it will turn itself off upon any interruption of the broker.
What you would like to do is not a correct usage for RabbitMQ. Message Queues are not databases. They are not long term persistence solutions, like a RDBMS is. You can mainly use RabbitMQ as a buffer for processing incoming messages, which after the consumer handles it, get inserted into the database. When a new client connects to you service, the database will be read, not the message queue.
Relevant
Also, unless you are building a really big, highly scalable system, I doubt you actually need RabbitMQ.
Apache Kafka is the right solution for this use-case. "Log Compaction enabled topics" a.k.a. compacted topics are specifically designed for this usecase. But the catch is, obviously your messages have to be idempotent, strictly no delta-business. Because kafka will compact from time to time and may retain only the last message of a "key".
Before a consumer nacks a message, is there any way the consumer can modify the message's state so that when the consumer consumes it upon redelivery, it sees that changed state. I'd rather not reject + reenqueue new message, but please let me know if that's the only way to accomplish this.
My goal is to determine how many times specific messages are being redelivered. I see two ways of doing this:
(1) On the message itself as described above. The message would be a container of basic stats and the application payload message.
(2) In some external storage. We would uniquely identify the message by the message id that we set.
I know 2 is possible, but my question is if 1 is possible.
There is no way to do (1) like you want. You would need to change the message, thus the message would become another message. If you want to do something like that (and it's possible that you meant this with I'd rather not reject + reenqueue new message) - you should ACK the message, increment one field in it and publish it again (again, maybe this is what you meant when you said reenqueue it). So your message payload would have some ID, counter, and again (obviously different) payload that is the content.
Definitvly much better way is (2) for multiple reasons:
it does not interfere with business logic, that is this diagnostic part is isolated
you are leaving re-queueing to rabbitmq (as you are supposed to do), meaning that you are not worrying about losing messages and handling some message meta info which has no use for you business logic
it's actually supposed to be used - the ACKing and NACKing, that's why it's in the AMQP specification
since you do need the number of how many times specific messages have been redelivered, you have it somewhere externally, meaning that it's independent of (rabbitmq's) message persistence, lifetime, potentially queue durability mirroring etc
Even if this question was marked as solved some time ago, I want to mention that there is a way at least for the redelivery. It might be integrated after the original answer. There is a different type of queues in RabbitMQ called Quorum queues.
Quorum queues offer the option to set redelivery limit:
Quorum queues support poison message handling via a redelivery limit. This feature is currently unique to Quorum queues.
In order to archive this, RabbitMQ is counting the numbers of deliveries in the header. The header attribute is called: x-delivery-count
Can someone please explain what is going on behind the scenes in a RabbitMQ cluster with multiple nodes and queues in mirrored fashion when publishing to a slave node?
From what I read, it seems that all actions other than publishes go only to the master and the master then broadcasts the effect of the actions to the slaves(this is from the documentation). Form my understanding it means a consumer will always consume message from the master queue. Also, if I send a request to a slave for consuming a message, that slave will do an extra hop by getting to the master for fetching that message.
But what happens when I publish to a slave node? Will this node do the same thing of sending first the message to the master?
It seems there are so many extra hops when dealing with slaves, so it seems you could have a better performance if you know only the master. But how do you handle master failure? Then one of the slaves will be elected master, so you have to know where to connect to?
Asking all of this because we are using RabbitMQ cluster with HAProxy in front, so we can decouple the cluster structure from our apps. This way, whenever a node goes done, the HAProxy will redirect to living nodes. But we have problems when we kill one of the rabbit nodes. The connection to rabbit is permanent, so if it fails, you have to recreate it. Also, you have to resend the messages in this cases, otherwise you will lose them.
Even with all of this, messages can still be lost, because they may be in transit when I kill a node (in some buffers, somewhere on the network etc). So you have to use transactions or publisher confirms, which guarantee the delivery after all the mirrors have been filled up with the message. But here another issue. You may have duplicate messages, because the broker might have sent a confirmation that never reached the producer (due to network failures, etc). Therefore consumer applications will need to perform deduplication or handle incoming messages in an idempotent manner.
Is there a way of avoiding this? Or I have to decide whether I can lose couple of messages versus duplication of some messages?
Can someone please explain what is going on behind the scenes in a RabbitMQ cluster with multiple nodes and queues in mirrored fashion when publishing to a slave node?
This blog outlines exactly what happens.
But what happens when I publish to a slave node? Will this node do the same thing of sending first the message to the master?
The message will be redirected to the master Queue - that is, the node on which the Queue was created.
But how do you handle master failure? Then one of the slaves will be elected master, so you have to know where to connect to?
Again, this is covered here. Essentially, you need a separate service that polls RabbitMQ and determines whether nodes are alive or not. RabbitMQ provides a management API for this. Your publishing and consuming applications need to refer to this service either directly, or through a mutual data-store in order to determine that correct node to publish to or consume from.
The connection to rabbit is permanent, so if it fails, you have to recreate it. Also, you have to resend the messages in this cases, otherwise you will lose them.
You need to subscribe to connection-interrupted events to react to severed connections. You will need to build in some level of redundancy on the client in order to ensure that messages are not lost. I suggest, as above, that you introduce a service specifically designed to interrogate RabbitMQ. You client can attempt to publish a message to the last known active connection, and should this fail, the client might ask the monitor service for an up-to-date listing of the RabbitMQ cluster. Assuming that there is at least one active node, the client may then establish a connection to it and publish the message successfully.
Even with all of this, messages can still be lost, because they may be in transit when I kill a node
There are certain edge-cases that you can't cover with redundancy, and neither can RabbitMQ. For example, when a message lands in a Queue, and the HA policy invokes a background process to copy the message to a backup node. During this process there is potential for the message to be lost before it is persisted to the backup node. Should the active node immediately fail, the message will be lost for good. There is nothing that can be done about this. Unfortunately, when we get down to the level of actual bytes travelling across the wire, there's a limit to the amount of safeguards that we can build.
herefore consumer applications will need to perform deduplication or handle incoming messages in an idempotent manner.
You can handle this a number of ways. For example, setting the message-ttl to a relatively low value will ensure that duplicated messages don't remain on the Queue for extended periods of time. You can also tag each message with a unique reference, and check that reference at the consumer level. Of course, this would require storing a cache of processed messages to compare incoming messages against; the idea being that if a previously processed message arrives, its tag will have been cached by the consumer, and the message can be ignored.
One thing that I'd stress with AMQP and Queue-based solutions in general is that your infrastructure provides the tools, but not the entire solution. You have to bridge those gaps based on your business needs. Often, the best solution is derived through trial and error. I hope my suggestions are of use. I blog about a number of RabbitMQ design solutions here, including the issues you mentioned, here if you're interested.
I am using ActiveMQ 5.4 with KahaDB as message store.
While Publishing Messages (with Persistence true) to a Topic, which has Durable subscriber, the persistence store is increasing even the messages are dispatched to Subscriber. So this is causing an issue as the message store is getting full and not accepting any more messages.
So my question is why the Persistence store is not discarding the messages in the KahaDB, even the messages are getting dispatched?
Regards,
Srinivas
What you are seeing is an interaction between the ActiveMQ message store behaviour and that for durable subscriptions on topics.
When you have durable subscriptions, a topic is treated like a queue for each subscriber's clientId (set on the Connection). The logic being that the client doesn't want to miss any messages when they disconnect. So if they disconnect, the durable subscription hangs around and keeps the messages alive.
The AMQ message store uses data logs for it's message journal. These are written sequentially, and never actually removed from (that would require random access). There is a second file which keeps track of which messages have been consumed. Once all the messages in a data file have been consumed, that file is deleted.
So what you're seeing is that some of the messages in the data file are not being consumed by these durable subscriptions and just hang around. ClientIds for durable subscribers not being consistently used would cause this issue. It's likely that there is something wrong with the way the feature is being used, if you use JMX to inspect the subscriptions on the broker that should help you track down the root cause.
As a general rule, whenever you think that you might want to use a durable subscription, use virtual topics instead - they are much easier to reason about, inspect and load balance. On the other hand if you just want to get the last couple of messages when you reconnect a topic subscriber rather than all the messages you may have missed, use retroactive consumers.
An easy way to get around this issue is to always use a time to live when you send a message - pretty much every use case has a time limit of when a message ought to be consumed by anyway. ActiveMQ will expire messages beyond this point, and free up the messages in the data files for deletion.