How do you replay KahaDB message archives? - activemq

In the ActiveMQ KahaDB documentation, it mentions that you can archive KahaDB data files so they can be replayed if needed later. Yet, through some searching and looking through their documentation and the draft copy of ActiveMQ in Action, I can't find any example or clues how to actually do the replay of those files.
I'm hoping someone out there can point me in the direction on what needs to be done in order to actually perform a replay.

KahaDB only replays messages/events when a broker is started to return the broker to the state prior to the broker being stopped (recovering persistent messages, etc.)
It does not retain historical messages to be replayed on demand. Once a message is dequeued successfully, then its removed from the KahaDB data files.
If you have such a requirement to copy messages for auditing/reuse, then I suggest look into something like mirrored queues or using the camel wire-tap pattern.

Related

The strange behavior of `delete-after` attribute of dynamic shovel

I was exploring the shovel plugin for moving the messages from source to temporary queues as a part of a bigger use case. I was creating the dynamic shovel for each queue to move the messages to the temporary queue and delete the dynamic shovel using the attribute "delete-after": "queue-length". I have seen in the RabbitMQ Management console(Admin->Shovel status) that the dynamic shovel got deleted successfully, but the source/temporary queues' state was running.
But the issue was that when new messages were coming to the source queues, they were automatically moving to the temporary queues even though there was no consumer of the source queue.
Note:
Source and temporary both queues are durable.
Messages are persistent (Delivery mode: 2)
The said operation was performed parallelly as there are hundreds of queues. I was creating dynamic shovel for each queue and delete them.
While I'm removing the dynamic shovel using the DELETE HTTP API instead of the above approach, it's working perfectly. I want to avoid making an extra HTTP call as the no of source queues are hundreds.
delete-after attribute has been deprecated and renamed with src-delete-after a long back. RMQ v3.7.x has the support of delete-after attribute but it was removed in v3.8.x(up to 3). Then it was brought back in v3.8.4
https://github.com/rabbitmq/rabbitmq-shovel/issues/72
Thanks to Michael

Artemis vs Activemq 5 message store

In activemq 5, each queue had a folder containing its data and messages, everything.
Which would mean that, in case of an issue, for example an out of disk space error. Some files would get corrupted before the server crash. In that case, in activemq 5, we would find logs indicating corrupted files, and we could delete the queue folder that was corrupted, resulting in small loss of messages instead of ALL messages.
In artemis, it seems that messages are stored in the same files, independently from the queue they are stored in. Which means if i get an out of disk space error, i might have to delete all my messages.
First, can you confirm the change of behaviour, and secondly, is there a way to recover ? And a bonus, if anyone know why this change happened, I would like to understand.
Artemis uses a completely new message journal implementation as compared to 5.x. The same journal is used for all messages. However, it isn't subject to the same corruption problems as you've seen with 5.x. If records from the journal can't be processed then they are simply skipped.
If you get an out of disk space error you should never need to delete all your messages. The journal files themselves are allocated and filled with zeroes to meet their configured size before they are actually used so if you were going to run out of disk space you'd do so during that process before any messages were written to them.
The Artemis journal implementation was written from the ground up for high performance specifically in conjunction with the broker's non-blocking architecture.

Any Alternatives to Purging Active MQ message queues?

I am new to Active MQ but sometimes the queues are not being processed and keep piling up, Is it a good practice to purge?, Isnt there any other solution that may prevent me from keeping all my messages for reprocessing apart from purging? I really dont want to loose the queues, Is this possible?
The correct way to deal with this is to set an expiration on messages such that after a given time the broker can discard them. Letting messages just pile into queues without regard to their lifetime will lead you into all sorts of problems most notably storage.
You need to develop a strategy for how long the messages should live so that the broker can start getting rid of them once they are no longer of use. If you don't do that then purging the queue is you only option.

Using Message Broker for database replications (currently RabbitMQ )

When my system's data changes I publish every single change to at least 4 different consumers (around 3000 messages a second) so I want to use a message broker.
Most of the consumers are responsible to update their database tables with the change.
(The DBs are different - couch, mysql, etc therefor solutions such as using their own replication mechanism or using db triggers is not possible)
questions
Does anyone have an experience with data replication between DBs using a message broker?
is it a good practice?
What do I do in case of failures?
Let's say, using RabbitMQ, the client removed 10,000 messages from the queue, acked, and threw an exception each time before handling them. Now they are lost. Is there a way to go back in the queue?
(re-queueing them will mess their order ).
Is using rabbitMQ a good practice? Isn't the ability to go back in the queue as in Kafka important to fail scenarios?
Thanks.
I don't have experience with DB replication using message brokers, but maybe this can help put you in the right track:
2. What do I do in case of failures?
Let's say, using RabbitMQ, the client removed 10,000 messages from the
queue, acked, and threw an exception each time before handling them.
Now they are lost. Is there a way to go back in the queue?
You can use dead lettering to avoid losing messages. I'd suggest to not ack until you are sure the consumers have processed them successfully, unless it is a long-running task. In case of failure, basic.reject instead of basic.ack to send them to a dead-letter queue. You have a medium throughput, so gotta be careful with that.
However, the order is not guaranteed. You'll need to implement a manual mechanism to recover them in the order they were published, maybe by using message headers with some sort of timestamp or id mechanism, to re-process them in the correct order.

Need advice on suitable message queue for Storm spout

I'm developing a prototype Lambda system and my data is streaming in via Flume to HDFS. I also need to get the data into Storm. Flume is a push system and Storm is more pull so I don't believe it's wise to try to connect a spout to Flume, but rather I think there should be a message queue between the two. Again this is a prototype, so I'm looking for best practices, not perfection. I'm thinking of putting an AMQP compliant queue as a Flume sink and then pulling the messages from a spout.
Is this a good approach? If so, I want to use a message queue that has relatively robust support in both the Flume world (as a sink) and the Storm world (as a spout). If I go AMQP then I assume that gives me the option to use whatever AMQP-compliant queue I want to use, correct? Thanks.
If your going to use AMQP, I'd recommend sticking to the finalized 1.0 version of the AMQP spec. Otherwise, your going to feel some pain when you try to upgrade to it from previous versions.
Your approach makes a lot of sense, but, for us the AMQP compliant issue looked a little less important. I will try to explain why.
We are using Kafka to get data into storm. The main reason is mainly around performance and usability. AMQP complaint queues do not seem to be designed for holding information for a considerable time, while with Kafka this is just a definition. This allows us to keep messages for a long time and allow us to "playback" those easily (as the message we wish to consume is always controlled by the consumer we can consume the same messages again and again without a need to set up an entire system for that purpose). Also, Kafka performance is incomparable to anything that I have seen.
Storm has a very useful KafkaSpout, in which the main things to pay attention to are:
Error reporting - there is some improvement to be done there. Messages are not as clear as one would have hoped.
It depends on zookeeper (which is already there if you have storm) and a path is required to be manually created for it.
According to the storm version, pay attention to the Kafka version in use. It is documented, but, can really easily be missed and cause unclear problems.
You can have the data streamed to a broker topic first. Then flume and storm spout can both consume from that topic. Flume has a jms source that makes it easy to consume from the message broker. And a storm jms spout to get the messages into storm.