NServiceBus Message Replay Archive Architecture - nservicebus

I'm building an application that needs to preserve copies of the messages it is sending so that I can replay all the messages at a later stage. This is necessary because the processing of the messages will change dramatically over the course of development, but the data must be captured ASAP since it is realtime observation data. I can't seem to find any built-in functionality that directly addresses this and while I could write a custom tool to persist the data, that seems to contradict the purpose of using NServiceBus in the first place. Some options I'm considering:
Use the ForwardReceivedMessagesTo functionality of the Target bus to create an Archive queue, and build a simple application which uses this Archive queue as an input queue for simply forwarding messages onto the Target bus whenever the Replayer tool runs. This does clear the Archive queue, requiring it to be backed up first using the mqbkup utility, but this can be automated as part of the replay process. Alternatively, using two alternating Archive queues (one taking in new messages, and one for replaying) would solve this.
Use a publish/subscribe model and have an Archiver subscribe to the Target queue, placing the message in an Archive queue. A Replayer tool similar to the one above could use the Archive queue as an input queue and forward the messages to the Target. This would also clear the Archive queue, requiring one of the solutions above.
MassTransit people mention something called BusDriver that allows copying between queues but I can't find anything more about it.
My primary concern is to choose the approach that is least likely to lose data as once an observation is made it can never be made again outside of a narrow time window. This seems like it should be a common problem and yet I can't seem to find a straightforward solution to it. Suggestions?
Update I've decided to go with a journalled Target queue. I'll have an Archiver use the journal as an input and store messages to a database (could just be file-based), as well as allow for replaying messages from that database to the Target queue. While it would be possible to write a tool that copies messages from the journal queue to the target queue, the real problem - from a practical perspective - is that of managing the journal queue: it can't be backed up easily (mqbkup brings down the MSMQ service, which is unacceptable), and operating non-destructively on the queue requires me to write an MSMQ-based tool when I'd rather stick to the NServiceBus level of abstraction. Ultimately, MSMQ is a transport and not a store of messages so it needs to be treated as such.

The first option seems viable. I would add that NSB comes with a tool named ReturnToSourceQueue.exe that will replay messages for you. You could turn on journalling to keep messages, but something would have to clean that up since it doesn't roll. We have NetApp and we've used its SnapMirror to backup queue data, I'm sure there are similar things out there.

#3 BusDriver command-line reference:
BusDriver is a command-line utility used to administer queues, service bus instances and other things related to MassTransit.
Command-Line Reference
busdriver.exe [verb] [-option:value] [--switch]
help, --help Displays help
count Counts the number of messages in the specified
queue
-uri The URI of the queue
peek Displays the body of messages in the queue without
removing the messages
-uri The URI of the queue
-count The number of messages to display
move Move messages from one queue to another
-from The URI of the source queue
-to The URI of the destination queue
-count The number of messages to move
requeue Requeue messages from one queue to another
-uri The URI of the queue
-count The number of messages to move
save Save messages from a queue to a set of files
-uri The URI of the source queue
-file The name of the file to write to (will have .1, .2
appended automatically for each message)
-count The number of messages to save
--remove If set, the messages will be removed from the queue
load Load messages from a set of files into a queue
-uri The URI of the destination queue
-file The name of the file to read from (will have .1, .2
appended automatically for each message)
-count The number of messages to load
--remove If set, the message file will be removed once the
message has been loaded
trace Request a trace of messages that have been received
by a service bus
-uri The URI of the control bus for the service bus
instance
-count The number of messages to request
status Request a status probe of the bus at the endpoint
-uri The URI of the control bus for the service bus instance
exit, quit Exit the interactive console (run without arguments
to start the interactive console)
Examples:
count -uri:msmq://localhost/mt_server
Returns the number of messages that are present in the local
MSMQ private queue named "mt_server"
peek -uri:msmq://localhost/mt_client
Displays the body of the first message present in the local
MSMQ private queue named "mt_client"
trace -uri:msmq://localhost/mt_subscriptions
Requests and displays a trace of the last 100 messages received
by the mt_subscriptions (the default queue name used by the
subscription service, which is part of the RuntimeServices)
move -from:msmq://localhost/mt_server_error -to:msmq://localhost/mt_server
Moves one message from the mt_server_error queue to the mt_server
queue (typically done to reprocess a message that was previously
moved to the error queue due to a processing error, etc.)

Related

Message Delivery Guarantee for Multiple Consumers in Pub/Sub and Messaging Queues

Requirement
A system undergoes some state change, and multiple other parts of the system has to know this(lets call them observers) so that they can perform some actions based on the current state, the actions of the observers are important, if some of the observers are not online(not listening currently due to some trouble, but will be back soon), the message should not be discarded till all the observers gets the message.
Trying to accomplish this with pub/sub model, here are my findings, (please correct if this understanding is wrong) -
The publisher creates an event on specific topic, and multiple subscribers can consume the same message. This model either provides no delivery guarantee(in redis), or delivery is guaranteed once(with messaging queues), ie. when one of the consumer acknowledges a message, the message is discarded(rabbitmq).
Example
A new Person Profile entity gets created in DB
Now,
A background verification service has to know this to trigger the verification process.
Subscriptions service has to know this to add default subscriptions to the user.
Now both the tasks are important, unrelated and can run in parallel.
Now In Queue model, if subscription service is down for some reason, a BG verification process acknowledges the message, the message will be removed from the queue, or if it is fire and forget like most of pub/sub, the delivery is anyhow not guaranteed for both the services.
One more point is both the tasks are unrelated and need not be triggered one after other.
In short, my need is to make sure all the consumers gets the same message and they should be able to acknowledge them individually, the message should be evicted only after all the consumers acknowledged it either of the above approaches doesn't do this.
Anything I am missing here ? How should I approach this problem ?
This scenario is explicitly supported by RabbitMQ's model, which separates "exchanges" from "queues":
A publisher always sends a message to an "exchange", which is just a stateless routing address; it doesn't need to know what queue(s) the message should end up in
A consumer always reads messages from a "queue", which contains its own copy of messages, regardless of where they originated
Multiple consumers can subscribe to the same queue, and each message will be delivered to exactly one consumer
Crucially, an exchange can route the same message to multiple queues, and each will receive a copy of the message
The key thing to understand here is that while we talk about consumers "subscribing" to a queue, the "subscription" part of a "pub-sub" setup is actually the routing from the exchange to the queue.
So a RabbitMQ pub-sub system might look like this:
A new Person Profile entity gets created in DB
This event is published as a message to an "events" topic exchange with a routing key of "entity.profile.created"
The exchange routes copies of the message to multiple queues:
A "verification_service" queue has been bound to this exchange to receive a copy of all messages matching "entity.profile.#"
A "subscription_setup_service" queue has been bound to this exchange to receive a copy of all messages matching "entity.profile.created"
The consuming scripts don't know anything about this routing, they just know that messages will appear in the queue for events that are relevant to them:
The verification service picks up the copy of the message on the "verification_service" queue, processes, and acknowledges it
The subscription setup service picks up the copy of the message on the "subscription_setup_service" queue, processes, and acknowledges it
If there are multiple consuming scripts looking at the same queue, they'll share the messages on that queue between them, but still completely independent of any other queue.
Here's a screenshot from this interactive visualisation tool that shows this scenario:
As you mentioned it is not something that you can control with Redis Pub/Sub data structure.
But you can do it easily with Redis Streams.
Streams will allow you to post messages using the XADD command and then control which consumers are dealing with the message and acknowledge that message has been processed.
You can look at these sample application that provides (in Java) example about:
posting and consuming messages
create multiple consumer groups
manage exceptions
Links:
Getting Started with Redis Streams and Java
Redis Streams in Action ( Project that shows how to use ADD/ACK/PENDING/CLAIM and build an error proof streaming application with Redis Streams and SpringData )

How can I get data from RabbitMQ? I don't want consume it from queue

Is there a tool can view data from queue? I just want know what data in queue, but I don't want consume these data. Web UI and REST API just show count number, I want details.
How can I use Mnesia query queue's data? like MySQL client.
There are a few options
Firehose
You may consider firehose feature
https://www.rabbitmq.com/firehose.html
RabbitMQ has a "firehose" feature, where the administrator can enable
(on a per-node, per-vhost basis) an exchange to which publish- and
delivery-notifications should be CCed.
rabbitmq_tracing plugin
https://www.rabbitmq.com/plugins.html
Second queue
Just setup your exchange so it will deliver messages to two queues. One queue is for actual business procesing. Second queue is for debug pourposes only. Reading messages from second queue will consume them. For that debug queue you may enable reasonable TTL and/or Queue Length Limit. Otherwise, unconsumed messages will eventually eat all disk space.
Consume and re-send
You may consume message (to see it) and immediatelyre-send same message to the same queue. RabbitMQ management GUI has this option. Note that this will change order of the messages.

Move message From one Queue to other Queue without deleting it Rabbitmq

I have the following problem.
My program sends messages directly to the Queue (without exchange). I need to monitor incoming of new messages and send them to other Queue without removing them from source queue.
I don't have access to program code, so I'm not able to publish messages to exchange first.
Is it possible to solve this problem using the management web interface of RabbitMQ?
I tried to use shovel plugin, but it removes all messages from source queue after ack.
First to clear up few things:
My program sends messages directly to the Queue (without exchange) This is not true, at the very least (and most likely in this case) nameless exchange is used.
removes all messages from source queue after ack
this is by design and therefore perfectly fine.
You should never keep messages in the queue, queue is made to be consumed. As Derick Bailey says here
RabbitMQ is not a database. RabbitMQ is a message broker and queueing system.
on the same link you will find your answer. I cannot give a concrete one since you didn't provide motivation, but whatever it is keeping messages in the queue is never good!
Maybe you want to log/store your message first and then process it with the consequence of processing being some 3rd action or whatever...

What belongs into a DLQ / Invalid Message Queue?

Is there a good best practice about what kind of messages an application is allowed to reject?
My understanding is that all messages which can't be handled should be rejected to the dead letter queue - no matter if the problem is a syntax error or a semantic error in the message or if the application is temporarily not able to handle the message (for instance because the db just went down).
Of course - if the app already knows upfront that it will not be able to handle a message (DB down), it should stop accepting messages.
So what's the common understanding / best practice?
My response is with respect to WebSphere MQ:
A Dead Letter Queue (DLQ for short) is a place where messages that could not be delivered to their destination are put. Messages can be put on the DLQ by queue managers, message channel agents (MCAs), and applications. All messages on the DLQ must be prefixed with a dead-letter header structure, MQDLH. The MQDLH header is automatically fixed when queue manager or MCAs put messages whereas applications must prefix the MQDLH explicitly.
As far applications are concerned, if they are unable to handle the message, say for example the message format is not understood, they can put the message to a BACKOUT queue instead of a DLQ. A BACKOUT queue is just like any normal queue where messages rejected by applications can be put. The advantage of BACKOUT queue is that you can specify a BACKOUT queue on a per queue basis and the messages put there need not have MQDLH header prefixed.
An application can be written to read the messages from BACKOUT and route them back to the target queue as it is. However the messages in a DLQ require additional processing to remove the MQDLH before they are put onto a target queue.

RabbitMQ use of immediate and mandatory bits

I am using RabbitMQ server.
For publishing messages, I set the immediate field to true and tried sending 50,000 messages. Using rabbitmqctl list_queues, I saw that the number of messages in the queue was zero.
Then, I changed the immediate flag to false and again tried sending 50,000 messages. Using rabbitmqctl list_queues, I saw that a total of 100,000 messages were in queues (till now, no consumer was present).
After that, I started a consumer and it consumed all the 100,000 messages.
Can anybody please help me in understanding about the immediate bit field and this behavior too? Also, I could not understand the concept of the mandatory bit field.
The immediate and mandatory fields are part of the AMQP specification, and are also covered in the RabbitMQ FAQ to clarify how its implementers interpreted their meaning:
Mandatory
This flag tells the server how to
react if a message cannot be routed to
a queue. Specifically, if mandatory is
set and after running the bindings the
message was placed on zero queues then
the message is returned to the sender
(with a basic.return). If mandatory
had not been set under the same
circumstances the server would
silently drop the message.
Or in my words, "Put this message on at least one queue. If you can't, send it back to me."
Immediate
For a message published with immediate
set, if a matching queue has ready
consumers then one of them will have
the message routed to it. If the lucky
consumer crashes before ack'ing
receipt the message will be requeued
and/or delivered to other consumers on
that queue (if there's no crash the
messaged is ack'ed and it's all done
as per normal). If, however, a
matching queue has zero ready
consumers the message will not be
enqueued for subsequent redelivery on
from that queue. Only if all of the
matching queues have no ready
consumers that the message is returned
to the sender (via basic.return).
Or in my words, "If there is at least one consumer connected to my queue that can take delivery of a message right this moment, deliver this message to them immediately. If there are no consumers connected then there's no point in having my message consumed later and they'll never see it. They snooze, they lose."
http://www.rabbitmq.com/blog/2012/11/19/breaking-things-with-rabbitmq-3-0/
Removal of "immediate" flag
What changed? We removed support for the
rarely-used "immediate" flag on AMQP's basic.publish.
Why on earth did you do that? Support for "immediate" made many parts
of the codebase more complex, particularly around mirrored queues. It
also stood in the way of our being able to deliver substantial
performance improvements in mirrored queues.
What do I need to do? If you just want to be able to publish messages
that will be dropped if they are not consumed immediately, you can
publish to a queue with a TTL of 0.
If you also need your publisher to be able to determine that this has
happened, you can also use the DLX feature to route such messages to
another queue, from which the publisher can consume them.
Just copied the announcement here for a quick reference.