RabbitMQ+MassTransit: how to cancel queued message from processing? - rabbitmq

In some exceptional situations I need somehow to tell consumer on receiving point that some messages shouldn’t be processed. Otherwise two systems will become out-of-sync (we deal with some outdates external systems, and if, for example, connection is dropped we have to discard all queued operations in scope of that connection).
Take a risk and resolve problem messages manually? Compensation actions (that could be tough to support in my case)? Anything else?

There are a few ways:
You can set a time-to-live when sending a message: await endpoint.Send(myMessage, c => c.TimeToLive = TimeSpan.FromHours(1));, but this will apply to all messages that are sent (or published) like this. I would consider this, after looking at your requirements. This is technical, but it is a proper messaging pattern.
Make TTL and generation timestamp properties of your message itself and let the consumer decide if the message is still worth processing. This is more business and, probably, the most correct way.
Combine tech and business - keep the timestamp and TTL in message headers so they don't pollute your message contracts, and filter them out using a custom middleware. In this case, you need to be careful to log such drops so you won't be left wonder why messages disappear now and then.
Almost any unreliable integration can be monitored using sagas, with timeouts. For example, we use a saga to integrate with Twilio. Since we have no ability to open a webhook for them, we poll after some interval to check the message status. You can start a saga when you get a message and schedule a message to check if the processing is still waiting. As discussed in comments, you can either use the "human intervention required" way to fix the issue or let the saga decide to drop the message.
A similar way could be to use a lookup table, where you put the list of messages that aren't relevant for processing. Such a table would be similar to the list of sagas. It seems that this way would also require scheduling. Both here, and for the saga, I'd recommend using a separate receive endpoint (a queue) for the DropIt message, with only one consumer. It would prevent DropIt messages from getting stuck behind the integration messages that are waiting to be processed (and some should be already dropped)
Use RMQ management API to remove messages from the queue. This is the worst method, I won't recommend it.

From what I understand, you're building a system that sends messages to 3rd party systems. In other words, systems you don't control. It has an API but compensating actions aren't always possible, because the API doesn't provide it or because actions are performed inside the 3rd party system that can't be compensated or rolled back?
If possible try to solve this via sagas. Make sure the saga executes the different steps (the sending of messages) in the right order. So that messages that cannot be compensated are sent last. This way message that can be compensated if they fail, will be compensated by the saga. The ones that cannot be compensated should be sent last, when you're as sure as possible that they don't have to be compensated. Because that last message is the last step in synchronizing all systems.
All in all this is one of the problems with distributed systems, keeping everything in sync. Compensating actions is the way to deal with this. If compensating actions aren't possible, you're in a very difficult situation. Try to see if the business can help by becoming more flexible and accepting that you need to compensate things, where they'll tell you it's not possible.

In some exceptional situations I need somehow to tell consumer on receiving point that some messages shouldn’t be processed.
Can't you revert this into:
Tell the consumer that an earlier message can be processed.
This way you can easily turn this in a state machine (like a saga) that acts on two messages. If the 2nd message never arrives then you can discard the 1st after a while or do something else.
The strategy here is to halt/wait until certain that no actions need to be reverted.

Related

RabbitMQ Architecture Gut Check

So I'm thinking of using RabbitMQ to send messages between all the varied apps in our organization. In the attached image is essentially the picture in my mind of how things would work.
So the message goes into the exchange, and splits out into three queues.
Payloads are always JSON text.
The consumers are long-running windows services whose only job is to sit and listen for messages destined for their particular application.When a message comes in, they look at the header to determine how this payload JSON should be interpreted, and which REST endpoint it should be sent to. e.g., "When I see a 'WORK_ORDER_COMPLETE' header I am going to parse this as a WorkOrderCompleteDto and send it as a POST to the CompletedWorkOrder WebAPI method at timelabor-api.mycompany.com. If the API returns other than 200, I reject the message and let rabbit handle it. If I get a 200 back from the API, then I ack the message to rabbit."
Then end applications are simply our internal line-of-business apps that we use for inventory, billing, etc. Those applications are then responsible for performing their respective function (decrementing inventory, creating a billing record, yadda yadda.
Does this in any way make a sensible understanding of a proper way to use Rabbit?
Conceptually, I believe you may be relying on RabbitMQ to do things that your application needs to do.
The assumption of the architecture seems to be that each message is processed by each of your consuming applications totally in a vacuum. What this means is that you don't care that a message processed successfully by Billing_App ultimately failed with Inventory_App. Maybe this is true, but in my experience, it isn't.
If the end goal is to achieve some consistent state in the overall data, you're going to need a some supervisory component orchestrating and monitoring the various operations to ensure that the state is consistent. This means, in effect, that your statement about rejecting a message back to RabbitMQ means you have a bit more thought to put into what happens when something fails.
I would focus on identifying some UML activity diagrams that describe your behavior and how it achieves the end-state, and use that as a guide to determine how the orchestration of your application needs to be designed.

RabbitMQ Message States

I'm working with RabbitMQ and I'd like to have multiple consumers doing different things for the same message, with this message being exactly in one queue.Each consumer would work on his own, and in the moment the consumer ends with his part, it marks the message as having completed phase "x" , when all the phases are completed for one message, then use the method a basicAck() to remove our message from the queue.
I suspect this to be impossible, if so, I would face this in other way. Having multiple queues with the same message ( using an exchange), each queue with a different consumer , which would communicate with with a Server. This server would then work with a database and checking/updating the completed phases. When all the phases are completed, log it in some way.
But this workaround seems exceedingly unefficient, I'd like to skip it if posssible.
Could it be posssible to set "states" or "phases" to a message in rabbitMQ?
So, first of all, in the context you're talking about, a "message" is an order to do some unit of work.
The first part of your question, by referring to "marking the message" treats the message as a stateful object. This is incorrect. Once a message is produced, it is immutable, meaning no changes are permitted to it. If you violate, or attempt to violate this principle, you have made an excursion beyond the realm of sound design.
So, let's reframe. In a properly-archtiected message-oriented system, a message can represent either a command ("do something") or an event ("something happened"). Note that sometimes we can call a reply message (something sent in response to a command) a third category, but it's really a sub-category of event.
Thus, we are led to the possibility of having (a) one message going to one queue, to be picked up by one consumer, or (b) one message going to many queues, to be picked up by many consumers. You take (a) and (b) to compose complex system behaviors that evolve over time with the execution of each of these small behaviors, and suddenly you have a complex system.
Messages do, in fact, have state. Their state is "processed" or "unprocessed", as appropriate. That is the limit to their statefulness.
Bottom Line
Your situation describes a series of activities (what each consumer does) being acted upon some sort of shared state among the activities. The role of messages and the message broker is to assist in the orchestration of these activities, by providing instruction on what to do (via commands) and what took place (via events). Messages themselves cannot be the shared state. So, you still need some sort of a database or other means to persist the state of your system. There is no way to avoid this.

Nservicebus Sequence

We have a requirement for all our messages to be processed in the order of arrival to MSMQ.
We will be exposing a WCF service to the clients, and this WCF service will post the messages using NServiceBus (Sendonly Bus) to MSMQ.
We are going to develop a windows service(MessageHandler), which will use Nservicebus to read the message from MSMQ and save it to the database. Our database will not be available for few hours everyday.
During the db downtime we expect that the process to retry the first message in MSMQ and halt processing other messages until the database is up. Once the database is up we want NServicebus to process in the order the message is sent.
Will setting up MaximumConcurrencyLevel="1" MaximumMessageThroughputPerSecond="1" helps in this scenario?
What is the best way using NServiceBus to handle this scenario?
We have a requirement for all our messages to be processed in the
order of arrival to MSMQ.
See the answer to this question How to handle message order in nservicebus?, and also this post here.
I am in agreement that while in-order delivery is possible, it is much better to design your system such that order does not matter. The linked article outlines the following soltuion:
Add a sequence number to all messages
in the receiver check the sequence number is the last seen number + 1 if not throw an out of sequence exception
Enable second level retries (so if they are out of order they will try again later hopefully after the correct message was received)
However, in the interest of anwering your specific question:
Will setting up MaximumConcurrencyLevel="1"
MaximumMessageThroughputPerSecond="1" helps in this scenario?
Not really.
Whenever you have a requirement for ordered delivery, the fundamental laws of logic dictate that somewhere along your message processing pipeline you must have a single-threaded process in order to guarantee in-order delivery.
Where this happens is up to you (check out the resequencer pattern), but you could certainly throttle the NserviceBus handler to a single thread (I don't think you need to set the MaximumMessageThroughputPerSecond to make it single threaded though).
However, even if you did this, and even if you used transactional queues, you could still not guarantee that each message would be dequeued and processed to the database in order, because if there are any permanent failures on any of the messages they will be removed from the queue and the next message processed.
During the db downtime we expect that the process to retry the first
message in MSMQ and halt processing other messages until the database
is up. Once the database is up we want NServicebus to process in the
order the message is sent.
This is not recommended. The second level retry functionality in NServiceBus is designed to handle unexpected and short-term outages, not planned and long-term outages.
For starters, when your NServiceBus message handler endpoint tries to process a message in it's input queue and finds the database unavailable, it will implement it's 2nd level retry policy, which by default will attempt the dequeue 5 times with increasing infrequency, and then fail permanently, sticking the failed message in it's error queue. It will then move onto the next message in the input queue.
While this doesn't violate your in-order delivery requirement on its own, it will make life very difficult for two reasons:
The permanently failed messages will need to be re-processed with priority once the database becomes available again, and
there will be a ton of unwanted failure logging, which will obfuscate any genuine handling errors.
If you have a regular planned outages which you know about in advance, then the simplest way to deal with them is to implement a service window, which another term for a schedule.
However, Windows services manager does not support the concept of service windows, so you would have to use a scheduled task to stop then start your service, or look at other options such as hangfire, quartz.net or some other cron-type library.
It kinds of depends why you need the messages to arrive in order. If it's like you first receive an Order message and then various OrderLine messages that all belong to a certain order, there are multiple possibilities.
One is to just accept that there can be OrderLine messages without an Order. The Order will come in later anyway. Eventual Consistency.
Another one is to collect messages (and possible state) in an NServiceBus Saga. When normally MessageA needs to arrive first, only to receive MessageB and MessageC later, give all three messages the ability to start the saga. All three messages need to have something that ties them together, like a unique GUID. Then the saga will make sure it collects them properly and when all messages have arrived, perhaps store its final state and mark the saga as completed.
Another option is to just persist all messages directly into the database and have something else figure out what belongs to what. This is a scenario useful for a data warehouse where the data just needs to be collected, no matter what. Some data might not be 100% accurate (or consistent) but that's okay.
Asynchronous messaging makes it hard to process them 100% in order, especially when the client calling the WCF is making mistakes and/or sending them out of order. It wouldn't be the first time I had such a requirement and out-of-order messages.

RabbitMQ: throttling fast producer against large queues with slow consumer

We're currently using RabbitMQ, where a continuously super-fast producer is paired with a consumer limited by a limited resource (e.g. slow-ish MySQL inserts).
We don't like declaring a queue with x-max-length, since all messages will be dropped or dead-lettered once the limit is reached, and we don't want to loose messages.
Adding more consumers is easy, but they'll all be limited by the one shared resource, so that won't work. The problem still remains: How to slow down the producer?
Sure, we could put a flow control flag in Redis, memcached, MySQL or something else that the producer reads as pointed out in an answer to a similar question, or perhaps better, the producer could periodically test for queue length and throttle itself, but these seem like hacks to me.
I'm mostly questioning whether I have a fundamental misunderstanding. I had expected this to be a common scenario, and so I'm wondering:
What is best practice for throttling producers? How is this done with RabbitMQ? Or do you do this in a completely different way?
Background
Assume the producer actually knows how to slow himself down with the right input. E.g. a hardware sensor or hardware random number generator, that can generate as many events as needed.
In our particular real case, we have an API that users can use to add messages. Instead of devouring and discarding messages, we'd like to apply back-pressure by having our API return an error if the queue is "full", so the caller/user knows to back-off, or have the API block until the consumer catches up. We don't control our user, so regardless of how fast the consumer is, I can create a producer that is faster.
I was hoping for something like the API for a TCP socket, where a write() can block and where a select() can be used to determine if a handle is writable. So either having the RabbitMQ API block or have it return an error if the queue is full.
For the x-max-length property, you said you don't want messages to be dropped or dead-lettered. I see there was an update in adding some more capabilities for this. As I see it is specified in the documentation:
"Use the overflow setting to configure queue overflow behaviour. If overflow is set to reject-publish, the most recently published messages will be discarded. In addition, if publisher confirms are enabled, the publisher will be informed of the reject via a basic.nack message"
So as I understand it, you can use queue limit to reject the new messages from publishers thus pushing some backpressure to the upstream.
I don't think that this is in any way rabbitmq specific. Basically you have a scenario, where there are two systems of different processing capabilities, and this mismatch will either pose a risk of overflowing the queue (whatever it would be), or even in case of a constant mismatch between producer and consumer, simply create more and more time-distance between event creation and its handling.
I used to deal with this kind of scenarios, and unfortunately there is no magic bullet. You either have to speed up even handling (better hardware, more suited software?) or throttle the event creation (which has nothing to do with MQ really).
Now, I would ask you what's the goal and how the events are produced. Are the events are produced constantly, with either unlimitted or just very high rate (for example readings from sensors - the more, the better), or are they created in batches/spikes (for example: user requests in specific time periods, batch loads from CRM system). I assume that the goal is to process everything cause you mention you don't want to loose any queued message.
If the output is constant, then some limiter (either internal counter, if the producer is the only producer, or external queue length checks if queue can be filled with some other system) is definitely in place.
IF eventsInTimePeriod/timePeriod > estimatedConsumerBandwidth
THEN LowerRate()
ELSE RiseRate()
In real world scenarios we used to simply limit the output manually to the estimated values and there were some alerts set for queue length, time from queue entry to queue leaving etc. Where such limiters were omitted (by mistake mostly) we used to find later some tasks that were supposed to be handled in few hours, that were waiting for three months for their turn.
I'm afraid it's hard to answer to "How to slow down the producer?" if we know nothing about it, but some ideas are: aforementioned rate check or maybe a blocking AddMessage method:
AddMessage(message)
WHILE(getQueueLength() > maxAllowedQueueLength)
spin(1000); // or sleep or whatever
mqAdapter.AddMessage(message)
I'd say it all depends on specific of the producer application and in general your architecture.

Why is NServiceBus Bus.Publish() not transactional?

Setup:
I have a couple of subscribers subscribing to an event produced by a publisher on the same machine via MSMQ. The subscribers use two different endpoint names, and are run in its respective process. (This is NSB 4.6.3)
Scenario:
Now, if I do something "bad" to one of the subscribers (say remove proper permission in MSMQ to receive messages, or delete the queue in MSMQ outright...), and call Bus.Publish(), I will still have one event successfully published to the "good" subscriber (if the good one precedes the bad one on the subscriber list in subscription storage), or none successful (if the bad one precedes the good one).
Conclusion:
The upshot here is that Bus.Publish() does not seem to be transactional, as to making publishing to subscribers all succeed or all fail. Depending on the order of the subscribers on the list, the end result might be different.
Questions:
Is this behavior by design?
What is the thought behind this?
If I want to make this call transactional, what is the recommended way? (One option seems to enclose Bus.Publish() in a TransactionScope in my code...)
Publish is transactional, or at least, it is if there is an ambient transaction. Assuming you have not taken steps to disable transactions, all message handlers have an ambient transaction running when you enter the Handle method. (Inspect Transaction.Current.TransactionInformation to see first-hand.) If you are operating out of an IWantToRunWhenBusStartsAndStops, however, there will be no ambient transaction, so then yes you would need to wrap with your own TransactionScope.
How delivery is handled (specific for the MSMQ transport) is different depending upon whether the destination is a local or remote queue.
Remote Queues
For a remote queue, delivery is not directly handled by the publisher at all. It simply drops the two messages in the "Outbox", so to speak. MSMQ uses store-and-forward to ensure that these messages are eventually delivered to their intended destinations, whether that be on the same machine or a remote machine. In these cases, you may look at your outgoing queues and see that there are messages stuck there that are unable to be delivered because of whatever you have done to their destinations.
The safety afforded by store-and-forward mean that one errant subscriber cannot take down a publisher, and so overall coupling is reduced. This is a good thing! But it also means that monitoring outgoing queues is a very important part of your DevOps story when deploying an NServiceBus system.
Local Queues
For local queues, MSMQ may still technically use a concept of an outoging queue in its own plumbing - I'm not sure and it doesn't really matter. But an additional step that MSMQ is capable of doing (and does) is to check the existence of a local queue before you try to send to it, and will throw an exception if it doesn't exist or something is wrong with it. This would indeed affect the publisher.
So yes, if you publish a message from a non-transactional state like the inside of an IWantToRunWhenBusStartsAndStops, and the downed queue happens to be #2 on the list in subscription storage, you could observe a message arriving at SubscriberA but not at Subscriber B. If it were within a message handler with transactions disabled, you could see the multiple copies arriving at SubscriberA because of the message retry logic!
Upshot
IWantToRunWhenBusStartsAndStops is great for quick demos and proving things out, but try to put as little real logic in them as possible, opting instead for the safety of message handlers where the ambient transaction applies. Also remember than an exception inside there could potentially take down your host process. Certainly don't publish inside of one without wrapping it with your own transaction.