There is a little exclamation mark next to the processing time in ServiceInsight. Anyone has any idea what it means? I can't find any information on the internet.
That is an indication of a clock drift. Essentially it means the message was processed before it was sent, which happens if the sender and receiver endpoints are different machines and their clocks are not in sync. It is also documented here.
Related
In some exceptional situations I need somehow to tell consumer on receiving point that some messages shouldn’t be processed. Otherwise two systems will become out-of-sync (we deal with some outdates external systems, and if, for example, connection is dropped we have to discard all queued operations in scope of that connection).
Take a risk and resolve problem messages manually? Compensation actions (that could be tough to support in my case)? Anything else?
There are a few ways:
You can set a time-to-live when sending a message: await endpoint.Send(myMessage, c => c.TimeToLive = TimeSpan.FromHours(1));, but this will apply to all messages that are sent (or published) like this. I would consider this, after looking at your requirements. This is technical, but it is a proper messaging pattern.
Make TTL and generation timestamp properties of your message itself and let the consumer decide if the message is still worth processing. This is more business and, probably, the most correct way.
Combine tech and business - keep the timestamp and TTL in message headers so they don't pollute your message contracts, and filter them out using a custom middleware. In this case, you need to be careful to log such drops so you won't be left wonder why messages disappear now and then.
Almost any unreliable integration can be monitored using sagas, with timeouts. For example, we use a saga to integrate with Twilio. Since we have no ability to open a webhook for them, we poll after some interval to check the message status. You can start a saga when you get a message and schedule a message to check if the processing is still waiting. As discussed in comments, you can either use the "human intervention required" way to fix the issue or let the saga decide to drop the message.
A similar way could be to use a lookup table, where you put the list of messages that aren't relevant for processing. Such a table would be similar to the list of sagas. It seems that this way would also require scheduling. Both here, and for the saga, I'd recommend using a separate receive endpoint (a queue) for the DropIt message, with only one consumer. It would prevent DropIt messages from getting stuck behind the integration messages that are waiting to be processed (and some should be already dropped)
Use RMQ management API to remove messages from the queue. This is the worst method, I won't recommend it.
From what I understand, you're building a system that sends messages to 3rd party systems. In other words, systems you don't control. It has an API but compensating actions aren't always possible, because the API doesn't provide it or because actions are performed inside the 3rd party system that can't be compensated or rolled back?
If possible try to solve this via sagas. Make sure the saga executes the different steps (the sending of messages) in the right order. So that messages that cannot be compensated are sent last. This way message that can be compensated if they fail, will be compensated by the saga. The ones that cannot be compensated should be sent last, when you're as sure as possible that they don't have to be compensated. Because that last message is the last step in synchronizing all systems.
All in all this is one of the problems with distributed systems, keeping everything in sync. Compensating actions is the way to deal with this. If compensating actions aren't possible, you're in a very difficult situation. Try to see if the business can help by becoming more flexible and accepting that you need to compensate things, where they'll tell you it's not possible.
In some exceptional situations I need somehow to tell consumer on receiving point that some messages shouldn’t be processed.
Can't you revert this into:
Tell the consumer that an earlier message can be processed.
This way you can easily turn this in a state machine (like a saga) that acts on two messages. If the 2nd message never arrives then you can discard the 1st after a while or do something else.
The strategy here is to halt/wait until certain that no actions need to be reverted.
TL;DR: I need to "replay" dead letter messages back into their original queues once I've fixed the consumer code that was originally causing the messages to be rejected.
I have configured the Dead Letter Exchange (DLX) for RabbitMQ and am successfully routing rejected messages to a dead letter queue. But now I want to look at the messages in the dead letter queue and try to decide what to do with each of them. Some (many?) of these messages should be replayed (requeued) to their original queues (available in the "x-death" headers) once the offending consumer code has been fixed. But how do I actually go about doing this? Should I write a one-off program that reads messages from the dead letter queue and allows me to specify a target queue to send them to? And what about searching the dead letter queue? What if I know that a message (let's say which is encoded in JSON) has a certain attribute that I want to search for and replay? For example, I fix a defect which I know will allow message with PacketId: 1234 to successfully process now. I could also write a one-off program for this I suppose.
I certainly can't be the first one to encounter these problems and I'm wondering if anyone else has already solved them. It seems like there should be some sort of Swiss Army Knife for this sort of thing. I did a pretty extensive search on Google and Stack Overflow but didn't really come up with much. The closest thing I could find were shovels but that doesn't really seem like the right tool for the job.
Should I write a one-off program that reads messages from the dead letter queue and allows me to specify a target queue to send them to?
generally speaking, yes.
you could set up a delayed re-try to resend the message back to the original queue, using a combination of the delay message exchange plugin.
but this would only automate the retries on an interval, and you may not have fixed the problem before the retries happen.
in some circumstances this is ok - like when the error is caused by an external resource being temporarily unavailable.
in your case, though, i believe your thoughts on creating an app to handle the dead letters is the best way to go, for several reasons:
you need to search through the messages, which isn't possible RMQ
this means you'll need a database to store the messages from the DLX/queue
because you're pulling the messages out of the DLX/queue, you'll need to ensure you get all the header info from the message so that you can re-publish to the correct queue when the time comes.
I certainly can't be the first one to encounter these problems and I'm wondering if anyone else has already solved them.
and you're not!
there are many solutions to this problem that all come down to the solution you've suggested.
some larger "service bus" implementations have this type of feature built in to them. i believe NServiceBus (or the SaaS version of it) has this built in, for example - though I'm not 100% sure of it.
if you want to look into this further, do some search for the term "poison message" - this is generally the term used for this situation. I've found a few things on google with a quick search, that may help you down the path:
http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/2013-January/025019.html
https://web.archive.org/web/20170809194056/http://tafakari.co.ke/2014/07/rabbitmq-poison-messages/
https://web.archive.org/web/20170809170555/http://kjnilsson.github.io/blog/2014/01/30/spread-the-poison/
hope that helps!
Our system receives ISO8583 messages which we decode and handle appropriately. Now we are getting invalid ISO messages in between which our system can't handle. In fact, it sends nothing in return. This causes a timeout on the other side. As a consequence, the (invalid) transaction is reverted which then causes quite a messup as there is nothing to be reverted.
Can anyone give me a clue on how to deal with/answer an invalid/undecodable ISO8583 message? Is there a standard answer (e.g. 'NAK' like)?
According to the ISO-8583 spec, 6XX (or 16XX, if you're using the '93 version)-class messages are appropriate for administrative notifications. Generally, a 644 or 1644 MTI is prescribed for notifying the sender of a problem processing a message, where
X6XX - Indicates an administrative message, often containing the details of a failure
XX4X - Indicates that the message is a notification; the sender is not to repeat the message
XXX4 - Indicates the source of the message (acquirer/issuer/other); here, it's Other
Putting it all together, your message should have at least the following fields
MTI: 1644
DE-24 (Function code): 650 (Unable to parse message)
Of course, you're to include the standard message identification fields: DE-7,11,12,39. These fields will be necessary for the message sender to match your response with the request.
I don't think there is a standard way of handling invalid ISO 8583 request messages. You didn't say why you are receiving invalid request messages, and without knowing that it is difficult to suggest how you should handle them.
Depending on the situation it may be best not to answer invalid ISO 8583 requests. In fact I know of systems that not only don't answer invalid request messages but will also blacklist the device that sent the invalid message and refuse to answer all other messages from it.
If you do decide not to respond to invalid request messages then as you have found out the client is likely to time out and then attempt to reverse the transaction. This is not usually a problem because servers will usually respond with an approval message to reversal request for transactions that don't exist. Remember that when a client times out after sending a request, it doesn't know if the request was processed or even if the request was received. So a server has to be prepared to handle both 1. a request that was received and processed (by undoing the transaction and then responding with an approval), and 2. a request that was never received/processed (by sending an approval). NOTE: In case 2 there is no need to undo the transaction because the transaction never took place.
From my experience with integrating ISO links, invalid ISO messages are usually, by industry standard, handled by a dropping down of the acquring host's connection followed by an angry mail from the acquirer's service provider accusing you of segfaulting down their mainframe.
Other than that different implementations, when implemented well, will handle invalid messages differently, from what #kolossus said in case the parser fails completely, to a normal **10 response with a specific response code such as RC 12 "Invalid transaction" when just some subfields don't make sense (such as problems with packaging of the complex subfields with tokens, track2 parsing etc)
The practical reason why #kolossus solution doesn't really make sense and why Stuard has a point is, if the client has issues of forming the ISO messages, then it almost certainly has a problem with parsing them too, so another ISO message doesn't really tell the client anything except provoking a parser exception on his side too.
End result will be the same - a technical reversal by the client, just not after a timeout. Basically, with iso8583, the best way to handle invalid messages is to not have them, there's no clean way.
In my situation multiple modules report their state over a CAN bus to a central processor, which replies and drives them. There's also a supervising processor, which listens in on the CAN bus and analyzes incoming messages from the modules for critically dangerous situations (two different modules reporting activating outputs which are absolutely forbidden from being activated simultaneously).
This all works okay as long as the CAN bus is noise-free.
CAN bus guarantees the recipient to receive a message; the message will be resent if no recipient confirms receiving it. The problem begins if there's more than one recipient and all of them absolutely must receive the message.
If the line is clean, both receive it, confirm it, and everything is okay.
If the message is badly damaged, neither will receive it, and it will be resent. That's okay.
But if the noise on the line is "just on the brink", one of them will receive it, and confirm, and the other will fail to receive it (noise on its end of the bus just minimally worse), and since the sender got the confirmation, the message won't be resent.
Is there a reliable way to assure two different recipients of a message both receive it? ...other than sending two messages with two addresses, specifically? (it's essential that the supervising CPU hears the same messages as the main CPU, not just similar)
There is no way at the CAN layer to detect receipt by more than one module. You would need to add messages to your communication protocol to confirm receipt if this is absolutely critical. As mentioned, you could have each module receive the same message and send a unique reply.
Some general thoughts:
1) Are the important messages broadcast periodically? If so, the recipient could test that the periodicity of the message is correct and fail safely if the period is violated.
2) CAN is a very robust network. In my many years, I have not seen noise affecting a single node like you described other than when the node was at the end of a exceedingly (and irrationally) long wire. You are correct to worry about this scenario and design your message format and system to be robust to all CAN failures. Generally, when safety or reliability was paramount, we would have more than one CAN bus communicating the information along with a number of crosscheck messages to verify that not only the path was intact but the device on the other end was operating intelligently. Our general assumption was that if crosscheck messages were making the trip, then our operational messages were making the trip successfully as well.
Obviously not.
It fails even in the simple case, that one receiver is shutdown.
There is no possibility for the master to detect this (for this single packet).
You need an advanced CAN, with more acknowledge slots, for each recipients one slot.
But you could request that each reciepient has to confirm the message with a unique response message.
So your master can detect by a timeout that not all reciepent received the message.
Getting this WCF error, and no idea how to fix it:
System.ServiceModel.CommunicationException: The sequence has been terminated by the remote endpoint. The user specified maximum retry count for a particular message has been exceeded. Because of this the reliable session cannot continue. The reliable session was faulted.
Any ideas welcome :(
From the error message, it would appear that you're using reliable messaging. One of its features is that if a message transfer fails, it will be retried - up to a maximum number of attempts.
Obviously, in your setup, this max number has been maxed out. This might indicate a problem with the network, or your service code, or both. Really hard to tell from here without knowing what you're doing and what your setup is......
I guess the main question would be: do you really need the reliable messaging feature? What are you trying to achieve with this? If you could turn it off, you wouldn't be seeing those errors... can you switch to some other mechanism, maybe message queueing (MSMQ)? Or can you rearchitect your app so you can live with the odd chance that one message might get delivered "out of band" ?