Error codes and messages best practice - error-handling

I am planning out an EDI system that sends, amongst other things, an XML acknowledgement message containing several elements, but specifically these three; ErrorCode, ErrorSeverity and ErrorDescription.
I will basically be parsing an inbound XML message and depending upon success or failure of parsing to include message formatting, syntax, structure, validity and some business rules I will return either a success or failure acknowledgement.
I have free reign to pick ErrorCodes, ErrorSeverity and ErrorDescription but instead of naively starting at ErrorCode [1], ErrorSeverity [Error], ErrorDescription [Cannot Find Inbound XML File] and adding errors as I think of them during the coding of the inbound message parser I was wondering if there's a best practise for picking error codes and severities?
I know HTTP error codes are like 2xx for OK messages, 4xx for certain errors, 5xx for server errors, etc and wondered if anyone has any good suggestions that might help me down the road before I code myself into a corner and say "if only all my "warning" errors had started with a 3 or something similar!
I think ErrorSeverity isn't going to be much more than [Error], [Warning], [Info] and [OK] maybe?
Thanks.

You can find existing error-codes for EDI here:
http://msdn.microsoft.com/en-us/library/bb245948.aspx
The systems/developers you will be communicating with will probably be happy if you use one of the standard described in one of those documents.

I like to distinguish "Error" (the input data were faulty) from "Fatal" (the system is broken). The first calls for fixing the data and retrying; the other does not.
It should go without saying that any "error" messages should be actionable; they should make it clear to the recipient exactly what is wrong and what action needs to be taken to correct the fault in the data.
If you are separately communicating severity, then not only do I not see any need to stick to pre-defined numeric ranges, I suggest that such a move is a straight-jacket. If you do decide there's mnemonic value in using ranges, make the ranges at least ten times larger than you think you need now (digits are cheap, why not use five? ;-)
You might also consider parameterizing your messages; e.g. explicit fields to indicate position in the text. That makes it easier for code to receive the message and do something useful with it (without having to parse the human-readable text looking for clues).

Related

Where does the business data go in a RabbitMQ message

I've been tasked with rearchitecting/repairing a flawed RabbitMQ/Elixir application. The guy who originally wrote it is unavailable.
One thing he did was to put the data of the message in the header property rather than in the payload. I don't know which is the appropriate way. I tend to lean toward payload because the data isn't a header in my mind. I've been looking for example RabbitMQ messages and while I found hundreds or sites addressing details about the message data/content/format, I can't find any tangible examples.
So my question is two-fold: does anyone know where I can find examples of the shape/format of the message, and which is the right place to put the data we are trying to send?
thanks!
Consider posting a letter, think of the headers as what you put on the envelope and the body as what you put in the envelope. RabbitMQ is the postal system.

Dealing with dead letters in RabbitMQ

TL;DR: I need to "replay" dead letter messages back into their original queues once I've fixed the consumer code that was originally causing the messages to be rejected.
I have configured the Dead Letter Exchange (DLX) for RabbitMQ and am successfully routing rejected messages to a dead letter queue. But now I want to look at the messages in the dead letter queue and try to decide what to do with each of them. Some (many?) of these messages should be replayed (requeued) to their original queues (available in the "x-death" headers) once the offending consumer code has been fixed. But how do I actually go about doing this? Should I write a one-off program that reads messages from the dead letter queue and allows me to specify a target queue to send them to? And what about searching the dead letter queue? What if I know that a message (let's say which is encoded in JSON) has a certain attribute that I want to search for and replay? For example, I fix a defect which I know will allow message with PacketId: 1234 to successfully process now. I could also write a one-off program for this I suppose.
I certainly can't be the first one to encounter these problems and I'm wondering if anyone else has already solved them. It seems like there should be some sort of Swiss Army Knife for this sort of thing. I did a pretty extensive search on Google and Stack Overflow but didn't really come up with much. The closest thing I could find were shovels but that doesn't really seem like the right tool for the job.
Should I write a one-off program that reads messages from the dead letter queue and allows me to specify a target queue to send them to?
generally speaking, yes.
you could set up a delayed re-try to resend the message back to the original queue, using a combination of the delay message exchange plugin.
but this would only automate the retries on an interval, and you may not have fixed the problem before the retries happen.
in some circumstances this is ok - like when the error is caused by an external resource being temporarily unavailable.
in your case, though, i believe your thoughts on creating an app to handle the dead letters is the best way to go, for several reasons:
you need to search through the messages, which isn't possible RMQ
this means you'll need a database to store the messages from the DLX/queue
because you're pulling the messages out of the DLX/queue, you'll need to ensure you get all the header info from the message so that you can re-publish to the correct queue when the time comes.
I certainly can't be the first one to encounter these problems and I'm wondering if anyone else has already solved them.
and you're not!
there are many solutions to this problem that all come down to the solution you've suggested.
some larger "service bus" implementations have this type of feature built in to them. i believe NServiceBus (or the SaaS version of it) has this built in, for example - though I'm not 100% sure of it.
if you want to look into this further, do some search for the term "poison message" - this is generally the term used for this situation. I've found a few things on google with a quick search, that may help you down the path:
http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/2013-January/025019.html
https://web.archive.org/web/20170809194056/http://tafakari.co.ke/2014/07/rabbitmq-poison-messages/
https://web.archive.org/web/20170809170555/http://kjnilsson.github.io/blog/2014/01/30/spread-the-poison/
hope that helps!

How to answer an invalid ISO8583 message

Our system receives ISO8583 messages which we decode and handle appropriately. Now we are getting invalid ISO messages in between which our system can't handle. In fact, it sends nothing in return. This causes a timeout on the other side. As a consequence, the (invalid) transaction is reverted which then causes quite a messup as there is nothing to be reverted.
Can anyone give me a clue on how to deal with/answer an invalid/undecodable ISO8583 message? Is there a standard answer (e.g. 'NAK' like)?
According to the ISO-8583 spec, 6XX (or 16XX, if you're using the '93 version)-class messages are appropriate for administrative notifications. Generally, a 644 or 1644 MTI is prescribed for notifying the sender of a problem processing a message, where
X6XX - Indicates an administrative message, often containing the details of a failure
XX4X - Indicates that the message is a notification; the sender is not to repeat the message
XXX4 - Indicates the source of the message (acquirer/issuer/other); here, it's Other
Putting it all together, your message should have at least the following fields
MTI: 1644
DE-24 (Function code): 650 (Unable to parse message)
Of course, you're to include the standard message identification fields: DE-7,11,12,39. These fields will be necessary for the message sender to match your response with the request.
I don't think there is a standard way of handling invalid ISO 8583 request messages. You didn't say why you are receiving invalid request messages, and without knowing that it is difficult to suggest how you should handle them.
Depending on the situation it may be best not to answer invalid ISO 8583 requests. In fact I know of systems that not only don't answer invalid request messages but will also blacklist the device that sent the invalid message and refuse to answer all other messages from it.
If you do decide not to respond to invalid request messages then as you have found out the client is likely to time out and then attempt to reverse the transaction. This is not usually a problem because servers will usually respond with an approval message to reversal request for transactions that don't exist. Remember that when a client times out after sending a request, it doesn't know if the request was processed or even if the request was received. So a server has to be prepared to handle both 1. a request that was received and processed (by undoing the transaction and then responding with an approval), and 2. a request that was never received/processed (by sending an approval). NOTE: In case 2 there is no need to undo the transaction because the transaction never took place.
From my experience with integrating ISO links, invalid ISO messages are usually, by industry standard, handled by a dropping down of the acquring host's connection followed by an angry mail from the acquirer's service provider accusing you of segfaulting down their mainframe.
Other than that different implementations, when implemented well, will handle invalid messages differently, from what #kolossus said in case the parser fails completely, to a normal **10 response with a specific response code such as RC 12 "Invalid transaction" when just some subfields don't make sense (such as problems with packaging of the complex subfields with tokens, track2 parsing etc)
The practical reason why #kolossus solution doesn't really make sense and why Stuard has a point is, if the client has issues of forming the ISO messages, then it almost certainly has a problem with parsing them too, so another ISO message doesn't really tell the client anything except provoking a parser exception on his side too.
End result will be the same - a technical reversal by the client, just not after a timeout. Basically, with iso8583, the best way to handle invalid messages is to not have them, there's no clean way.

Preserving delivery order

I am considering using AMQP for an application where delivery order is paramount.
I cannot therefore use the normal re-delivery features, as undelivered messages are re-queued out of order.
It looks like what I must do is to leave the message on the queue until it has been processed, and then specifically delete it. It is then possible that the same message is processed twice in order, but that is easy to trap and deal with.
However, I don't see how to do this. What I am looking for is some sort of peek and delete message methods, giving me direct control, but they don't seem to exist.
Am I missing something, or trying to solve the problem in the wrong way?
You cannot have peek-and-delete in AMQP. Actually, you cannot browse the messages on a queue without consuming them and Rabbit does not provide any extension to enable this.
The general response to your problem is "Think very carefully if you actually need that in-order constraint", because, for instance, with that constraint in place, you cannot have multiple consumers on a queue.
I have been solving the same problem. In my solutions I have been wrapping the messages into one single message where the outer message has been processed first and then I have processed the remaining inner messages in the wrapped order. This has some disadvantages, for example big messages (once your wrapping hierarchy contains many messages), more difficult serialization, ..., but for me the solution was suitable enough.

How to rollback an NHibernate Transaction within NServiceBus

It's my understanding we have essentially 2 kinds of exceptions when using NServiceBus.
Environmental : Meaning any required component is not currently available. Usually resulting in a full rollback of the transaction. This is the description I see behind the rollback within NServiceBus Documentation (Including putting the message back on the bus - which sounds fantastic). How do I do this?
Validation : A message is being processed that cannot succeed because of business logic, rules, etc. Where in I want to rollback all database interaction but there's no value in keeping the command in the queue. In which case I just want to roll back the NHibernate section of the transaction - not the MSMQ portion. How do I do this? Typically I would perform validation before any single message is processed but when you have multiple messages bound together into a single transaction and you want to roll them all back this isn't possible via pre-validation.
My assumption is either the answer is insanely obvious and I've overlooked it or what I'm trying to do isn't possible (in regards to the Validation exception).
NSB takes care of getting the message out of the way by moving it to an error queue(v2.5). In v3 this functionality is enhanced and will give you more options to handle faults(DB, custom, etc.). The error queue is configured in your app.config.
In my experience, it's easiest (and probably also more appropriate) to ensure that messages have a very high probability that they can succeed when they participate in a distributed transaction.
Therefore, most validation logic should already have been carried out when you dispatch the command message, and rollback is reserved for the truly exceptional case.
If your client cannot perform the validation, maybe you should insert a validation service in front of your current service. This validation service could route invalid command messages somewhere else before they reach the real service.
Thank you for your answers. I believe the answer lies somewhere between the two.
We are unfortunately unable to implement a validation service but we've simply added better upfront validation to the message processing logic.
Unfortunately until we get to v3 we are currently unable to use the Error Queue as we are utilizing the message response functionality to alert integrators of issues with their messages. And throwing an unhandled error prevents any responses from being generated.