Why is error handling important? - error-handling

I was given a task of write the coding guidelines for my team, and it was going great until my manager asked me to write an explanation of Why Error Handling is Important.
I know it instinctively, but how do I express this in words?
I tried to google it first but came up empty, so I now ask my fellow coding wizards.

IMHO ... most programs are very large, very complex and written by multiple people. This combination of factors almost always leads to some kind of software bug. It's not that programmers are malicious, stupid or lazy ... it's just that in the rush to meet a deadline we often don't forsee every possible thing that a user can do to our programs and something is bound to happen.
In this respect error handling serves two purposes.
First, it lets the user know, in a relatively friendly manner, that something has gone wrong and that they should contact the technical support department or that someone from tech support has been notified. As we all know there's a HUGE difference between receiving a rather nasty, tech riddled notice that says something like "Object not set to reference of an object" etc. ... and receiving a nice popup type window that says "There has been an issue. Please contact the helpdesk".
Second it allows the programmer to put in some niceties to aid in the debugging of issues. For instance ... in my code, I typically write a custom error handler that takes in a number of parameters and spits back a nice, formatted message that can either be emailed to the helpdesk, stashed in an event log, written to a log file etc.. The error message will contain as much info as I can cram in there to help me figure out what happened, stack traces, function parameters, database calls ... you name it. I like verbose error messages to help me figure out what actually happened. The user never has to see any of it, they get the nice, friendly message above, letting them know that someone can figure out what's going on.

Error handling is important because it makes it easier for the end users of your code to use it correctly. Another important issue is that it makes your code easier to maintain. Error handling makes it easier to embed input specifications into the code, so you don't have to look up the design when you write and later maintain the code.

Why Error Handling is Important.
Because of what can happen if you don't.
If you're capable of writing coding guidelines, you should be able to handle this, surely?

Its quite simple to explain to a layman manager:
If your handle your errors, your program will likely continue to function after an error, your customer can likely continue working, and you can provide a report of exactly how the bug occurred so you can fix it.
If you don't handle your errors, your program may crash, lose all of your customers work and you likely won't know where the bug occurred (provided you don't handle your fatal exception with a stack trace).

Another huge reason that error handling is so important is security! Certain types of errors, if not handled properly can leave a program and the underlying operating system in a vulnerable state. Handling errors must be a deliberate and well thought out process because even when handled gracefully, errors can write to log files or splash error messages to the screen that supply potential attackers with very valuable information that they can use later to take advantage of specific vulnerabilities.

First I would ask is it important?
I've seen (ugly) code where some errors were ignored (eg null reference)
So what type of errors are important to handle?
There is a big difference between System.IO.FileNotFoundException, System.Data.SqlClient.SqlException and System.ApplicationException

Related

Why is it wrong to think that `an exception is related to how frequently something happens`?

I met the following excerpt in the CLR via C# book:
Important Many developers incorrectly believe that an exception is related to how frequently
something happens. For example, a developer designing a file Read method is likely to say the
following: “When reading from a file, you will eventually reach the end of its data. Since reaching the
end will always happen, I’ll design my Read method so that it reports the end by returning a special
value; I won’t have it throw an exception.” The problem with this statement is that it is being made by
the developer designing the Read method, not by the developer calling the Read method.
When designing the Read method, it is impossible for the developer to know all of the possible
situations in which the method gets called. Therefore, the developer can’t possibly know how often the
caller of the Read method will attempt to read past the end of the file. In fact, since most files contain
structured data, attempting to read past the end of a file is something that rarely happens.
I can not understand two things which the excerpt (from my pov) was intended to explain. What does it mean that an exception is related to how frequently something happens? How is it possible to prove that it is not a correct way of thinking (I believe that a counterexample does the job of proving this, but still I do not understand the counterexample presented in the above excerpt)?
I do not understand the counterexample. Ok, let someone call a method which reads from a file many times after the end of the file was reached. Ok, let the method to report the end of the file all these times. I see no reasons for this to be worse than throwing an exception.
The author is saying a developer should not attempt to guess how often a branch of code will be executed by users, and should not decide whether to throw an exception from that branch based on their guess. In other words, it is incorrect to define an exception as, "something that doesn't happen very often."
The obvious reason for not making guesses is they may be wrong. A more fundamental reason is that exceptions are not necessarily infrequent, depending on the business domain. Consider an e-commerce site where users enter credit card numbers. Users will frequently enter their card numbers incorrectly. If we related exceptions to how frequently something happens, we might determine an incorrect CC number is not an exception, because it happens quite often.
Developers may be reluctant to throw exceptions. This often results in applications that "fail slow" because error conditions propagate beyond the point where they occur. Exceptions encourage an application to fail fast.
Related: Avoid in-band error indicators.

Assertions in ABAP

Over the years I've written code in a variety of languages and environments, but one constant seemed to be the consensus on the use of assertions. As I understand it, they are there for the development process when you want to identify "impossible" errors and other situations to which your first reaction would be "that can't be right" and which cannot be handled gracefully, leaving the system in a state where it has no choice but to terminate. Assertions are easy to understand and quick to code but due to their fail-fast nature are unsuitable for development code. Ideally, assertions are used to discover all development bugs and then removed or turned off when shipping the code. Input or program states that are wrong, but possible (and expected to occur) should instead be handled gracefully via exceptions or other error handling techniques.
However, none of this seems to hold true for writing ABAP code for SAP. I've just spent the better part of an hour trying to track down the precise location where an assert was giving me an unintelligible error. This turned out to be five levels down in standard SAP code, which is apparently riddled with ASSERT statements. I now know that a certain variable identifying a table IS NOT INITIAL while its accompanying variable identifying a field is.
This tells me nothing. The Web Dynpro component running this code actually "catches" this assert, showing me a generic error message, which only serves to prevent the debugger from launching when the assert is tripped.
My question therefore is what the guidelines or best practices are for the use of assertions in ABAP. Is this SAP writing bad code? Is it an accepted practice to fill your custom code with asserts and leave them in when shipping the code? If so, how would we go about handling these asserts in runtime so that the application doesn't crash and burn while still being able to identify the cause of the error?
The guidelines and best practices are virtually the same in ABAP development as in any other language. Assertion should be used as internal guidance checks only, exceptions for regular input validation errors and other stuff. It might be sensible to leave the assertions in the code - after all, you'd probably rather want your program to crash in a controlled fashion than continue in an unforeseen way and probably damage some critical data in the process without anyone noticing. Take a look at checkpoint groups if you don't want your program to abort in a production environment - but in my opinion: What's the use of a sanity check (as a last line of defense) if it's disabled in the environment where it matters most?
Of course I'm assuming that the input is validated properly (so that crashes are prevented) and that all APIs are used according to the intended use and documentation. Unfortunately - as with every other programming language - it's up to the developer to live up to these standards.

Pythonic error handling of complex functions

I'd like to know if there is a Pythonic way for handling errors in long-running functions that can have errors in part that do not affect the ability of the function to continue.
As an example, consider a function that given a list of URLs, it recursively retrieves the resource and all linked resources under the path of the top level URLs. It stores the retrieved resources in a local filesystem with a directory structure mirroring the URL structure. Essentially this is a basic recursive wget for a list of pages.
There are quite a number of points where this function could fail:
A URL may be invalid, or unresolvable
The host may not be reachable (perhaps temporarily)
Saving locally may have disk errors
anything else you can think of.
A failure on retrieving or saving any one resource only affects the function's ability to continue to process that resource and any child resources that may be linked from it, but it is possible to continue to retrieve other resources.
A simple model of error handling is that on the first error, an appropriate exception is raised for the caller to handle. The problem with this is that it terminates the function and does not allow it to continue. The error could possibly be fixed and the function restarted from the beginning but this would cause work to be redone, and any permanent errors may mean we never complete.
A couple of alternatives I have in mind are:
Record errors in a list as they occur and abort processing that resource any any child resources, but continue on to the next resource. A threshold could be used to abort the entire function if too many errors occur, or perhaps just try everything. The caller can interrogate this list at the completion of the function to see if there were any problems.
The caller could provide a callable object that is called with each error. This moves responsibility for recording errors back to the caller. You could even specify that if the callable returns False that processing should stop. This would move the threshold management to the caller.
Implement the former with the latter, providing an error handling object than encodes the former's behavior.
In Python discussions, I've often noted certain approaches described as Pythonic or non-Pythonic. I'd like to know if there are any particularly Pythonic approaches to handling the type of scenario described above.
Does Python have any batteries included that model more sophisticated error handling than the terminate model of exception handling, or do the more complex batteries included use a model of error handling that I should copy to stay Pythonic?
Note: Please do not focus on the example. I'm not looking to solve problems in that particular space, but it seemed like a good example that most people here would have an understanding of.
I don't think there's a particularly clear "Pythonic/non-Pythonic" distinction at the level you're talking about here.
One of the big reasons there's no "one-size-fits-all" solution in this domain, is that the exact semantics you want are going to be problem specific.
For one situation, abort-on-first-failure may be adequate.
For another, you may want abort-and-rollback if any of the operations fails.
For a third, you may want to complete as many as possible and simply log-and-ignore failures
For a fourth alternative, you may want to complete as many as possible, but raise an exception at the end to report any that failed.
Even supporting an error handler doesn't necessarily cover all of those desired behaviours - a simple per-failure error handler can't easily provide abort-and-rollback semantics, or generate a single exception at the end. (It's not impossible - you just have to mess around with tricks like passing bound methods or closures as your error handlers)
So the best you can do is take an educated guess at typical usage scenarios and desirable behaviours in the face of errors, and design your API accordingly.
A fully general solution would accept an on-error handler that is given each failure as it happens, and a final "errors occurred" handler that gives the caller a chance to decide how multiple errors are handled (with some protocol to allow data to be passed from the individual error handlers to the final batch error handler).
However, providing such a general solution is likely to be an API design failure. The designer of the API shouldn't be afraid to have an opinion on how their API should be used, and how errors should be handled. The main thing to keep in mind is to not overengineer your solution:
if the naive approach is adequate, don't mess with it
if collecting failures in a list and reporting a single error is good enough, do that
if you need to rollback everything if one part fails, then just implement it that way
if there's a genuine use case for custom error handling, then accept an error handler as a part of the API. But have a specific use case in mind when you do this, don't just do it for the sake of it. And when you do, have a sensible default handler that is used if the user doesn't specify one (this may just be the naive "raise immediately" approach)
If you do offer selectable error handlers, consider offering some standard error handlers that can be passed in either as callables or as named strings (i.e. along the lines of the error handler selection for text codecs)
Perhaps the best you're going to get as a general principle is that "Pythonic" error handling will be as simple as possible, but no simpler. But at that point, the word is just being used as a synonym for "good code", which isn't really its intent.
On the other hand, it is slightly easier to talk about what actual forms non-Pythonic error handling might take:
def myFunction(an_arg, error_handler)
# Do stuff
if err_occurred:
if isinstance(err, RuntimeError):
error_handler.handleRuntimeError()
elif isinstance(err, IOError):
error_handler.handleIOError()
The Pythonic idiom is that error handlers, if supported at all, are just simple callables. Give them the information they need to decide how to handle the situation, rather than try to decide too much on their behalf. If you want to make it easier to implement common aspects of the error handling, then provide a separate helper class with a __call__ method that does the dispatch, so people can decide whether or not they want to use it (or how much they want to override when they do use it). This isn't completely Python-specific, but it is something that folks coming from languages that make it annoyingly difficult to pass arbitrary callables around (such as Java, C, C++) may get wrong. So complex error handling protocols would definitely be a way to head into "non-Pythonic error handling" territory.
The other problem in the above non-Pythonic code is that there's no default handler provided. Forcing every API user to make a decision they may not yet be equipped to make is just poor API design. But now we're back in general "good code"/"bad code" territory, so Pythonic/non-Pythonic really shouldn't be used to describe the difference.
Error handling should rely on exceptions and logging, so for each error raise an exception and log an error message.
Then at any caller function level catch the exception, log any other additional error if needed and handle the issue.
If the issue is not fully handled, then re-raise the exception again so that upper levels can catch the same exception and perform different actions.
In any of this stages you can keep a counter of some types of exceptions so that you can perform some actions only if there have been a specific number of issues.

Throwing Exceptions for user errors? Or better to design custom error message framework?

So I never got into detailed error processing too much when I played in VBA/VB6 a lot. Mostly then, if you ran into a user error (such as some input of theirs failing a validation test of some kind), you popped a MsgBox() with some error information and the critical (or warning) icon, and safely aborted out of the code and hope they got a clue.
In .NET, my reading basically points to exceptions as the end-all in error handling. It looks to me that if you know a spot of code where a user can screw up, you're supposed to catch it with either try...catch blocks (for things like data conversions), or standard if...the...else constructs for other things, and then throw a new exception if needed.
Isn't throwing an exception essentially a forced crash of a program in a sense (granted, you get the option of continuing)? Or are exceptions geared specifically for things like data conversion errors and other "things that shouldn't happen", and resume use of MsgBox() and friends for minor user screwups?
Consider the case of where you have a TextBox that is only supposed to accept numeric data (or heck, just a specific set of characters). Barring some other trick that lets you restrict that field (let's just assume it's freeform, programatically), it would seem a bit of a waste to throw new exceptions everytime they type in an invalid character, or even if the error checking doesn't happen until they press a submit button (like on a webpage). Popping a MsgBox() seems more sane in that case.
So what's the straight dope on exceptions and throwing new ones on user errors? How about if your program also exposes a programmatic framework? Bad usage of one of the programmatic functions definitely seems like new exception territory to me.
Exceptions in .NET certainly are available for bad entry, not just the things that should never go wrong. In any case, you shouldn't be letting unhanded exceptions get to the user.
You're probably going to displaying a MsgBox whether you 'testing' the input, or letting the framework detect an exception, so it doesn't make a huge amount of difference. Seeing as exceptions are generally slow, you should probably use 'if' statements to capture the obvious validation errors, and rely on exceptions to capture the more obscure scenarios.

How to get users to pay attention to problems?

We occasionally need to notify users about warnings or problems. But often times, especially if it's a common problem, users will just dismiss the warning and continue. Often times users won't even remember seeing the warning, but we check their logs and see that several were displayed. So, how do you get users to pay attention when you're trying to tell them something important?
This isn't as simple as forcing users to resolve all problems before allowing them to save. They often need to save data that isn't strictly okay by our business rules for various reasons (usually for problems that can't be solved right away, or at all).
We've got a better warning/error handling system in mind that I think will help a lot, but I want to see what others have done.
If you want users to pay attention to warnings, use them in moderation!
The big problem with the UAC in Vista is that people are getting so many notifications, that they stop reading who is exactly requesting access to what, they just give permission without thinking.
Another example is the delete confirmation in explorer when sending files to the recycle bin. I got so used to just hitting 'Ok' immediately after pressing 'delete', that I missed the fact that the dialog was telling me that the file would not be moved to the bin, but deleted immediately, for whatever reason.
My personal fix: I disabled the delete confirmation for the recycle bin. If something can not be moved to the bin, I still get a message, and this time I know that it might be important, so I pay attention.
Conclusion: Don't spam the user with messages, or the important warnings will get lost in the noise.
The quality of your warning will not prevent users from submitting invalid data. If you allow invalid data to be submitted, it will be.
If you have data that must be submitted to a rules system, then that data must be valid before it is submitted. However, allowing users to save their work is a separate issue. You should allow users to save their work, then submit the data to the rules engine when it is valid.
The fundamental problem is that users don't like to read, they just want to be left alone to do their work :).
The best way to combat this is the following:
Don't pop up a window unless absolutely necessary
If you do, make the error or warning message as short and succinct as you possibly can
Long error/warning messages simply won't get read. The user will get to about the fifth word and think "this is taking too much time, I just want to get back to work".
My advice boils down to three things.
Reevaluate what you think is important for the user to know.
Don't be lazy and ask the user to resolve what your program can resolve for itself.
Don't interrupt what the user is doing with stupid (and yes, they are stupid) messages.
If you have a form with required data, then color-code the field as red or highlight it with an asterisk to indicate it's required. Disable the "OK" or "Confirm" button until they fill out all required fields.
For fields with incomplete or inconsistent data, bring up a tooltip or color-code the field so the user knows that something may be wrong. You could also display the list of warnings prominently somewhere on your form. But don't stop the data entry. You'll just frustrate and anger your users.
I must admit that I too often click on "OK" or whatever I'm conditioned to do for a dialog to go away without thinking. Usually this occurs when there are just too many of them.
Without claiming to be a psychologist of any kind, I think it is natural to pay attention to unusual things and filter away repetetive things.
With that in mind it is maybe worth considering to make less important dialogues less intrusive so that the real important ones get more attention.
I think toaster messages and the way google handles messages in it's online apps are real nice examples of how to notifiy a user of something inessential.
--EDIT--
Now that I re-read my post, I remember reading this in "Don't Make Me Think".
A brilliant little book (few tens of pages) that's full of nice and easy to understand usability things. Somewhat focused on online usability, but defenatly applyable in offline applications too.
This is what we've got planned. Essentially, create something Bugzilla-ish for storing these errors/warnings/whatever. But it also goes hand in hand with some of the other answers.
Instead of using simple MessageBox, display warnings/errors in a Visual Studio-like error window. As long as there are problems, they'll be displayed in this window.
If the data is saved, save all warnings/errors to the database. Now anyone can see what the current issues are - bonus! Also, those problems can be loaded from the database instead of detecting them in the app all the time, which will help a lot - some problems are not trivial to detect.
Allow users to perform several actions, like:
Acknowledge the problem, so it is no longer displayed.
Assign the problem to another user
Flag the problem as "not really a problem"
Set a "must be solved by" date
(probably others, the design hasn't been fully thought out yet)
Log all of these actions to the database, so we have accountability
That's it in a nutshell. Now problems stick around, so they're in the users faces until they're solved. The problems can be tracked, so we can tell where the ball was dropped if we get bit. I hope it works!
Though I never got around to implementing this at a previous site, I wanted to create a custom dialog box where users would have to check a box stating that they have read and acknowledged the message (and then log that response). This was for an ISO-xxxx company so this kind of bureaucracy was a logical response to these types of mistakes.
My other, much more sinister, idea was to make "No" or "Cancel" the default options. Eventually they would get the Tab-Enter keystrokes down pat and then you would just switch it back.
Break the system!
It has honestly been my experience that if you don't want an end user to do something without explicitly understanding it, stop them from doing it...
As seriously anoying as the whole "Windows Error/warning Messages" gets, I never take notice until a program tells me I can't do something... then I am forced to ask myself "Why Not"
Time to google the answer... or RTFM
I know that it is not always feasible to use this approach, but if you can... they will listen!
I like programs that hint that there's a problem while ignoring it as long as possible - which sounds very like what you're striving for. One thing I've been thinking about (but vaguely, since I haven't had a use for it) is putting a status indicator for errors/warnings (a bit like the omnipresent throbber of a web-browser, but for errors). This icon would change state, a bit like a traffic light, to show that the program has problems that will have to be addressed sooner or later - perhaps yellow for warnings if the problem with the data could be corrected later and isn't going to cause any major problems, red for any problem that is going to have to be fixed before they complete the current job (for form data, that would mean the whole transaction, not the current form). Obviously the colours wouldn't be enough, there would have to be some support for colour-blind people, but you get the idea. Clicking the indicator would bring up a list of the problems (and perhaps explanations as to why that is a problem - so that people can point out when the code's assumptions are unhelpful or wrong), and selecting a problem would allow you to jump to the field where it can be fixed.
One thing you should probably do, whatever method you go with in the end, is to look through your warnings and work out whether they're actually necessary. I've seen far too many programs that warn me about perfectly reasonable input that is then accepted, or warn me about the usual behaviour of the program. That's the sort of thing that helps condition people to click through warnings. If you have logs of the warnings, you might start there - Why are people clicking through them? They might be conditioned, or it might be that there genuinely isn't a problem, and someone hasn't told you that things have changed.
I quite like the firefox method when installing plugins: The ok button is disabled and displays a countdown for 5 seconds. After that the use can choose to ignore it.
For web applications, the alert() and confirm() javascript methods, while somewhat basic, achieve the effect of either preventing users from doing something or making sure that they clearly agree to something that they have been warned about.
For other situations, where the action will not cause considerable disruption in the business processes, we often display a small warning box at the top of the page after, say, a form is submitted.
For example, our applications require location validation in several places (valid city/state/zip).
If the location is absolutely critical, we will make it required on the form.
If the location is required for some aspects of the application, we will use the confirm() to make sure they understand that they will not be able to use certain features without a valid location.
In some cases, we use a default location. In that case, we provide the message/warning box at the top of the next page indicating that a default location is being used.
I've found that if you're producing log messages for your own use (even if that use directly benefits the users themselves), the only way to get users to report problems is to have the application do it for them.
In the case of dealing with user input that might be wrong, have you considered using something like the red squigglies used by spell checking or some kind of highlighting of the problem areas as the user does their work? Most users have been trained to ignore dialogs by using buggy software, but that kind of message might make it clear that the error is the user's to fix.
Do you have a good idea why each of the exceptional situations occurs? What do you try to achieve with each of these messages:
make user review the data for obvious typos or mistakes
make someone else to review the data at a later stage when more information is available
inform this user and anyone else looking at the data at a later stage about any assumptions made
make sure user understands the consequences of their actions
Can any of these goals be achieved more effectively in a different way?
Some ideas (none of them automatically qualify as a silver bullet):
Keep messages short and relevant, exclude any language that does not contribute additional info (such as "please" etc), tell users what is expected of them (i.e. instead of "Post code is empty" use "Enter post code".).
Use language that is understood by the users, always give sufficient information, try to be as specific as possible.
Use different looking messages for different types of warnings and errors (use font, colour, imaging, possibly animation and sound).
Revisit the entire process, so that someone has to process any info submitted with warnings later on.
Visualise warnings the next time info brought to the screen (i.e. highlight problematic areas) so that they can be resolved later, when more info is available.
Add a sign off to the warnings, for instance request a user to enter their password each time they need to dismiss a warning.
Make actions undo-able, so you don't really need the warnings
Dont try to solve with programming. See if you can change the data input process.
Use colors and icons.
Green - everything is ok (or confirmation something happened as expected)
Yellow - Warning. You user may or may not want to look into the issue
Red - Error. Something that requires user interaction to resolve.
I would also suggest (as others have on this thread) to use sparingly.