Pythonic error handling of complex functions - error-handling

I'd like to know if there is a Pythonic way for handling errors in long-running functions that can have errors in part that do not affect the ability of the function to continue.
As an example, consider a function that given a list of URLs, it recursively retrieves the resource and all linked resources under the path of the top level URLs. It stores the retrieved resources in a local filesystem with a directory structure mirroring the URL structure. Essentially this is a basic recursive wget for a list of pages.
There are quite a number of points where this function could fail:
A URL may be invalid, or unresolvable
The host may not be reachable (perhaps temporarily)
Saving locally may have disk errors
anything else you can think of.
A failure on retrieving or saving any one resource only affects the function's ability to continue to process that resource and any child resources that may be linked from it, but it is possible to continue to retrieve other resources.
A simple model of error handling is that on the first error, an appropriate exception is raised for the caller to handle. The problem with this is that it terminates the function and does not allow it to continue. The error could possibly be fixed and the function restarted from the beginning but this would cause work to be redone, and any permanent errors may mean we never complete.
A couple of alternatives I have in mind are:
Record errors in a list as they occur and abort processing that resource any any child resources, but continue on to the next resource. A threshold could be used to abort the entire function if too many errors occur, or perhaps just try everything. The caller can interrogate this list at the completion of the function to see if there were any problems.
The caller could provide a callable object that is called with each error. This moves responsibility for recording errors back to the caller. You could even specify that if the callable returns False that processing should stop. This would move the threshold management to the caller.
Implement the former with the latter, providing an error handling object than encodes the former's behavior.
In Python discussions, I've often noted certain approaches described as Pythonic or non-Pythonic. I'd like to know if there are any particularly Pythonic approaches to handling the type of scenario described above.
Does Python have any batteries included that model more sophisticated error handling than the terminate model of exception handling, or do the more complex batteries included use a model of error handling that I should copy to stay Pythonic?
Note: Please do not focus on the example. I'm not looking to solve problems in that particular space, but it seemed like a good example that most people here would have an understanding of.

I don't think there's a particularly clear "Pythonic/non-Pythonic" distinction at the level you're talking about here.
One of the big reasons there's no "one-size-fits-all" solution in this domain, is that the exact semantics you want are going to be problem specific.
For one situation, abort-on-first-failure may be adequate.
For another, you may want abort-and-rollback if any of the operations fails.
For a third, you may want to complete as many as possible and simply log-and-ignore failures
For a fourth alternative, you may want to complete as many as possible, but raise an exception at the end to report any that failed.
Even supporting an error handler doesn't necessarily cover all of those desired behaviours - a simple per-failure error handler can't easily provide abort-and-rollback semantics, or generate a single exception at the end. (It's not impossible - you just have to mess around with tricks like passing bound methods or closures as your error handlers)
So the best you can do is take an educated guess at typical usage scenarios and desirable behaviours in the face of errors, and design your API accordingly.
A fully general solution would accept an on-error handler that is given each failure as it happens, and a final "errors occurred" handler that gives the caller a chance to decide how multiple errors are handled (with some protocol to allow data to be passed from the individual error handlers to the final batch error handler).
However, providing such a general solution is likely to be an API design failure. The designer of the API shouldn't be afraid to have an opinion on how their API should be used, and how errors should be handled. The main thing to keep in mind is to not overengineer your solution:
if the naive approach is adequate, don't mess with it
if collecting failures in a list and reporting a single error is good enough, do that
if you need to rollback everything if one part fails, then just implement it that way
if there's a genuine use case for custom error handling, then accept an error handler as a part of the API. But have a specific use case in mind when you do this, don't just do it for the sake of it. And when you do, have a sensible default handler that is used if the user doesn't specify one (this may just be the naive "raise immediately" approach)
If you do offer selectable error handlers, consider offering some standard error handlers that can be passed in either as callables or as named strings (i.e. along the lines of the error handler selection for text codecs)
Perhaps the best you're going to get as a general principle is that "Pythonic" error handling will be as simple as possible, but no simpler. But at that point, the word is just being used as a synonym for "good code", which isn't really its intent.
On the other hand, it is slightly easier to talk about what actual forms non-Pythonic error handling might take:
def myFunction(an_arg, error_handler)
# Do stuff
if err_occurred:
if isinstance(err, RuntimeError):
error_handler.handleRuntimeError()
elif isinstance(err, IOError):
error_handler.handleIOError()
The Pythonic idiom is that error handlers, if supported at all, are just simple callables. Give them the information they need to decide how to handle the situation, rather than try to decide too much on their behalf. If you want to make it easier to implement common aspects of the error handling, then provide a separate helper class with a __call__ method that does the dispatch, so people can decide whether or not they want to use it (or how much they want to override when they do use it). This isn't completely Python-specific, but it is something that folks coming from languages that make it annoyingly difficult to pass arbitrary callables around (such as Java, C, C++) may get wrong. So complex error handling protocols would definitely be a way to head into "non-Pythonic error handling" territory.
The other problem in the above non-Pythonic code is that there's no default handler provided. Forcing every API user to make a decision they may not yet be equipped to make is just poor API design. But now we're back in general "good code"/"bad code" territory, so Pythonic/non-Pythonic really shouldn't be used to describe the difference.

Error handling should rely on exceptions and logging, so for each error raise an exception and log an error message.
Then at any caller function level catch the exception, log any other additional error if needed and handle the issue.
If the issue is not fully handled, then re-raise the exception again so that upper levels can catch the same exception and perform different actions.
In any of this stages you can keep a counter of some types of exceptions so that you can perform some actions only if there have been a specific number of issues.

Related

Why is it wrong to think that `an exception is related to how frequently something happens`?

I met the following excerpt in the CLR via C# book:
Important Many developers incorrectly believe that an exception is related to how frequently
something happens. For example, a developer designing a file Read method is likely to say the
following: “When reading from a file, you will eventually reach the end of its data. Since reaching the
end will always happen, I’ll design my Read method so that it reports the end by returning a special
value; I won’t have it throw an exception.” The problem with this statement is that it is being made by
the developer designing the Read method, not by the developer calling the Read method.
When designing the Read method, it is impossible for the developer to know all of the possible
situations in which the method gets called. Therefore, the developer can’t possibly know how often the
caller of the Read method will attempt to read past the end of the file. In fact, since most files contain
structured data, attempting to read past the end of a file is something that rarely happens.
I can not understand two things which the excerpt (from my pov) was intended to explain. What does it mean that an exception is related to how frequently something happens? How is it possible to prove that it is not a correct way of thinking (I believe that a counterexample does the job of proving this, but still I do not understand the counterexample presented in the above excerpt)?
I do not understand the counterexample. Ok, let someone call a method which reads from a file many times after the end of the file was reached. Ok, let the method to report the end of the file all these times. I see no reasons for this to be worse than throwing an exception.
The author is saying a developer should not attempt to guess how often a branch of code will be executed by users, and should not decide whether to throw an exception from that branch based on their guess. In other words, it is incorrect to define an exception as, "something that doesn't happen very often."
The obvious reason for not making guesses is they may be wrong. A more fundamental reason is that exceptions are not necessarily infrequent, depending on the business domain. Consider an e-commerce site where users enter credit card numbers. Users will frequently enter their card numbers incorrectly. If we related exceptions to how frequently something happens, we might determine an incorrect CC number is not an exception, because it happens quite often.
Developers may be reluctant to throw exceptions. This often results in applications that "fail slow" because error conditions propagate beyond the point where they occur. Exceptions encourage an application to fail fast.
Related: Avoid in-band error indicators.

Assertions in ABAP

Over the years I've written code in a variety of languages and environments, but one constant seemed to be the consensus on the use of assertions. As I understand it, they are there for the development process when you want to identify "impossible" errors and other situations to which your first reaction would be "that can't be right" and which cannot be handled gracefully, leaving the system in a state where it has no choice but to terminate. Assertions are easy to understand and quick to code but due to their fail-fast nature are unsuitable for development code. Ideally, assertions are used to discover all development bugs and then removed or turned off when shipping the code. Input or program states that are wrong, but possible (and expected to occur) should instead be handled gracefully via exceptions or other error handling techniques.
However, none of this seems to hold true for writing ABAP code for SAP. I've just spent the better part of an hour trying to track down the precise location where an assert was giving me an unintelligible error. This turned out to be five levels down in standard SAP code, which is apparently riddled with ASSERT statements. I now know that a certain variable identifying a table IS NOT INITIAL while its accompanying variable identifying a field is.
This tells me nothing. The Web Dynpro component running this code actually "catches" this assert, showing me a generic error message, which only serves to prevent the debugger from launching when the assert is tripped.
My question therefore is what the guidelines or best practices are for the use of assertions in ABAP. Is this SAP writing bad code? Is it an accepted practice to fill your custom code with asserts and leave them in when shipping the code? If so, how would we go about handling these asserts in runtime so that the application doesn't crash and burn while still being able to identify the cause of the error?
The guidelines and best practices are virtually the same in ABAP development as in any other language. Assertion should be used as internal guidance checks only, exceptions for regular input validation errors and other stuff. It might be sensible to leave the assertions in the code - after all, you'd probably rather want your program to crash in a controlled fashion than continue in an unforeseen way and probably damage some critical data in the process without anyone noticing. Take a look at checkpoint groups if you don't want your program to abort in a production environment - but in my opinion: What's the use of a sanity check (as a last line of defense) if it's disabled in the environment where it matters most?
Of course I'm assuming that the input is validated properly (so that crashes are prevented) and that all APIs are used according to the intended use and documentation. Unfortunately - as with every other programming language - it's up to the developer to live up to these standards.

Proper library behavior with erroneous user input

Suppose I'm writing a library that stores a sequence of doubles to a file in a certain format. The format requires that the doubles are monotonically increasing.
Now, some users won't read the manual carefully or write buggy frontends that do something like
store(3.0)
store(3.1)
store(0.3)
store(7.8)
What the library could do is
Error out when store(0.3) is called.
Try to correct the error by making a good guess, e.g., actually store(3.3).
Correct the error and write a message to stderr.
[...]
The advantage of (1) would be that the user cannot miss it. If the code ran for a long time (which is the regular case in my context), though, the user wouldn't be too happy with the program aborting.
(2) would do away with this, but possibly encourage misusing the library.
Are there policies in any language that advocate one approach over the other?
Irrespective of the language used, my general advice is to always fail quickly. This localises errors to the actual source of the problem - i.e., throw an error or exception and bail out (perhaps permitting the programmer to catch the exception, depending on the language). Similarly, some languages with checked exceptions might force the programmer to add a check for malformed input.
The reason for this is simple - the further away from the actual source of the problem that the errors manifest, the harder the program is to debug. Let's say the programmer didn't mean 3.3 (as opposed to 0.3) and you corrected it for him - well, the program will keep running, but at some point the value 3.3 will manifest and potentially cause other problems. It might also be that the source of these values was some kind of sorting algorithm with bugs - the fact that your library doesn't fail in this case will simply make it harder to debug the sorting algorithm and identify the real cause of the failure.
It also plays hell with any attempts to unit test the code - code that should fail doesn't necessarily fail in the right place. This just makes the code magical and much more difficult to manage as part of a development process.
There is an alternative to simply failing and forcing the user or client program to start the interaction all over again - you could do things in a transactional manner such that the library is left in a consistent state after the failure, permitting the user to proceed from the last valid input (for example). This should be implemented with proper rollback semantics though, to ensure data consistency.
So in summary: fail fast, and fail early.

Throwing Exceptions for user errors? Or better to design custom error message framework?

So I never got into detailed error processing too much when I played in VBA/VB6 a lot. Mostly then, if you ran into a user error (such as some input of theirs failing a validation test of some kind), you popped a MsgBox() with some error information and the critical (or warning) icon, and safely aborted out of the code and hope they got a clue.
In .NET, my reading basically points to exceptions as the end-all in error handling. It looks to me that if you know a spot of code where a user can screw up, you're supposed to catch it with either try...catch blocks (for things like data conversions), or standard if...the...else constructs for other things, and then throw a new exception if needed.
Isn't throwing an exception essentially a forced crash of a program in a sense (granted, you get the option of continuing)? Or are exceptions geared specifically for things like data conversion errors and other "things that shouldn't happen", and resume use of MsgBox() and friends for minor user screwups?
Consider the case of where you have a TextBox that is only supposed to accept numeric data (or heck, just a specific set of characters). Barring some other trick that lets you restrict that field (let's just assume it's freeform, programatically), it would seem a bit of a waste to throw new exceptions everytime they type in an invalid character, or even if the error checking doesn't happen until they press a submit button (like on a webpage). Popping a MsgBox() seems more sane in that case.
So what's the straight dope on exceptions and throwing new ones on user errors? How about if your program also exposes a programmatic framework? Bad usage of one of the programmatic functions definitely seems like new exception territory to me.
Exceptions in .NET certainly are available for bad entry, not just the things that should never go wrong. In any case, you shouldn't be letting unhanded exceptions get to the user.
You're probably going to displaying a MsgBox whether you 'testing' the input, or letting the framework detect an exception, so it doesn't make a huge amount of difference. Seeing as exceptions are generally slow, you should probably use 'if' statements to capture the obvious validation errors, and rely on exceptions to capture the more obscure scenarios.

Why is error handling important?

I was given a task of write the coding guidelines for my team, and it was going great until my manager asked me to write an explanation of Why Error Handling is Important.
I know it instinctively, but how do I express this in words?
I tried to google it first but came up empty, so I now ask my fellow coding wizards.
IMHO ... most programs are very large, very complex and written by multiple people. This combination of factors almost always leads to some kind of software bug. It's not that programmers are malicious, stupid or lazy ... it's just that in the rush to meet a deadline we often don't forsee every possible thing that a user can do to our programs and something is bound to happen.
In this respect error handling serves two purposes.
First, it lets the user know, in a relatively friendly manner, that something has gone wrong and that they should contact the technical support department or that someone from tech support has been notified. As we all know there's a HUGE difference between receiving a rather nasty, tech riddled notice that says something like "Object not set to reference of an object" etc. ... and receiving a nice popup type window that says "There has been an issue. Please contact the helpdesk".
Second it allows the programmer to put in some niceties to aid in the debugging of issues. For instance ... in my code, I typically write a custom error handler that takes in a number of parameters and spits back a nice, formatted message that can either be emailed to the helpdesk, stashed in an event log, written to a log file etc.. The error message will contain as much info as I can cram in there to help me figure out what happened, stack traces, function parameters, database calls ... you name it. I like verbose error messages to help me figure out what actually happened. The user never has to see any of it, they get the nice, friendly message above, letting them know that someone can figure out what's going on.
Error handling is important because it makes it easier for the end users of your code to use it correctly. Another important issue is that it makes your code easier to maintain. Error handling makes it easier to embed input specifications into the code, so you don't have to look up the design when you write and later maintain the code.
Why Error Handling is Important.
Because of what can happen if you don't.
If you're capable of writing coding guidelines, you should be able to handle this, surely?
Its quite simple to explain to a layman manager:
If your handle your errors, your program will likely continue to function after an error, your customer can likely continue working, and you can provide a report of exactly how the bug occurred so you can fix it.
If you don't handle your errors, your program may crash, lose all of your customers work and you likely won't know where the bug occurred (provided you don't handle your fatal exception with a stack trace).
Another huge reason that error handling is so important is security! Certain types of errors, if not handled properly can leave a program and the underlying operating system in a vulnerable state. Handling errors must be a deliberate and well thought out process because even when handled gracefully, errors can write to log files or splash error messages to the screen that supply potential attackers with very valuable information that they can use later to take advantage of specific vulnerabilities.
First I would ask is it important?
I've seen (ugly) code where some errors were ignored (eg null reference)
So what type of errors are important to handle?
There is a big difference between System.IO.FileNotFoundException, System.Data.SqlClient.SqlException and System.ApplicationException