This is most certainly a language agnostic question and one that has bothered me for quite some time now. An example will probably help me explain the dilemma I am facing:
Let us say we have a method which is responsible for reading a file, populating a collection with some objects (which store information from the file), and then returning the collection...something like the following:
public List<SomeObject> loadConfiguration(String filename);
Let us also say that at the time of implementing this method, it would seem infeasible for the application to continue if the collection returned was empty (a size of 0). Now, the question is, should this validation (checking for an empty collection and perhaps the subsequent throwing of an exception) be done within the method? Or, should this methods sole responsibility be to perform the load of the file and ignore the task of validation, allowing validation to be done at some later stage outside of the method?
I guess the general question is: is it better to decouple the validation from the actual task being performed by a method? Will this make things, in general, easier at a later stage to change or build upon - in the case of my example above, it may be the case at a later stage where a different strategy is added to recover from the event of an empty collection being return from the 'loadConfiguration' method..... this would be difficult if the validation (and resulting exception) was being done in the method.
Perhaps I am being overly pedantic in the quest for some dogmatic answer, where instead it simply just relies on the context in which a method is being used. Anyhow, I would be very interested in seeing what others have to say regarding this.
Thanks all!
My recommendation is to stick to the single responsibility principle which says, in a nutshell, that each object should have 1 purpose. In this instance, your method has 3 purposes and then 4 if you count the validation aspect.
Here's my recommendation on how to handle this and how to provide a large amount of flexibility for future updates.
Keep your LoadConfig method
Have it call the a new method for reading the file.
Pass the previous method's return value to another method for loading the file into the collection.
Pass the object collection into some validation method.
Return the collection.
That's taking 1 method initially and breaking it into 4 with one calling 3 others. This should allow you to change pieces w/o having any impact on others.
Hope this helps
I guess the general question is: is it
better to decouple the validation from
the actual task being performed by a
method?
Yes. (At least if you really insist on answering such a general question – it’s always quite easy to find a counter-example.) If you keep both the parts of the solution separate, you can exchange, drop or reuse any of them. That’s a clear plus. Of course you must be careful not to jeopardize your object’s invariants by exposing the non-validating API, but I think you are aware of that. You’ll have to do some little extra typing, but that won’t hurt you.
I will answer your question by a question: do you want various validation methods for the product of your method ?
This is the same as the 'constructor' issue: is it better to raise an exception during the construction or initialize a void object and then call an 'init' method... you are sure to raise a debate here!
In general, I would recommend performing the validation as soon as possible: this is known as the Fail Fast which advocates that finding problems as soon as possible is better than delaying the detection since diagnosis is immediate while later you would have to revert the whole flow....
If you're not convinced, think of it this way: do you really want to write 3 lines every time you load a file ? (load, parse, validate) Well, that violates the DRY principle.
So, go agile there:
write your method with validation: it is responsible for loading a valid configuration (1)
if you ever need some parametrization, add it then (like a 'check' parameter, with a default value which preserves the old behavior of course)
(1) Of course, I don't advocate a single method to do all this at once... it's an organization matter: under the covers this method should call dedicated methods to organize the code :)
To deflect the question to a more basic one, each method should do as little as possible. So in your example, there should be a method that reads in the file, a method that extracts the necessary data from the file, another method to write that data to the collection, and another method that calls these methods. The validation can go in a separate method, or in one of the others, depending on where it makes the most sense.
private byte[] ReadFile(string fileSpec)
{
// code to read in file, and return contents
}
private FileData GetFileData(string fileContents)
{
// code to create FileData struct from file contents
}
private void FileDataCollection: Collection<FileData> { }
public void DoItAll (string fileSpec, FileDataCollection filDtaCol)
{
filDtaCol.Add(GetFileData(ReadFile(fileSpec)));
}
Add validation, verification to each of the methods as appropriate
You are designing an API and should not make any unnecessary assumptions about your client. A method should take only the information that it needs, return only the information requested, and only fail when it is unable to return a meaningful value.
So, with that in mind, if the configuration is loadable but empty, then returning an empty list seems correct to me. If your client has an application specific requirement to fail when provided an empty list, then it may do so, but future clients may not have that requirement. The loadConfiguration method itself should fail when it really fails, such as when it is unable to read or parse the file.
But you can continue to decouple your interface. For example, why must the configuration be stored in a file? Why can't I provide a URL, a row in a database, or a raw string containing the configuration data? Very few methods should take a file path as an argument since it binds them tightly to the local file system and makes them responsible for opening, reading, and closing files in addition to their core logic. Consider accepting an input stream as an alternative. Or if you want to allow for elaborate alternatives -- like data from a database -- consider accepting a ConfigurationReader interface or similar.
Methods should be highly cohesive ... that is single minded. So my opinion would be to separate the responsibilities as you have described. I sometimes feel tempted to say...it is just a short method so it does not matter...then I regret it 1.5 weeks later.
I think this depends on the case: If you could think of a scenario where you would use this method and it returned an empty list, and this would be okay, then I would not put the validation inside the method. But for e.g. a method which inserts data into a database which have to be validated (is the email address correct, has a name been specified, ... ) then it should be ok to put validation code inside the function and throw an exception.
Another alternative, not mentioned above, is to support Dependency Injection and have the method client inject a validator. This would allow the preservation of the "strong" Resource Acquisition Is Initialization principle, that is to say Any Object which Loads Successfully is Ready For Business (Matthieu's mention of Fail Fast is much the same notion).
It also allows a resource implementation class to create its own low-level validators which rely on the structure of the resource without exposing clients to implementation details unnecessarily, which can be useful when dealing with multiple disparate resource providers such as Ryan listed.
Related
I work on a class in VBA, that encapsulates downloading stuff with MSXML2.XmlHttp.
There are three possibilities for the return value: Text, XML and Stream.
Should I create a function for each:
aText=myDownloader.TextSynchronous(URL,formData,dlPost,....)
aXml.load myDownloader.XmlSynchronous(URL,formData,dlPost,....)
Or can I just return the XmlHttpObject I created inside the class and then have
aText=myDownloader.Synchronous(URL,formData,dlPost,.....).ResponseText
aXML=myDownloader.Synchronous(URL,formData,dlPost,.....).ResponseXML
In the former case I can set the obj to nothing in the class but have to write several functions that are more or less the same.
In the latter case, I relay on the "garbage collector" but have a leaner class.
Both should work, but which one is better coding style?
In my opinion, the first way is better because you don't expose low level details to a high level of the abstraction.
I did something similar with a web crawler in Java, so I have a class only to manipulate the URL connection getting all the needed data (low level) and a high level class using the low level class that return an object called Page.
You can have a third method that only execute myDownloader.Synchronous(URL,formData,dlPost,.....) and stores the returned object in a private variable and the others method only manipulate this object. This form, you will only open the connection one time.
After much seeking around in the web (triggered by the comment by EmmadKareem) I found this:
First of all, Dont do localObject=Nothing at the end of a method - the variable goes out of scope anyway and is discarded. see this older but enlightening post on msdn
VBA uses reference counting and apart from some older bugs on ADO this seems to work woute well and (as I understand) immediately discards ressources that are not used anymore. So from a performance/memory usage point of view this seems not to be a problem.
As to the coding style: I think the uncomfortable fdeeling I had when I designed this could go away by simply renaming the function to myDownloader.getSyncDLObj(...) or some such.
There seem to be two camps on codestyle. One promotes clean code, which is easy to read, but uses five lines everytime you use it. Its most important prerogative is "every function should do one thing and one thing only. Their approach would probably look something like
myDownloader.URL="..."
myDownloader.method=dlSync
myDownloader.download
aText=myDownloader.getXmlHttpObj.ResponseText
myDownloader.freeResources
and one is OK with the more cluttered, but less lineconsuming
aText=myDownloader.getSyncObj(...).ResponseText
both have their merits both none is wrong, dangerous or frowned upon. As this is a helper class and I use it to remove the inner workings of the xmlhttp from the main code I am more comfortable with the second approach here. (One line for one goal ;)
I would be very interested on anyones take on that matter
I use the Scanner class for reading multiple similar files. I would like to extend it to make sure they all use the same delimiter and I can also add methods like skipUntilYouFind(String thisHere) that all valid for them all.
I can make a utility-class that contain them, or embed the Scanner Class as a variable inside another class but this is more cumbersome.
I have found some reasons to declare a class final, but why is it done here?
Probably because extending it and overwriting some of it's methods would probably break it. And making it easier to overwrite methods would expose to much of the inner workings, so if in the future they decide to change those (for performance or some other reasons), it would be harder for them to change the class without breaking all the classes that extend it.
For example, consider the following method in the class:
public boolean nextBoolean() {
clearCaches();
return Boolean.parseBoolean(next(boolPattern()));
}
Say you want to overwrite this because you want to make 'awesome' evaluate to a 'true' boolean (for whatever reason). If you overwrite it, you can't call super.nextBoolean(), since that would consume the next token using the default logic. But if you don't call super.nextBoolean(), clearCaches() won't be called, possibly breaking the other not overwritten methods. You can't call clearCaches() because it's private. If they made it protected, but then realized that it's causing a performance problem, and wanted a new implementation that doesn't clear caches anymore, then they might break your overwritten implementation which would still be calling that.
So basically it's so they can easily change the hidden parts inside the class, which are quite complex, and protecting you from making a broken child class (or a class that could be easily be broken).
I suppose it is due to security reasons. This class reads user input, so that someone with bad intentions could extend it, modify it's behavior and you'd be screwed. If it is final, it is not that easy for the bad guy, because if he makes his own type of Scanner (not java.util.Scanner), the principles of Polymorphism would be broken. See the bad guy can be smart enough to write a bot/script which does this automatically on remote servers... He can even do it by dynamic classloading in compiled application.
I think that the link you provided explains it all.
In your case it seems like you should prefer composition instead of inheritance anyway. You are creating a utility that has some predefined behavior, and that can hide some (or all) of the details of the Scanner class.
I've seen many implementations that used inheritance in order to change a behavior. The end result was usually a monolithic design, and in some cases, a broken contract, and/or broken behavior.
I have the habit of always validating property setters against bad data, even if there's no where in my program that would reasonably input bad data. My QA person doesn't want me throwing exceptions unless I can explain where they would occur. Should I be validating all properties? Is there a standard on this I could point to?
Example:
public void setName(String newName){
if (newName == null){
throw new IllegalArgumentException("Name cannot be null");
}
name = newName;
}
...
//Only call to setName(String)
t.setName("Jim");
You're enforcing your method's preconditions, which are an important part of its contract. There's nothing wrong with doing that, and it also serves as self-documenting code (if I read your method's code, I immediately see what I shouldn't pass to it), though asserts may be preferable for that.
Personally I prefer using Asserts in these wildly improbable cases just to avoid difficult to read code but to make it clear that assumptions are being made in the function's algorithms.
But, of course, this is very much a judgement call that has to be made on a case-by-case basis. You can see it (and I have seen it) get completely out of hand - to the point where a simple function is turned into a tangle of if statements that pretty much never evaluate to true.
You are doing ok !
Whether it's a setter or a function - always validate and throw meaningfull exception. you never know when you'll need it, and you will...
In general I don't favor this practice. It's not that performing validation is bad, but rather, on things like simple setters it tends to create more clutter than its worth in protecting from bugs. I prefer using unit tests to insure there are no bugs.
Well, it's been awhile since this question was posted but I'd like to give a different point of view on this topic.
Using the specific example you posted, IMHO you should doing validation, but in a different way.
The key to archieving validation lies in the question itself. Think about it: you're dealing with names, not strings.
A string is a name when it's not null. We can also think of additional characteristics that make a string a name: is cannot be empty nor contain spaces.
Suppose you need to add those validation rules: if you stick with your approach you'll end up cluttering your setter as #SingleShot said.
Also, what would you do if more than one domain object has a setName setter?
Even if you use helper classes as #dave does, code will still be duplicated: calls to the helper instances.
Now, think for a moment: what if all the arguments you could ever receive in the setName method were valid? No validation would be needed for sure.
I might sound too optimistic, but it can be done.
Remember you're dealing with names, so why don't model the concept of a name?
Here's a vanilla, dummy implementation to show the idea:
public class Name
public static Name From(String value) {
if (string.IsNullOrEmpty(value)) throw new ...
if (value.contains(' ')) throw new ...
return new Name(value);
}
private Name(string value) {
this.value = value;
}
// other Name stuff goes here...
}
Because validation is happening at the moment of creation, you can only get valid Name instances. There's no way to create a Name instance from an "invalid" string.
Not only the validation code has been centralized, but also exceptions are thrown in a context that have meaning to them (the creation of a Name instance).
You can read about great design principles in Hernan Wilkinson's "Design Principles Behind Patagonia" (the name example is taken from it). Be sure to check the ESUG 2010 Video and the presentation slides
Finally, I think you might find Jim Shore's "Fail Fast" article interesting.
It's a tradeoff. It's more code to write, review and maintain, but you'll probably find problems quicker if somehow a null name gets through.
I lean towards having it because eventually you find you do need it.
I used to have utility classes to keep the code to a minimum. So instead of
if (name == null) { throw new ...
you could have
Util.assertNotNull(name)
Then Java added asserts to the language and you could do it more directly. And turn it off if you wanted.
It's well done in my opinion. For null values throw IllegalArgumentException. For other kind of validations you should consider using a customized exceptions hierarchy related to your domain objects.
I'm not aware of any documented standard that says 'validate all user input' but it's a very good idea. In the current version of the program it may not be possible to access this particular property with invalid data but that's not going to prevent it from happening in the future. All sorts of interesting things happen during maintenance. And hey, you never know when someone will reuse the class in another application that doesn't validate data before passing it in.
Is type checking considered bad practice even if you are checking against an interface? I understand that you should always program to an interface and not an implementation - is this what it means?
For example, in PHP, is the following OK?
if($class instanceof AnInterface) {
// Do some code
}
Or is there a better way of altering the behaviour of code based on a class type?
Edit: Just to be clear I am talking about checking whether a class implements an interface not just that it is an instance of a certain class.
As long as you follow the LSP, I don't see a problem. Your code must work with any implementation of the interface. It's not a problem that certain implementations cause you to follow different code paths, as long as you can correctly work with any implementation of the interface.
If your code doesn't work with all implementations of the interface, then you shouldn't use the interface in the first place.
If you can avoid type checking you should; however, one scenario where I found it handy, was we had a web service which took a message but the contents of the message could change. We had to persist the message back into a db, in order to get the right component to break the message down to its proper tables we used type checking in a sense.
What I find more common and flexible then if ($class instanceof SomeOtherType) is to define an IProcessing strategy for example and then using factory based on the type $class create the correct class.
So in c# roughly this:
void Process(Message msg)
{
IProcessor processor=ProcessignFactory.GetProcessor(msg.GetType());
processor.Process(msg);
}
However sometimes doing this can be overkill if your only dealing with one variation that won't change implement it using a type check, and when / if you find you were wrong and it requires more checks then refactor it into a more robust solution.
In my practice any checking for type (as well as type casting) has always indicated that something is wrong with the code or with the language.
So I try to avoid it whenever possible.
Run-time type checking is often necessary in situations where an interface provides all the methods necessary to do something, but does not provide enough to do it well. A prime example of such a situation is determining the number of items in an enumerable sequence. It's possible to make such a determination by enumerating through the sequence, but many enumerable objects "know" how many items they contain. If an object knows how many items it contains, it will likely be more efficient to ask it than to enumerate through the collection and count the items individually.
Arguably, IEnumerable should have provided some methods to ask what it knows about the number of items it contains [recognizing the possibility that the object may know that the number is unbounded, or that it's at most 4,591 (but could be a lot less), etc.], but it doesn't. What might be ideal would be if a new version of IEnumerable interface could be produced that included default implementations for any "new" methods it adds, and if such interface could be considered to be implemented by any implementations of the present version. Unfortunately, because no such feature exists, the only way to get the count of an enumerable collection without enumerating it is to check whether it implements any known collection interfaces that include a Count member.
I'm wondering if it's a good idea to make verifications in getters and setters, or elsewhere in the code.
This might surprise you be when it comes to optimizations and speeding up the code, I think you should not make verifications in getters and setters, but in the code where you're updating your files or database. Am I wrong?
Well, one of the reasons why classes usually contain private members with public getters/setters is exactly because they can verify data.
If you have a Number than can be between 1 and 100, i would definitely put something in the setter that validates that and then maybe throw an exception that is being caught by the code. The reason is simple: If you don't do it in the setter, you have to remember that 1 to 100 limitation every time you set it, which leads to duplicated code or when you forget it, it leads to an invalid state.
As for performance, i'm with Knuth here:
"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil."
#Terrapin, re:
If all you have is a bunch of [simple
public set/get] properties ... they
might as well be fields
Properties have other advantages over fields. They're a more explicit contract, they're serialized, they can be debugged later, they're a nice place for extension through inheritance. The clunkier syntax is an accidental complexity -- .net 3.5 for example overcomes this.
A common (and flawed) practice is to start with public fields, and turn them into properties later, on an 'as needed' basis. This breaks your contract with anyone who consumes your class, so it's best to start with properties.
It depends.
Generally, code should fail fast. If the value can be set by multiple points in the code and you validate only on after retrieving the value, the bug appears to be in the code that does the update. If the setters validate the input, you know what code is trying to set invalid values.
From the perspective of having the most maintainable code, I think you should do as much validation as you can in the setter of a property. This way you won't be caching or otherwise dealing with invalid data.
After all, this is what properties are meant for. If all you have is a bunch of properties like...
public string Name
{
get
{
return _name;
}
set
{
_name = value;
}
}
... they might as well be fields
Validation should be captured separately from getters or setters in a validation method. That way if the validation needs to be reused across multiple components, it is available.
When the setter is called, such a validation service should be utilized to sanitize input into the object. That way you know all information stored in an object is valid at all times.
You don't need any kind of validation for the getter, because information on the object is already trusted to be valid.
Don't save your validation until a database update!! It is better to fail fast.
I like to implement IDataErrorInfo and put my validation logic in its Error and this[columnName] properties. That way if you want to check programmatically whether there's an error you can simply test either of those properties in code, or you can hand the validation off to the data binding in Web Forms, Windows Forms or WPF.
WPF's "ValidatesOnDataError" Binding property makes this particularly easy.
I try to never let my objects enter an invalid state, so setters definitely would have validation as well as any methods that change state. This way, I never have to worry that the object I'm dealing with is invalid. If you keep your methods as validation boundaries, then you never have to worry about validation frameworks and IsValid() method calls sprinkled all over the place.
You might wanna check out Domain Driven Design, by Eric Evans. DDD has this notion of a Specification:
... explicit predicate-like VALUE
OBJECTS for specialized purposes. A
SPECIFICATION is a predicate that
determines if an object does or does
not satisfy some criteria.
I think failing fast is one thing, the other is where to keep the logic for validation. The domain is the right place to keep the logic and I think a Specification Object or a validate method on your Domain objects would be a good place.