Data verifications in Getter/Setter or elsewhere? - optimization

I'm wondering if it's a good idea to make verifications in getters and setters, or elsewhere in the code.
This might surprise you be when it comes to optimizations and speeding up the code, I think you should not make verifications in getters and setters, but in the code where you're updating your files or database. Am I wrong?

Well, one of the reasons why classes usually contain private members with public getters/setters is exactly because they can verify data.
If you have a Number than can be between 1 and 100, i would definitely put something in the setter that validates that and then maybe throw an exception that is being caught by the code. The reason is simple: If you don't do it in the setter, you have to remember that 1 to 100 limitation every time you set it, which leads to duplicated code or when you forget it, it leads to an invalid state.
As for performance, i'm with Knuth here:
"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil."

#Terrapin, re:
If all you have is a bunch of [simple
public set/get] properties ... they
might as well be fields
Properties have other advantages over fields. They're a more explicit contract, they're serialized, they can be debugged later, they're a nice place for extension through inheritance. The clunkier syntax is an accidental complexity -- .net 3.5 for example overcomes this.
A common (and flawed) practice is to start with public fields, and turn them into properties later, on an 'as needed' basis. This breaks your contract with anyone who consumes your class, so it's best to start with properties.

It depends.
Generally, code should fail fast. If the value can be set by multiple points in the code and you validate only on after retrieving the value, the bug appears to be in the code that does the update. If the setters validate the input, you know what code is trying to set invalid values.

From the perspective of having the most maintainable code, I think you should do as much validation as you can in the setter of a property. This way you won't be caching or otherwise dealing with invalid data.
After all, this is what properties are meant for. If all you have is a bunch of properties like...
public string Name
{
get
{
return _name;
}
set
{
_name = value;
}
}
... they might as well be fields

Validation should be captured separately from getters or setters in a validation method. That way if the validation needs to be reused across multiple components, it is available.
When the setter is called, such a validation service should be utilized to sanitize input into the object. That way you know all information stored in an object is valid at all times.
You don't need any kind of validation for the getter, because information on the object is already trusted to be valid.
Don't save your validation until a database update!! It is better to fail fast.

I like to implement IDataErrorInfo and put my validation logic in its Error and this[columnName] properties. That way if you want to check programmatically whether there's an error you can simply test either of those properties in code, or you can hand the validation off to the data binding in Web Forms, Windows Forms or WPF.
WPF's "ValidatesOnDataError" Binding property makes this particularly easy.

I try to never let my objects enter an invalid state, so setters definitely would have validation as well as any methods that change state. This way, I never have to worry that the object I'm dealing with is invalid. If you keep your methods as validation boundaries, then you never have to worry about validation frameworks and IsValid() method calls sprinkled all over the place.

You might wanna check out Domain Driven Design, by Eric Evans. DDD has this notion of a Specification:
... explicit predicate-like VALUE
OBJECTS for specialized purposes. A
SPECIFICATION is a predicate that
determines if an object does or does
not satisfy some criteria.
I think failing fast is one thing, the other is where to keep the logic for validation. The domain is the right place to keep the logic and I think a Specification Object or a validate method on your Domain objects would be a good place.

Related

In what cases should public fields be used instead of properties? [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Public Data members vs Getters, Setters
In what cases should public fields be used, instead of properties or getter and setter methods (where there is no support for properties)? Where exactly is their use recommended, and why, or, if it is not, why are they still allowed as a language feature? After all, they break the Object-Oriented principle of encapsulation where getters and setters are allowed and encouraged.
If you have a constant that needs to be public, you might as well make it a public field instead of creating a getter property for it.
Apart from that, I don't see a need, as far as good OOP principles are concerned.
They are there and allowed because sometimes you need the flexibility.
That's hard to tell, but in my opinion public fields are only valid when using structs.
struct Simple
{
public int Position;
public bool Exists;
public double LastValue;
};
But different people have different thoughts about:
http://kristofverbiest.blogspot.com/2007/02/public-fields-and-properties-are-not.html
http://blogs.msdn.com/b/ericgu/archive/2007/02/01/properties-vs-public-fields-redux.aspx
http://www.markhneedham.com/blog/2009/02/04/c-public-fields-vs-automatic-properties/
If your compiler does not optimize getter and setter invocations, the access to your properties might be more expensive than reading and writing fields (call stack). That might be relevant if you perform many, many invocations.
But, to be honest, I know no language where this is true. At least in both .NET and Java this is optimized well.
From a design point of view I know no case where using fields is recommended...
Cheers
Matthias
Let's first look at the question why we need accessors (getters/setters)? You need them to be able to override the behaviour when assigning a new value/reading a value. You might want to add caching or return a calculated value instead of a property.
Your question can now be formed as do I always want this behaviour? I can think of cases where this is not useful at all: structures (what were structs in C). Passing a parameter object or a class wrapping multiple values to be inserted into a Collection are cases where one actually does not need accessors: The object is merely a container for variables.
There is one single reason(*) why to use get instead of public field: lazy evaluation. I.e. the value you want may be stored in a database, or may be long to compute, and don't want your program to initialize it at startup, but only when needed.
There is one single reason(*) why to use set instead of public field: other fields modifications. I.e. you change the value of other fields when you the value of the target field changes.
Forcing to use get and set on every field is in contradiction with the YAGNI principle.
If you want to expose the value of a field from an object, then expose it! It is completely pointless to create an object with four independent fields and mandating that all of them uses get/set or properties access.
*: Other reasons such as possible data type change are pointless. In fact, wherever you use a = o.get_value() instead of a = o.value, if you change the type returned by get_value() you have to change at every use, just as if you would have changed the type of value.
The main reason is nothing to do with OOP encapsulation (though people often say it is), and everything to do with versioning.
Indeed from the OOP position one could argue that fields are better than "blind" properties, as a lack of encapsulation is clearer than something that pretends to encapsulation and then blows it away. If encapsulation is important, then it should be good to see when it isn't there.
A property called Foo will not be treated the same from the outside as a public field called Foo. In some languages this is explicit (the language doesn't directly support properties, so you've got a getFoo and a setFoo) and in some it is implicit (C# and VB.NET directly support properties, but they are not binary-compatible with fields and code compiled to use a field will break if it's changed to a property, and vice-versa).
If your Foo just does a "blind" set and write of an underlying field, then there is currently no encapsulation advantage to this over exposing the field.
However, if there is a later requirement to take advantage of encapsulation to prevent invalid values (you should always prevent invalid values, but maybe you didn't realise some where invalid when you first wrote the class, or maybe "valid" has changed with a scope change), to wrap memoised evaluation, to trigger other changes in the object, to trigger an on-change event, to prevent expensive needless equivalent sets, and so on, then you can't make that change without breaking running code.
If the class is internal to the component in question, this isn't a concern, and I'd say use fields if fields read sensibly under the general YAGNI principle. However, YAGNI doesn't play quite so well across component boundaries (if I did need my component to work today, I certainly am probably going to need that it works tomorrow after you've changed your component that mine depends on), so it can make sense to pre-emptively use properties.

What Getters and Setters should and shouldn't do [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Convention question: When do you use a Getter/Setter function rather than using a Property
I've run into a lot of differing opinions on Getters and Setters lately, so I figured I should make it into it's own question.
A previous question of mine received an immediate comment (later deleted) that stated setters shouldn't have any side effects, and a SetProperty method would be a better choice.
Indeed, this seems to be Microsoft's opinion as well. However, their properties often raise events, such as Resized when a form's Width or Height property is set. OwenP also states "you shouldn't let a property throw exceptions, properties shouldn't have side effects, order shouldn't matter, and properties should return relatively quickly."
Yet Michael Stum states that exceptions should be thrown while validating data within a setter. If your setter doesn't throw an exception, how could you effectively validate data, as so many of the answers to this question suggest?
What about when you need to raise an event, like nearly all of Microsoft's Control's do? Aren't you then at the mercy of whomever subscribed to your event? If their handler performs a massive amount of information, or throws an error itself, what happens to your setter?
Finally, what about lazy loading within the getter? This too could violate the previous guidelines.
What is acceptable to place in a getter or setter, and what should be kept in only accessor methods?
Edit:
From another article in the MSDN:
The get and set methods are generally no different from other methods. They can perform any program logic, throw exceptions, be overridden, and be declared with any modifiers allowed by the programming language. Note, however, that properties can also be static. If a property is static, there are limitations on what the get and set methods can do. See your programming language reference for details.
My view:
If a setter or getter is expected to be expensive, don't make it a property, make it a method.
If setting a property triggers events due to changes, this is fine. How else would you allow listeners to be notified of changes? However, you may want to offer a BeginInit/EndInit pair to suppress events until all changes are made. Normally, it is the responsibility of the event handler to return promptly, but if you really can't trust it to do so, then you may wish to signal the event in another thread.
If setting a property throws exceptions on invalid values, it's also fine. This is a reasonable way to signal the problem when the value is completely wrong. In other cases, you set a bunch of properties and then call a method that uses them to do something, such as make a connection. This would allow holding off validation and error-handling until the properties are used, so the properties would not need to throw anything.
Accessing a property may have side-effects so long as they aren't unexpected and don't matter. This means a JIT instantiation in a getter is fine. Likewise, setting a dirty flag for the instance whenever a change is made is just fine, as it setting a related property, such as a different format for the same value.
If it does something as opposed to just accessing a value, it should be a method. Method are verbs, so creating a connection would be done by the OpenConnection() method, not a Connection property. A Connection property would be used to retrieve the connection in use, or to bind the instance to another connection.
edit - added 5, changed 2 and 3
I agree with the idea that getters/settings shouldn't have side effects, but I would say that they shouldn't have non-obvious side effects.
As far as throwing exceptions, if you are setting a property to an invalid value (in a very basic sense), then validation exceptions are fine. However, if the setter is running whole series of complicated business rule validation, or trying to go off and update other objects, or any other thing that may cause an exception, then that is bad. But this problem is not really an issue with the exception itself, but rather that the setter is going off and secretly performing a lot of functionlity that the caller would not (or should not) expect.
The same with events. If a setter is throwing an event saying that "this property changed", then it's OK, because that's an obvious side effect. But if it's firing off some other custom event so cause some hidden chuck of code to execute in another part of a system, it's bad.
This is the same reason that I avoid lazy-loading in getters. Indeed, they can make things easier a lot of the time, but they can make things a more confusing some of the time, because there always ends up being convoluted logic around exactly when you want the child objects loaded. It's usually just one more line of code to explicitly load the child objects when you are populating the parent object, and it can avoid a lot of confusion about the object state. But this aspect can get very subjective, and a lot of it depends on the situation.
I've always found the conservative approach to be best, when working in C# anyway. Because properties are syntactically the same as fields, they should work like fields: no exceptions, no validation, no funny business. (Indeed, most of my properties start out as simple fields, and don't become properties until absolutely necessary.) The idea is that if you see something that looks like it's getting or setting a field set, then it IS something like getting or setting a field, in terms of functionality (there is no exception thrown), overall efficiency (setting variables doesn't trigger a cascade of delegate calls, for example) and effect on program's state (setting a variable sets that variable, and doesn't call lots of delegates that could do just about anything).
Sensible things for a property set to do include setting a flag to indicate that there's been a change:
set {
if(this.value!=value) {
this.changed=true;
this.value=value;
}
}
Perhaps actually set a value on another object, e.g.:
set { this.otherObject.value=value; }
Maybe disentangle the input a bit, to simplify the class's internal code:
set {
this.isValid=(value&Flags.IsValid)!=0;
this.setting=value&Flags.SettingMask;
}
(Of course, in these latter two cases, the get function might well do the opposite.)
If anything more complicated needs to happen, in particular calling delegates, or performing validation, or throwing exceptions, then my view is that a function is better. (Quite often, my fields turn into properties with get and set, and then end up as a get property and a set function.) Similarly for the getters; if you're returning a reference to something, it's no problem, but if you're creating a whole new large object and filling it in each time the property is read -- not so hot.

Is it better for class data to be passed internally or accessed directly?

Example:
// access fields directly
private void doThis()
{
return doSomeWork(this.data);
}
// receive data as an argument
private void doThis(data)
{
return doSomeWork(data);
}
The first option is coupled to the value in this.data while the second option avoids this coupling. I feel like the second option is always better. It promotes loose coupling WITHIN the class. Accessing global class data willy-nilly throughout just seems like a bad idea. Obviously this class data needs to be accessed directly at some point. However, if accesses, to this global class data can be eliminated by parameter passing, it seems that this is always preferable.
The second example has the advantage of working with any data of the proper type, whereas the first is bound to working with the just class data. Even if you don't NEED the additional flexibility, it seems nice to leave it as an option.
I just don't see any advantage in accessing member data directly from private methods as in the first example. Whats the best practice here? I've referenced code complete, but was not able to find anything on this particular issue.
if the data is part of the object's state, private/protected is just fine. option 1 - good.
i noticed some developers like to create private/protected vars just to pass parameters between methods in a class so that they dun have to pass them in the method call. they are not really to store the model/state of an object. ...then, option 1 - NOT good.
Why option 1 not good in this case...
expose only as much as you need (var scoping). so, pass the data in. do not create a private/protected var just to pass data between 2 methods.
private methods that figures out everything internally makes it very easy to understand. keep it this way, unless its unavoidable.
private/protected vars make it harder to refactor as your method is not 'self encompassing', it depends on external vars that might be used elsewhere.
my 2 cents! :-)
In class global data are not a problem IMHO. Classes are used to couple state, behaviour and identity. So such a coupling is not a problem. The argument suggests, that you can call that method with data from other objects, even of other classes and I think that should be more considered than coupling inside class.
They are both instance methods, therefore #1 makes more sense unless you have a situation involving threads (but depending on the language and scenario, even then you can simply lock/mark the data method as syncronized - my Java knowledge is rusty).
The second technique is more reminiscent of procedural programming.

How much responsibility should a method have?

This is most certainly a language agnostic question and one that has bothered me for quite some time now. An example will probably help me explain the dilemma I am facing:
Let us say we have a method which is responsible for reading a file, populating a collection with some objects (which store information from the file), and then returning the collection...something like the following:
public List<SomeObject> loadConfiguration(String filename);
Let us also say that at the time of implementing this method, it would seem infeasible for the application to continue if the collection returned was empty (a size of 0). Now, the question is, should this validation (checking for an empty collection and perhaps the subsequent throwing of an exception) be done within the method? Or, should this methods sole responsibility be to perform the load of the file and ignore the task of validation, allowing validation to be done at some later stage outside of the method?
I guess the general question is: is it better to decouple the validation from the actual task being performed by a method? Will this make things, in general, easier at a later stage to change or build upon - in the case of my example above, it may be the case at a later stage where a different strategy is added to recover from the event of an empty collection being return from the 'loadConfiguration' method..... this would be difficult if the validation (and resulting exception) was being done in the method.
Perhaps I am being overly pedantic in the quest for some dogmatic answer, where instead it simply just relies on the context in which a method is being used. Anyhow, I would be very interested in seeing what others have to say regarding this.
Thanks all!
My recommendation is to stick to the single responsibility principle which says, in a nutshell, that each object should have 1 purpose. In this instance, your method has 3 purposes and then 4 if you count the validation aspect.
Here's my recommendation on how to handle this and how to provide a large amount of flexibility for future updates.
Keep your LoadConfig method
Have it call the a new method for reading the file.
Pass the previous method's return value to another method for loading the file into the collection.
Pass the object collection into some validation method.
Return the collection.
That's taking 1 method initially and breaking it into 4 with one calling 3 others. This should allow you to change pieces w/o having any impact on others.
Hope this helps
I guess the general question is: is it
better to decouple the validation from
the actual task being performed by a
method?
Yes. (At least if you really insist on answering such a general question – it’s always quite easy to find a counter-example.) If you keep both the parts of the solution separate, you can exchange, drop or reuse any of them. That’s a clear plus. Of course you must be careful not to jeopardize your object’s invariants by exposing the non-validating API, but I think you are aware of that. You’ll have to do some little extra typing, but that won’t hurt you.
I will answer your question by a question: do you want various validation methods for the product of your method ?
This is the same as the 'constructor' issue: is it better to raise an exception during the construction or initialize a void object and then call an 'init' method... you are sure to raise a debate here!
In general, I would recommend performing the validation as soon as possible: this is known as the Fail Fast which advocates that finding problems as soon as possible is better than delaying the detection since diagnosis is immediate while later you would have to revert the whole flow....
If you're not convinced, think of it this way: do you really want to write 3 lines every time you load a file ? (load, parse, validate) Well, that violates the DRY principle.
So, go agile there:
write your method with validation: it is responsible for loading a valid configuration (1)
if you ever need some parametrization, add it then (like a 'check' parameter, with a default value which preserves the old behavior of course)
(1) Of course, I don't advocate a single method to do all this at once... it's an organization matter: under the covers this method should call dedicated methods to organize the code :)
To deflect the question to a more basic one, each method should do as little as possible. So in your example, there should be a method that reads in the file, a method that extracts the necessary data from the file, another method to write that data to the collection, and another method that calls these methods. The validation can go in a separate method, or in one of the others, depending on where it makes the most sense.
private byte[] ReadFile(string fileSpec)
{
// code to read in file, and return contents
}
private FileData GetFileData(string fileContents)
{
// code to create FileData struct from file contents
}
private void FileDataCollection: Collection<FileData> { }
public void DoItAll (string fileSpec, FileDataCollection filDtaCol)
{
filDtaCol.Add(GetFileData(ReadFile(fileSpec)));
}
Add validation, verification to each of the methods as appropriate
You are designing an API and should not make any unnecessary assumptions about your client. A method should take only the information that it needs, return only the information requested, and only fail when it is unable to return a meaningful value.
So, with that in mind, if the configuration is loadable but empty, then returning an empty list seems correct to me. If your client has an application specific requirement to fail when provided an empty list, then it may do so, but future clients may not have that requirement. The loadConfiguration method itself should fail when it really fails, such as when it is unable to read or parse the file.
But you can continue to decouple your interface. For example, why must the configuration be stored in a file? Why can't I provide a URL, a row in a database, or a raw string containing the configuration data? Very few methods should take a file path as an argument since it binds them tightly to the local file system and makes them responsible for opening, reading, and closing files in addition to their core logic. Consider accepting an input stream as an alternative. Or if you want to allow for elaborate alternatives -- like data from a database -- consider accepting a ConfigurationReader interface or similar.
Methods should be highly cohesive ... that is single minded. So my opinion would be to separate the responsibilities as you have described. I sometimes feel tempted to say...it is just a short method so it does not matter...then I regret it 1.5 weeks later.
I think this depends on the case: If you could think of a scenario where you would use this method and it returned an empty list, and this would be okay, then I would not put the validation inside the method. But for e.g. a method which inserts data into a database which have to be validated (is the email address correct, has a name been specified, ... ) then it should be ok to put validation code inside the function and throw an exception.
Another alternative, not mentioned above, is to support Dependency Injection and have the method client inject a validator. This would allow the preservation of the "strong" Resource Acquisition Is Initialization principle, that is to say Any Object which Loads Successfully is Ready For Business (Matthieu's mention of Fail Fast is much the same notion).
It also allows a resource implementation class to create its own low-level validators which rely on the structure of the resource without exposing clients to implementation details unnecessarily, which can be useful when dealing with multiple disparate resource providers such as Ryan listed.

OOP Design Question - Validating properties

I have the habit of always validating property setters against bad data, even if there's no where in my program that would reasonably input bad data. My QA person doesn't want me throwing exceptions unless I can explain where they would occur. Should I be validating all properties? Is there a standard on this I could point to?
Example:
public void setName(String newName){
if (newName == null){
throw new IllegalArgumentException("Name cannot be null");
}
name = newName;
}
...
//Only call to setName(String)
t.setName("Jim");
You're enforcing your method's preconditions, which are an important part of its contract. There's nothing wrong with doing that, and it also serves as self-documenting code (if I read your method's code, I immediately see what I shouldn't pass to it), though asserts may be preferable for that.
Personally I prefer using Asserts in these wildly improbable cases just to avoid difficult to read code but to make it clear that assumptions are being made in the function's algorithms.
But, of course, this is very much a judgement call that has to be made on a case-by-case basis. You can see it (and I have seen it) get completely out of hand - to the point where a simple function is turned into a tangle of if statements that pretty much never evaluate to true.
You are doing ok !
Whether it's a setter or a function - always validate and throw meaningfull exception. you never know when you'll need it, and you will...
In general I don't favor this practice. It's not that performing validation is bad, but rather, on things like simple setters it tends to create more clutter than its worth in protecting from bugs. I prefer using unit tests to insure there are no bugs.
Well, it's been awhile since this question was posted but I'd like to give a different point of view on this topic.
Using the specific example you posted, IMHO you should doing validation, but in a different way.
The key to archieving validation lies in the question itself. Think about it: you're dealing with names, not strings.
A string is a name when it's not null. We can also think of additional characteristics that make a string a name: is cannot be empty nor contain spaces.
Suppose you need to add those validation rules: if you stick with your approach you'll end up cluttering your setter as #SingleShot said.
Also, what would you do if more than one domain object has a setName setter?
Even if you use helper classes as #dave does, code will still be duplicated: calls to the helper instances.
Now, think for a moment: what if all the arguments you could ever receive in the setName method were valid? No validation would be needed for sure.
I might sound too optimistic, but it can be done.
Remember you're dealing with names, so why don't model the concept of a name?
Here's a vanilla, dummy implementation to show the idea:
public class Name
public static Name From(String value) {
if (string.IsNullOrEmpty(value)) throw new ...
if (value.contains(' ')) throw new ...
return new Name(value);
}
private Name(string value) {
this.value = value;
}
// other Name stuff goes here...
}
Because validation is happening at the moment of creation, you can only get valid Name instances. There's no way to create a Name instance from an "invalid" string.
Not only the validation code has been centralized, but also exceptions are thrown in a context that have meaning to them (the creation of a Name instance).
You can read about great design principles in Hernan Wilkinson's "Design Principles Behind Patagonia" (the name example is taken from it). Be sure to check the ESUG 2010 Video and the presentation slides
Finally, I think you might find Jim Shore's "Fail Fast" article interesting.
It's a tradeoff. It's more code to write, review and maintain, but you'll probably find problems quicker if somehow a null name gets through.
I lean towards having it because eventually you find you do need it.
I used to have utility classes to keep the code to a minimum. So instead of
if (name == null) { throw new ...
you could have
Util.assertNotNull(name)
Then Java added asserts to the language and you could do it more directly. And turn it off if you wanted.
It's well done in my opinion. For null values throw IllegalArgumentException. For other kind of validations you should consider using a customized exceptions hierarchy related to your domain objects.
I'm not aware of any documented standard that says 'validate all user input' but it's a very good idea. In the current version of the program it may not be possible to access this particular property with invalid data but that's not going to prevent it from happening in the future. All sorts of interesting things happen during maintenance. And hey, you never know when someone will reuse the class in another application that doesn't validate data before passing it in.