Is type checking ever OK? - oop

Is type checking considered bad practice even if you are checking against an interface? I understand that you should always program to an interface and not an implementation - is this what it means?
For example, in PHP, is the following OK?
if($class instanceof AnInterface) {
// Do some code
}
Or is there a better way of altering the behaviour of code based on a class type?
Edit: Just to be clear I am talking about checking whether a class implements an interface not just that it is an instance of a certain class.

As long as you follow the LSP, I don't see a problem. Your code must work with any implementation of the interface. It's not a problem that certain implementations cause you to follow different code paths, as long as you can correctly work with any implementation of the interface.
If your code doesn't work with all implementations of the interface, then you shouldn't use the interface in the first place.

If you can avoid type checking you should; however, one scenario where I found it handy, was we had a web service which took a message but the contents of the message could change. We had to persist the message back into a db, in order to get the right component to break the message down to its proper tables we used type checking in a sense.
What I find more common and flexible then if ($class instanceof SomeOtherType) is to define an IProcessing strategy for example and then using factory based on the type $class create the correct class.
So in c# roughly this:
void Process(Message msg)
{
IProcessor processor=ProcessignFactory.GetProcessor(msg.GetType());
processor.Process(msg);
}
However sometimes doing this can be overkill if your only dealing with one variation that won't change implement it using a type check, and when / if you find you were wrong and it requires more checks then refactor it into a more robust solution.

In my practice any checking for type (as well as type casting) has always indicated that something is wrong with the code or with the language.
So I try to avoid it whenever possible.

Run-time type checking is often necessary in situations where an interface provides all the methods necessary to do something, but does not provide enough to do it well. A prime example of such a situation is determining the number of items in an enumerable sequence. It's possible to make such a determination by enumerating through the sequence, but many enumerable objects "know" how many items they contain. If an object knows how many items it contains, it will likely be more efficient to ask it than to enumerate through the collection and count the items individually.
Arguably, IEnumerable should have provided some methods to ask what it knows about the number of items it contains [recognizing the possibility that the object may know that the number is unbounded, or that it's at most 4,591 (but could be a lot less), etc.], but it doesn't. What might be ideal would be if a new version of IEnumerable interface could be produced that included default implementations for any "new" methods it adds, and if such interface could be considered to be implemented by any implementations of the present version. Unfortunately, because no such feature exists, the only way to get the count of an enumerable collection without enumerating it is to check whether it implements any known collection interfaces that include a Count member.

Related

Is it possible to determine if two objects both implement a common interface which is not specified at compile time?

Given Object1 and Object2, are there any techniques for determining if they both implement a common interface? No problem if the interface is known at compile time (use typeof ... is [known interface]), but what about if interface isn't specified at compile time?
Specific use case is implementing a strongly typed collection object. I only want to add Object2 if it shares a common interface as Object1. Typename doesn't work since it returns the underlying object type and I may have two distinct objects each implementing ISomeInterface but on different underlying classes.
An example that doesn't quite work can be found here (as it relies on typename but that doesn't allow for interface comparisons)
Specifically, expanding the IsTypeSafe function found here on CodeReview but adapted so that if an object supports an interface common to all previously added items, it can be added to the list.
Specific question: is there a way to determine if two objects both implement a common interface that is unspecified at compile time?
I got really confused with your "unspecified at compile time" wording, but the crux of your question is here:
if an object supports an interface common to all previously added items, it can be added to the list.
In other words, you're asking if there's a way to do this in VBA (pseudo-mish-mash of VBA/C#):
isOk = item.Type.Interfaces.Any(i => other.Type.Interfaces.Contains(i))
In order to be able to inspect an object variable's implemented interfaces, you'd need to be able to inspect its type at run-time. This ability is called "reflection"... and VBA can't do that.
Rubberduck (disclaimer: I manage this OSS VBIDE add-in project) has a COM API that might eventually grow to support exactly that though (it's open-source, implement it - we are very happy to take pull requests!), but in order to work its magic it needs to literally parse and resolve the entire project and all its references, which means using reflection for what you'd like to use it for, would be a massive performance hit.
A "type-safe" List class in VBA is basically smokes & mirrors. Sorry!

Flaw: Constructor does Real Work

I have a class which represents a set of numbers. The constructor takes three arguments: startValue, endValue and stepSize.
The class is responsible for holding a list containing all values between start and end value taking the stepSize into consideration.
Example: startValue: 3, endValue: 1, stepSize = -1, Collection = { 3,2,1 }
I am currently creating the collection and some info strings about the object in the constructor. The public members are read only info strings and the collection.
My constructor does three things at the moment:
Checks the arguments; this could throw an exception from the constructor
Fills values into the collection
Generates the information strings
I can see that my constructor does real work but how can I fix this, or, should I fix this? If I move the "methods" out of the constructor it is like having init function and leaving me with an not fully initialized object. Is the existence of my object doubtful? Or is it not that bad to have some work done in the constructor because it is still possible to test the constructor because no object references are created.
For me it looks wrong but it seems that I just can't find a solution. I also have taken a builder into account but I am not sure if that's right because you can't choose between different types of creations. However single unit tests would have less responsibility.
I am writing my code in C# but I would prefer a general solution, that's why the text contains no code.
EDIT: Thanks for editing my poor text (: I changed the title back because it represents my opinion and the edited title did not. I am not asking if real work is a flaw or not. For me, it is. Take a look at this reference.
http://misko.hevery.com/code-reviewers-guide/flaw-constructor-does-real-work/
The blog states the problems quite well. Still I can't find a solution.
Concepts that urge you to keep your constructors light weight:
Inversion of control (Dependency Injection)
Single responsibility principle (as applied to the constructor rather than a class)
Lazy initialization
Testing
K.I.S.S.
D.R.Y.
Links to arguments of why:
How much work should be done in a constructor?
What (not) to do in a constructor
Should a C++ constructor do real work?
http://misko.hevery.com/code-reviewers-guide/flaw-constructor-does-real-work/
If you check the arguments in the constructor that validation code can't be shared if those arguments come in from any other source (setter, constructor, parameter object)
If you fill values into the collection or generate the information strings in the constructor that code can't be shared with other constructors you may need to add later.
In addition to not being able to be shared there is also being delayed until really needed (lazy init). There is also overriding thru inheritance that offers more options with many methods that just do one thing rather then one do everything constructor.
Your constructor only needs to put your class into a usable state. It does NOT have to be fully initialized. But it is perfectly free to use other methods to do the real work. That just doesn't take advantage of the "lazy init" idea. Sometimes you need it, sometimes you don't.
Just keep in mind anything that the constructor does or calls is being shoved down the users / testers throat.
EDIT:
You still haven't accepted an answer and I've had some sleep so I'll take a stab at a design. A good design is flexible so I'm going to assume it's OK that I'm not sure what the information strings are, or whether our object is required to represent a set of numbers by being a collection (and so provides iterators, size(), add(), remove(), etc) or is merely backed by a collection and provides some narrow specialized access to those numbers (such as being immutable).
This little guy is the Parameter Object pattern
/** Throws exception if sign of endValue - startValue != stepSize */
ListDefinition(T startValue, T endValue, T stepSize);
T can be int or long or short or char. Have fun but be consistent.
/** An interface, independent from any one collection implementation */
ListFactory(ListDefinition ld){
/** Make as many as you like */
List<T> build();
}
If we don't need to narrow access to the collection, we're done. If we do, wrap it in a facade before exposing it.
/** Provides read access only. Immutable if List l kept private. */
ImmutableFacade(List l);
Oh wait, requirements change, forgot about 'information strings'. :)
/** Build list of info strings */
InformationStrings(String infoFilePath) {
List<String> read();
}
Have no idea if this is what you had in mind but if you want the power to count line numbers by twos you now have it. :)
/** Assuming information strings have a 1 to 1 relationship with our numbers */
MapFactory(List l, List infoStrings){
/** Make as many as you like */
Map<T, String> build();
}
So, yes I'd use the builder pattern to wire all that together. Or you could try to use one object to do all that. Up to you. But I think you'll find few of these constructors doing much of anything.
EDIT2
I know this answer's already been accepted but I've realized there's room for improvement and I can't resist. The ListDefinition above works by exposing it's contents with getters, ick. There is a "Tell, don't ask" design principle that is being violated here for no good reason.
ListDefinition(T startValue, T endValue, T stepSize) {
List<T> buildList(List<T> l);
}
This let's us build any kind of list implementation and have it initialized according to the definition. Now we don't need ListFactory. buildList is something I call a shunt. It returns the same reference it accepted after having done something with it. It simply allows you to skip giving the new ArrayList a name. Making a list now looks like this:
ListDefinition<int> ld = new ListDefinition<int>(3, 1, -1);
List<int> l = new ImmutableFacade<int>( ld.buildList( new ArrayList<int>() ) );
Which works fine. Bit hard to read. So why not add a static factory method:
List<int> l = ImmutableRangeOfNumbers.over(3, 1, -1);
This doesn't accept dependency injections but it's built on classes that do. It's effectively a dependency injection container. This makes it a nice shorthand for popular combinations and configurations of the underlying classes. You don't have to make one for every combination. The point of doing this with many classes is now you can put together whatever combination you need.
Well, that's my 2 cents. I'm gonna find something else to obsess on. Feedback welcome.
As far as cohesion is concerned, there's no "real work", only work that's in line (or not) with the class/method's responsibility.
A constructor's responsibility is to create an instance of a class. And a valid instance for that matter. I'm a big fan of keeping the validation part as intrinsic as possible, so that you can see the invariants every time you look at the class. In other words, that the class "contains its own definition".
However, there are cases when an object is a complex assemblage of multiple other objects, with conditional logic, non-trivial validation or other creation sub-tasks involved. This is when I'd delegate the object creation to another class (Factory or Builder pattern) and restrain the accessibility scope of the constructor, but I think twice before doing it.
In your case, I see no conditionals (except argument checking), no composition or inspection of complex objects. The work done by your constructor is cohesive with the class because it essentially only populates its internals. While you may (and should) of course extract atomic, well identified construction steps into private methods inside the same class, I don't see the need for a separate builder class.
The constructor is a special member function, in a way that it constructor, but after all - it is a member function. As such, it is allowed to do things.
Consider for example c++ std::fstream. It opens a file in the constructor. Can throw an exception, but doesn't have to.
As long as you can test the class, it is all good.
It's true, a constructur should do minimum of work oriented to a single aim - successful creaation of the valid object. Whatever it takes is ok. But not more.
In your example, creating this collection in the constructor is perfectly valid, as object of your class represent a set of numbers (your words). If an object is set of numbers, you should clearly create it in the constructor! On the contrary - the constructur does not perform what it is made for - a fresh, valid object construction.
These info strings call my attention. What is their purpose? What exactly do you do? This sounds like something periferic, something that can be left for later and exposed through a method, like
String getInfo()
or similar.
If you want to use Microsoft's .NET Framework was an example here, it is perfectly valid both semantically and in terms of common practice, for a constructor to do some real work.
An example of where Microsoft does this is in their implementation of System.IO.FileStream. This class performs string processing on path names, opens new file handles, opens threads, binds all sorts of things, and invokes many system functions. The constructor is actually, in effect, about 1,200 lines of code.
I believe your example, where you are creating a list, is absolutely fine and valid. I would just make sure that you fail as often as possible. Say if you the minimum size higher than the maximum size, you could get stuck in an infinite loop with a poorly written loop condition, thus exhausting all available memory.
The takeaway is "it depends" and you should use your best judgement. If all you wanted was a second opinion, then I say you're fine.
It's not a good practice to do "real work" in the constructor: you can initialize class members, but you shouldn't call other methods or do more "heavy lifting" in the constructor.
If you need to do some initialization which requires a big amount of code running, a good practice will be to do it in an init() method which will be called after the object was constructed.
The reasoning for not doing heavy lifting inside the constructor is: in case something bad happens, and fails silently, you'll end up having a messed up object and it'll be a nightmare to debug and realize where the issues are coming from.
In the case you describe above I would only do the assignments in the constructor and then, in two separate methods, I would implement the validations and generate the string-information.
Implementing it this way also conforms with SRP: "Single Responsibility Principle" which suggests that any method/function should do one thing, and one thing only.

Downsides about using interface as parameter and return type in OOP

This is a question independent from languages.
Conceptually, it's good to code for interfaces(contracts) instead of specific implementations. I've got no problem understanding merits about the practice.
However, when I really code in that practice, the users of my classes, from time to time need to cast the interfaces for specific needs of specific functions provided by specific classes that implement that interface.
I understand there must be something wrong, either on my side or on the user's side, as the interface should expose all methods/properties(in the case of c#) that can possibly be necessary.
The code base is huge, and the users are clients.
It won't be particularly easy to make changes on either side.
That makes me wonder some downsides about using interface as parameter and return type.
Can people please list demerits of the practice? And please, include any solution if you know how to work around it.
Thanks a lot for enlightening me.
EDIT:
To be a bit more specific:
Assume we have a class called DbInfoExtractor. It has a public method GetInfo, as follows:
public IInformation GetInfo(IInfoParam);
where IInformation is an interface implemented by specific classes like VideoInfo, AudioInfo, TextInfo, etc; IInfoParam is an interface implemented by specific classes like VidoeInfoParam, AudioInfoParam, TextInfoParam, etc;
Apparently, depending on the specific object passed into the method GetInfo, the DbInfoExtractor needs to take different actions, as it is reasonable to assume that for different types of information, the extractor considers different sets of aspects(e.g. {size, title, date} for video, {title, author} for text information, etc) as search keys and search for relevant information in different ways.
Here, I see two options to go on:
1, using if ... else ... to decide what actually to take depending on the type of the parameter the GetInfo method receives. This is certainly bad, as avoiding this situation is one the very reasons we use polymorphism.
2, We should call IInfoParam.TakeAction(), and each specific implementation of IInfoParam has its own TakeAction() method to actually search and find the corresponding information from the database.
This options seems better, but still quite bad, as it shouldn't be the parameter that takes action searching and finding the information; it should be the responsibility of DbInfoExtractor.
So how can I delegate the TakeAction back to DbInfoExtractor? (I actually wrote some code to do this, but it's neither standard nor elegant. Basically I make parameter classes nested classes in DbInfoExtractor, so that they can call various versions of TakeAction of DbInfoExtractor.)
Please enlighten me!
Thanks.
Thanks.
Why not
public IVideoInformation GetVideoInformation(VideoQuery);
public IAudioInformation GetAudioInformation(AudioQuery);
// etc.
It doesn't look like there's a need for polymorphism here.
The query types are Query Objects, if you need those. They probably don't need to be interfaces; they know nothing about the database. A simple list of parameters (maybe just ID) might be sufficient.
The question is what does the client have, and what do they want? That's your interface.
Switch statements and casting are a smell, and typically mean that you've violated the Liskov substitution principle.

Should ecapsulated objects be public or private?

I'm a little unclear as to how far to take the idea in making all members within a class private and make public methods to handle mutations. Primitive types are not the issue, it's encapsulated object that I am unclear about. The benefit of making object members private is the ability to hide methods that do not apply to the context of class being built. The downside is that you have to provide public methods to pass parameters to the underlying object (more methods, more work). On the otherside, if you want to have all methods and properties exposed for the underlying object, couldn't you just make the object public? What are the dangers in having objects exposed this way?
For example, I would find it useful to have everything from a vector, or Array List, exposed. The only downside I can think of is that public members could potentially assigned a type that its not via implicit casting (or something to that affect). Would a volitile designation reduce the potential for problems?
Just a side note: I understand that true enapsulation implies that members are private.
What are the dangers in having objects exposed this way?
Changing the type of those objects would require changing the interface to the class. With private objects + public getters/setters, you'd only have to modify the code in the getters and setters, assuming you want to keep the things being returned the same.
Note that this is why properties are useful in languages such as Python, which technically doesn't have private class members, only obscured ones at most.
The problem with making instance variables public is that you can never change your mind later, and make them private, without breaking existing code that relies on directly public access to those instance vars. Some examples:
You decide to later make your class thread-safe by synchronizing all access to instance vars, or maybe by using a ThreadLocal to create a new copy of the value for each thread. Can't do it if any thread can directly access the variables.
Using your example of a vector or array list - at some point, you realize that there is a security flaw in your code because those classes are mutable, so somebody else can replace the contents of the list. If this were only available via an accessor method, you could easily solve the problem by making an immutable copy of the list upon request, but you can't do that with a public variable.
You realize later that one of your instance vars is redundant and can be derived based on other variables. Once again, easy if you're using accessors, impossible with public variables.
I think that it boils down to a practical point - if you know that you're the only one who will be using this code, and it pains you to write accessors (every IDE will do it for you automatically), and you don't mind changing your own code later if you decide to break the API, then go for it. But if other people will be using your class, or if you would like to make it easier to refactor later for your own use, stick with accessors.
Object oriented design is just a guideline. Think about it from the perspective of the person who will be using your class. Balance OOD with making it intuitive and easy to use.
You could run into issues depending on the language you are using and how it treats return statements or assignment operators. In some cases it may give you a reference, or values in other cases.
For example, say you have a PrimeCalculator class that figures out prime numbers, then you have another class that does something with those prime numbers.
public PrimeCalculator calculatorObject = new PrimeCalculator();
Vector<int> primeNumbers = calculatorObject.PrimeNumbersVector;
/* do something complicated here */
primeNumbers.clear(); // free up some memory
When you use this stuff later, possibly in another class, you don't want the overhead of calculating the numbers again so you use the same calculatorObject.
Vector<int> primes = calculatorObject.PrimeNumbersVector;
int tenthPrime = primes.elementAt(9);
It may not exactly be clear at this point whether primes and primeNumbers reference the same Vector. If they do, trying to get the tenth prime from primes would throw an error.
You can do it this way if you're careful and understand what exactly is happening in your situation, but you have a smaller margin of error using functions to return a value rather than assigning the variable directly.
Well you can check the post :
first this
then this
This should solve your confusion . It solved mine ! Thanks to Nicol Bolas.
Also read the comments below the accepted answer (also notice the link given in the second last comment by me ( in the first post) )
Also visit the wikipedia post

How much responsibility should a method have?

This is most certainly a language agnostic question and one that has bothered me for quite some time now. An example will probably help me explain the dilemma I am facing:
Let us say we have a method which is responsible for reading a file, populating a collection with some objects (which store information from the file), and then returning the collection...something like the following:
public List<SomeObject> loadConfiguration(String filename);
Let us also say that at the time of implementing this method, it would seem infeasible for the application to continue if the collection returned was empty (a size of 0). Now, the question is, should this validation (checking for an empty collection and perhaps the subsequent throwing of an exception) be done within the method? Or, should this methods sole responsibility be to perform the load of the file and ignore the task of validation, allowing validation to be done at some later stage outside of the method?
I guess the general question is: is it better to decouple the validation from the actual task being performed by a method? Will this make things, in general, easier at a later stage to change or build upon - in the case of my example above, it may be the case at a later stage where a different strategy is added to recover from the event of an empty collection being return from the 'loadConfiguration' method..... this would be difficult if the validation (and resulting exception) was being done in the method.
Perhaps I am being overly pedantic in the quest for some dogmatic answer, where instead it simply just relies on the context in which a method is being used. Anyhow, I would be very interested in seeing what others have to say regarding this.
Thanks all!
My recommendation is to stick to the single responsibility principle which says, in a nutshell, that each object should have 1 purpose. In this instance, your method has 3 purposes and then 4 if you count the validation aspect.
Here's my recommendation on how to handle this and how to provide a large amount of flexibility for future updates.
Keep your LoadConfig method
Have it call the a new method for reading the file.
Pass the previous method's return value to another method for loading the file into the collection.
Pass the object collection into some validation method.
Return the collection.
That's taking 1 method initially and breaking it into 4 with one calling 3 others. This should allow you to change pieces w/o having any impact on others.
Hope this helps
I guess the general question is: is it
better to decouple the validation from
the actual task being performed by a
method?
Yes. (At least if you really insist on answering such a general question – it’s always quite easy to find a counter-example.) If you keep both the parts of the solution separate, you can exchange, drop or reuse any of them. That’s a clear plus. Of course you must be careful not to jeopardize your object’s invariants by exposing the non-validating API, but I think you are aware of that. You’ll have to do some little extra typing, but that won’t hurt you.
I will answer your question by a question: do you want various validation methods for the product of your method ?
This is the same as the 'constructor' issue: is it better to raise an exception during the construction or initialize a void object and then call an 'init' method... you are sure to raise a debate here!
In general, I would recommend performing the validation as soon as possible: this is known as the Fail Fast which advocates that finding problems as soon as possible is better than delaying the detection since diagnosis is immediate while later you would have to revert the whole flow....
If you're not convinced, think of it this way: do you really want to write 3 lines every time you load a file ? (load, parse, validate) Well, that violates the DRY principle.
So, go agile there:
write your method with validation: it is responsible for loading a valid configuration (1)
if you ever need some parametrization, add it then (like a 'check' parameter, with a default value which preserves the old behavior of course)
(1) Of course, I don't advocate a single method to do all this at once... it's an organization matter: under the covers this method should call dedicated methods to organize the code :)
To deflect the question to a more basic one, each method should do as little as possible. So in your example, there should be a method that reads in the file, a method that extracts the necessary data from the file, another method to write that data to the collection, and another method that calls these methods. The validation can go in a separate method, or in one of the others, depending on where it makes the most sense.
private byte[] ReadFile(string fileSpec)
{
// code to read in file, and return contents
}
private FileData GetFileData(string fileContents)
{
// code to create FileData struct from file contents
}
private void FileDataCollection: Collection<FileData> { }
public void DoItAll (string fileSpec, FileDataCollection filDtaCol)
{
filDtaCol.Add(GetFileData(ReadFile(fileSpec)));
}
Add validation, verification to each of the methods as appropriate
You are designing an API and should not make any unnecessary assumptions about your client. A method should take only the information that it needs, return only the information requested, and only fail when it is unable to return a meaningful value.
So, with that in mind, if the configuration is loadable but empty, then returning an empty list seems correct to me. If your client has an application specific requirement to fail when provided an empty list, then it may do so, but future clients may not have that requirement. The loadConfiguration method itself should fail when it really fails, such as when it is unable to read or parse the file.
But you can continue to decouple your interface. For example, why must the configuration be stored in a file? Why can't I provide a URL, a row in a database, or a raw string containing the configuration data? Very few methods should take a file path as an argument since it binds them tightly to the local file system and makes them responsible for opening, reading, and closing files in addition to their core logic. Consider accepting an input stream as an alternative. Or if you want to allow for elaborate alternatives -- like data from a database -- consider accepting a ConfigurationReader interface or similar.
Methods should be highly cohesive ... that is single minded. So my opinion would be to separate the responsibilities as you have described. I sometimes feel tempted to say...it is just a short method so it does not matter...then I regret it 1.5 weeks later.
I think this depends on the case: If you could think of a scenario where you would use this method and it returned an empty list, and this would be okay, then I would not put the validation inside the method. But for e.g. a method which inserts data into a database which have to be validated (is the email address correct, has a name been specified, ... ) then it should be ok to put validation code inside the function and throw an exception.
Another alternative, not mentioned above, is to support Dependency Injection and have the method client inject a validator. This would allow the preservation of the "strong" Resource Acquisition Is Initialization principle, that is to say Any Object which Loads Successfully is Ready For Business (Matthieu's mention of Fail Fast is much the same notion).
It also allows a resource implementation class to create its own low-level validators which rely on the structure of the resource without exposing clients to implementation details unnecessarily, which can be useful when dealing with multiple disparate resource providers such as Ryan listed.