Should ecapsulated objects be public or private? - oop

I'm a little unclear as to how far to take the idea in making all members within a class private and make public methods to handle mutations. Primitive types are not the issue, it's encapsulated object that I am unclear about. The benefit of making object members private is the ability to hide methods that do not apply to the context of class being built. The downside is that you have to provide public methods to pass parameters to the underlying object (more methods, more work). On the otherside, if you want to have all methods and properties exposed for the underlying object, couldn't you just make the object public? What are the dangers in having objects exposed this way?
For example, I would find it useful to have everything from a vector, or Array List, exposed. The only downside I can think of is that public members could potentially assigned a type that its not via implicit casting (or something to that affect). Would a volitile designation reduce the potential for problems?
Just a side note: I understand that true enapsulation implies that members are private.

What are the dangers in having objects exposed this way?
Changing the type of those objects would require changing the interface to the class. With private objects + public getters/setters, you'd only have to modify the code in the getters and setters, assuming you want to keep the things being returned the same.
Note that this is why properties are useful in languages such as Python, which technically doesn't have private class members, only obscured ones at most.

The problem with making instance variables public is that you can never change your mind later, and make them private, without breaking existing code that relies on directly public access to those instance vars. Some examples:
You decide to later make your class thread-safe by synchronizing all access to instance vars, or maybe by using a ThreadLocal to create a new copy of the value for each thread. Can't do it if any thread can directly access the variables.
Using your example of a vector or array list - at some point, you realize that there is a security flaw in your code because those classes are mutable, so somebody else can replace the contents of the list. If this were only available via an accessor method, you could easily solve the problem by making an immutable copy of the list upon request, but you can't do that with a public variable.
You realize later that one of your instance vars is redundant and can be derived based on other variables. Once again, easy if you're using accessors, impossible with public variables.
I think that it boils down to a practical point - if you know that you're the only one who will be using this code, and it pains you to write accessors (every IDE will do it for you automatically), and you don't mind changing your own code later if you decide to break the API, then go for it. But if other people will be using your class, or if you would like to make it easier to refactor later for your own use, stick with accessors.

Object oriented design is just a guideline. Think about it from the perspective of the person who will be using your class. Balance OOD with making it intuitive and easy to use.

You could run into issues depending on the language you are using and how it treats return statements or assignment operators. In some cases it may give you a reference, or values in other cases.
For example, say you have a PrimeCalculator class that figures out prime numbers, then you have another class that does something with those prime numbers.
public PrimeCalculator calculatorObject = new PrimeCalculator();
Vector<int> primeNumbers = calculatorObject.PrimeNumbersVector;
/* do something complicated here */
primeNumbers.clear(); // free up some memory
When you use this stuff later, possibly in another class, you don't want the overhead of calculating the numbers again so you use the same calculatorObject.
Vector<int> primes = calculatorObject.PrimeNumbersVector;
int tenthPrime = primes.elementAt(9);
It may not exactly be clear at this point whether primes and primeNumbers reference the same Vector. If they do, trying to get the tenth prime from primes would throw an error.
You can do it this way if you're careful and understand what exactly is happening in your situation, but you have a smaller margin of error using functions to return a value rather than assigning the variable directly.

Well you can check the post :
first this
then this
This should solve your confusion . It solved mine ! Thanks to Nicol Bolas.
Also read the comments below the accepted answer (also notice the link given in the second last comment by me ( in the first post) )
Also visit the wikipedia post

Related

Flaw: Constructor does Real Work

I have a class which represents a set of numbers. The constructor takes three arguments: startValue, endValue and stepSize.
The class is responsible for holding a list containing all values between start and end value taking the stepSize into consideration.
Example: startValue: 3, endValue: 1, stepSize = -1, Collection = { 3,2,1 }
I am currently creating the collection and some info strings about the object in the constructor. The public members are read only info strings and the collection.
My constructor does three things at the moment:
Checks the arguments; this could throw an exception from the constructor
Fills values into the collection
Generates the information strings
I can see that my constructor does real work but how can I fix this, or, should I fix this? If I move the "methods" out of the constructor it is like having init function and leaving me with an not fully initialized object. Is the existence of my object doubtful? Or is it not that bad to have some work done in the constructor because it is still possible to test the constructor because no object references are created.
For me it looks wrong but it seems that I just can't find a solution. I also have taken a builder into account but I am not sure if that's right because you can't choose between different types of creations. However single unit tests would have less responsibility.
I am writing my code in C# but I would prefer a general solution, that's why the text contains no code.
EDIT: Thanks for editing my poor text (: I changed the title back because it represents my opinion and the edited title did not. I am not asking if real work is a flaw or not. For me, it is. Take a look at this reference.
http://misko.hevery.com/code-reviewers-guide/flaw-constructor-does-real-work/
The blog states the problems quite well. Still I can't find a solution.
Concepts that urge you to keep your constructors light weight:
Inversion of control (Dependency Injection)
Single responsibility principle (as applied to the constructor rather than a class)
Lazy initialization
Testing
K.I.S.S.
D.R.Y.
Links to arguments of why:
How much work should be done in a constructor?
What (not) to do in a constructor
Should a C++ constructor do real work?
http://misko.hevery.com/code-reviewers-guide/flaw-constructor-does-real-work/
If you check the arguments in the constructor that validation code can't be shared if those arguments come in from any other source (setter, constructor, parameter object)
If you fill values into the collection or generate the information strings in the constructor that code can't be shared with other constructors you may need to add later.
In addition to not being able to be shared there is also being delayed until really needed (lazy init). There is also overriding thru inheritance that offers more options with many methods that just do one thing rather then one do everything constructor.
Your constructor only needs to put your class into a usable state. It does NOT have to be fully initialized. But it is perfectly free to use other methods to do the real work. That just doesn't take advantage of the "lazy init" idea. Sometimes you need it, sometimes you don't.
Just keep in mind anything that the constructor does or calls is being shoved down the users / testers throat.
EDIT:
You still haven't accepted an answer and I've had some sleep so I'll take a stab at a design. A good design is flexible so I'm going to assume it's OK that I'm not sure what the information strings are, or whether our object is required to represent a set of numbers by being a collection (and so provides iterators, size(), add(), remove(), etc) or is merely backed by a collection and provides some narrow specialized access to those numbers (such as being immutable).
This little guy is the Parameter Object pattern
/** Throws exception if sign of endValue - startValue != stepSize */
ListDefinition(T startValue, T endValue, T stepSize);
T can be int or long or short or char. Have fun but be consistent.
/** An interface, independent from any one collection implementation */
ListFactory(ListDefinition ld){
/** Make as many as you like */
List<T> build();
}
If we don't need to narrow access to the collection, we're done. If we do, wrap it in a facade before exposing it.
/** Provides read access only. Immutable if List l kept private. */
ImmutableFacade(List l);
Oh wait, requirements change, forgot about 'information strings'. :)
/** Build list of info strings */
InformationStrings(String infoFilePath) {
List<String> read();
}
Have no idea if this is what you had in mind but if you want the power to count line numbers by twos you now have it. :)
/** Assuming information strings have a 1 to 1 relationship with our numbers */
MapFactory(List l, List infoStrings){
/** Make as many as you like */
Map<T, String> build();
}
So, yes I'd use the builder pattern to wire all that together. Or you could try to use one object to do all that. Up to you. But I think you'll find few of these constructors doing much of anything.
EDIT2
I know this answer's already been accepted but I've realized there's room for improvement and I can't resist. The ListDefinition above works by exposing it's contents with getters, ick. There is a "Tell, don't ask" design principle that is being violated here for no good reason.
ListDefinition(T startValue, T endValue, T stepSize) {
List<T> buildList(List<T> l);
}
This let's us build any kind of list implementation and have it initialized according to the definition. Now we don't need ListFactory. buildList is something I call a shunt. It returns the same reference it accepted after having done something with it. It simply allows you to skip giving the new ArrayList a name. Making a list now looks like this:
ListDefinition<int> ld = new ListDefinition<int>(3, 1, -1);
List<int> l = new ImmutableFacade<int>( ld.buildList( new ArrayList<int>() ) );
Which works fine. Bit hard to read. So why not add a static factory method:
List<int> l = ImmutableRangeOfNumbers.over(3, 1, -1);
This doesn't accept dependency injections but it's built on classes that do. It's effectively a dependency injection container. This makes it a nice shorthand for popular combinations and configurations of the underlying classes. You don't have to make one for every combination. The point of doing this with many classes is now you can put together whatever combination you need.
Well, that's my 2 cents. I'm gonna find something else to obsess on. Feedback welcome.
As far as cohesion is concerned, there's no "real work", only work that's in line (or not) with the class/method's responsibility.
A constructor's responsibility is to create an instance of a class. And a valid instance for that matter. I'm a big fan of keeping the validation part as intrinsic as possible, so that you can see the invariants every time you look at the class. In other words, that the class "contains its own definition".
However, there are cases when an object is a complex assemblage of multiple other objects, with conditional logic, non-trivial validation or other creation sub-tasks involved. This is when I'd delegate the object creation to another class (Factory or Builder pattern) and restrain the accessibility scope of the constructor, but I think twice before doing it.
In your case, I see no conditionals (except argument checking), no composition or inspection of complex objects. The work done by your constructor is cohesive with the class because it essentially only populates its internals. While you may (and should) of course extract atomic, well identified construction steps into private methods inside the same class, I don't see the need for a separate builder class.
The constructor is a special member function, in a way that it constructor, but after all - it is a member function. As such, it is allowed to do things.
Consider for example c++ std::fstream. It opens a file in the constructor. Can throw an exception, but doesn't have to.
As long as you can test the class, it is all good.
It's true, a constructur should do minimum of work oriented to a single aim - successful creaation of the valid object. Whatever it takes is ok. But not more.
In your example, creating this collection in the constructor is perfectly valid, as object of your class represent a set of numbers (your words). If an object is set of numbers, you should clearly create it in the constructor! On the contrary - the constructur does not perform what it is made for - a fresh, valid object construction.
These info strings call my attention. What is their purpose? What exactly do you do? This sounds like something periferic, something that can be left for later and exposed through a method, like
String getInfo()
or similar.
If you want to use Microsoft's .NET Framework was an example here, it is perfectly valid both semantically and in terms of common practice, for a constructor to do some real work.
An example of where Microsoft does this is in their implementation of System.IO.FileStream. This class performs string processing on path names, opens new file handles, opens threads, binds all sorts of things, and invokes many system functions. The constructor is actually, in effect, about 1,200 lines of code.
I believe your example, where you are creating a list, is absolutely fine and valid. I would just make sure that you fail as often as possible. Say if you the minimum size higher than the maximum size, you could get stuck in an infinite loop with a poorly written loop condition, thus exhausting all available memory.
The takeaway is "it depends" and you should use your best judgement. If all you wanted was a second opinion, then I say you're fine.
It's not a good practice to do "real work" in the constructor: you can initialize class members, but you shouldn't call other methods or do more "heavy lifting" in the constructor.
If you need to do some initialization which requires a big amount of code running, a good practice will be to do it in an init() method which will be called after the object was constructed.
The reasoning for not doing heavy lifting inside the constructor is: in case something bad happens, and fails silently, you'll end up having a messed up object and it'll be a nightmare to debug and realize where the issues are coming from.
In the case you describe above I would only do the assignments in the constructor and then, in two separate methods, I would implement the validations and generate the string-information.
Implementing it this way also conforms with SRP: "Single Responsibility Principle" which suggests that any method/function should do one thing, and one thing only.

OOP confusion in classes

I am from a C# background and have been doing programming for quite some time now. But only recently i started giving some thoughts on how i program. Apparently, my OOP is very bad.
I have a few questions maybe someone can help me out. They are basic but i want to confirm.
1- In C#, we can declare class properties like
private int _test;
and there setter getters like
public int Test {get; set;}
Now, lets say i have to use this property inside the class. Which one will i use ? the private one or the public one ? or they both are the same ?
2- Lets say that i have to implement a class that does XML Parsing. There can be different things that we can use as input for the class like "FILE PATH". Should i make this a class PROPERTY or should i just pass it as an argument to a public function in the class ? Which approach is better. Check the following
I can create a class property and use like this
public string FilePath {get; set;}
public int Parse()
{
var document = XDocument.Load(this.FilePath);
.........//Remaining code
}
Or
I can pass the filepath as a parameter
public int Parse(string filePath)
On what basis should i make a decision that i should make a property or i should pass something as argument ?
I know the solutions of these questions but i want to know the correct approach. If you can recommend some video lectures or books that will be nice also.
Fields vs Properties
Seems like you've got a few terms confused.
private int _test;
This is an instance field (also called member).
This field will allow direct access to the value from inside the class.
Note that I said "inside the class". Because it is private, it is not accessible from outside the class. This is important to preserve encapsulation, a cornerstone of OOP. Encapsulation basically tells us that instance members can't be accessed directly outside the class.
For this reason we make the member private and provide methods that "set" and "get" the variable (at least: in Java this is the way). These methods are exposed to the outside world and force whoever is using your class to go trough your methods instead of accessing your variable directly.
It should be noted that you also want to use your methods/properties when you're inside the current class. Each time you don't, you risk bypassing validation rules. Play it safe and always use the methods instead of the backing field.
The netto result from this is that you can force your logic to be applied to changes (set) or retrieval (get). The best example is validation: by forcing people to use your method, your validation logic will be applied before (possibly) setting a field to a new value.
public int Test {get; set;}
This is an automatically implemented property. A property is crudely spoken an easier way of using get/set methods.
Behind the scenes, your code translates to
private int _somevariableyoudontknow;
public void setTest(int t){
this._somevariableyoudontknow = t;
}
public int getTest(){
return this._somevariableyoudontknow;
}
So it is really very much alike to getters and setters. What's so nice about properties is that you can define on one line the things you'd do in 7 lines, while still maintaining all the possibilities from explicit getters and setters.
Where is my validation logic, you ask?
In order to add validation logic, you have to create a custom implemented property.
The syntax looks like this:
private int _iChoseThisName;
public int Test {
get {
return _iChoseThisName;
}
set {
if(value > 5) { return _iChoseThisName; }
throw new ArgumentException("Value must be over 5!");
}
}
Basically all we did was provide an implementation for your get and set. Notice the value keyword!
Properties can be used as such:
var result = SomeClass.Test; // returns the value from the 'Test' property
SomeClass.Test = 10; // sets the value of the 'Test' property
Last small note: just because you have a property named Test, does not mean the backing variable is named test or _test. The compiler will generate a variablename for you that serves as the backing field in a manner that you will never have duplication.
XML Parsing
If you want your second answer answered, you're going to have to show how your current architecture looks.
It shouldn't be necessary though: it makes most sense to pass it as a parameter with your constructor. You should just create a new XmlParser (random name) object for each file you want to parse. Once you're parsing, you don't want to change the file location.
If you do want this: create a method that does the parsing and let it take the filename as a parameter, that way you still keep it in one call.
You don't want to create a property for the simple reason that you might forget to both set the property and call the parse method.
There are really two questions wrapped in your first question.
1) Should I use getters and setters (Accessors and Mutators) to access a member variable.
The answer depends on whether the implementation of the variable is likely to change. In some cases, the interface type (the type returned by the getter, and set by the setter) needs to be kept consistent but the underlying mechanism for storing the data may change. For instance, the type of the property may be a String but in fact the data is stored in a portion of a much larger String and the getter extracts that portion of the String and returns it to the user.
2) What visibility should I give a property?
Visibility is entirely dependent on use. If the property needs to be accessible to other classes or to classes that inherit from the base class then the property needs to be public or protected.
I never expose implementation to external concerns. Which is to say I always put a getter and setter on public and protected data because it helps me ensure that I will keep the interface the same even if the underlying implementation changes. Another common issue with external changes is that I want a chance to intercept an outside user's attempt to modify a property, maybe to prevent it, but more likely to keep the objects state in a good or safe state. This is especially important for cached values that may be exposed as properties. Think of a property that sums the contents of an array of values. You don't want to recalculate the value every time it is referenced so you need to be certain that the setter for the elements in the array tells the object that the sum needs to be recalculated. This way you keep the calculation to a minimum.
I think the second question is: When do I make a value that I could pass in to a constructor public?
It depends on what the value is used for. I generally think that there are two distinct types of variables passed in to constructors. Those that assist in the creation of the object (your XML file path is a good example of this) and those that are passed in because the object is going to be responsible for their management. An example of this is in collections which you can often initialize the collection with an array.
I follow these guidelines.
If the value passed in can be changed without damaging the state of the object then it can be made into a property and publicly visible.
If changing the value passed in will damage the state of the object or redefine its identity then it should be left to the constructor to initialize the state and not be accesible again through property methods.
A lot of these terms are confusing because of the many different paradigms and languages in OO Design. The best place to learn about good practices in OO Design is to start with a good book on Patterns. While the so-called Gang of Four Book http://en.wikipedia.org/wiki/Design_Patterns was the standard for many years, there have since been many better books written.
Here are a couple resources on Design Patterns:
http://sourcemaking.com/design_patterns
http://www.oodesign.com/
And a couple on C# specific.
http://msdn.microsoft.com/en-us/magazine/cc301852.aspx
http://www.codeproject.com/Articles/572738/Building-an-application-using-design-patterns-and
I can possibly answer your first question. You asked "I have to use this property inside the class." That sounds to me like you need to use your private variable. The public method which you provided I believe will only do two things: Allow a client to set one of your private variables, or to allow a client to "see" (get) the private variable. But if you want to "use this property inside the class", the private variable is the one that should be your focus while working with the data within the class. Happy holidays :)
The following is my personal opinion based on my personal experience in various programming languages. I do not think that best practices are necessarily static for all projects.
When to use getters, when to use private instance variables directly
it depends.
You probably know that, but let's talk about why we usually want getters and setters instead of public instance variables: it allows us to aquire the full power of OOP.
While an instance variable is just some dump piece of memory (the amount of dumbness surely depends on the language you're working in), a getter is not bound to a specific memory location. The getter allows childs in the OOP hirarchy to override the behaviour of the "instance variable" without being bound to it. Thus, if you have an interface with various implementations, some may use ab instance variable, while others may use IO to fetch data from the network, calculate it from other values, etc.
Thus, getters do not necessarily return the instance variable (in some languages this is more complicated, such as c++ with the virtual keyword, but I'll try to be language-independent here).
Why is that related to the inner class behaviour? If you have a class with a non-final getter, the getter and the inner variable may return different values. Thus, if you need to be sure it is the inner value, use it directly. If you, however, rely on the "real" value, always use the getter.
If the getter is final or the language enforces the getter to be equal (and this case is way more common than the first case), I personally prefer accessing the private field directly; this makes code easy to read (imho) and does not yield any performance penalty (does not apply to all languages).
When to use parameters, when to use instance variables/properties
use parameters whereever possible.
Never use instance variables or properties as parameters. A method should be as self-contained as possible. In the example you stated, the parameterized version is way better imo.
Intance variables (with getters or not) are properties of the instance. As they are part of the instance, they should be logically bound to it.
Have a look at your example. If you hear the word XMLParser, what do you think about it? Do you think that a parser can only parse a single file it is bound to? Or do you think that a parser can parse any files? I tend to the last one (additionally, using an instance variable would additionally kill thread-safety).
Another example: You wish to create an XMLArchiver, taking multiple xml documents into a single archive. When implementing, you'd have the filename as a parameter of the constructor maybe opening an outputstream towards the file and storing a reference to it as an instance variable. Then, you'd call archiver.add(stuff-to-add) multiple times. As you see, the file (thus, the filename) is naturally bound to the XMLArchiver instance, not to the method adding files to it.

Is it better for class data to be passed internally or accessed directly?

Example:
// access fields directly
private void doThis()
{
return doSomeWork(this.data);
}
// receive data as an argument
private void doThis(data)
{
return doSomeWork(data);
}
The first option is coupled to the value in this.data while the second option avoids this coupling. I feel like the second option is always better. It promotes loose coupling WITHIN the class. Accessing global class data willy-nilly throughout just seems like a bad idea. Obviously this class data needs to be accessed directly at some point. However, if accesses, to this global class data can be eliminated by parameter passing, it seems that this is always preferable.
The second example has the advantage of working with any data of the proper type, whereas the first is bound to working with the just class data. Even if you don't NEED the additional flexibility, it seems nice to leave it as an option.
I just don't see any advantage in accessing member data directly from private methods as in the first example. Whats the best practice here? I've referenced code complete, but was not able to find anything on this particular issue.
if the data is part of the object's state, private/protected is just fine. option 1 - good.
i noticed some developers like to create private/protected vars just to pass parameters between methods in a class so that they dun have to pass them in the method call. they are not really to store the model/state of an object. ...then, option 1 - NOT good.
Why option 1 not good in this case...
expose only as much as you need (var scoping). so, pass the data in. do not create a private/protected var just to pass data between 2 methods.
private methods that figures out everything internally makes it very easy to understand. keep it this way, unless its unavoidable.
private/protected vars make it harder to refactor as your method is not 'self encompassing', it depends on external vars that might be used elsewhere.
my 2 cents! :-)
In class global data are not a problem IMHO. Classes are used to couple state, behaviour and identity. So such a coupling is not a problem. The argument suggests, that you can call that method with data from other objects, even of other classes and I think that should be more considered than coupling inside class.
They are both instance methods, therefore #1 makes more sense unless you have a situation involving threads (but depending on the language and scenario, even then you can simply lock/mark the data method as syncronized - my Java knowledge is rusty).
The second technique is more reminiscent of procedural programming.

What is the point of defining Access Modifiers?

I understand the differences between them (at least in C#). I know the effects they have on the elements to which they are assigned. What I don't understand is why it is important to implement them - why not have everything Public?
The material I read on the subject usually goes on about how classes and methods shouldn't have unnecessary access to others, but I've yet to come across an example of why/how that would be a bad thing. It seems like a security thing, but I'm the programmer; I create the methods and define what they will (or will not) do. Why would I spend all the effort to write a function which tried to change a variable it shouldn't, or tried to read information in another class, if that would be bad?
I apologize if this is a dumb question. It's just something I ran into on the first articles I ever read on OOP, and I've never felt like it really clicked.
I'm the programmer is a correct assumption only if you're the only programmer.
In many cases, other programmers work with the first programmer's code. They use it in ways he didn't intend by fiddling with the values of fields they shouldn't, and they create a hack that works, but breaks when the producer of the original code changes it.
OOP is about creating libraries with well-defined contracts. If all your variables are public and accessible to others, then the "contract" theoretically includes every field in the object (and its sub-objects), so it becomes much harder to build a new, different implementation that still honors the original contract.
Also, the more "moving parts" of your object are exposed, the easier it is for a user of your class to manipulate it incorrectly.
You probably don't need this, but here's an example I consider amusing:
Say you sell a car with no hood over the engine compartment. Come nighttime, the driver turns on the lights. He gets to his destination, gets out of the car and then remembers he left the light on. He's too lazy to unlock the car's door, so he pulls the wire to the lights out from where it's attached to the battery. This works fine - the light is out. However, because he didn't use the intended mechanism, he finds himself with a problem next time he's driving in the dark.
Living in the USA (go ahead, downvote me!), he refuses to take responsibility for his incorrect use of the car's innards, and sues you, the manufacturer for creating a product that's unsafe to drive in the dark because the lights can't be reliably turned on after having been turned off.
This is why all cars have hoods over their engine compartments :)
A more serious example: You create a Fraction class, with a numerator and denominator field and a bunch of methods to manipulate fractions. Your constructor doesn't let its caller create a fraction with a 0 denominator, but since your fields are public, it's easy for a user to set the denominator of an existing (valid) fraction to 0, and hilarity ensues.
First, nothing in the language forces you to use access modifiers - you are free to make everything public in your class if you wish. However, there are some compelling reasons for using them. Here's my perspective.
Hiding the internals of how your class operates allows you to protect that class from unintended uses. While you may be the creator of the class, in many cases you will not be the only consumer - or even maintainer. Hiding internal state protects the class for people who may not understand its workings as well as you. Making everything public creates the temptation to "tweak" the internal state or internal behavior when the class isn't acting the way you may want - rather than actually correcting the public interface of internal implementation. This is the road to ruin.
Hiding internals helps to de-clutter the namespace, and allows tools like Intellisense to display only the relevant and meaningful methods/properties/fields. Don't discount tools like Intellisense - they are a powerful means for developers to quickly identify what they can do with your class.
Hiding internals allows you to structure an interface appropriate for the problem the class is solving. Exposing all of the internals (which often substantially outnumber the exposed interface) makes it hard to later understand what the class is trying to solve.
Hiding internals allows you to focus your testing on the appropriate portion - the public interface. When all methods/properties of a class are public, the number of permutations you must potentially test increases significantly - since any particular call path becomes possible.
Hiding internals helps you control (enforce) the call paths through your class. This makes it easier to ensure that your consumers understand what your class can be asked to do - and when. Typically, there are only a few paths through your code that are meaningful and useful. Allowing a consumer to take any path makes it more likely that they will not get meaningful results - and will interpret that as your code being buggy. Limiting how your consumers can use your class actually frees them to use it correctly.
Hiding the internal implementation frees you to change it with the knowledge that it will not adversely impact consumers of your class - so long as your public interface remains unchanged. If you decide to use a dictionary rather than a list internally - no one should care. But if you made all the internals of your class available, someone could write code that depends on the fact that your internally use a list. Imagine having to change all of the consumers when you want to change such choices about your implementation. The golden rule is: consumers of a class should not care how the class does what it does.
It is primarily a hiding and sharing thing. You may produce and use all your own code, but other people provide libraries, etc. to be used more widely.
Making things non-public allows you to explicitly define the external interface of your class. The non-public stuff is not part of the external interface, which means you can change anything you want internally without affecting anyone using the external interface,
You only want to expose the API and keep everything else hidden. Why?
Ok lets assume you want to make an awesome Matrix library so you make
class Matrix {
public Object[][] data //data your matrix storages
...
public Object[] getRow()
}
By default any other programmer that use your library will want to maximize the speed of his program by tapping into the underlying structure.
//Someone else's function
Object one() {data[0][0]}
Now, you discover that using list to emulate the matrix will increase performance so you change data from
Object[][] data => Object[] data
causes Object one() to break. In other words by changing your implementation you broke backward compatibility :-(
By encapsulating you divide internal implementation from external interface (achieved with a private modifier).
That way you can change implementation as much as possible without breaking backward compatibility :D Profit!!!
Of course if you are the only programmer that is ever going to modify or use that class you might as well as keep it public.
Note: There are other major benefits for encapsulating your stuff, this is just one of many. See Encapsulation for more details
I think the best reason for this is to provide layers of abstraction on your code.
As your application grows, you will need to have your objects interacting with other objects. Having publicly modifiable fields makes it harder to wrap your head around your entire application.
Limiting what you make public on your classes makes it easier to abstract your design so you can understand each layer of your code.
For some classes, it may seem ridiculous to have private members, with a bunch of methods that just set and get those values. The reason for it is that let's say you have a class where the members are public and directly accessible:
class A
{
public int i;
....
}
And now you go on using that in a bunch of code you wrote. Now after writing a bunch of code that directly accesses i and now you realize that i should have some constraints on it, like i should always be >= 0 and less than 100 (for argument's sake).
Now, you could go through all of your code where you used i and check for this constraint, but you could just add a public setI method that would do it for you:
class A
{
private int i;
public int I
{
get {return i;}
set
{
if (value >= 0 && value < 100)
i = value;
else
throw some exception...
}
}
}
This hides all of that error checking. While the example is trite, situations like these come up quite often.
It is not related to security at all.
Access modifers and scope are all about structure, layers, organization, and communication.
If you are the only programmer, it is probably fine until you have so much code even you can't remember. At that point, it's just like a team environment - the access modifiers and the structure of the code guide you to stay within the architecture.

Should protected attributes always be banned?

I seldom use inheritance, but when I do, I never use protected attributes because I think it breaks the encapsulation of the inherited classes.
Do you use protected attributes ? what do you use them for ?
In this interview on Design by Bill Venners, Joshua Bloch, the author of Effective Java says:
Trusting Subclasses
Bill Venners: Should I trust subclasses more intimately than
non-subclasses? For example, do I make
it easier for a subclass
implementation to break me than I
would for a non-subclass? In
particular, how do you feel about
protected data?
Josh Bloch: To write something that is both subclassable and robust
against a malicious subclass is
actually a pretty tough thing to do,
assuming you give the subclass access
to your internal data structures. If
the subclass does not have access to
anything that an ordinary user
doesn't, then it's harder for the
subclass to do damage. But unless you
make all your methods final, the
subclass can still break your
contracts by just doing the wrong
things in response to method
invocation. That's precisely why the
security critical classes like String
are final. Otherwise someone could
write a subclass that makes Strings
appear mutable, which would be
sufficient to break security. So you
must trust your subclasses. If you
don't trust them, then you can't allow
them, because subclasses can so easily
cause a class to violate its
contracts.
As far as protected data in general,
it's a necessary evil. It should be
kept to a minimum. Most protected data
and protected methods amount to
committing to an implementation
detail. A protected field is an
implementation detail that you are
making visible to subclasses. Even a
protected method is a piece of
internal structure that you are making
visible to subclasses.
The reason you make it visible is that
it's often necessary in order to allow
subclasses to do their job, or to do
it efficiently. But once you've done
it, you're committed to it. It is now
something that you are not allowed to
change, even if you later find a more
efficient implementation that no
longer involves the use of a
particular field or method.
So all other things being equal, you
shouldn't have any protected members
at all. But that said, if you have too
few, then your class may not be usable
as a super class, or at least not as
an efficient super class. Often you
find out after the fact. My philosophy
is to have as few protected members as
possible when you first write the
class. Then try to subclass it. You
may find out that without a particular
protected method, all subclasses will
have to do some bad thing.
As an example, if you look at
AbstractList, you'll find that there
is a protected method to delete a
range of the list in one shot
(removeRange). Why is that in there?
Because the normal idiom to remove a
range, based on the public API, is to
call subList to get a sub-List,
and then call clear on that
sub-List. Without this particular
protected method, however, the only
thing that clear could do is
repeatedly remove individual elements.
Think about it. If you have an array
representation, what will it do? It
will repeatedly collapse the array,
doing order N work N times. So it will
take a quadratic amount of work,
instead of the linear amount of work
that it should. By providing this
protected method, we allow any
implementation that can efficiently
delete an entire range to do so. And
any reasonable List implementation
can delete a range more efficiently
all at once.
That we would need this protected
method is something you would have to
be way smarter than me to know up
front. Basically, I implemented the
thing. Then, as we started to subclass
it, we realized that range delete was
quadratic. We couldn't afford that, so
I put in the protected method. I think
that's the best approach with
protected methods. Put in as few as
possible, and then add more as needed.
Protected methods represent
commitments to designs that you may
want to change. You can always add
protected methods, but you can't take
them out.
Bill Venners: And protected data?
Josh Bloch: The same thing, but even more. Protected data is even more
dangerous in terms of messing up your
data invariants. If you give someone
else access to some internal data,
they have free reign over it.
Short version: it breaks encapsulation but it's a necessary evil that should be kept to a minimum.
C#:
I use protected for abstract or virtual methods that I want base classes to override. I also make a method protected if it may be called by base classes, but I don't want it called outside the class hierarchy.
You may need them for static (or 'global') attribute you want your subclasses or classes from same package (if it is about java) to benefit from.
Those static final attributes representing some kind of 'constant value' have seldom a getter function, so a protected static final attribute might make sense in that case.
Scott Meyers says don't use protected attributes in Effective C++ (3rd ed.):
Item 22: Declare data members private.
The reason is the same you give: it breaks encapsulations. The consequence is that otherwise local changes to the layout of the class might break dependent types and result in changes in many other places.
I don't use protected attributes in Java because they are only package protected there. But in C++, I'll use them in abstract classes, allowing the inheriting class to inherit them directly.
There are never any good reasons to have protected attributes. A base class must be able to depend on state, which means restricting access to data through accessor methods. You can't give anyone access to your private data, even children.
I recently worked on a project were the "protected" member was a very good idea. The class hiearchy was something like:
[+] Base
|
+--[+] BaseMap
| |
| +--[+] Map
| |
| +--[+] HashMap
|
+--[+] // something else ?
The Base implemented a std::list but nothing else. The direct access to the list was forbidden to the user, but as the Base class was incomplete, it relied anyway on derived classes to implement the indirection to the list.
The indirection could come from at least two flavors: std::map and stdext::hash_map. Both maps will behave the same way but for the fact the hash_map needs the Key to be hashable (in VC2003, castable to size_t).
So BaseMap implemented a TMap as a templated type that was a map-like container.
Map and HashMap were two derived classes of BaseMap, one specializing BaseMap on std::map, and the other on stdext::hash_map.
So:
Base was not usable as such (no public accessors !) and only provided common features and code
BaseMap needed easy read/write to a std::list
Map and HashMap needed easy read/write access to the TMap defined in BaseMap.
For me, the only solution was to use protected for the std::list and the TMap member variables. There was no way I would put those "private" because I would anyway expose all or almost all of their features through read/write accessors anyway.
In the end, I guess that if you en up dividing your class into multiple objects, each derivation adding needed features to its mother class, and only the most derived class being really usable, then protected is the way to go. The fact the "protected member" was a class, and so, was almost impossible to "break", helped.
But otherwise, protected should be avoided as much as possible (i.e.: Use private by default, and public when you must expose the method).
The protected keyword is a conceptual error and language design botch, and several modern languages, such as Nim and Ceylon (see http://ceylon-lang.org/documentation/faq/language-design/#no_protected_modifier), that have been carefully designed rather than just copying common mistakes, don't have such a keyword.
It's not protected members that breaks encapsulation, it's exposing members that shouldn't be exposed that breaks encapsulation ... it doesn't matter whether they are protected or public. The problem with protected is that it is wrongheaded and misleading ... declaring members protected (rather than private) doesn't protect them, it does the opposite, exactly as public does. A protected member, being accessible outside the class, is exposed to the world and so its semantics must be maintained forever, just as is the case for public. The whole idea of "protected" is nonsense ... encapsulation is not security, and the keyword just furthers the confusion between the two. You can help a little by avoiding all uses of protected in your own classes -- if something is an internal part of the implementation, isn't part of the class's semantics, and may change in the future, then make it private or internal to your package, module, assembly, etc. If it is an unchangeable part of the class semantics, then make it public, and then you won't annoy users of your class who can see that there's a useful member in the documentation but can't use it, unless they are creating their own instances and can get at it by subclassing.
In general, no you really don't want to use protected data members. This is doubly true if your writing an API. Once someone inherits from your class you can never really do maintenance and not somehow break them in a weird and sometimes wild way.
I use them. In short, it's a good way, if you want to have some attributes shared. Granted, you could write set/get functions for them, but if there is no validation, then what's the point? It's also faster.
Consider this: you have a class which is your base class. It has quite a few attributes you wan't to use in the child objects. You could write a get/set function for each, or you can just set them.
My typical example is a file/stream handler. You want to access the handler (i.e. file descriptor), but you want to hide it from other classes. It's way easier than writing a set/get function for it.
I think protected attributes are a bad idea. I use CheckStyle to enforce that rule with my Java development teams.
In general, yes. A protected method is usually better.
In use, there is a level of simplicity given by using a protected final variable for an object that is shared by all the children of a class. I'd always advise against using it with primitives or collections since the contracts are impossible to define for those types.
Lately I've come to separate stuff you do with primitives and raw collections from stuff you do with well-formed classes. Primitives and collections should ALWAYS be private.
Also, I've started occasionally exposing public member variables when they are declaired final and are well-formed classes that are not too flexible (again, not primitives or collections).
This isn't some stupid shortcut, I thought it out pretty seriously and decided there is absolutely no difference between a public final variable exposing an object and a getter.
It depends on what you want. If you want a fast class then data should be protected and use protected and public methods.
Because I think you should assume that your users who derive from your class know your class quite well or at least they have read your manual at the function they going to override.
If your users mess with your class it is not your problem. Every malicious user can add the following lines when overriding one of your virtuals:
(C#)
static Random rnd=new Random();
//...
if (rnd.Next()%1000==0) throw new Exception("My base class sucks! HAHAHAHA! xD");
//...
You can't seal every class to prevent this.
Of course if you want a constraint on some of your fields then use accessor functions or properties or something you want and make that field private because there is no other solution...
But I personally don't like to stick to the oop principles at all costs. Especially making properties with the only purpose to make data members private.
(C#):
private _foo;
public foo
{
get {return _foo;}
set {_foo=value;}
}
This was my personal opinion.
But do what your boss require (if he wants private fields than do that.)
I use protected variables/attributes within base classes that I know I don't plan on changing into methods. That way, subclasses have full access to their inherited variables, and don't have the (artificially created) overhead of going through getters/setters to access them. An example is a class using an underlying I/O stream; there is little reason not to allow subclasses direct access to the underlying stream.
This is fine for member variables that are used in direct simple ways within the base class and all subclasses. But for a variable that has a more complicated use (e.g., accessing it causes side effects in other members within the class), a directly accessible variable is not appropriate. In this case, it can be made private and public/protected getters/setters can be provided instead. An example is an internal buffering mechanism provided by the base class, where accessing the buffers directly from a subclass would compromise the integrity of the algorithms used by the base class to manage them.
It's a design judgment decision, based on how simple the member variable is, and how it is expected to be so in future versions.
Encapsulation is great, but it can be taken too far. I've seen classes whose own private methods accessed its member variables using only getter/setter methods. This is overkill, since if a class can't trust its own private methods with its own private data, who can it trust?