Is "serialisation without duplication" possible in c++0x? - serialization

One of the big uses of code generation in c++ is to support message serialisation. Typically, you want to support specifying message contents and layout in the same step and produce code for that message type that can give you objects capable of being serialised to/from communication streams. In the past, this has usually resulted in code that looks like:
class MyMessage : public SerialisableObject
{
// message members
int myNumber_;
std::string myString_;
std::vector<MyOtherSerialisableObject> aBunchOfThingsIWantToSerialise_;
public:
// ctor, dtor, accesors, mutators, then:
virtual void Serialise(SerialisationStream & stream)
{
stream & myNumber_;
stream & myString_;
stream & aBunchOfThingsIWantToSerialise_;
}
};
The problem with using this kind of design is that violates an important rule of good architecture: you should not have to specify the intent of a design twice. Duplication of intent, like duplicated code and other common development duplication, leaves room for one place in the code to become divergent with the other, causing errors.
In the above, the duplication is the list of members. Potential errors include adding a member to the class but forgetting to add it to the serialisation list, serialising a member twice (possibly by not using the same order as the member declaration or possibly due to a misspelling of a similar member, among other ways), or serialising something that is not a member (which might produce a compiler error, unless name lookup finds something at a different scope than the object that matches lookup rules). That kind of mistake is the same reason we no longer try to match every heap allocation with a delete (instead using smart pointers) or ever file open with a close (using RAII ctor//dtor mechanisms) - we don't want to have to match up our intent in multiple places because there are times we - or another engineer less familiar with the intent - make mistakes.
Generally, therefore, this has been one of the things that code generation could take care of. You might create a file MyMessage.cg to specify both layout and members in one step
serialisable MyMessage
{
int myNumber_;
std::string myString_;
std::vector<MyOtherSerialisableObject> aBunchOfThingsIWantToSerialise_;
};
that would be run through a code generation utility and produce the code.
I was wondering if it was possible yet to do this in c++0x without external code generation. Are there any new language mechanisms that make it possible to specify a class as serialisable once, and the names and layout of it's members are used to layout the message during serialisation?
To be clear, I know that there are tricks with boost tuples and fusion that can come close to this kind of behavior even in the pre-c++0x language. Those usages, though, being based on indexing into the tuple rather than by-member-name access, have all been brittle to changing the layout, as other places in the code that access the messages would then also need to be reordered. Some kind of by-member-name access is necessary to not have to duplicate the layout specification in places in the code that use the messages.
Also, I know it might be nice to take this up to the next level and ask for specifying when some of the members shouldn't be serialised. Other languages that offer serialisation built in often offer some kind of attribute to do this, so
int myNonSerialisedNumber_ [[noserialise]];
might seem natural. However, I personally think it is bad design to have serialisable objects where everything is not serialised, since the lifetime of messages is in the transport to/from the communications layer, separate from other data lifetimes. Also, you could have an object which has a purely serialisable as on of it's members, so such functionality doesn't by anything the language doesn't already offer.
Is this possible? Or did the standards committee leave out this kind of introspective capability? I don't need it to look like the code gen file above - any simple method for compiletime specification of layout and members in a single step would solve this common problem.

This is both possible and practical in C++11 – in fact it was possible back in C++03, the syntax was just a little too unwieldy. I wrote a small library based around the same idea - see the following:
www.github.com/molw5/framework
Sample syntax:
class Object : serializable <Object,
value <NAME(“Field 1”), int>,
value <NAME(“Field 2”), float>,
value <NAME(“Field 3”), double>>
{
};
Most of the underlying code could be reproduced, in principal, in C++03 – some of the implementation details without variadic templates would have been...tricky, but I believe it would have been possible to recover the core functionality. What you could not reproduce in C++03 was the NAME macro above and the syntax relies fairly heavily on it. The macro provides the machinery necessary to generate a unique typename from a string, that is the following:
NAME(“Field 1”)
expands to
type_string <'F', 'i', 'e', 'l', 'd', ' ', '1'>
through the use of some common macros and constexpr (for character extraction). Back in C++03 something similar to the type_string above would need to be entered manually.

C++, of any form, supports neither introspection nor reflection (to the extent that they are different).
One nice thing about doing serialization manually (ie: without introspection or reflection) is that you can provide object versioning. You can support older forms of the serialization, and simply create reasonable defaults for the data that wasn't in the old versions. Or if a new version removes some data, you can simply serialize and discard it.
It seems to me that what you need is Boost.Serialization.

Related

Polymorphism versus switch case tradeoffs

I haven't found any clear articles on this, but I was wondering about why polymorphism is the recommended design pattern over exhaustive switch case / pattern matching. I ask this because I've gotten a lot of heat from experienced developers for not using polymorphic classes, and it's been troubling me. I've personally had a terrible time with polymorphism and a wonderful time with switch cases, the reduction in abstractions and indirection makes readability of the code so much easier in my opinion. This is in direct contrast with books like "clean code" which are typically seen as industry standards.
Note: I use TypeScript, so the following examples may not apply in other languages, but I think the principle generally applies as long as you have exhaustive pattern matching / switch cases.
List the options
If you want to know what the possible values of an action, with an enum, switch case, this is trivial. For classes this requires some reflection magic
// definitely two actions here, I could even loop over them programmatically with basic primitives
enum Action {
A = 'a',
B = 'b',
}
Following the code
Dependency injection and abstract classes mean that jump to definition will never go where you want
function doLetterThing(myEnum: Action) {
switch (myEnum) {
case Action.A:
return;
case Action.B;
return;
default:
exhaustiveCheck(myEnum);
}
}
versus
function doLetterThing(action: BaseAction) {
action.doAction();
}
If I jump to definition for BaseAction or doAction I will end up on the abstract class, which doesn't help me debug the function or the implementation. If you have a dependency injection pattern with only a single class, this means that you can "guess" by going to the main class / function and looking for how "BaseAction" is instantiated and following that type to the place and scrolling to find the implementation. This seems generally like a bad UX for a developer though.
(small note about whether dependency injection is good, traits seem to do a good enough job in cases where they are necessary (though either done prematurely as a rule rather than as a necessity seems to lead to more difficult to follow code))
Write less code
This depends, but if have to define an extra abstract class for your base type, plus override all the function types, how is that less code than single line switch cases? With good types here if you add an option to the enum, your type checker will flag all the places you need to handle this which will usually involve adding 1 line each for the case and 1+ line for implementation. Compare this with polymorphic classes which you need to define a new class, which needs the new function syntax with the correct params and the opening and closing parens. In most cases, switch cases have less code and less lines.
Colocation
Everything for a type is in one place which is nice, but generally whenever I implement a function like this is I look for a similarly implemented function. With a switch case, it's extremely adjacent, with a derived class I would need to find and locate in another file or directory.
If I implemented a feature change such as trimming spaces off the ends of a string for one type, I would need to open all the class files to make sure if they implement something similar that it is implemented correctly in all of them. And if I forget, I might have different behaviour for different types without knowing. With a switch the co location makes this extremely obvious (though not foolproof)
Conclusion
Am I missing something? It doesn't make sense that we have these clear design principles that I basically can only find affirmative articles about but don't see any clear benefits, and serious downsides compared to some basic pattern matching style development
Consider the solid-principles, in particular OCP and DI.
To extend a switch case or enum and add new functionality in the future, you must modify the existing code. Modifying legacy code is risky and expensive. Risky because you may inadvertently introduce regression. Expensive because you have to learn (or re-learn) implementation details, and then re-test the legacy code (which presumably was working before you modified it).
Dependency on concrete implementations creates tight coupling and inhibits modularity. This makes code rigid and fragile, because a change in one place affects many dependents.
In addition, consider scalability. An abstraction supports any number of implementations, many of which are potentially unknown at the time the abstraction is created. A developer needn't understand or care about additional implementations. How many cases can a developer juggle in one switch, 10? 100?
Note this does not mean polymorphism (or OOP) is suitable for every class or application. For example, there are counterpoints in, Should every class implement an interface? When considering extensibility and scalability, there is an assumption that a code base will grow over time. If you're working with a few thousand lines of code, "enterprise-level" standards are going to feel very heavy. Likewise, coupling a few classes together when you only have a few classes won't be very noticeable.
Benefits of good design are realized years down the road when code is able to evolve in new directions.
I think you are missing the point. The main purpose of having a clean code is not to make your life easier while implementing the current feature, rather it makes your life easier in future when you are extending or maintaining the code.
In your example, you may feel implementing your two actions using switch case. But what happens if you need to add more actions in future? Using the abstract class, you can easily create a new action type and the caller doesn't need to be modified. But if you keep using switch case it will be lot more messier, especially for complex cases.
Also, following a better design pattern (DI in this case) will make the code easier to test. When you consider only easy cases, you may not find the usefulness of using proper design patterns. But if you think broader aspect, it really pays off.
"Base class" is against the Clean Code. There should not be a "Base class", not just for bad naming, also for composition over inheritance rule. So from now on, I will assume it is an interface in which other classes implement it, not extend (which is important for my example). First of all, I would like to see your concerns:
Answer for Concerns
This depends, but if have to define an extra abstract class for your
base type, plus override all the function types, how is that less code
than single line switch cases
I think "write less code" should not be character count. Then Ruby or GoLang or even Python beats the Java, obviously does not it? So I would not count the lines, parenthesis etc. instead code that you should test/maintain.
Everything for a type is in one place which is nice, but generally
whenever I implement a function like this is I look for a similarly
implemented function.
If "look for a similarly" means, having implementation together makes copy some parts from the similar function then we also have some clue here for refactoring. Having Implementation class differently has its own reason; their implementation is completely different. They may follow some pattern, lets see from Communication perspective; If we have Letter and Phone implementations, we should not need to look their implementation to implement one of them. So your assumption is wrong here, if you look to their code to implement new feature then your interface does not guide you for the new feature. Let's be more specific;
interface Communication {
sendMessage()
}
Letter implements Communication {
sendMessage() {
// get receiver
// get sender
// set message
// send message
}
}
Now we need Phone, so if we go to Letter implementation to get and idea to how to implement Phone then our interface does not enough for us to guide our implementation. Technically Phone and Letter is different to send a message. Then we need a Design pattern here, maybe Template Pattern? Let's see;
interface Communication {
default sendMessage() {
getMessageFactory().sendMessage(getSender(), getReceiver(), getBody())
}
getSender()
getReceiver()
getBody()
}
Letter implements Communication {
getSender() { returns sender }
getReceiver() {returns receiver }
getBody() {returns body}
getMessageFactory {returns LetterMessageFactory}
}
Now when we need to implement Phone we don't need to look the details of other implementations. We exactly now what we need to return and also our Communication interface's default method handles how to send the message.
If I implemented a feature change such as trimming spaces off the ends
of a string for one type, I would need to open all the class files to
make sure if they implement something similar that it is implemented
correctly in all of them...
So if there is a "feature change" it should be only its implemented class, not in all classes. You should not change all of the implementations. Or if it is same implementation in all of them, then why each implements it differently? It should be kept as the default method in their interface. Then if feature change required, only default method is changed and you should update your implementation and test in one place.
These are the main points that I wanted to answer your concerns. But I think the main point is you don't get the benefit. I was also struggling before I work on a big project that other teams need to extend my features. I will divide benefits to topics with extreme examples which may be more helpful to understand:
Easy to read
Normally when you see a function, you should not feel to go its implementation to understand what is happening there. It should be self-explanatory. Based on this fact; action.doAction(); -> or lets say communication.sendMessage() if they implement Communicate interface. I don't need to go for its base class, search for implementations etc. for debugging. Even implementing class is "Letter" or "Phone" I know that they send message, I don't need their implementation details. So I don't want to see all implemented classes like in your example "switch Letter; Phone.." etc. In your example doLetterThing responsible for one thing (doAction), since all of them do same thing, then why you are showing your developer all these cases?. They are just making the code harder to read.
Easy to extend
Imagine that you are extending a big project where you don't have an access to their source(I want to give extreme example to show its benefit easier). In the java world, I can say you are implementing SPI (Service Provider Interface). I can show you 2 example for this, https://github.com/apereo/cas and https://github.com/keycloak/keycloak where you can see that interface and implementations are separated and you just implement new behavior when it is required, no need to touch the original source. Why this is important? Imagine the following scenario again;
Let's suppose that Keycloak calls communication.sendMessage(). They don't know implementations in build time. If you extend Keycloak in this case, you can have your own class that implements Communication interface, let's say "Computer". Know if you have your SPI in the classpath, Keycloak reads it and calls your computer.sendMessage(). We did not touch the source code but extended the capabilities of Message Handler class. We can't achieve this if we coded against switch cases without touching the source.

Public vs. Private?

I don't really understand why it's generally good practice to make member variables and member functions private.
Is it for the sake of preventing people from screwing with things/more of an organizational tool?
Basically, yes, it's to prevent people from screwing with things.
Encapsulation (information hiding) is the term you're looking for.
By only publishing the bare minimum of information to the outside world, you're free to change the internals as much as you want.
For example, let's say you implement your phone book as an array of entries and don't hide that fact.
Someone then comes along and writes code which searches or manipulates your array without going through your "normal" interface. That means that, when you want to start using a linked list or some other more efficient data structure, their code will break, because it's used that information.
And that's your fault for publishing that information, not theirs for using it :-)
Classic examples are the setters and getters. You might think that you could just expose the temperature variable itself in a class so that a user could just do:
Location here = new Location();
int currTemp = here.temp;
But, what if you wanted to later have it actually web-scrape information from the Bureau of Meteorology whenever you asked for the temperature. If you'd encapsulated the information in the first place, the caller would just be doing:
int currTemp = here.getTemp();
and you could change the implementation of that method as much as you want. The only thing you have to preserve is the API (function name, arguments, return type and so on).
Interestingly, it's not just in code. Certain large companies will pepper their documentation with phrases like:
This technical information is for instructional purposes only and may change in future releases.
That allows them to deliver what the customer wants (the extra information) but doesn't lock them in to supporting it for all eternity.
The main reason is that you, the library developer, have insurance that nobody will be using parts of your code that you don't want to have to maintain.
Every public piece of your code can, and inevitably will get used by your customers. If you later discover that your design was actually terrible, and that version 2.0 should be written much better, then you realise that your paying customers actually want you to preserve all existing functionality, and you're locked in to maintaining backwards compatibility at the price of making better software.
By making as much of your code as possible private, you are unreservedly declaring that this code is nobody's business and that you can and will be able to rewrite it at any time.
It's to prevent people from screwing with things - but not from a security perspective.
Instead, it's intended to allow users of your class to only care about the public sections, leaving you (the author) free to modify the implementation (private) without worrying about breaking someone else's code.
For instance, most programming languages seem to store Strings as a char[] (an array of characters). If for some reason it was discovered that a linked list of nodes (each containing a single character) performed better, the internal implementation using the array could be switched, without (theoretically) breaking any code using the String class.
It's to present a clear code contract to anyone (you, someone else) who is using your object... separate "how to use it" from "how it works". This is known as Encapsulation.
On a side note, at least on .NET (probably on other platforms as well), it's not very hard for someone who really wants access to get to private portions of an object (in .NET, using reflection).
take the typical example of a counter. the thing the bodyguard at your night club is holding in his hands to make his punch harder and to count the people entering and leaving the club.
now the thing is defined like this:
public class Counter{
private int count = 0;
public void increment()
{
count++;
}
public void decrement()
{
count--;
}
}
As you can see, there are no setters/getters for count, because we don't want users (programmers) of this class, to be able to call myCounter.setCount(100), or even worse myCounter.Count -= 10; because that's not what this thing does, it goes up one for everyone entering and down for everyone leaving.
There is a scope for a lot of debate on this.
For example ... If a lot of .Net Framework was private, then this would prevent developers from screwing things up but at the same time it prevents devs from using the funcionality.
In my personal opinion, I would give preference to making methods public. But I would suggest to make use of the Facade pattern. In simple terms, you have a class that encapsulates complex functionality. For example, in the .net framework, the WebClient is a Facade that hides the complex http request/response logic.
Also ... Keep classes simple ... and you should have few public methods. That is a better abstraction than having large classes with lots of private methods
It is useful to know how an object s 'put together' have a look at this video on YouTube
http://www.youtube.com/watch?v=RcZAkBVNYTA&list=PL3FEE93A664B3B2E7&index=11&feature=plpp_video

Is it good convention for a class to perform functions on itself?

I've always been taught that if you are doing something to an object, that should be an external thing, so one would Save(Class) rather than having the object save itself: Class.Save().
I've noticed that in the .Net libraries, it is common to have a class modify itself as with String.Format() or sort itself as with List.Sort().
My question is, in strict OOP is it appropriate to have a class which performs functions on itself when called to do so, or should such functions be external and called on an object of the class' type?
Great question. I have just recently reflected on a very similar issue and was eventually going to ask much the same thing here on SO.
In OOP textbooks, you sometimes see examples such as Dog.Bark(), or Person.SayHello(). I have come to the conclusion that those are bad examples. When you call those methods, you make a dog bark, or a person say hello. However, in the real world, you couldn't do this; a dog decides himself when it's going to bark. A person decides itself when it will say hello to someone. Therefore, these methods would more appropriately be modelled as events (where supported by the programming language).
You would e.g. have a function Attack(Dog), PlayWith(Dog), or Greet(Person) which would trigger the appropriate events.
Attack(dog) // triggers the Dog.Bark event
Greet(johnDoe) // triggers the Person.SaysHello event
As soon as you have more than one parameter, it won't be so easy deciding how to best write the code. Let's say I want to store a new item, say an integer, into a collection. There's many ways to formulate this; for example:
StoreInto(1, collection) // the "classic" procedural approach
1.StoreInto(collection) // possible in .NET with extension methods
Store(1).Into(collection) // possible by using state-keeping temporary objects
According to the thinking laid out above, the last variant would be the preferred one, because it doesn't force an object (the 1) to do something to itself. However, if you follow that programming style, it will soon become clear that this fluent interface-like code is quite verbose, and while it's easy to read, it can be tiring to write or even hard to remember the exact syntax.
P.S.: Concerning global functions: In the case of .NET (which you mentioned in your question), you don't have much choice, since the .NET languages do not provide for global functions. I think these would be technically possible with the CLI, but the languages disallow that feature. F# has global functions, but they can only be used from C# or VB.NET when they are packed into a module. I believe Java also doesn't have global functions.
I have come across scenarios where this lack is a pity (e.g. with fluent interface implementations). But generally, we're probably better off without global functions, as some developers might always fall back into old habits, and leave a procedural codebase for an OOP developer to maintain. Yikes.
Btw., in VB.NET, however, you can mimick global functions by using modules. Example:
Globals.vb:
Module Globals
Public Sub Save(ByVal obj As SomeClass)
...
End Sub
End Module
Demo.vb:
Imports Globals
...
Dim obj As SomeClass = ...
Save(obj)
I guess the answer is "It Depends"... for Persistence of an object I would side with having that behavior defined within a separate repository object. So with your Save() example I might have this:
repository.Save(class)
However with an Airplane object you may want the class to know how to fly with a method like so:
airplane.Fly()
This is one of the examples I've seen from Fowler about an aenemic data model. I don't think in this case you would want to have a separate service like this:
new airplaneService().Fly(airplane)
With static methods and extension methods it makes a ton of sense like in your List.Sort() example. So it depends on your usage pattens. You wouldn't want to have to new up an instance of a ListSorter class just to be able to sort a list like this:
new listSorter().Sort(list)
In strict OOP (Smalltalk or Ruby), all methods belong to an instance object or a class object. In "real" OOP (like C++ or C#), you will have static methods that essentially stand completely on their own.
Going back to strict OOP, I'm more familiar with Ruby, and Ruby has several "pairs" of methods that either return a modified copy or return the object in place -- a method ending with a ! indicates that the message modifies its receiver. For instance:
>> s = 'hello'
=> "hello"
>> s.reverse
=> "olleh"
>> s
=> "hello"
>> s.reverse!
=> "olleh"
>> s
=> "olleh"
The key is to find some middle ground between pure OOP and pure procedural that works for what you need to do. A Class should do only one thing (and do it well). Most of the time, that won't include saving itself to disk, but that doesn't mean Class shouldn't know how to serialize itself to a stream, for instance.
I'm not sure what distinction you seem to be drawing when you say "doing something to an object". In many if not most cases, the class itself is the best place to define its operations, as under "strict OOP" it is the only code that has access to internal state on which those operations depend (information hiding, encapsulation, ...).
That said, if you have an operation which applies to several otherwise unrelated types, then it might make sense for each type to expose an interface which lets the operation do most of the work in a more or less standard way. To tie it in to your example, several classes might implement an interface ISaveable which exposes a Save method on each. Individual Save methods take advantage of their access to internal class state, but given a collection of ISaveable instances, some external code could define an operation for saving them to a custom store of some kind without having to know the messy details.
It depends on what information is needed to do the work. If the work is unrelated to the class (mostly equivalently, can be made to work on virtually any class with a common interface), for example, std::sort, then make it a free function. If it must know the internals, make it a member function.
Edit: Another important consideration is performance. In-place sorting, for example, can be miles faster than returning a new, sorted, copy. This is why quicksort is faster than merge sort in the vast majority of cases, even though merge sort is theoretically faster, which is because quicksort can be performed in-place, whereas I've never heard of an in-place merge-sort. Just because it's technically possible to perform an operation within the class's public interface, doesn't mean that you actually should.

What is the point of defining Access Modifiers?

I understand the differences between them (at least in C#). I know the effects they have on the elements to which they are assigned. What I don't understand is why it is important to implement them - why not have everything Public?
The material I read on the subject usually goes on about how classes and methods shouldn't have unnecessary access to others, but I've yet to come across an example of why/how that would be a bad thing. It seems like a security thing, but I'm the programmer; I create the methods and define what they will (or will not) do. Why would I spend all the effort to write a function which tried to change a variable it shouldn't, or tried to read information in another class, if that would be bad?
I apologize if this is a dumb question. It's just something I ran into on the first articles I ever read on OOP, and I've never felt like it really clicked.
I'm the programmer is a correct assumption only if you're the only programmer.
In many cases, other programmers work with the first programmer's code. They use it in ways he didn't intend by fiddling with the values of fields they shouldn't, and they create a hack that works, but breaks when the producer of the original code changes it.
OOP is about creating libraries with well-defined contracts. If all your variables are public and accessible to others, then the "contract" theoretically includes every field in the object (and its sub-objects), so it becomes much harder to build a new, different implementation that still honors the original contract.
Also, the more "moving parts" of your object are exposed, the easier it is for a user of your class to manipulate it incorrectly.
You probably don't need this, but here's an example I consider amusing:
Say you sell a car with no hood over the engine compartment. Come nighttime, the driver turns on the lights. He gets to his destination, gets out of the car and then remembers he left the light on. He's too lazy to unlock the car's door, so he pulls the wire to the lights out from where it's attached to the battery. This works fine - the light is out. However, because he didn't use the intended mechanism, he finds himself with a problem next time he's driving in the dark.
Living in the USA (go ahead, downvote me!), he refuses to take responsibility for his incorrect use of the car's innards, and sues you, the manufacturer for creating a product that's unsafe to drive in the dark because the lights can't be reliably turned on after having been turned off.
This is why all cars have hoods over their engine compartments :)
A more serious example: You create a Fraction class, with a numerator and denominator field and a bunch of methods to manipulate fractions. Your constructor doesn't let its caller create a fraction with a 0 denominator, but since your fields are public, it's easy for a user to set the denominator of an existing (valid) fraction to 0, and hilarity ensues.
First, nothing in the language forces you to use access modifiers - you are free to make everything public in your class if you wish. However, there are some compelling reasons for using them. Here's my perspective.
Hiding the internals of how your class operates allows you to protect that class from unintended uses. While you may be the creator of the class, in many cases you will not be the only consumer - or even maintainer. Hiding internal state protects the class for people who may not understand its workings as well as you. Making everything public creates the temptation to "tweak" the internal state or internal behavior when the class isn't acting the way you may want - rather than actually correcting the public interface of internal implementation. This is the road to ruin.
Hiding internals helps to de-clutter the namespace, and allows tools like Intellisense to display only the relevant and meaningful methods/properties/fields. Don't discount tools like Intellisense - they are a powerful means for developers to quickly identify what they can do with your class.
Hiding internals allows you to structure an interface appropriate for the problem the class is solving. Exposing all of the internals (which often substantially outnumber the exposed interface) makes it hard to later understand what the class is trying to solve.
Hiding internals allows you to focus your testing on the appropriate portion - the public interface. When all methods/properties of a class are public, the number of permutations you must potentially test increases significantly - since any particular call path becomes possible.
Hiding internals helps you control (enforce) the call paths through your class. This makes it easier to ensure that your consumers understand what your class can be asked to do - and when. Typically, there are only a few paths through your code that are meaningful and useful. Allowing a consumer to take any path makes it more likely that they will not get meaningful results - and will interpret that as your code being buggy. Limiting how your consumers can use your class actually frees them to use it correctly.
Hiding the internal implementation frees you to change it with the knowledge that it will not adversely impact consumers of your class - so long as your public interface remains unchanged. If you decide to use a dictionary rather than a list internally - no one should care. But if you made all the internals of your class available, someone could write code that depends on the fact that your internally use a list. Imagine having to change all of the consumers when you want to change such choices about your implementation. The golden rule is: consumers of a class should not care how the class does what it does.
It is primarily a hiding and sharing thing. You may produce and use all your own code, but other people provide libraries, etc. to be used more widely.
Making things non-public allows you to explicitly define the external interface of your class. The non-public stuff is not part of the external interface, which means you can change anything you want internally without affecting anyone using the external interface,
You only want to expose the API and keep everything else hidden. Why?
Ok lets assume you want to make an awesome Matrix library so you make
class Matrix {
public Object[][] data //data your matrix storages
...
public Object[] getRow()
}
By default any other programmer that use your library will want to maximize the speed of his program by tapping into the underlying structure.
//Someone else's function
Object one() {data[0][0]}
Now, you discover that using list to emulate the matrix will increase performance so you change data from
Object[][] data => Object[] data
causes Object one() to break. In other words by changing your implementation you broke backward compatibility :-(
By encapsulating you divide internal implementation from external interface (achieved with a private modifier).
That way you can change implementation as much as possible without breaking backward compatibility :D Profit!!!
Of course if you are the only programmer that is ever going to modify or use that class you might as well as keep it public.
Note: There are other major benefits for encapsulating your stuff, this is just one of many. See Encapsulation for more details
I think the best reason for this is to provide layers of abstraction on your code.
As your application grows, you will need to have your objects interacting with other objects. Having publicly modifiable fields makes it harder to wrap your head around your entire application.
Limiting what you make public on your classes makes it easier to abstract your design so you can understand each layer of your code.
For some classes, it may seem ridiculous to have private members, with a bunch of methods that just set and get those values. The reason for it is that let's say you have a class where the members are public and directly accessible:
class A
{
public int i;
....
}
And now you go on using that in a bunch of code you wrote. Now after writing a bunch of code that directly accesses i and now you realize that i should have some constraints on it, like i should always be >= 0 and less than 100 (for argument's sake).
Now, you could go through all of your code where you used i and check for this constraint, but you could just add a public setI method that would do it for you:
class A
{
private int i;
public int I
{
get {return i;}
set
{
if (value >= 0 && value < 100)
i = value;
else
throw some exception...
}
}
}
This hides all of that error checking. While the example is trite, situations like these come up quite often.
It is not related to security at all.
Access modifers and scope are all about structure, layers, organization, and communication.
If you are the only programmer, it is probably fine until you have so much code even you can't remember. At that point, it's just like a team environment - the access modifiers and the structure of the code guide you to stay within the architecture.

Is type checking ever OK?

Is type checking considered bad practice even if you are checking against an interface? I understand that you should always program to an interface and not an implementation - is this what it means?
For example, in PHP, is the following OK?
if($class instanceof AnInterface) {
// Do some code
}
Or is there a better way of altering the behaviour of code based on a class type?
Edit: Just to be clear I am talking about checking whether a class implements an interface not just that it is an instance of a certain class.
As long as you follow the LSP, I don't see a problem. Your code must work with any implementation of the interface. It's not a problem that certain implementations cause you to follow different code paths, as long as you can correctly work with any implementation of the interface.
If your code doesn't work with all implementations of the interface, then you shouldn't use the interface in the first place.
If you can avoid type checking you should; however, one scenario where I found it handy, was we had a web service which took a message but the contents of the message could change. We had to persist the message back into a db, in order to get the right component to break the message down to its proper tables we used type checking in a sense.
What I find more common and flexible then if ($class instanceof SomeOtherType) is to define an IProcessing strategy for example and then using factory based on the type $class create the correct class.
So in c# roughly this:
void Process(Message msg)
{
IProcessor processor=ProcessignFactory.GetProcessor(msg.GetType());
processor.Process(msg);
}
However sometimes doing this can be overkill if your only dealing with one variation that won't change implement it using a type check, and when / if you find you were wrong and it requires more checks then refactor it into a more robust solution.
In my practice any checking for type (as well as type casting) has always indicated that something is wrong with the code or with the language.
So I try to avoid it whenever possible.
Run-time type checking is often necessary in situations where an interface provides all the methods necessary to do something, but does not provide enough to do it well. A prime example of such a situation is determining the number of items in an enumerable sequence. It's possible to make such a determination by enumerating through the sequence, but many enumerable objects "know" how many items they contain. If an object knows how many items it contains, it will likely be more efficient to ask it than to enumerate through the collection and count the items individually.
Arguably, IEnumerable should have provided some methods to ask what it knows about the number of items it contains [recognizing the possibility that the object may know that the number is unbounded, or that it's at most 4,591 (but could be a lot less), etc.], but it doesn't. What might be ideal would be if a new version of IEnumerable interface could be produced that included default implementations for any "new" methods it adds, and if such interface could be considered to be implemented by any implementations of the present version. Unfortunately, because no such feature exists, the only way to get the count of an enumerable collection without enumerating it is to check whether it implements any known collection interfaces that include a Count member.