Why the message classes generated by the protocol buffer compiler are all immutable? - serialization

The protocol buffer compiler generated message classes are immutable. The message classes contain appropriate setter methods but no getter methods on it. This constraint does not apply to other serialization technologies like Java binary serialization, XML, JSON, etc.
As per my understanding, immutability is of use while doing concurrent programming. Immutability could be of help in achieving thread-safety. But, I assume, that is not the reason in case of protocol buffer.
What could be the reason of making message classes immutable?
After reading the protocol buffer documentation, it seems the above stated only applies to Java (at the least) and not to C++ and other supported platforms/languages.
Note: This question is only to satisfy my curiosity.
Thanks.

The google implementation indeed uses a builder pattern - i.e. a mutable (but not very usable in terms of entity) builder, which creates an immutable object instance. This is not a requirement - indeed, there are alternative implementations for several platforms that do not use this design pattern. But frankly, it simply isn't an issue, because if there is any friction (and what you describe: friction) then you should simply avoid using your DTO types (i.e. the objects used for serialization) as your primary domain entity types. As soon as you do that, it becomes a non-issue: you write your own domain entity types with whatever pattern you like (including any domain logic etc), and then map to/from the DTO types as and when you need to; then the choice of design pattern used by the DTO tier is a mere uninteresting implementation detail.
But again: for your chosen platform, take a look to see if any alternative implementations might suit your requirements more closely.

Related

Why is Kotlin’s ArrayDeque a concrete class without an interface?

Of course, I can use the whole Java standard library as long as I stay on the JVM, but quite often Kotlin offers equivalent classes, sometimes shadowing Java’s classes (and using them behind my back). As such, Kotlin offers an ArrayDeque to replace Java’s ArrayDeque.
Now, the good old java.util.ArrayDeque, being in line with the other collection classes, implements the interface java.util.Deque (it’s actually one of four such implementations in java.util. That is usually considered good practice, if you just care for a Deque you should not need to know whether it is implemented using an array or a list.
However, the new-fangled kotlin.collections.ArrayDeque (where actually the package name is redundant, since it is accessible without) has no interface it implements, apart from MutableList, which is too generic.
I don’t want to trigger opinion-based answers here, even if I can already imagine a few of them. Rather, I wonder if there may exist a hard technical reason to decide against using a strategy used in other places of the Kotlin standard library itself.

When is it okay to depend on concrete classes?

Today I was asked a question I could not find an answer for so here I am, asking for your help!
The Dependency Inversion Principle states that both concrete classes and abstractions should depend on abstractions right?
Regardless of that we still depend on framework classes like Integer and String. Is there a good answer as to why that is okay?
I know we should not reinvent the wheel just because it may change ever so slightly, and these particular classes I mentioned will most likely never change in a way their users will notice (their interfaces won't change).
As an additional point, note that the elements you are citing are actually in java.lang, i.e. there is no import statement, and one could say therefore no dependency to anything induced by using these types, besides the fact you use Java.
As soon as you step out of java.lang, I believe you are usually better off applying DIP, i.e. always prefer using a List<T> over using a ArrayList<T>.
The problem is to limit dependencies across borders of "components" (or module/package...) to only functional dependencies (i.e. abstractions, interfaces in Java). At some point you do need concrete implementations, that are necessarily built using at least some data structures, even if it is only arrays of basic int type. This does not violate DIP.
The use of Plain Old Data as suggested in #jaco0646 's answer, that are not violating DIP is kind of borderline ; in most cases you could use a signature that explicitly passes the fields of the struct you are considering instead of packing them into a single object ; this approach is indeed more general, e.g. you can implement it without having that POD class, maybe relying on some relational DBMS, you can interact with code written in any language etc...
However in practice, it can make sense to use POD in signatures, so that if I add a field to a POD, this will automatically propagate to all signatures that use the struct. Some of these functions may not use the new field, so we are now giving them too much information (we are leaking, with respect to strict "need to know"). Still, it can be a pragmatic answer in many cases to opt for this approach.
If we look at e.g. webservices, there is general tendency to consider POD are not a problem in service signatures, and using them helps keep clients compatible even if some new fields appear in the struct.
In OOP, an object is the encapsulated combination of data and behavior. The data is hidden; the behavior is exposed. It is these objects to which the DIP applies. Ideally, these objects should be instantiated in a Composition Root, which is the only component that depends on the objects' concrete classes.
Obviously something has to depend on the concrete classes in order to instantiate them. This is typically your DI container. The idea is that the container's sole concern is instantiating concrete classes, so everything else can obey the DIP.
On the other hand, opposite to objects, we have primitive data structures. These classes are not (necessarily) encapsulated, expose their data, and have little or no behavior. It is fine to depend on the concrete classes of data structures. These are not "objects" in the OOP sense. The DIP does not apply to data structures. Dependencies on concrete data structures should be local; however, and not exposed outside of the object that owns those dependencies.
Note that you will often see "hybrids" in code: classes that behave as both objects and data structures. They expose both their data and behavior. Hybrids are the worst of both worlds, and whether you apply the DIP to them or not, the bigger problem is that they are trying to serve two opposing purposes and violating encapsulation.

Abstract Data Type vs. non Abstract Data Types (in Java)

I have read a lot about abstract data types (ADTs) and I'm askig myself if there are non-abstract/ concrete datatypes?
There is already a question on SO about ADTs, but this question doesn't cover "non-abstract" data types.
The definition of ADT only mentions what operations are to be
performed but not how these operations will be implemented
reference
So a ADT is hiding the concrete implementation from the user and "only" offers a bunch of permissible operations/ methods; e.g., the Stack in Java (reference). Only methods like pop(), push(), empty() are visible and the concrete implementation is hidden.
Following this argumentation leads me to the question, if there is a "non-abstract" data type?
Even a primitive data type like java.lang.Integer has well defined operations, like +, -, ... and according to wikipedia it is a ADT.
For example, integers are an ADT, defined as the values …, −2, −1, 0, 1, 2, …, and by the operations of addition, subtraction, multiplication, and division, together with greater than, less than, etc.,
reference
The java.lang.Integer is not a primitive type. It is an ADT that wraps the primitve java type int. The same holds for the other Java primitive types and the corresponding wrappers.
You don't need OOP support in a language to have ADTs. If you don't have support, you establish conventions for the ADT in the code you write (i.e. you only use it as previoulsy defined by the operations and possible values of the ADT)
That's why ADT's predate the class and object concepts present in OOP languages.They existed before. Statements like class just introduced direct support in the languages, allowing compilers to check what you are doing with the ADTs.
Primitive types are just values that can be stored in memory, without any other associated code. They don't know about themselves or their operations. And their internal representation is known by external actors, unlike the ADTs. Just like the possible operations. These are manipulations to the values done externally, from the outside.
Primitive types carry with them, although you don't necessary see it, implementation details relating the CPU or virtual machine architecture. Because they map to CPU available register sizes and instructions that the CPU executes directly. Hence the maximum integer value limits, for example.
If I am allowed to say this, the hardware knows your primitive types.
So your non-abstract data types are the primitive types of a language,
if those types aren't themselves ADT's too. If they happen to be ADTs,
you probably have to create them (not just declare them; there will
be code setting up things in memory, not only the storage in a certain
address), so they have an identity, and they usually offer methods
invoked through that identity, that is, they know about themselves.
Because in some languages everything is an object, like in Python, the
builtin types (the ones that are readily available with no
need to define classes) are sometimes called primitive too, despite
being no primitive at all by the above definition.
Edit:
As mentioned by jaco0646, there is more about concrete/abstract
words in OOP.
An ADT is already an abstraction. It represents a category
of similar objects you can instantiate from.
But an ADT can be even more abstract, and is referred as such (as
opposed to concrete data types) if you declare it with no intention of
instantiating objects from it. Usually you do this because other "concrete"
ADTs (the ones you instantiate) inherit from the "abstract" ADT. This allows the sharing and extension of behaviour between several different ADTs.
For example you can define an API like that, and make one or more different
ADTs offer (and respect) that API to their users, just by inheritance.
Abstract ADTs maybe defined by you or be available in language types or
libraries.
For example a Python builtin list object is also a collections.abc.Iterable.
In Python you can use multiple inheritance to add functionality like that.
Although there are other ways.
In Java you can't, but you have interfaces instead, and can declare a class to implement one or more interfaces, besides possibly extending another class.
So an ADT definition whose purpose is to be directly instantiated, is a
concrete ADT. Otherwise it is abstract.
A closely related notion is that of an abstract method in a class.
It is a method you don't fill with code, because it is meant to be filled by children classes that should implement it, respecting its signature (name and parameters).
So depending on your language you will find possible different (or similar) ways of implementing this concepts.
I agree with the answer from #progmatico, but I would add that concrete (non-abstract) data types include more than primitives.
In Java, Stack happens to be a concrete data type, which extends another concrete data type Vector, which extends an ADT AbstractList.
The interfaces implemented by AbstractList are also ADTs: Iterable, Collection, List.

Multimethods vs Interfaces

Are there languages that idiomatically use both notions at the same time? When will that be necessary if ever? What are the pros and cons of each approach?
Background to the question:
I am a novice (with some python knowledge) trying to build a better picture of how multimethods and interfaces are meant to be used (in general).
I assume that they are not meant to be mixed: Either one declares available logic in terms of interfaces (and implements it as methods of the class) or one does it in terms of multimethods. Is this correct?
Does it make sense to speak of a spectrum of OOP notions where:
one starts with naive subclassing (data and logic(methods) and logic implementation(methods) are tightly coupled)
then passes through interfaces (logic is in the interface, data and logic implementation is in the class)
and ends at multimethods (logic is in the signature of the multimethod, logic implementation is scattered, data is in the class(which is only a datastructure with nice handles))?
This answer, to begin, largely derives from my primary experience developing in common-lisp and clojure.
Yes, multimethods do carry some penalty in cost, but offer almost unlimited flexibility in the ability to craft a dispatch mechanism that precisely models whatever you might look to accomplish by their specialization.
Protocols and Interfaces, on one hand, are also involved with sone of these same matters of specializations and dispatch, but they work and are used in a very different manner. These are facilities that follow a convention wherein single dispatch provides only a straightforward mapping of one specialized implementation for a given class. The power of protocols and interfaces is in their typical use to define some group of abstract capabilities that, when taken together, fully specify the API for thus concept. For example, a "pointer" interface might contain the 3 or 4 concepts that represent the notion of what a pointer is. So the general interface of a pointer might look like REFERENCE, DEREFERENCE, ALLOCATE, and DISPOSE. Thus the power of an interface comes from its composition of a group of related definitions that, together, express a compete abstraction -- when implementing an interface in a specific situation, it is normally an all-or-nothing endeavor. Either all four of those functions are present, or whatever this thing us does not represent our definition of pointer.
Hope this helped a little.
Dan Lentz

Dealing with "global" data structures in an object-oriented world

This is a question with many answers - I am interested in knowing what others consider to be "best practice".
Consider the following situation: you have an object-oriented program that contains one or more data structures that are needed by many different classes. How do you make these data structures accessible?
You can explicitly pass references around, for example, in the constructors. This is the "proper" solution, but it means duplicating parameters and instance variables all over the program. This makes changes or additions to the global data difficult.
You can put all of the data structures inside of a single object, and pass around references to this object. This can either be an object created just for this purpose, or it could be the "main" object of your program. This simplifies the problems of (1), but the data structures may or may not have anything to do with one another, and collecting them together in a single object is pretty arbitrary.
You can make the data structures "static". This lets you reference them directly from other classes, without having to pass around references. This entirely avoids the disadvantages of (1), but is clearly not OO. This also means that there can only ever be a single instance of the program.
When there are a lot of data structures, all required by a lot of classes, I tend to use (2). This is a compromise between OO-purity and practicality. What do other folks do? (For what it's worth, I mostly come from the Java world, but this discussion is applicable to any OO language.)
Global data isn't as bad as many OO purists claim!
After all, when implementing OO classes you've usually using an API to your OS. What the heck is this if it isn't a huge pile of global data and services!
If you use some global stuff in your program, you're merely extending this huge environment your class implementation can already see of the OS with a bit of data that is domain specific to your app.
Passing pointers/references everywhere is often taught in OO courses and books, academically it sounds nice. Pragmatically, it is often the thing to do, but it is misguided to follow this rule blindly and absolutely. For a decent sized program, you can end up with a pile of references being passed all over the place and it can result in completely unnecessary drudgery work.
Globally accessible services/data providers (abstracted away behind a nice interface obviously) are pretty much a must in a decent sized app.
I must really really discourage you from using option 3 - making the data static. I've worked on several projects where the early developers made some core data static, only to later realise they did need to run two copies of the program - and incurred a huge amount of work making the data non-static and carefully putting in references into everything.
So in my experience, if you do 3), you will eventually end up doing 1) at twice the cost.
Go for 1, and be fine-grained about what data structures you reference from each object. Don't use "context objects", just pass in precisely the data needed. Yes, it makes the code more complicated, but on the plus side, it makes it clearer - the fact that a FwurzleDigestionListener is holding a reference to both a Fwurzle and a DigestionTract immediately gives the reader an idea about its purpose.
And by definition, if the data format changes, so will the classes that operate on it, so you have to change them anyway.
You might want to think about altering the requirement that lots of objects need to know about the same data structures. One reason there does not seem to be a clean OO way of sharing data is that sharing data is not very object-oriented.
You will need to look at the specifics of your application but the general idea is to have one object responsible for the shared data which provides services to the other objects based on the data encapsulated in it. However these services should not involve giving other objects the data structures - merely giving other objects the pieces of information they need to meet their responsibilites and performing mutations on the data structures internally.
I tend to use 3) and be very careful about the synchronisation and locking across threads. I agree it is less OO, but then you confess to having global data, which is very un-OO in the first place.
Don't get too hung up on whether you are sticking purely to one programming methodology or another, find a solution which fits your problem. I think there are perfectly valid contexts for singletons (Logging for instance).
I use a combination of having one global object and passing interfaces in via constructors.
From the one main global object (usually named after what your program is called or does) you can start up other globals (maybe that have their own threads). This lets you control the setting up of program objects in the main objects constructor and tearing them down again in the right order when the application stops in this main objects destructor. Using static classes directly makes it tricky to initialize/uninitialize any resources these classes use in a controlled manner. This main global object also has properties for getting at the interfaces of different sub-systems of your application that various objects may want to get hold of to do their work.
I also pass references to relevant data-structures into constructors of some objects where I feel it is useful to isolate those objects from the rest of the world within the program when they only need to be concerned with a small part of it.
Whether an object grabs the global object and navigates its properties to get the interfaces it wants or gets passed the interfaces it uses via its constructor is a matter of taste and intuition. Any object you're implementing that you think might be reused in some other project should definately be passed data structures it should use via its constructor. Objects that grab the global object should be more to do with the infrastructure of your application.
Objects that receive interfaces they use via the constructor are probably easier to unit-test because you can feed them a mock interface, and tickle their methods to make sure they return the right arguments or interact with mock interfaces correctly. To test objects that access the main global object, you have to mock up the main global object so that when they request interfaces (I often call these services) from it they get appropriate mock objects and can be tested against them.
I prefer using the singleton pattern as described in the GoF book for these situations. A singleton is not the same as either of the three options described in the question. The constructor is private (or protected) so that it cannot be used just anywhere. You use a get() function (or whatever you prefer to call it) to obtain an instance. However, the architecture of the singleton class guarantees that each call to get() returns the same instance.
We should take care not to confuse Object Oriented Design with Object Oriented Implementation. Al too often, the term OO Design is used to judge an implementation, just as, imho, it is here.
Design
If in your design you see a lot of objects having a reference to exactly the same object, that means a lot of arrows. The designer should feel an itch here. He should verify whether this object is just commonly used, or if it is really a utility (e.g. a COM factory, a registry of some kind, ...).
From the project's requirements, he can see if it really needs to be a singleton (e.g. 'The Internet'), or if the object is shared because it's too general or too expensive or whatsoever.
Implementation
When you are asked to implement an OO Design in an OO language, you face a lot of decisions, like the one you mentioned: how should I implement all the arrows to the oft used object in the design?
That's the point where questions are addressed about 'static member', 'global variable' , 'god class' and 'a-lot-of-function-arguments'.
The Design phase should have clarified if the object needs to be a singleton or not. The implementation phase will decide on how this singleness will be represented in the program.
Option 3) while not purist OO, tends to be the most reasonable solution. But I would not make your class a singleton; and use some other object as a static 'dictionary' to manage those shared resources.
I don't like any of your proposed solutions:
You are passing around a bunch of "context" objects - the things that use them don't specify what fields or pieces of data they are really interested in
See here for a description of the God Object pattern. This is the worst of all worlds
Simply do not ever use Singleton objects for anything. You seem to have identified a few of the potential problems yourself