What exactly is an "object" of serialization? - serialization

object in this context is not 'the goal' or 'intent'.
Serialization was quoted and I am interested in better understanding the use of the word object:
Serialization is the process of converting an object into a stream of bytes to store the object or transmit it to memory, a database, or a file. Its main purpose is to save the state of an object in order to be able to recreate it when needed. The reverse process is called deserialization.
...
This illustration shows the overall process of serialization
...
Through serialization, a developer can perform actions like sending the object to a remote application by means of a Web Service, passing an object from one domain to another, passing an object through a firewall as an XML string, or maintaining security or user-specific information across applications
Is object (in the above context) to include (but not necessarily limited to) any real-world phenomena that one like to model? For example:
As a concrete example is object of this narrative is a Broadway patron? or is it a revenue seat?
A fictional Broadway show has 3 shows every Saturday. Tickets are
valid for a particular show and enumerated seat. The process of
encoding the showtime and serially enumerated seat number defines
a unique ticket. Ticket are encoded with a barcode comprising
said data to measure attendance.
Additional copied concrete example: Explanation via Picture:
Explanation by Analogy:
Suppose I'm talking to my buddy on the phone and I'm telling him about my new puppy.
Here's my problem: the puppy is a living, breathing mammal. How am I meant to convey a puppy over the phone line? I can't physically put my puppy into my phone receiver.
So instead, I'll have to convey a representation of the puppy over the phone. In other words, I then serialize my dog Rex, and I send him the serialized version of Rex over the phone line:
{ "name":"Rex", "age":5, "favourite_food": pedigree_choice_cuts, "favourite_game": fetch_ball, "favourite_hobby": wagging_tail }
It's a perfect representation - a serialization of my dog.
Summary:
Serialization basically means transforming my dog Rex into something else - a JSON object - which can then be transported over the phone line as a series of 1s and 0s. My buddy in NYC can then translate those 1s and 0s back into a JSON object - so that he has a perfect representation of my dog Rex. Simple!

It means any kind of composite data, e.g. an aggregate, a structure, a union, an array, or (in object-oriented languages) some object (e.g. an instance of some class, if your language have them; but you could have prototype-based programming languages, such as JavaScript).
Be also aware of endianness. It makes binary data less portable than textual one.
Look into several serialization libraries for more. Start with simple things like RPC XDR, most JSON libraries, etc. See also s11n

Related

How do you call an object which state can be completely described by its string representation?

Is there a name in the OOP world to refer to such objects? For example, in java
"Word".toString();
Will return an output of Word. This is a string representation of the entity that exists currently in the program.
Some more examples can be accomplished with other datatypes like Doubles, Integers, maybe even lists or different data structures.
And some other more complex that cannot be represented in this way, for example a full fledged RESTful service class might not have a string representation of its current state.
What's the right terminology? native? immutable? those 2 last terms doesn't really reflect this definition.
To expand on the question:
Imagine you have a function/method that converts a string to a map, a string could be {key1=value1,key2=value2} and you would get a map back, this doesn't work for some complex objects, how would you describe the parameters of this function if you were to generalize it's use for other simple object types?
You have an abstract object that consists of internal state.
You have one or more concrete representations of that object's state.
In one case the concrete representation is a chunk of memory containing primitives and references to other component objects on the heap (in Java, other languages may be different).
You have a different representation that is amenable to being stored in a contiguous block of characters or bytes, and possibly transmitted over a network.
Both representations are canonically equivalent given equivalent contexts containing their non-state information (methods, class hierarchy, etc), but they serve different purposes.
Generically, this could be called a "change of representation". When the first representation above is converted to the second it's called "serialization", and the reverse process is "deserialization". Note that you could have many different representations fulfilling different requirements and supporting different functionality.
One important point to note is that in both cases, in-memory and "serialized" (and any other representations), if an object's state contains references to other objects, then the entire "state" consists of that object and all the objects that can be reached from it, and objects reachable from those objects, etc. This is known as an "object graph", and it exists equally in all representations.
As to deciding which one you should or shouldn't use, that depends totally on your processing requirements.
for example a full fledged RESTful service class might not have a string representation of its current state
This is incorrect, you can always define a serialized representation of an object's state. It may be inconvenient to do so, but if it is required it can be done.
Imagine you have a function/method that converts a string to a map, a string could be {key1=value1,key2=value2} and you would get a map back, this doesn't work for some complex objects
Again, it can always be made to work if it is a requirement, as long as the cost of doing so is justified.
In summary, everything is a representation, and you can arrange to transform one representation to any other and back again, without loss, assuming you're willing to incur the costs of doing so. As mentioned above, one factor is the cost of representing not just the single object, but the entire object graph, which can be substantial.

Write/read class objects to/from file, D-Lang

I'm trying to write/read a class object from/to a file.
I'm new to D and I just want to play a little bit around with it.
Is there a Class/Function to write/read an object to/from a file?
I'm looking for something similar to the ObjectOutputStream сlass in Java.
Or do I have to serialize (concatenate) the object's variables as strings in the file?
I have a Movie class and a MovieManager class, which contains a dynamic movie-array.
A Movie object contains just a few strings and integer values.
Extending answer, provided in comment, it is worth explicitly stating, that D does not provide "one true way" of reading/writing objects to/from files, as there can't be a single optimal one. Different considerations about speed, resulting file format, handling references and similar corner cases may results in different serialization strategies.
That being said, most likely proper serialization library is needed, and, by lucky chance, one of most mature D solutions ("Orange" by Jacob Carlborg https://github.com/jacob-carlborg/orange) is being reviewed right now as a candidate for inclusion into standard library as a std.serialization: newsgroup thread. It may be your best bet.
The library Unmanaged provides a serialization system. You also have Orange
which is less restrictive, as Unmanaged serialization only works if the object to serialize is an ancestor of one of the framework base class.But...Unmanaged works on the "accessor" principle. The data serialized are get via a method and the data deserialized are set via a method, which allows to update some stuffs when the deserializer recall for example...

What design pattern is used by IProject.setDescription in Eclipse

I'm designing an API with a specific pattern in mind, but don't know if this pattern has a name. It's similar to the Command pattern in GoF (Gang of Four) but not exactly.
One simple example of it I can find is in Eclipse where you manipulate a project (IProject), not by calling methods on the project that change its state, but by this 3 step process:
extracting its state into a descriptor object (IProjectDescription) with getDescription
setting properties on the descriptor. E.g. setName
applying the descriptor back to the original project with setDescription
The general principle seems to be that you have a complex object as part of a framework with many potentially interdependent properties, and rather than working directly on that object, one property at a time, you extract the properties into a simple data object, manipulate that, and apply it back.
It has some of the attributes of the Command pattern, in that the data object encapsulates all of the changes like a Command would - but it's not really a Command, because you don't execute it on the object, it's simply a representation of the state of the object.
It also has some attributes of a Transactional API, in that, by making the changes all in one hit with the set... call, you allow for the entire modification to effectively "roll back" if any one property changes fails. But while that's an advantage of the approach, it's not really the main purpose of it. And what's more, you can achieve the transactional nature without this approach, by simply adding transactional methods to the API (like commit and rollback)
There are two advantages in this pattern that I do want to exploit - although I don't see them being exploited by the eclipse example above:
You can represent the meaningful state of the underlying object while its implementation changes. This is useful for upgrading, or copying state from different types of representations. Say I release a new version of my API where I create an object Foo2 which is a totally new form of my old Foo1, but both have the same basic properties. To upgrade a Foo1 to a Foo2, I can extract those properties as a FooState. foo2.setFooState(foo1.getFooState) as simply as that. The way in which the properties are interpreted and represented is encapsulated in the Foos and can be totally different.
I can persist and transmit the state of the underlying object with my simple data object, where persisting the object itself would be much more complex. So I can extract the state of Foo as a FooState, and persist it as a simple XML document then later apply it to some new object by "loading" it and applying it. Or I can transmit the FooState simply to a webservice as a JSON object whereas the Foo itself is too big and complex to transmit. (Or the objects on each end of the service call are entirely different, like Foo1 and Foo2)
Anyway, I can't find an name or example of this pattern anywhere, neither in the Gang of Four design patterns, nor even in Martin Fowler's comprehensive "bliki"
Data Transfer Object(DTO) that Martin Fowler describes in his book Principles of Enterprise Application Architecture seems to be for the purpose you describe in point 2.
A DTO is a fairly simple extraction of the more complex Domain Model that it represents.
Fowler describes that the usage of a DTO in combination with an assembler can be used to keep the DTO independent from the actual Domain Object(or Objects) that it is supposed to represent. The assembler knows how to create a DTO from the Domain Object and vice versa. Also he mentions that the DTO needs to be serializable to persist/transmit its state. What you describe in point 2 seems to match this description.
What you've described in point 1 though does not seem to be an intended purpose, but definitely seems achievable using this pattern.
I'm not sure if you went through the Pattern catalog of his book or the book itself. The book itself describes this in much greater detail.
You may also want to have a look at Transfer Object definition from Oracle which Fowler says here is what he describes as DTO.
Not every design is documented as a single Design Pattern, in fact most system designs are combinations of multiple patterns.
However one part of what you're doing, with IProjectDescription is using a Memento, however yours seems to be a Polymorphic variation. Consider Patterns as they appear in Pattern Catalogues to be the pared down to the essential starting point not the end result. Patterns are by there very nature supposed to be extended and combined.
The Command pattern can give you Commit and RollBack (Do/Undo) and combining it with Memento in that way is a quite common approach. The same thing is seen in the Java Servlet API with HttpRequest & HttpResponse.

Why do we use serialization?

Why do we need to use serialization?
If we want to send an object or piece of data through a network we can use streams of bytes. If we want to save some data to the disk, we can again use the binary mode along with the byte streams and save it.
So what's the advantage of using serialization?
Technically on the low-level, your serialized object will also end up as a stream of bytes on your cable or your filesystem...
So you can also think of it as a standardized and already available way of converting your objects to a stream of bytes. Storing/transferring object is a very common requirement, and it has less or little meaning to reinvent this wheel in every application.
As other have mentioned, you also know that this object->stream_of_bytes implementation is quite robust, tested, and generally architecture-independent.
This does not mean it is the only acceptable way to save or transfer an object: in some cases, you'll have to implement your own methods, for example to avoid saving unnecessary/private members (for example for security or performance reasons). But if you are in a simple case, you can make your life easier by using the serialization/deserialization of your framework, language or VM instead of having to implement it by yourself.
Hope this helps.
Quoting from Designing Data Intensive Applications book:
Programs usually work with data in (at least) two different
representations:
In memory, data is kept in objects, structs, lists, arrays, hash tables, trees, and so on. These data structures are optimized for
efficient access and manipulation by the CPU (typically using
pointers).
When you want to write data to a file or send it over the network, you have to encode it as some kind of self-contained sequence of bytes
(for example, a JSON document). Since a pointer wouldn’t make sense to
any other process, this sequence-of-bytes representation looks quite
different from the data structures that are normally used in memory.
Thus, we need some kind of translation between the two
representations. The translation from the in-memory representation to
a byte sequence is called encoding (also known as serialization or
marshalling), and the reverse is called decoding (parsing,
deserialization, unmarshalling).
Among other reasons to be compatible between architecture. An integer doesn't have the same number of bytes from one architecture to another, and sometimes from one compiler to another.
Plus what you're talking about is still serialization. Binary Serialization. You're putting all the bytes of your object together in order to store them and be able to reconvert them as an object later. This is serializing.
More info on wikipedia
Serialization is the process of converting an object into a stream so that it can be saved in any physical file like (XML) or can be saved in Database. The main purpose of Serialization in C# is to persist an object and save it in any specified storage medium like stream, physical file or DataBase.
In General, serialization is a method to persist an object's state, but i suggest you to read this wiki page, it is pretty detailed and correct in my opinion:
http://en.wikipedia.org/wiki/Serialization
In serialization, the point is not turning an object into bits and bytes, objects ARE bits and bytes already. Serialization is the process of making the object's "state" persistent. Notice the word "state", which means the values of the instance variables of the entire object graph (the target object and all the objects it references either directly or indirectly) WITHOUT methods and other extra runtime stuff stuck to them (and of course plus a little more info that JVM needs for restoration of these objects, such as their class types).
So this is the main reason of its necessity: Storing the whole bytes of objects would be expensive, and for all intents and purposes, unnecessary.

Serialization vs. Archiving?

The iOS docs differentiate between "serializing" and "archiving." Is this a general distinction (i.e., holds in other languages) or is it specific to Objective-C? Also, what is the difference between these two?
This is a case of one being the other some (but not all) of the time.
Wikipedia has this to say about serialization:
"Serialization is the process of converting a data structure or object into a sequence of bits so that it can be stored in a file or memory buffer, or transmitted across a network connection link to be "resurrected" later in the same or another computer environment"
So, archiving may only be serialization, but it could also be the combination of serialization and compresssion, for example. Or perhaps it adds some kind of header info. So serialization is a form of archive, but an archive is not necessarily a serialization.
This isn't really specific to iOS - these terms are thrown around all over. Their specific meaning in the context of iOS could be quite specific, though.
I was actually trying to look for their difference from IOS perspective. Adding the following for people interested :
Purpose:
Archiving is used to store object graphs. complete data model can be archived and restored easily. The way Nib files work can be considered as example for archiving.
Serialization is used for storing arbitrary hierarchy of objects.
The wat plist files work can be considered as example fo serializations.
Differences(excerpts from Archives programing guide):
"The archive preserves the identity of every object in the graph and all the relationships it has with all the other objects in the graph."
Every object encoded within the context of rootObject invocation is tracked. If the coder is asked to encode an object more than once, the coder encodes a reference to the first encoding instead of encoding the object again.
"The serialization only preserves the values of the objects and their position in the hierarchy. Multiple references to the same value object might result in multiple objects when deserialized. The mutability of the objects is not maintained."
Implementation differences:
Any object that implements NSCoding protocol can be archived where as Only instances of NSArray, NSDictionary, NSString, NSDate, NSNumber, and NSData (and some of their subclasses) can be serialized. The contents of array and dictionary objects must also contain only objects of these few classes.
When to Use:
property lists(serialization) should be used for data that consists primarily of strings and numbers. They are very inefficient when used with large blocks of binary data.
It is worthy to Archive objects other than plist objects or storing large blocks of data.
Generally speaking, Serialization is concerned with converting your program data types into architecture independent byte streams. Archiving is specialized serialization in that you could store type and other relationship based information that allow you to unserialize/unmarshall easily. So archival can be thought of as a specialization and subset of Serialization. For Objective-C
Serialization converts Objective-C
types to and from an
architecture-independent byte stream.
In contrast to archiving, basic
serialization does not record the data
type of the values nor the
relationships between them; only the
values themselves are recorded. It is
your responsibility to deserialize the
data in the proper order. Several
convenience classes, however, do
provide the ability to serialize
property lists, recording their
structure along with their values.
With C++ boost serialization --
http://www.boost.org/doc/libs/1_45_0/libs/serialization/doc/index.html
Here, we use the term "serialization"
to mean the reversible deconstruction
of an arbitrary set of C++ data
structures to a sequence of bytes.
Such a system can be used to
reconstitute an equivalent structure
in another program context. Depending
on the context, this might used
implement object persistence, remote
parameter passing or other facility.
In this system we use the term
"archive" to refer to a specific
rendering of this stream of bytes.
This could be a file of binary data,
text data, XML, or some other created
by the user of this library.