What does Serializing a graph mean? - serialization

I have seen this expression "Graph Serialization" in so many places. what does it mean? And what does serialization mean in general and when it is used or in which domains it is mentioned?

Serialization is the process of turning a data set into binary data for transmission or storage. On the iPhone for example, we do this:
NSString *myStringToSerialize = #"I'm going to be bits!";
NSData *data = [myStringToSerialize dataUsingEncoding: NSUnicodeStringEncoding];
The data object is now a binary representation of myStringToSerialize, that we can do something with it (POST it to a web server, save it to a file, email it, etc...).
Graph Serialization is when you take the graph structure and write it to bits so that you can send it somewhere and read it again.
We normally serialize for two reasons:
1) Serialization provides:
A method of persisting objects which is more convenient than writing their properties to a text file on disk, and re-assembling them by reading this back in.
A method of issuing remote procedure calls, e.g., as in SOAP
A method for distributing objects, especially in software componentry such as COM, CORBA, etc.
A method for detecting changes in time-varying data.
2) Serialization allows us to transfer objects between programming languages and various systems that would not be interoperable without serialization.

Serialization is used to flatten a complex structure in something that can be easily transmitted or stored. Every application uses objects that can represent a functional structure (List, Tree, Graph).
But problems come when you have to use them outside your application. How for instance, will you save your fabulous customer list once your edited it ? How can you provide a temperature graph through a web-service. Think something about putting them in a linear structure, like an array of bytes or a string or a database field.
For example, xml file is the result of serializing a tree.
Graph serialization is about serializing graphs. The big issue with this type of content, they are harder to crawl. Unlike trees, you can loop through nodes ; they are harder to represent them in a linear way.

Related

Can Protocol Buffer be partially serialized?

Originally, the program saves the data to file by its own defined behavior. First, the data is defined as following:
struct Data{
DWORD m_Location;
BYTE m_StableCount;
BYTE extra[3]; /* nice 4 byte divisible value */
// the following data is not stored in the file
DWORD m_Uid;
WORD m_Address;
};
Those fields before m_Uid will be stored into file, however, the others does NOT.
Now, I want to convert the Data into protocol buffer message. As far as I know, all fields defined in the message can be serialized. So I have to split the Data into two parts: one including all saved fields, the other including the rest fields.
Here is my question: What if I declare all fields of Data in one message, and only serialize some partial fields in protocol buffer? Any API support it or NOT?
Thanks in advance.
This largely depends on what library you are using. A lot of protocol buffers implementations work as code-gen from the schema, and you have to use the generated DTO - so you would already need to push the data into a different object model. That is an implementation detail, though - it isn't a protocol requirement. For example, protobuf-net allows your existing model to be used, and makes it possible to ignore/include values both generally, and specifically (i.e. it allows per-instance conditional serialization, using the standard conventions of the .NET world for such things). However, I'm assuming that your question relates specifically to non-.NET code, in which case the challenge would be to find a C/C++ library that allows for this approach.

Why do we use serialization?

Why do we need to use serialization?
If we want to send an object or piece of data through a network we can use streams of bytes. If we want to save some data to the disk, we can again use the binary mode along with the byte streams and save it.
So what's the advantage of using serialization?
Technically on the low-level, your serialized object will also end up as a stream of bytes on your cable or your filesystem...
So you can also think of it as a standardized and already available way of converting your objects to a stream of bytes. Storing/transferring object is a very common requirement, and it has less or little meaning to reinvent this wheel in every application.
As other have mentioned, you also know that this object->stream_of_bytes implementation is quite robust, tested, and generally architecture-independent.
This does not mean it is the only acceptable way to save or transfer an object: in some cases, you'll have to implement your own methods, for example to avoid saving unnecessary/private members (for example for security or performance reasons). But if you are in a simple case, you can make your life easier by using the serialization/deserialization of your framework, language or VM instead of having to implement it by yourself.
Hope this helps.
Quoting from Designing Data Intensive Applications book:
Programs usually work with data in (at least) two different
representations:
In memory, data is kept in objects, structs, lists, arrays, hash tables, trees, and so on. These data structures are optimized for
efficient access and manipulation by the CPU (typically using
pointers).
When you want to write data to a file or send it over the network, you have to encode it as some kind of self-contained sequence of bytes
(for example, a JSON document). Since a pointer wouldn’t make sense to
any other process, this sequence-of-bytes representation looks quite
different from the data structures that are normally used in memory.
Thus, we need some kind of translation between the two
representations. The translation from the in-memory representation to
a byte sequence is called encoding (also known as serialization or
marshalling), and the reverse is called decoding (parsing,
deserialization, unmarshalling).
Among other reasons to be compatible between architecture. An integer doesn't have the same number of bytes from one architecture to another, and sometimes from one compiler to another.
Plus what you're talking about is still serialization. Binary Serialization. You're putting all the bytes of your object together in order to store them and be able to reconvert them as an object later. This is serializing.
More info on wikipedia
Serialization is the process of converting an object into a stream so that it can be saved in any physical file like (XML) or can be saved in Database. The main purpose of Serialization in C# is to persist an object and save it in any specified storage medium like stream, physical file or DataBase.
In General, serialization is a method to persist an object's state, but i suggest you to read this wiki page, it is pretty detailed and correct in my opinion:
http://en.wikipedia.org/wiki/Serialization
In serialization, the point is not turning an object into bits and bytes, objects ARE bits and bytes already. Serialization is the process of making the object's "state" persistent. Notice the word "state", which means the values of the instance variables of the entire object graph (the target object and all the objects it references either directly or indirectly) WITHOUT methods and other extra runtime stuff stuck to them (and of course plus a little more info that JVM needs for restoration of these objects, such as their class types).
So this is the main reason of its necessity: Storing the whole bytes of objects would be expensive, and for all intents and purposes, unnecessary.

Save MKOverlay objects in Core Data

I have a large MKOverlay that I would like to be saved in Core Data so that I don't have to create it later. Since this isn't one of the types that you can choose in Core Data, how do I go about saving it?
Do I need to somehow encode it first?
Do I then need to decode it when using?
What kind of object do I select in core data when creating a new property?
Thanks guys.
If you do not need to query for different overlays and you're not using core data elsewhere in your project, then you're probably better off caching the overlay on disk as an encoded NSArray.
However, if you're already using Core Data or you're caching multiple overlays then you can encode/decode the overlay in a field of type NSData. Add additional fields to the entity so you can query for the specific overlay you're looking for.
In iOS 5, you can enable optional storage of NSData fields in an external file by selecting the "Allows External Storage" option. Core Data will apply a size-based heuristic to determine if a blob or external file will result in better performance.
MKOverlay conforms to NSCoding, so you can encode and decode an entire array of MKOverlay objects using an encode method of NSKeyedArchiver and store the result in a binary field in your entity. You'll likely want + (NSData *)archivedDataWithRootObject:(id)rootObject on NSKeyedArchiver and + (id)unarchiveObjectWithData:(NSData *)data on NSKeyedUnarchiver
See the Archives section in the Archives and Serializations Programming guide for details of creating a keyed archive at: http://developer.apple.com/library/ios/#documentation/Cocoa/Conceptual/Archiving/Articles/archives.html
You can write a custom accessor for the entity's binary field that encodes and decodes the overlay array for you. Another option is to create a value transformer that encapsulates the encoding and decoding operations. The end result would be an overlays array property that you can set and read via entity.overlays.
I believe you can use Apple's NSCoding libraries to convert the object to and from a serialized state. However, Core Data may support saving objects, but NSCoding lets you save any class that implements it anywhere, including a string sent to a server, a file written to disk, or if you're as bad a programmer as me, an NSUserDefaults entry.
edit- You may have to implement NSCoding into your own class based on MKOverlay by adding read and write methods, I'm uncertain.
Why not instead save the properties (size, color, coordinates, etc can all be described with NSNumbers and those can be stored in Core Data natively) and recreate the MKOverlay when needed. I think that's a much more efficient approach to be honest. I'm not sure how much of an impact creating an object has, so prove me wrong if I'm wrong.
You need to take the large dataset that composes the overlay and turn those individual data nodes into NSManagedObjects to be stored in CoreData.
I mean, you probably COULD just NSCoder the entire thing into one giant datablob, but at that point, you might as well just write the thing to a flat file (which frankly might be better if all you want to do is read/write it without changing it).
Don't use Core Data unless you're going to be doing legit querying or piecemeal changes to the dataset.

Advantages and disadvantages of encoding objects with NSCoding or simply writing data to files

I'm curious what the advantages of encoding objects in objective c with NSCoding and writing them to disk may be over simply writing a persistence object to disk. Is there a performance increase in terms of I/O or disk space usage?
Well, most NSCoding implementations will handle object graphs correctly; i.e. if you code a member object that's already been coded to the coder, it won't code it again. Decoding will restore the object graph correctly (so the decoded target object has multiple inbound references). You also get all the built in helper coding functions (for primitive types, and objects).
Other than that, NSCoders are just persistence object generators, so you end up doing similar work, only without the annoyances and common cases handled by Apple. What persistence generator could you write that wouldn't duplicate tonnes of NSCoder functionality?

Serialization vs. Archiving?

The iOS docs differentiate between "serializing" and "archiving." Is this a general distinction (i.e., holds in other languages) or is it specific to Objective-C? Also, what is the difference between these two?
This is a case of one being the other some (but not all) of the time.
Wikipedia has this to say about serialization:
"Serialization is the process of converting a data structure or object into a sequence of bits so that it can be stored in a file or memory buffer, or transmitted across a network connection link to be "resurrected" later in the same or another computer environment"
So, archiving may only be serialization, but it could also be the combination of serialization and compresssion, for example. Or perhaps it adds some kind of header info. So serialization is a form of archive, but an archive is not necessarily a serialization.
This isn't really specific to iOS - these terms are thrown around all over. Their specific meaning in the context of iOS could be quite specific, though.
I was actually trying to look for their difference from IOS perspective. Adding the following for people interested :
Purpose:
Archiving is used to store object graphs. complete data model can be archived and restored easily. The way Nib files work can be considered as example for archiving.
Serialization is used for storing arbitrary hierarchy of objects.
The wat plist files work can be considered as example fo serializations.
Differences(excerpts from Archives programing guide):
"The archive preserves the identity of every object in the graph and all the relationships it has with all the other objects in the graph."
Every object encoded within the context of rootObject invocation is tracked. If the coder is asked to encode an object more than once, the coder encodes a reference to the first encoding instead of encoding the object again.
"The serialization only preserves the values of the objects and their position in the hierarchy. Multiple references to the same value object might result in multiple objects when deserialized. The mutability of the objects is not maintained."
Implementation differences:
Any object that implements NSCoding protocol can be archived where as Only instances of NSArray, NSDictionary, NSString, NSDate, NSNumber, and NSData (and some of their subclasses) can be serialized. The contents of array and dictionary objects must also contain only objects of these few classes.
When to Use:
property lists(serialization) should be used for data that consists primarily of strings and numbers. They are very inefficient when used with large blocks of binary data.
It is worthy to Archive objects other than plist objects or storing large blocks of data.
Generally speaking, Serialization is concerned with converting your program data types into architecture independent byte streams. Archiving is specialized serialization in that you could store type and other relationship based information that allow you to unserialize/unmarshall easily. So archival can be thought of as a specialization and subset of Serialization. For Objective-C
Serialization converts Objective-C
types to and from an
architecture-independent byte stream.
In contrast to archiving, basic
serialization does not record the data
type of the values nor the
relationships between them; only the
values themselves are recorded. It is
your responsibility to deserialize the
data in the proper order. Several
convenience classes, however, do
provide the ability to serialize
property lists, recording their
structure along with their values.
With C++ boost serialization --
http://www.boost.org/doc/libs/1_45_0/libs/serialization/doc/index.html
Here, we use the term "serialization"
to mean the reversible deconstruction
of an arbitrary set of C++ data
structures to a sequence of bytes.
Such a system can be used to
reconstitute an equivalent structure
in another program context. Depending
on the context, this might used
implement object persistence, remote
parameter passing or other facility.
In this system we use the term
"archive" to refer to a specific
rendering of this stream of bytes.
This could be a file of binary data,
text data, XML, or some other created
by the user of this library.