Need of serialization - serialization

I'm new to serialization concept, please help in understanding concept.
What exactly serialization means? I have read the definition, but could not understand in details.
How basic types (int, string) are serialized?
If we don't use serialization in our code how data will be transmitted?
Is there any implicit serialization process involved while accessing database from front end Java/C# code? example insert/delete from database.

Serialization just takes an object and translates it into something simpler. Imagine that you had an object in C# like so:
class Employee
{
public int age;
public string fullname;
}
public static void Main()
{
var john = new Employee();
john.age = 21;
john.fullname = "John Smith";
var matt = new Employee();
matt.age = 44;
matt.fullname = "Matt Rogers";
...
This is C# friendly. But if you wanted to save that information in a text file in CSV format, you would end up with something like this:
age,fullname
21,John Smith
44,Matt Rogers
When you write a CSV, you are basically serializing information into a different format - in this case a CSV file. You can serialize your object to XML, JSON, database table(s), memory or something else. Here's an example from Udemy regarding serialization.
If you don't serialize, confusion will be transmitted. Perhaps your object's ToString() will be implictly called before transmission and whatever result gets transmitted. Therefore it is vital to convert your data to something that is receiver friendly.
There's always some serialization happening. When you execute a query that populates a DataTable, for example, serialization occurred.

Concept :
Serialization is the process of converting an object into series of bytes.
Usually the objects we use in application will be complex and all of them can be easily represented in the form of series of bytes which can be stored in the file/database or transfered over network.
You can make a class Serializable just by making it implement Serializable interface.
For a class to be serialized successfully, two conditions must be met:
The class must implement the java.io.Serializable interface.
All of the fields in the class must be serializable. If a field is not serializable, it must be marked transient.
When the program is done serializing, and if it is stored in a file with extension .ser then it can be used for deserializing.
Serialization gives an serialVersionUID to the serialized object which has to match for deserialization

Related

OOAD - File-Format-Reader class vs Object-Model class: which should depend on which?

Let's consider, as an example, the domain of GPS and Geographical (GIS) entities.
We would model the meaningful geographic entities (points, paths, regions) as classes in any desired programming language, and these classes would be a conceptual, "implementation-free" representation of these entities.
On the other hand, there are a lot of file formats that save these features with more or less the same meaning. In the GPS domain the most common file formats are GPX, KML, ShapeFile, WellKnownText, etc.
Supposing, then, I want to create a GpsFeatureCollection class which would contain a Points property, a Paths property, and so on. Also, I would implement classes like GpsReader, KmlReader, ShapeFileReader (and their respective Writers) and so on.
THE QUESTION IS:
Which is the best practice in OOAD:
Have a GpsFeatureCollection to instantiate a FileFormat(Reader/Writer) class?
Have a GpsFeatureCollection to implement Read/WriteFromFormat methods instead of classes?
Have each file format reader to instantiate an empty GpsFeatureCollection, populate it with data read from file, then pass the populated object as a return value?
Have a mediator class to avoid any dependency between FileFormatClass and ObjectModelClass?
None of the above?
"Well, it depends..."
I am really interested in doing "the right thing". My immediate plans are to use Python, but most probably this would matter for other languages too. This is causing some "analysis paralysis" in my pet project currently...
Here is my take wherein I pass reader and writer instances to read() and write() methods, this seems to achieve good level of decoupling and yet provides flexibility to pick various readers and writers.
Code uses Java-like syntax
Declare a Reader interface, we will assuming multiple implementation such KMLReader,
ShapeFileReader, etc
interface Reader {
GpsFeatureCollection read();
}
Declare a Writer interface, we will assuming multiple implementation such KMLWriter, ShapeFileWriter, etc
interface Writer {
void write(GpsFeatureCollection c);
}
Let's declare GpsFeatureCollection class to have read and write methods which accept respective interfaces as parameter to perform the job.
class GpsFeatureCollection {
...
public static GpsFeatureCollection read(Reader r) {
return r.read();
}
public static void write(Writer w) {
w.write(this);
}
}
Some example of usage using different readers and writers.
// Reading data
GpsFeaureCollection data = GpsFeatureCollection.read(new ShapeFileReader("/tmp/shapefile"));
// Writing data
data.write(new KMLWriter("/tmp/kmlfile"));

Is protobuf-net suited for serializing arbitrary object/domain models?

I have been exploring the CQRS/DDD-principles and patterns for a while now and have started implementing a sample project where I have split my storage-model into a WriteModel and a ReadModel. The WriteModel will use a simple NoSQL-like database where aggregates are stored in a key-value style, with value being just a serialized version of the aggregate.
I am now looking at ProtoBuf-Net for serializing and deserializing my domain model aggregates in and out of storage. Other than this post I haven't found any guidance or tips for using ProtoBuf-Net in this area. The point is that the (ideal) requirements for serialization and deserialization of aggregates is that the domain model should have as little knowledge as possible about this infrastructural concern, which implies the following:
No attributes on the classes
No constructors, getters, setters or any other piece of code just for the sake of serialization.
Ability to use any (custom) type possible and have it serialized/deserialized.
Thus far I have implemented just the serialization of the first versions of my aggregates which works perfectly fine. I use the RuntimeTypeModel.Default-instance to configure the MetaModel at runtime and have UseConstructor = false everywhere, which enables me to completely separate the serialization mechanics from my domain-assembly. I have even implemented a custom post-deserialization mechanism that enables me to just-in-time initialize fields after ProtoBuf-Net has deserialized it into a valid instance. So suppose I have class AggregateA like so:
[Version(1)]
public sealed class AggregateA
{
private readonly int _x;
private readonly string _y;
...
}
Then in my serialization-library I have code something along the following lines:
var metaType = RuntimeTypeModel.Default.Add(typeof(AggregateA), false);
metaType.UseConstructor = false;
metaType.AddField(1, "_x");
metaType.AddField(2, "_y");
...
However, I realize that up to this point I have only implemented the basic scenario, and I am now starting to think about how to approach versioning of my model. I am particularly interested in larger refactoring-scenario's, where type A has been split into type A1 and A2, for example:
[Version(2)]
public sealed class AggregateA1
{
private readonly int _x;
...
}
[Version(2)]
public sealed class AggregateA2
{
private readonly string _y;
...
}
Suppose I have a serialized bunch of instances of AggregateA, but now my domain model knows only AggregateA1 and AggregateA2, how would you handle this scenario with ProtoBuf-Net?
A second question deals with point 3: is ProtoBuf-Net capable of handling arbitrary types if you're willing to put in some extra configuration-effort? I've read about exceptions raised when using the DateTimeOffset-type, which makes me think not all types can be serialized by the framework out-of-the-box, but can I serialize these types by registering them in the RuntimeTypeModel? Should I even want to go there? Or better to forget about serializing common .NET types other than the simple ones?
protobuf-net is intended to work with predictable known models. It is true that everything can be configured at runtime, but I have not put any thought as to how to handle your A1/A2 scenario, precisely because that is not a supported scenario (in my defense, I can't see that working nicely with most serializers). Thinking off the top of my head, if you have the configuration/mapping data somewhere, then you could simply deserialize twice; i.e. as long as we still tell it that AggregateA1._x maps to 1 and AggregateA2._y maps to 2, you can do:
object a1 = model.Deserialize(source, null, typeof(AggregateA1));
source.Position = 0; // rewind
object a2 = model.Deserialize(source, null, typeof(AggregateA2));
However, more complex tweaks would require additional thought.
Re "arbitrary types"... define "arbitrary" ;p In particular, there is support for "surrogate" types which can be useful for some transformations - but without a very specific "problem statement" it is hard to answer completely.
Summary:
protobuf-net has an intended usage, which includes both serialization-aware (attributed, etc) and non-aware scenarios (runtime configuration, etc) - but it also works for a range of more bespoke scenarios (letting you drop to the raw reader/writer API if you want to). It does not and cannot guarantee to be a direct fit for every serialization scenario imaginable, and how well it behaves will depend on how far from that scenario you are.

Protocol buffer and OO design

I'm using protocol buffer as a wire data-format in a client-server architecture. Domain objects (java beans) will go through following life-cycle.
Used in client side business logic
Converted to protobuf format
Transmitted to the server
Converted back to domain object
Used in server side business logic
"Protocol Buffers and O-O Design" section in ProtoBuf documentation recommends wrapping generated class inside proper domain model.
I'd like to find-out the best appoach.
For e.g. I have a simple proto definition.
package customer;
option java_package = "com.example";
option java_outer_classname = "CustomerProtos";
message Customer {
required string name = 1;
optional string address = 2;
}
This is how domain model is defined. As you can see, the data is completely stored in proto builder object.
package com.example;
public class CustomerModel
{
private CustomerProtos.Customer.Builder builder = CustomerProtos.Customer.newBuilder();
public String getName()
{
return builder.getName();
}
public void setName(String name)
{
builder.setName(name);
}
public String getAddress()
{
return builder.getAddress();
}
public void setAddress(String address)
{
builder.setAddress(address);
}
public byte[] serialize()
{
return builder.build().toByteArray();
}
}
Is this a good practice? because these objects are used in all phases of life-cycle, but we only requires protocolbuf format at client-server transmission phase.
Is there any performance issue when accessing proto builder class getter/setter methods specially when proto definition is complex and nested?
I have no experience with protocol buffers, but I would not recommend implementing your domain objects tailored to a specific serialization/transfer framework. You might regret that in the future.
The domain objects and logic of a software application should be as independent as possible from specific implementation issues (in your case serialization/transfer), because you want your domain to be easy to understand and be reusable/maintainable in the future.
If you want to define your domain objects independent of serialization/transfer, you have two options:
Before serialization/transfer, you copy the information to protocol
buffers specific objects and send them to your server. There you
would have to copy the information back to your domain objects.
Use a non-protocol serialization library like Kryo or
ProtoStuff to directly transfer your domain objects to the
server.
The disadvantages of option 1 are that your domain is defined two times (which is undesirable with respect to modifications) and the copying of information (which produces error-prone and non maintainable code).
The disadvantages of option 2 are that you lose schema evolution (although ProtoStuff apparently supports it) and the complete (potentially large) object graph is serialized and transferred. Although you could prune the object graph (manually or with JGT) before serialization/transfer.
We've made a protobuf-converter to solve the problem of transformation of your Domain Model Objects into Google Protobuf Messages and vice versa.
How to use it:
Domain model classes that have to be transformed into protobuf messages must satisfy conditions:
Class has to be marked by #ProtoClass annotaion that contains
reference on related protobuf message class.
Class fields has to be marked by #ProtoField annotaion. These fields must have getters and setters.
E.g.:
#ProtoClass(ProtobufUser.class)
public class User {
#ProtoField
private String name;
#ProtoField
private String password;
// getters and setters for 'name' and 'password' fields
...
}
Code for conversion User instance into related protobuf message:
User userDomain = new User();
...
ProtobufUser userProto = Converter.create().toProtobuf(ProtobufUser.class, userDomain);
Code for backward conversion:
User userDomain = Converter.create().toDomain(User.class, userProto);
Conversion of lists of objects is similar to single object conversion.

An alternative way to use Azure Table Storage?

I'd like to use for table storage an entity like this:
public class MyEntity
{
public String Text { get; private set; }
public Int32 SomeValue { get; private set; }
public MyEntity(String text, Int32 someValue)
{
Text = text;
SomeValue = someValue;
}
}
But it's not possible, because the ATS needs
Parameterless constructor
All properties public and
read/write.
Inherit from TableServiceEntity;
The first two, are two things I don't want to do. Why should I want that anybody could change some data that should be readonly? or create objects of this kind in a inconsistent way (what are .ctor's for then?), or even worst, alter the PartitionKey or the RowKey. Why are we still constrained by these deserialization requirements?
I don't like develop software in that way, how can I use table storage library in a way that I can serialize and deserialize myself the objects? I think that as long the objects inherits from TableServiceEntity it shouldn't be a problem.
So far I got to save an object, but I don't know how retrieve it:
Message m = new Message("message XXXXXXXXXXXXX");
CloudTableClient tableClient = account.CreateCloudTableClient();
tableClient.CreateTableIfNotExist("Messages");
TableServiceContext tcontext = new TableServiceContext(account.TableEndpoint.AbsoluteUri, account.Credentials);
var list = tableClient.ListTables().ToArray();
tcontext.AddObject("Messages", m);
tcontext.SaveChanges();
Is there any way to avoid those deserialization requirements or get the raw object?
Cheers.
If you want to use the Storage Client Library, then yes, there are restrictions on what you can and can't do with your objects that you want to store. Point 1 is correct. I'd expand point 2 to say "All properties that you want to store must be public and read/write" (for integer properties you can get away with having read only properties and it won't try to save them) but you don't actually have to inherit from TableServiceEntity.
TableServiceEntity is just a very light class that has the properties PartitionKey, RowKey, Timestamp and is decorated with the DataServiceKey attribute (take a look with Reflector). All of these things you can do to a class that you create yourself and doesn't inherit from TableServiceEntity (note that the casing of these properties is important).
If this still doesn't give you enough control over how you build your classes, you can always ignore the Storage Client Library and just use the REST API directly. This will give you the ability to searialize and deserialize the XML any which way you like. You will lose the all of the nice things that come with using the library, like ability to create queries in LINQ.
The constraints around that ADO.NET wrapper for the Table Storage are indeed somewhat painful. You can also adopt a Fat Entity approach as implemented in Lokad.Cloud. This will give you much more flexibility concerning the serialization of your entities.
Just don't use inheritance.
If you want to use your own POCO's, create your class as you want it and create a separate tableEntity wrapper/container class that holds the pK and rK and carries your class as a serialized byte array.
You can use composition to achieve what you want.
Create your Table Entities as you need to for storage and create your POCOs as wrappers on those providing the API you want the rest of your application code to see.
You can even mix in some interfaces for better code.
How about generating the POCO wrappers at runtime using System.Reflection.Emit http://blog.kloud.com.au/2012/09/30/a-better-dynamic-tableserviceentity/

Where should the responsibility for parsing the input stream be in this scenario?

Say if I am parsing readings from a handheld device of some sort via an input stream. There are readings of different types, and each need parsing differently.
Currently I have a class "handheld" that handles all parsing and creates reading objects of the appropriate type as required. It parses the reading and populates each reading via their "set" methods.
I'm wondering though if the readings themselves should know how to parse the input stream. For instance, when the next reading comes along, should I instantiate the appropriate reading object and call a "parse" method on it, passing it in the input stream?
The main thing I don't like about this is the parsing code is all over the place rather than kept neatly in one place. It does however get rid of the need for all those set methods and the reading can just apply itself to the server/database/whatever when required via the "apply" method I have.
So which would be considered the "nicer" (or more OO) way?
I would go by creating a Factory design pattern.
Create a base class to represent GeneralParser and make a child class for each parser and if there was something common in the parsing method, let it be in the base GeneralParser's Parse method and call base.parse method in child.parse method.
I am sure you have a way to determine which parser to use, and I think currently you're using control statements (if, switch...) and do the parsing. Well now instead of that let the specialized (child) parser class handle it for you.
Pseudo class diagram:
GeneralParser
|
|
->XMLParser
->JsonParser
Here is some implementation in C#.Net
public static class ParserFactory
{
public static GeneralParser CreateXMLParser()
{
return new XMLParser();
}
public static GeneralParser CreateJsonParser()
{
return new JSONParser();
}
}
In your program code, you may write something like this (pseudo-code) because it depends on the way that you're deciding which parser to use.
// ...
GeneralParser parser;
if( to_be_parsed_as_xml)
{
parser = ParserFactory.CreateXMLParser();
parser.Parse(stream);
}
else if( to_be_parsed_as_json )
{
parser = ParserFactory.CreateJsonParser();
parser.Parse(stream);
}
// ...
You can create a parser on the fly (without keeping its reference) if you only need parsers to parse and nothing more.