Linq to Xml VS XmlSerializer VS DataContractSerializer - serialization

In my web method, I get an object of some third party C# entity class. The entity class is nothing but the DataContract. This entity class is quite complex and has properties of various types, some properties are collections too. Of course, those linked types are also DataContracts.
I want to serialize that DataContract entity into XML as part of business logic of my web service. I cannot use DataContractSerializer directly (on the object I receive in the web method) simply because the XML schema is altogether different. So the XML generated by DataContractSerializer will not get validated against the schema.
I am not able to conclude the approach I should follow for implementation. I could think of following implementation approaches:
LINQ to XML - This looks ok but I need to create XML tree (i.e. elements or XML representation of the class instance) manually for each type of object. Since there are many entity classes and they are linked to each other, I think this is too much of work to write XML elements manually. Besides, i'll have to keep modifying the XML Tree as and when the entity class introduces some new property. Not only this, the code where I generate XML tree would look little clumsy (at least in appearance) and would be harder to maintain/change by some other developer in future; he/she will have to look at it so closely to understand how that XML is generated.
XmlSerializer - I can write my own entity classes that represent the XML structure I want. Now, I need to copy details from incoming object to the object of my own classes. So this is additional work (for .NET too when code executes!). Then I can use XmlSerializer on my object to generate XML. In this case, I'll have to create entity classes and whenever third party entity gets modified, I'll have to just add new property in my class. (with XmlElement or XmlAttibute attributes). But people recommend DataContractSerializer over this one and so I don't want to finalize this unless all aspects are clear to me.
DataContractSerializer - Again here, I'll have to write my own entity class since I have no control over the third party DataContracts. And I need to copy details from incoming object to the object of my own classes. So this is additional work. However, since DataContractSerializer does not support Xml attributes, I'll have to implement IXmlSerializable and generate required Xml in WriteXml method. DataContractSerializer is faster than XmlSerializer, but again I'll have to handle the changes (in WriteXml) if third party entity changes.
Questions:
Which approach is best in this scenario considering performance too?
Can you suggest some better approach?
Is DataContractSerializer worth considering (because it has better performance over XmlSerilaizer) when incoming entity class is subject to change?
Should LINQ be really used for serialization? Or is it really good for things other than querying?
Can XmlSerializer be preferred over LINQ in such cases? If yes, why?

I agree with #Werner Strydom's answer.
I decided to use the XmlSerializer because code becomes maintainable and it offers performance I expect. Most important is that it gives me full control over the XML structure.
This is how I solved my problem:
I created entity classes (representing various types of Xml elements) as per my requirement and passed an instance of the root class (class representing root element) through XmlSerializer.
Small use of LINQ in case of 1:M relationship:
Wherever I wanted same element (say Employee) many times under specific node (say Department) , I declared the property of type List<T>. e.g. public List<Employee> Employees in the Department class. In such cases XmlSerializer obviously added an element called Employees (which is grouping of all Employee elements) under the Department node. In such cases, I used LINQ (after XmlSerializer serialized the .NET object) to manipulate the XElement (i.e. XML) generated by XmlSerializer. Using LINQ, I simply put all Employee nodes directly under Department node and removed the Employees node.
However, I got the expected performance by combination of xmlSerializer and LINQ.
Downside is that, all classes I created had to be public when they could very well be internal!
Why not DataContractSerializer and LINQ-to-XML?
DataContractSerializer does not allow to use Xml attributes (unless I implement IXmlSerializable). See the types supported by DataContractSerializer.
LINQ-to-XML (and IXmlSerializable too) makes code clumsy while creating complex XML structure and that code would definitely make other developers scratch their heads while maintaining/changing it.
Is there any other way?
Yes. As mentioned by #Werner Strydom, you can very well generate classes using XSD.exe or tool like Xsd2Code and work directly with them if you are happy with the resulting classes.

I'll pick XmlSerializer because its the most maintainable for a custom schema (assuming you have the XSD). When you are done developing the system, test its performance in its entirety and determine whether XML serialization is causing problems. If it is, you can then replace it with something that requires more work and test it again to see if there is any gains. But if XML serialization isn't an issue, then you have maintainable code.
The time it takes to parse a small snippet of XML data may be negligible compared to communicating with the database or external systems. On systems with large memory (16GB+) you may find the GC being a bottleneck in .NET 4 and earlier (.NET 4.5 tries to solve this), especially when you work with very large data sets and streams.
Use AutoMapper to map objects created by XSD.EXE to your entities. This will allow the database design to change without impacting the web service.
One thing that is great about LINQ to XML is XSD validation. However, that impacts performance.

Another option is to utilize LINQ and Reflection to create a generic class to serialize your object to XML. A good example of this can be found at http://primecoder.blogspot.com/2010/09/how-to-serialize-objects-to-xml-using.html . I am not sure what your XML needs to look like at the end of the day, but if it is pretty basic this could do the trick. You would not need to make changes as your entity classes add/remove/change properties, and you could use this across all of your objects (and other projects if stored in a utility DLL).

Related

DTO to POCO with Lucene

We are using Lucene as the search server for data retrieval.
With this come certain complexities that I was unprepared for, not the least of which is managing relationships between objects.
I want to create a clean and simple POCO for our domain objects. These POCOs will contain related objects that I need for the UI, but no other fields (IDs defining these relationships, various other fields I simply don't need on the UI)
This means that I cannot directly translate Lucene's Hits collection into my UI-friendly POCOs and need some intermediary set of classes that will, at the least, contain IDs of related objects (stored in the same, or other indeces). I hesitate to call these DTO objects but for the sake simplicity I will call them that.
So I envision it working as follows:
Perform query in Lucene -> Hits collection
Iterate through Hits -> DTO collection
DTO collection -> [service to retrieve related objects, compose a POCO] ->
POCOs
Render a UI using the shiny simple POCOs
My fear in doing so is that I'll end up with Anemic Domain Model ( http://www.martinfowler.com/bliki/AnemicDomainModel.html ).
Is this a valid concern or am I on the right path?
I've ended up going the familiar to me pattern of a DTO. DTO has all the IDs - it is merely a CLR reflection of a record retrieved from Lucene.
I then map from DTO to a POCO in the service layer and use those objects to render the UI elements.
Does not feel slick, but it works.
Without any ID information in your POCOs, your design will likely suffer from anemia as there will just be an unconnected jumble of objects (which may not even fit all in memory at once). Also, it would seem to me that the lack of IDs would greatly interfere with caching and memoization (which help in not hitting the database every time you need an object). I have rarely had the luxury of assuming that all of my data will fit in memory all at once.

One large class compared to several small classes for a Cocoa Mac application

I have created an application, using ARC, that parses data from an online XML file. I am able to get everything I need using one class and one call to the API. The API provides the XML data. Due to the large xml file, I have a lot of variables, IBOutlets, and IBActions associated with this class.
But there are two approaches to this:
1) create a class which parses the XML data and also implements that data for your application
, i.e. create one class that does everything (as I have already done)
or
2) create a class which parses the XML data and create other classes which handle the data obtained from the XML parser class, i.e. one class does the parsing and another class implements that data
Note that some APIs that provide XML data track the number of calls/minute or calls/day to their service. So you would not want several classes calling the API, it would be better to make one request to the API which receives all the data you need.
So is it better to use several smaller classes to handle the xml data or is it fine to just use one large class to do everything?
When in doubt, smaller classes are better.
2) create a class which parses the XML data and create other classes which handle the data obtained from the XML parser class, i.e. one class does the parsing and another class implements that data
One key advantage of this is that the thing that the latter class models is separate from the parsing work that the former class does. This becomes important:
As Peter Willsey said, when your XML parser changes. For example, if you switch from stream-based to document-based parsing, or vice versa, or if you switch from one parsing library to another.
When your XML input changes. If you want to add support for a new format or a new version of a format, or kill off support for an obsolete format, you can simply add/remove parsing classes; the model class can remain unchanged (or receive only small and obvious improvements to support new functionality in new/improved formats).
When you add support for non-XML inputs. For example, JSON, plists, keyed archives, or custom proprietary formats. Again, you can simply add/remove parsing classes; the model class need not change much, if at all.
Even if none of these things ever happen, they're still better separated than mashed together. Parsing input and modeling the user's data are two different jobs; mashing them together makes them hard or impossible to reason about separately. Keep them separate, and you can change one without having to step around the other.
I guess it depends on your application. Something to consider is, what if you have to change the XML Parser you are using? You will have to rewrite your monolithic class and you could break a lot of unrelated functionality. If you abstracted the XML parser it would just be a matter of rewriting that particular class's implementation. Or what if the scope of your application changes and suddenly you have several views ? Will you be able to reuse code elsewhere without violating the DRY (Don't repeat yourself) principle ?
What you want to strive for is low coupling and high cohesion, meaning classes should not depend on each other and classes should have well defined responsibilities with highly related methods.

Does adding PetaPoco attributes to POCO's have any negative side effects?

Our current application uses a smart object style for working with the database. We are looking at the feasibility of moving to PetaPoco instead. Looking over the features I notice you can add attributes to make it easier to CRUD objects. Does adding these attributes have any negative side effects that I should be aware of?
Has anyone found a reason NOT to use these decorators?
Directly to the use of the POCO object instance itself? None.
At least not that I would be aware of. Jon Skeet should be able to provide more info because he knows compiler inner workings through and through, so he knows exactly what happens with this metadata after it's been compiled.
Other implications indirectly related to these
There are of course implications when accessing these declarative attributes, because they're read using reflection which is normally a slow process.
But there's nothing to worry here, because PetaPoco is a smart library and reads these only once then compiles & caches these things, so you only get penalized once then you get blazing performance afterwards. Because it uses compiled code.
Non-performance related implications
By putting attributes (any) on your classes/properties/methods you somehow bind your code to particular engine that will use this class, because they're directives for this particular engine to understand your code.
In case of PetaPoco attributes this means that your class can be used with PetaPoco but not with some other DAL (ie. EF) unless you add attributes of that one as well (EF Code First uses the very same approach with attributes).
The second implication is related to back-end database. In case you rename a table, column or any other part that is provided in your PetaPoco attribute as a constant magic string, you will subsequently have to change this string as well. This just means that you have to be thorough when doing database changes...
One downside is that it breaks the separation between the "domain" layer and the "data" layer, since it introduces the PetaPoco file (which contains data logic) to domain classes that should really not have any knowledge or dependency on the data layer.
If you're doing a single-project MVC app or something then it's okay to just use the Models directory for both, but for non-trivial and separated apps you'll have to have two PetaPoco files or play around with abstracting portions of the file in order to annotate your models without making them "know too much" about the underlying data, or else have you specify the table and/or primary key name all over the place.

Serialization of Objects

how does Serialization of objects works? How object got deserialized and a instance is created from serialized date without a call to any constructor?
I've kept this answer language agnostic since a language wasn't given.
When the object is serialized, all the require information to rebuild it is encoded in way which can be retrieved. This typically includes the type of the object, as well as the value of all the instance variables.
When the object is deserialized, an area in memory of the correct size is allocated and is populated using the serialized information such that the new object is identical to the serialized one.
The running program can then refer to this new object in memory without having to actually call the constructor.
There are lots of little details which this doesn't explain, but this is the general idea of serialization/deserialization.
Are you talking about Java? If so, serialization is an extralingual object creation mechanism. It's a backdoor that uses native code to create the object without calling any constructors. Therefore, when designing a class for serializability, you need to make sure that a class created through deserialization maintains the same invariants (key fields being initialized) as you would through the constructor path. A third way to create objects in Java is through cloning, and similar issues apply.
Cloning and serialization don't interact well with the use of final fields if you need to set the value of that field to something different than what is returned by clone or the deserialization process.
Josh Bloch's "Effective Java" has some chapters that explain these issues in more depth.
(this answer may apply to other languages too, but I've only used serialization in Java)
Regarding .NET: this isn't a definitive or textbook answer, and I might be all-out wrong...
.NET Serialization needs to be seperated out into Binary vs. others (XML or an XML derivitave typically). Binary serialization is mostly a black-box to me, but it allows the object to be serialized and restored in their current state. XML serialization typically only serialized the public fields/properties of an object, unless overriden by adding a custom ISerializable implementation.
In the case of XML serialization I believe .NET uses Reflection to determine which fields and properties get converted to their equivalent Elements. Adding an [XMLSerializable] attribute will implement a default behavior which can be adjusted by applying other attributes at the field level (such as [XMLAttribute]).
The metadata (which Reflection depends on) stores all the object members as well as their attributes and addresses, which allows the serializer to determine how it should build the output.

WCF and nullable attributes in the generated schemas

I am trying to do some contract first development, and have already designed a schema where an element has minoccurs=0 and nullable=false.
However I am not able to make a DataContract or XmlSerializer that generates this xsd.
I have an idea that this is not possible.
The only solution I have found is an ugly one, see it here.
In this solution i have to implement the IXmlSerializable and do the shema and the serialization myself. So I would have to maintin both the schema and the C# class -> Ugly
Has anyone found a solution to this or heard from Microsoft that this is not possible?
As far as I know, both XmlSerialiser and DataContractSerializer will generate object that will respect the schema if the object is filled correctly, but if the required values are not filled, the produced xml will not validate agains the schema.
You could try to use XsdObjectGenerator.