tl;dr: Is there a design pattern to load an XML file and construct hierarchical polymorphous objects based on the contents?
Say (example) I have a derived class Container derived from Storage, with a list of items.
In some way I have to load an XML file, this file has:
- Some generic info for Storage (name, location).
- That I need a Container implementation (eg: not a Cabinet or Shelves).
- Items with information about them.
Currently, I open an QXmlStreamReader in main, see what Storage I need, construct one, pass the reader to Storage to be loaded.
Storage in it's turn iterates all items, creates them and passes the reader on for the Item contents. This works, without polymorphism. (lots of switches, ugh)
The problem is that when I want to make things cleanly polymorphous, I reach a problem since I both have to initialize the base, and the derived.
Which needs the Call Super pattern, and it fails when I change the order of the XML file, since QXmlStreamReader is stateful. (has no reset)
Which smells, both because of the call super and the fact that all classes know about the xml file.
The problem:
class Storage {
list Items;
string location;
virtual getItem();
virtual loadXML(reader);
}
class Container : Storage {
getItem();
loadXML(reader); // needs call super
}
class Item {
string name;
int size;
virtual loadXML(reader);
}
class Thing : Item {
specific property;
loadXML(reader); // needs call super
}
What other design pattern can I use to construct such hierarchical implementation based on a XML file?
Related
Let's consider, as an example, the domain of GPS and Geographical (GIS) entities.
We would model the meaningful geographic entities (points, paths, regions) as classes in any desired programming language, and these classes would be a conceptual, "implementation-free" representation of these entities.
On the other hand, there are a lot of file formats that save these features with more or less the same meaning. In the GPS domain the most common file formats are GPX, KML, ShapeFile, WellKnownText, etc.
Supposing, then, I want to create a GpsFeatureCollection class which would contain a Points property, a Paths property, and so on. Also, I would implement classes like GpsReader, KmlReader, ShapeFileReader (and their respective Writers) and so on.
THE QUESTION IS:
Which is the best practice in OOAD:
Have a GpsFeatureCollection to instantiate a FileFormat(Reader/Writer) class?
Have a GpsFeatureCollection to implement Read/WriteFromFormat methods instead of classes?
Have each file format reader to instantiate an empty GpsFeatureCollection, populate it with data read from file, then pass the populated object as a return value?
Have a mediator class to avoid any dependency between FileFormatClass and ObjectModelClass?
None of the above?
"Well, it depends..."
I am really interested in doing "the right thing". My immediate plans are to use Python, but most probably this would matter for other languages too. This is causing some "analysis paralysis" in my pet project currently...
Here is my take wherein I pass reader and writer instances to read() and write() methods, this seems to achieve good level of decoupling and yet provides flexibility to pick various readers and writers.
Code uses Java-like syntax
Declare a Reader interface, we will assuming multiple implementation such KMLReader,
ShapeFileReader, etc
interface Reader {
GpsFeatureCollection read();
}
Declare a Writer interface, we will assuming multiple implementation such KMLWriter, ShapeFileWriter, etc
interface Writer {
void write(GpsFeatureCollection c);
}
Let's declare GpsFeatureCollection class to have read and write methods which accept respective interfaces as parameter to perform the job.
class GpsFeatureCollection {
...
public static GpsFeatureCollection read(Reader r) {
return r.read();
}
public static void write(Writer w) {
w.write(this);
}
}
Some example of usage using different readers and writers.
// Reading data
GpsFeaureCollection data = GpsFeatureCollection.read(new ShapeFileReader("/tmp/shapefile"));
// Writing data
data.write(new KMLWriter("/tmp/kmlfile"));
I'm using protocol buffer as a wire data-format in a client-server architecture. Domain objects (java beans) will go through following life-cycle.
Used in client side business logic
Converted to protobuf format
Transmitted to the server
Converted back to domain object
Used in server side business logic
"Protocol Buffers and O-O Design" section in ProtoBuf documentation recommends wrapping generated class inside proper domain model.
I'd like to find-out the best appoach.
For e.g. I have a simple proto definition.
package customer;
option java_package = "com.example";
option java_outer_classname = "CustomerProtos";
message Customer {
required string name = 1;
optional string address = 2;
}
This is how domain model is defined. As you can see, the data is completely stored in proto builder object.
package com.example;
public class CustomerModel
{
private CustomerProtos.Customer.Builder builder = CustomerProtos.Customer.newBuilder();
public String getName()
{
return builder.getName();
}
public void setName(String name)
{
builder.setName(name);
}
public String getAddress()
{
return builder.getAddress();
}
public void setAddress(String address)
{
builder.setAddress(address);
}
public byte[] serialize()
{
return builder.build().toByteArray();
}
}
Is this a good practice? because these objects are used in all phases of life-cycle, but we only requires protocolbuf format at client-server transmission phase.
Is there any performance issue when accessing proto builder class getter/setter methods specially when proto definition is complex and nested?
I have no experience with protocol buffers, but I would not recommend implementing your domain objects tailored to a specific serialization/transfer framework. You might regret that in the future.
The domain objects and logic of a software application should be as independent as possible from specific implementation issues (in your case serialization/transfer), because you want your domain to be easy to understand and be reusable/maintainable in the future.
If you want to define your domain objects independent of serialization/transfer, you have two options:
Before serialization/transfer, you copy the information to protocol
buffers specific objects and send them to your server. There you
would have to copy the information back to your domain objects.
Use a non-protocol serialization library like Kryo or
ProtoStuff to directly transfer your domain objects to the
server.
The disadvantages of option 1 are that your domain is defined two times (which is undesirable with respect to modifications) and the copying of information (which produces error-prone and non maintainable code).
The disadvantages of option 2 are that you lose schema evolution (although ProtoStuff apparently supports it) and the complete (potentially large) object graph is serialized and transferred. Although you could prune the object graph (manually or with JGT) before serialization/transfer.
We've made a protobuf-converter to solve the problem of transformation of your Domain Model Objects into Google Protobuf Messages and vice versa.
How to use it:
Domain model classes that have to be transformed into protobuf messages must satisfy conditions:
Class has to be marked by #ProtoClass annotaion that contains
reference on related protobuf message class.
Class fields has to be marked by #ProtoField annotaion. These fields must have getters and setters.
E.g.:
#ProtoClass(ProtobufUser.class)
public class User {
#ProtoField
private String name;
#ProtoField
private String password;
// getters and setters for 'name' and 'password' fields
...
}
Code for conversion User instance into related protobuf message:
User userDomain = new User();
...
ProtobufUser userProto = Converter.create().toProtobuf(ProtobufUser.class, userDomain);
Code for backward conversion:
User userDomain = Converter.create().toDomain(User.class, userProto);
Conversion of lists of objects is similar to single object conversion.
I have slightly peculiar program which deals with cases very similar to this
(in C#-like pseudo code):
class CDataSet
{
int m_nID;
string m_sTag;
float m_fValue;
void PrintData()
{
//Blah Blah
}
};
class CDataItem
{
int m_nID;
string m_sTag;
CDataSet m_refData;
CDataSet m_refParent;
void Print()
{
if(null == m_refData)
{
m_refParent.PrintData();
}
else
{
m_refData.PrintData();
}
}
};
Members m_refData and m_refParent are initialized to null and used as follows:
m_refData -> Used when a new data set is added
m_refParent -> Used to point to an existing data set.
A new data set is added only if the field m_nID doesn't match an existing one.
Currently this code is managing around 500 objects with around 21 fields per object and the format of choice as of now is XML, which at 100k+ lines and 5MB+ is very unwieldy.
I am planning to modify the whole shebang to use ProtoBuf, but currently I'm not sure as to how I can handle the reference semantics. Any thoughts would be much appreciated
Out of the box, protocol buffers does not have any reference semantics. You would need to cross-reference them manually, typically using an artificial key. Essentially on the DTO layer you would a key to CDataSet (that you simply invent, perhaps just an increasing integer), storing the key instead of the item in m_refData/m_refParent, and running fixup manually during serialization/deserialization. You can also just store the index into the set of CDataSet, but that may make insertion etc more difficult. Up to you; since this is serialization you could argue that you won't insert (etc) outside of initial population and hence the raw index is fine and reliable.
This is, however, a very common scenario - so as an implementation-specific feature I've added optional (opt-in) reference tracking to my implementation (protobuf-net), which essentially automates the above under the covers (so you don't need to change your objects or expose the key outside of the binary stream).
Say if I am parsing readings from a handheld device of some sort via an input stream. There are readings of different types, and each need parsing differently.
Currently I have a class "handheld" that handles all parsing and creates reading objects of the appropriate type as required. It parses the reading and populates each reading via their "set" methods.
I'm wondering though if the readings themselves should know how to parse the input stream. For instance, when the next reading comes along, should I instantiate the appropriate reading object and call a "parse" method on it, passing it in the input stream?
The main thing I don't like about this is the parsing code is all over the place rather than kept neatly in one place. It does however get rid of the need for all those set methods and the reading can just apply itself to the server/database/whatever when required via the "apply" method I have.
So which would be considered the "nicer" (or more OO) way?
I would go by creating a Factory design pattern.
Create a base class to represent GeneralParser and make a child class for each parser and if there was something common in the parsing method, let it be in the base GeneralParser's Parse method and call base.parse method in child.parse method.
I am sure you have a way to determine which parser to use, and I think currently you're using control statements (if, switch...) and do the parsing. Well now instead of that let the specialized (child) parser class handle it for you.
Pseudo class diagram:
GeneralParser
|
|
->XMLParser
->JsonParser
Here is some implementation in C#.Net
public static class ParserFactory
{
public static GeneralParser CreateXMLParser()
{
return new XMLParser();
}
public static GeneralParser CreateJsonParser()
{
return new JSONParser();
}
}
In your program code, you may write something like this (pseudo-code) because it depends on the way that you're deciding which parser to use.
// ...
GeneralParser parser;
if( to_be_parsed_as_xml)
{
parser = ParserFactory.CreateXMLParser();
parser.Parse(stream);
}
else if( to_be_parsed_as_json )
{
parser = ParserFactory.CreateJsonParser();
parser.Parse(stream);
}
// ...
You can create a parser on the fly (without keeping its reference) if you only need parsers to parse and nothing more.
I have document scanning system where several types of documents are scanned. Initially, the document has no information when its scanned, then they get classified and additional information is entered for them in a second step later. So, I have a base class called Document, and subclasses for each type with their respective metadata like below. I have it setup as a table-per-subclass (joined subclass) mapping in NHibernate.
public class Document
{
public int ID { get; set; }
public string FilePath { get; set; }
}
public class Certificate : Document
{
// certificate-specific fields
}
public class Correspondence : Document
{
// correspondence-specific fields
}
What I need to be able to do is create a Document class first and save it. Then retrieve in a second step later on and convert it to one of the subclass types and fill in the rest of its information. What would be the best approach to do this, and is this even possible with NHibernate? If at all possible I would like to retain the original document record, but its not a dealbreaker if I have to jettison it.
Unfortunately, NHibernate does not allow you to switch between subclasses after initial creation; to get this working the way you want, you have 3 options:
Use a native sql call to change the discriminator (and possibly) add or change any subclass-related fields.
Copy the contents of your object to a new object of the proper class and then delete the original.
Don't use subclasses, control the state of your object through an enumeration or some other mechanism that allows you to determine their type at run-time.
This issue has already been discussed here. I would go with Terry Wilcox's tip to use a role for this. Composition over inheritance.