How to solve dependency when deserializing object graph - serialization

When I deserialize an object graph by restoring all each node and edges by object itself (for object-oriented way), I found there's serious dependency problem.
For example if an object A references itself A, (self circular reference) it means, it expects itself A in complete original state while it is restoring. Because it was in that state when it was being serialized.
For self-referencing case, it can be detected because it knows itself is being restored. But if the A needs another object B, it should expect the other object B in complete original state too. If the object B also references A, now there's circular reference to A again, and it becomes equal problem without knowing the A is now being restored. If B wants to use some property of A while deserialization, it is not guaranteed to be exist.
Fundamentally, this problem happens because an object needs complete state of the other objects though itself is in incomplete state. This doesn't make sense. I have thought about dividing restoring process into multiple phases, but it doesn't make any real difference because actually the object will remain in incomplete state until all phases finish.
Can I have some advise or good solution for this problem?
PS.
I started coding this to make some replacement for Cocoa's NSKeyedArchiver. So I assumed encoding of object is done by the object itself. (for its internal state) So it could be different with general graph problems. But I can't exclude hidden state that can be get by the object itself...

For later reference...
I studied a lot about this problem after I posted this question. And I realized this problem is fundamentally impossible to solve. (this problem is different with general graph because in my case, each node can hide edges, and each node have to solve edge informations from itself when restoring.)
The core problem is dependency. All each node is depending on its original state which cannot be accessed while deserializing. But if it doesn't depend on non-existing state, it can be serialized completely. And the only way to guarantee this is giving up general graph structure.
So I decided to switch all my data structure into tree structure. This is giving heavy limitation on structure. Because it doesn't have cyclic dependencies, so all nodes can be restored with fully restored sub-node informations.
And I found this. DAG. http://en.wikipedia.org/wiki/Directed_acyclic_graph
It's just a directed tree with shared children. I think DAG structure is also fine. Because dependencies can be fully resolved too.
DAG has big limitation. The references cannot be cyclic. I have thought about weak reference concept, but it's same on it needs full original state when it deserializes.
This is huge limitation but I decided trade this off with robust deserialization algorithm. I think shared child reference is enough for me. Actually, the loss of robustness is unacceptable to me, so I decided to go this way.
With these stuffs, I think I can serialize/deserialize my data structure completely.
Thanks for the Internet and the Wikipedia.

Related

Does adding PetaPoco attributes to POCO's have any negative side effects?

Our current application uses a smart object style for working with the database. We are looking at the feasibility of moving to PetaPoco instead. Looking over the features I notice you can add attributes to make it easier to CRUD objects. Does adding these attributes have any negative side effects that I should be aware of?
Has anyone found a reason NOT to use these decorators?
Directly to the use of the POCO object instance itself? None.
At least not that I would be aware of. Jon Skeet should be able to provide more info because he knows compiler inner workings through and through, so he knows exactly what happens with this metadata after it's been compiled.
Other implications indirectly related to these
There are of course implications when accessing these declarative attributes, because they're read using reflection which is normally a slow process.
But there's nothing to worry here, because PetaPoco is a smart library and reads these only once then compiles & caches these things, so you only get penalized once then you get blazing performance afterwards. Because it uses compiled code.
Non-performance related implications
By putting attributes (any) on your classes/properties/methods you somehow bind your code to particular engine that will use this class, because they're directives for this particular engine to understand your code.
In case of PetaPoco attributes this means that your class can be used with PetaPoco but not with some other DAL (ie. EF) unless you add attributes of that one as well (EF Code First uses the very same approach with attributes).
The second implication is related to back-end database. In case you rename a table, column or any other part that is provided in your PetaPoco attribute as a constant magic string, you will subsequently have to change this string as well. This just means that you have to be thorough when doing database changes...
One downside is that it breaks the separation between the "domain" layer and the "data" layer, since it introduces the PetaPoco file (which contains data logic) to domain classes that should really not have any knowledge or dependency on the data layer.
If you're doing a single-project MVC app or something then it's okay to just use the Models directory for both, but for non-trivial and separated apps you'll have to have two PetaPoco files or play around with abstracting portions of the file in order to annotate your models without making them "know too much" about the underlying data, or else have you specify the table and/or primary key name all over the place.

Jackson serialization should ignore already included objects

So this seems like a fairly simple answer to a common problem: Infinite loop detected in Jackosn. If, when serializing an object tree, Jackson comes upon an object it has already serialized why doesn't it just ignore it? Is there a way to do this in Jackson, or has someone created something similar?
Why all this mucking around with JsonManagedReference/JsonBackReference, which is completely insufficent if you start serializing child objects (which need a reference to the parent) some of the time and you are serializing parent objects some of the time (which obviously doesn't want the child to refer back to itself)?
It seems like now I have to create custom views that take into account every type of circular reference and use case possible which in any non-trivial ORM is a huge task.
EDIT (October 2012)
Jackson 2.x actual now DOES support identity information handling with #JsonIdentityInfo annotation! So the original answer is bit out of date...
OBSOLETE
Jackson does not support handling of object identity: this is a non-trivial task not so because of identifying shared objects which can be done by traversing object graph (incurring some overhead), but rather in figuring out how to include identity information; which ids to use and how. This in turn is somewhat similar to inclusion of type information, but now adds second dimension of extra wrapping to handle.
Doing this has been requested before, and some thought has gone into figuring out how to do it, but ratio of effort to benefit (i.e. number of requests, how badly it is needed) has been higher than adding other features.
So your best bet is to use wrapper objects and implement this manually, or have a look at XStream which can solve this (when enabled; it adds significant overhead in time) and also has JSON output mode using Jettison.
Implementing this manually for your use case is bit easier than solving the general case: you could start with BeanSerializerModifier to add wrapper handler that can keep track of object identities, and know what to serialize instead as object id.

Communication in component-based game engine

For a 2D game I'm making (for Android) I'm using a component-based system where a GameObject holds several GameComponent objects. GameComponents can be things such as input components, rendering components, bullet emitting components, and so on. Currently, GameComponents have a reference to the object that owns them and can modify it, but the GameObject itself just has a list of components and it doesn't care what the components are as long as they can be updated when the object is updated.
Sometimes a component has some information which the GameObject needs to know. For example, for collision detection a GameObject registers itself with the collision detection subsystem to be notified when it collides with another object. The collision detection subsystem needs to know the object's bounding box. I store x and y in the object directly (because it is used by several components), but width and height are only known to the rendering component which holds the object's bitmap. I would like to have a method getBoundingBox or getWidth in the GameObject that gets that information. Or in general, I want to send some information from a component to the object. However, in my current design the GameObject doesn't know what specific components it has in the list.
I can think of several ways to solve this problem:
Instead of having a completely generic list of components, I can let the GameObject have specific field for some of the important components. For example, it can have a member variable called renderingComponent; whenever I need to get the width of the object I just use renderingComponent.getWidth(). This solution still allows for generic list of components but it treats some of them differently, and I'm afraid I'll end up having several exceptional fields as more components need to be queried. Some objects don't even have rendering components.
Have the required information as members of the GameObject but allow the components to update it. So an object has a width and a height which are 0 or -1 by default, but a rendering component can set them to the correct values in its update loop. This feels like a hack and I might end up pushing many things to the GameObject class for convenience even if not all objects need them.
Have components implement an interface that indicates what type of information they can be queried for. For example, a rendering component would implement the HasSize interface which includes methods such as getWidth and getHeight. When the GameObject needs the width, it loops over its components checking if they implement the HasSize interface (using the instanceof keyword in Java, or is in C#). This seems like a more generic solution, one disadvantage is that searching for the component might take some time (but then, most objects have 3 or 4 components only).
This question isn't about a specific problem. It comes up often in my design and I was wondering what's the best way to handle it. Performance is somewhat important since this is a game, but the number of components per object is generally small (the maximum is 8).
The short version
In a component based system for a game, what is the best way to pass information from the components to the object while keeping the design generic?
We get variations on this question three or four times a week on GameDev.net (where the gameobject is typically called an 'entity') and so far there's no consensus on the best approach. Several different approaches have been shown to be workable however so I wouldn't worry about it too much.
However, usually the problems regard communicating between components. Rarely do people worry about getting information from a component to the entity - if an entity knows what information it needs, then presumably it knows exactly what type of component it needs to access and which property or method it needs to call on that component to get the data. if you need to be reactive rather than active, then register callbacks or have an observer pattern set up with the components to let the entity know when something in the component has changed, and read the value at that point.
Completely generic components are largely useless: they need to provide some sort of known interface otherwise there's little point them existing. Otherwise you may as well just have a large associative array of untyped values and be done with it. In Java, Python, C#, and other slightly-higher-level languages than C++ you can use reflection to give you a more generic way of using specific subclasses without having to encode type and interface information into the components themselves.
As for communication:
Some people are making assumptions that an entity will always contain a known set of component types (where each instance is one of several possible subclasses) and therefore can just grab a direct reference to the other component and read/write via its public interface.
Some people are using publish/subscribe, signals/slots, etc., to create arbitrary connections between components. This seems a bit more flexible but ultimately you still need something with knowledge of these implicit dependencies. (And if this is known at compile time, why not just use the previous approach?)
Or, you can put all shared data in the entity itself and use that as a shared communication area (tenuously related to the blackboard system in AI) that each of the components can read and write to. This usually requires some robustness in the face of certain properties not existing when you expected them to. It also doesn't lend itself to parallelism, although I doubt that's a massive concern on a small embedded system...?
Finally, some people have systems where the entity doesn't exist at all. The components live within their subsystems and the only notion of an entity is an ID value in certain components - if a Rendering component (within the Rendering system) and a Player component (within the Players system) have the same ID, then you can assume the former handles the drawing of the latter. But there isn't any single object that aggregates either of those components.
Like others have said, there's no always right answer here. Different games will lend themselves towards different solutions. If you're building a big complex game with lots of different kinds of entities, a more decoupled generic architecture with some kind of abstract messaging between components may be worth the effort for the maintainability you get. For a simpler game with similar entities, it may make the most sense to just push all of that state up into GameObject.
For your specific scenario where you need to store the bounding box somewhere and only the collision component cares about it, I would:
Store it in the collision component itself.
Make the collision detection code work with the components directly.
So, instead of having the collision engine iterate through a collection of GameObjects to resolve the interaction, have it iterate directly through a collection of CollisionComponents. Once a collision has occurred, it will be up to the component to push that up to its parent GameObject.
This gives you a couple of benefits:
Leaves collision-specific state out of GameObject.
Spares you from iterating over GameObjects that don't have collision components. (If you have a lot of non-interactive objects like visual effects and decoration, this can save a decent number of cycles.)
Spares you from burning cycles walking between the object and its component. If you iterate through the objects then do getCollisionComponent() on each one, that pointer-following can cause a cache miss. Doing that for every frame for every object can burn a lot of CPU.
If you're interested I have more on this pattern here, although it looks like you already understand most of what's in that chapter.
Use an "event bus". (note that you probably can't use the code as is but it should give you the basic idea).
Basically, create a central resource where every object can register itself as a listener and say "If X happens, I want to know". When something happens in the game, the responsible object can simply send an event X to the event bus and all interesting parties will notice.
[EDIT] For a more detailed discussion, see message passing (thanks to snk_kid for pointing this out).
One approach is to initialize a container of components. Each component can provide a service and may also require services from other components. Depending on your programming language and environment you have to come up with a method for providing this information.
In its simplest form you have one-to-one connections between components, but you will also need one-to-many connections. E.g. the CollectionDetector will have a list of components implementing IBoundingBox.
During initialization the container will wire up connections between components, and during run-time there will be no additional cost.
This is close to you solution 3), expect the connections between components are wired only once and are not checked at every iteration of the game loop.
The Managed Extensibility Framework for .NET is a nice solution to this problem. I realize that you intend to develop on Android, but you may still get some inspiration from this framework.

Passing object references needlessly through a middleman

I often find myself needing reference to an object that is several objects away, or so it seems. The options I see are passing a reference through a middle-man or just making something available statically. I understand the danger of global scope, but passing a reference through an object that does nothing with it feels ridiculous. I'm okay with a little bit passing around, I suppose. I suspect there's a line to be drawn somewhere.
Does anyone have insight on where to draw this line?
Or a good way to deal with the problem of distributing references amongst dependent objects?
Use the Law of Demeter (with moderation and good taste, not dogmatically). If you're coding a.b.c.d.e, something IS wrong -- you've nailed forevermore the implementation of a to have a b which has a c which... EEP!-) One or at the most two dots is the maximum you should be using. But the alternative is NOT to plump things into globals (and ensure thread-unsafe, buggy, hard-to-maintain code!), it is to have each object "surface" those characteristics it is designed to maintain as part of its interface to clients going forward, instead of just letting poor clients go through such undending chains of nested refs!
This smells of an abstraction that may need some improvement. You seem to be violating the Law of Demeter.
In some cases a global isn't too bad.
Consider, you're probably programming against an operating system's API. That's full of globals, you can probably access a file or the registry, write to the console. Look up a window handle. You can do loads of stuff to access state that is global across the whole computer, or even across the internet... and you don't have to pass a single reference to your class to access it. All this stuff is global if you access the OS's API.
So, when you consider the number of global things that often exist, a global in your own program probably isn't as bad as many people try and make out and scream about.
However, if you want to have very nice OO code that is all unit testable, I suppose you should be writing wrapper classes around any access to globals whether they come from the OS, or are declared yourself to encapsulate them. This means you class that uses this global state can get references to the wrappers, and they could be replaced with fakes.
Hmm, anyway. I'm not quite sure what advice I'm trying to give here, other than say, structuring code is all a balance! And, how to do it for your particular problem depends on your preferences, preferences of people who will use the code, how you're feeling on the day on the academic to pragmatic scale, how big the code base is, how safety critical the system is and how far off the deadline for completion is.
I believe your question is revealing something about your classes. Maybe the responsibilities could be improved ? Maybe moving some code would solve problems ?
Tell, don't ask.
That's how it was explained to me. There is a natural tendency to call classes to obtain some data. Taken too far, asking too much, typically leads to heavy "getter sequences". But there is another way. I must admit it is not easy to find, but improves gradually in a specific code and in the coder's habits.
Class A wants to perform a calculation, and asks B's data. Sometimes, it is appropriate that A tells B to do the job, possibly passing some parameters. This could replace B's "getName()", used by A to check the validity of the name, by an "isValid()" method on B.
"Asking" has been replaced by "telling" (calling a method that executes the computation).
For me, this is the question I ask myself when I find too many getter calls. Gradually, the methods encounter their place in the correct object, and everything gets a bit simpler, I have less getters and less call to them. I have less code, and it provides more semantic, a better alignment with the functional requirement.
Move the data around
There are other cases where I move some data. For example, if a field moves two objects up, the length of the "getter chain" is reduced by two.
I believe nobody can find the correct model at first.
I first think about it (using hand-written diagrams is quick and a big help), then code it, then think again facing the real thing... Then I code the rest, and any smells I feel in the code, I think again...
Split and merge objects
If a method on A needs data from C, with B as a middle man, I can try if A and C would have some in common. Possibly, A or a part of A could become C (possible splitting of A, merging of A and C) ...
However, there are cases where I keep the getters of course.
But it's less likely a long chain will be created.
A long chain will probably get broken by one of the techniques above.
I have three patterns for this:
Pass the necessary reference to the object's constructor -- the reference can then be stored as a data member of the object, and doesn't need to be passed again; this implies that the object's factory has the necessary reference. For example, when I'm creating a DOM, I pass the element name to the DOM node when I construct the DOM node.
Let things remember their parent, and get references to properties via their parent; this implies that the parent or ancestor has the necessary property. For example, when I'm creating a DOM, there are various things which are stored as properties of the top-level DomDocument ancestor, and its child nodes can access those properties via the reference which each one has to its parent.
Put all the different things which are passed around as references into a single class, and then pass around just that one class instance as the only thing that's passed around. For example, there are many properties required to render a DOM (e.g. the GDI graphics handle, the viewport coordinates, callback events, etc.) ... I put all of these things into a single 'Context' instance which is passed as the only parameter to the methods of the DOM nodes to be rendered, and each method can get whichever properties it needs out of that context parameter.

Reading a pointer from XML without being sure the relevant Obj-C instance exists

I have a "parent" Obj-C object containing (in a collection) a bunch of objects whose instance variables point to one another, possibly circularly (fear not, no retaining going on between these "siblings"). I write the parent object to XML, which of course involves (among other things) writing out its "children", in no particular order, and due to the possible circularity, I replace these references between the children with unique IDs that each child has.
The problem is reading this XML back in... as I create one "child", I come across an ID, but there's no guarantee the object it refers to has been created yet. Since the references are possibly circular, there isn't even an order in which to read them that solves this problem.
What do I do? My current solution is to replace (in the actual instance variables) the references with strings containing the unique IDs. This is nasty, though, because to use these instance variables, instead of something like [oneObject aSibling] I now have to do something like [theParent childWithID:[oneObject aSiblingID]]. I suppose I could create an aSibling method to simplify things, but it feels like there's a cleaner way than all this. Is there?
This sounds an awful lot like you are re-inventing NSCoding as it handles circular references, etc... Now, there might be a good reason to re-invent that wheel. Only you can answer that question.
In any case, sounds like you want a two pass unarchival process.
Pass 1: Grab all the objects out of the backing store and reconstitute. As each object comes out, shove it in a dictionary or map with the UID as the key. Whenever an object contains a UID, register the object as needing to be fixed up; add it to a set or array that you keep around during unarchival.
Pass 2: Walk the set or array of objects that need to be fixed up and fix 'em up, replacing the UIDs with objects from the map you built in pass #1.
I hit a bit of parse error on that last paragraph. Assuming your classes are sensibly declared, they ought to be able to repair themselves on the fly.
(All things considered, this is exactly the kind of data structure that is much easier to implement in a GC'd environment. If you are targeting Mac OS X, not the iPhone, turning on GC is going to make your life easier, most likely)
Java's serialization process does much the same thing. Every object it writes out, it puts in a 'previously seen objects' table. When it comes to writing out a subsequent reference, if it's seen the object before, then it writes out a code which indicates that it's a previously seen object from the list. When the reverse occurs, whenever it sees such a reference, it replaces it on the fly with the instance before.
That approach means that you don't have to use this map for all instances, but rather the substitution happens only for objects you've seen a second time. However, you still need to be able to uniquely reference the first instance you've got written, whether by some pointer to a part in the data structure or not is dependent on what you're writing.