Code design: Who's responsible for changing object data? - oop

Assuming I have some kind of data structure to work on (for example images) which I want to pre- and postprocess in different ways to make further processing steps easier. What's the best way to implement this responsibility with an OOP language like C++?
Further assuming I have a lot of different processing algorithms with inherent complexity I very likely want to encapsulate them in dedicated classes. This means though that the algorithm implementations externally have to set some kind of info in my data to indicate it having been processed. And that also doesn't look like clean design to me because having been processed seems like an info associated with the data and thus something the data object itself should determine and set on its own.
It also looks like a very common source of error in complex applications: Someone implements another processing algorithm, forgets to set the flags in the data appropriately, something in completely different parts of the application won't work as expected and someone will have lots of fun spotting the error.
Can someone outline a general structure of a good and fail-save way to implement sth like this?

To make sure I understand what you are asking, here are my assumptions based on my reading of the question:
The data is some kind of binary format (presumably an image but as you say it could be anything) that can be represented as an array of bytes
There are a number of processing steps (I'll refer to them as transformations) that can be applied to the data
Some transformations depend on other such that, for example, you would like to avoid applying a transformation if its pre-requisite has not been applied. You would like it to be robust, so that attempting to apply an illegal transformation will be detected and prevented.
And the question is how to do this in an object-oriented way that avoids future bugs as the complexity of the program increases.
One way is to have the image data object, which encapsulates both the binary data and a record of the transformations that have been applied to it, be responsible for executing the transformation through a Transformation object delegate; and the Transformation objects implement both the processing algorithm and the knowledge of whether it can be applied based on previous transformations.
So you might define the following (excuse my Java-like naming style; it's been a long time since I've done C++):
An enumerated type called TransformationType
An abstract class called Transformer, with the following methods:
A method called 'getType' which returns a TransformationType
A method called 'canTransform' that accepts a list of TransformationType and returns a boolean. The list indicates transformations that have already been applied to the data, and the boolean indicates whether it is OK to execute this transformation.
A method called 'transform' that accepts an array of bytes and returns an array of (presumably modified) bytes
A class called BinaryData, containing a byte array and a list of TransformationType. This class implements the method 'void transform(Transformer t)' to do the following:
Query the transformer's 'canTransform' method, passing the list of transformation types; either throw an exception or return if canTransform returns false
Replace he byte array with the results of invoking t.transform(data)
Add the transfomer's type to the list
I think this accomplishes what you want - the image transformation algorithms are defined polymorphically in classes, but the actual application of the transformations is still 'controlled' by the data object. Hence we do not have to trust external code to do the right thing wrt setting / checking flags, etc.

Related

How do you call an object which state can be completely described by its string representation?

Is there a name in the OOP world to refer to such objects? For example, in java
"Word".toString();
Will return an output of Word. This is a string representation of the entity that exists currently in the program.
Some more examples can be accomplished with other datatypes like Doubles, Integers, maybe even lists or different data structures.
And some other more complex that cannot be represented in this way, for example a full fledged RESTful service class might not have a string representation of its current state.
What's the right terminology? native? immutable? those 2 last terms doesn't really reflect this definition.
To expand on the question:
Imagine you have a function/method that converts a string to a map, a string could be {key1=value1,key2=value2} and you would get a map back, this doesn't work for some complex objects, how would you describe the parameters of this function if you were to generalize it's use for other simple object types?
You have an abstract object that consists of internal state.
You have one or more concrete representations of that object's state.
In one case the concrete representation is a chunk of memory containing primitives and references to other component objects on the heap (in Java, other languages may be different).
You have a different representation that is amenable to being stored in a contiguous block of characters or bytes, and possibly transmitted over a network.
Both representations are canonically equivalent given equivalent contexts containing their non-state information (methods, class hierarchy, etc), but they serve different purposes.
Generically, this could be called a "change of representation". When the first representation above is converted to the second it's called "serialization", and the reverse process is "deserialization". Note that you could have many different representations fulfilling different requirements and supporting different functionality.
One important point to note is that in both cases, in-memory and "serialized" (and any other representations), if an object's state contains references to other objects, then the entire "state" consists of that object and all the objects that can be reached from it, and objects reachable from those objects, etc. This is known as an "object graph", and it exists equally in all representations.
As to deciding which one you should or shouldn't use, that depends totally on your processing requirements.
for example a full fledged RESTful service class might not have a string representation of its current state
This is incorrect, you can always define a serialized representation of an object's state. It may be inconvenient to do so, but if it is required it can be done.
Imagine you have a function/method that converts a string to a map, a string could be {key1=value1,key2=value2} and you would get a map back, this doesn't work for some complex objects
Again, it can always be made to work if it is a requirement, as long as the cost of doing so is justified.
In summary, everything is a representation, and you can arrange to transform one representation to any other and back again, without loss, assuming you're willing to incur the costs of doing so. As mentioned above, one factor is the cost of representing not just the single object, but the entire object graph, which can be substantial.

OOP and object parametrization

I am supposed to develop a program, which will heavily depend on input data at runtime (data for initialization, read from XML) and I would like to ask for good OOP practice regarding object/architecture design.
Situation
I have the following objects, object_A, object_B, object_C, each of them has a specified objective.
object_A = evaluation of equations, requires input, produces output
object_B = evaluation of equations, requires input, produces output
object_C = requires data from object_A and object_B as input, produces output
Then there is object_D, which passes the data and calls functions among these objects_A/B/C.
There are 2 ways to tackle this situation that I know of :
a) Inheritance
object_D inherits from object_A, object_B, object_C. Data are passed by appointing appropriate structures in objects_A/_B/_C using "this->", virtual functions in objects_A/_B/_C can then call back to object_D.
hierarchical approach
objects are concealed
difficult to parametrize the object_A/_B/_C (parameters need to travel all the way up in the hierarchy to base classes)
b) Passing pointers
Create object_A/_B/_C, by passing parameters in constructor. Then pass pointers of these objects to constructor of object_D.
no information hiding, all objects are visible
hierarchy might be unclear, especially when there are more levels
easy to pass initialization parameters
Question
What is an appropriate way of handling software architecture, where many objects require passing initialization parameters at runtime?
I think your question is broad and can have more than one good answer. However, I think your scenario can be solved in one of two ways:
Eventing: Instead of tightly coupling your classes using inheritance, you can use events. For instance when Object A finishes processing it raise an event called 'ClassAFinished'. Then you have to create an event handler for ClassAFinish Event that will in turns pass objectA's output to other objects that rely on Object A output.
Second way is Chain of Responsibility design pattern. Since your question is related to OOP I think it's reasonable to use this design pattern. In a nutshell Chain of Responsibility is a design pattern that you use it when you have a series (chain) of objects, each of which will do specific processing (responsibility), but each one of them can't begin processing until it received data from the previous object. When it finishes processing it'll send its output to the next object and so forth.
These are 2 main ideas that I wanted to share with you.

what does "serialization provides a method for detecting changes in time varying data." mean?

I have read many articles regarding the serialization and its ability to update fetched/stored data in time-varying!
Actually, I don't understand how does this happen?
Could anyone please give me a short explanation about this sentence:
serialization provides a method for detecting changes in time varying data
It looks like you're quoting from the Wikipedia page on Serialization:
Uses
[...]
A method for detecting changes in time-varying data.
The answer is in the paragraph at the end of that section:
Since both serializing and deserializing can be driven from common code (for example, the Serialize function in Microsoft Foundation Classes), it is possible for the common code to do both at the same time, and thus, 1) detect differences between the objects being serialized and their prior copies, and 2) provide the input for the next such detection. It is not necessary to actually build the prior copy because differences can be detected on the fly. The technique is called differential execution. It is useful in the programming of user interfaces whose contents are time-varying — graphical objects can be created, removed, altered, or made to handle input events without necessarily having to write separate code to do those things.
In other words, if you serialize an object O at time T0 and keep a copy of it, then in the future, at time T1, you can deserialize the object from the backup copy at T0 and compare it to the live-running object at T1, or, alternatively, serialize the same object O at the new time T1 and compare the two serialized versions for differences.

Is "serialisation without duplication" possible in c++0x?

One of the big uses of code generation in c++ is to support message serialisation. Typically, you want to support specifying message contents and layout in the same step and produce code for that message type that can give you objects capable of being serialised to/from communication streams. In the past, this has usually resulted in code that looks like:
class MyMessage : public SerialisableObject
{
// message members
int myNumber_;
std::string myString_;
std::vector<MyOtherSerialisableObject> aBunchOfThingsIWantToSerialise_;
public:
// ctor, dtor, accesors, mutators, then:
virtual void Serialise(SerialisationStream & stream)
{
stream & myNumber_;
stream & myString_;
stream & aBunchOfThingsIWantToSerialise_;
}
};
The problem with using this kind of design is that violates an important rule of good architecture: you should not have to specify the intent of a design twice. Duplication of intent, like duplicated code and other common development duplication, leaves room for one place in the code to become divergent with the other, causing errors.
In the above, the duplication is the list of members. Potential errors include adding a member to the class but forgetting to add it to the serialisation list, serialising a member twice (possibly by not using the same order as the member declaration or possibly due to a misspelling of a similar member, among other ways), or serialising something that is not a member (which might produce a compiler error, unless name lookup finds something at a different scope than the object that matches lookup rules). That kind of mistake is the same reason we no longer try to match every heap allocation with a delete (instead using smart pointers) or ever file open with a close (using RAII ctor//dtor mechanisms) - we don't want to have to match up our intent in multiple places because there are times we - or another engineer less familiar with the intent - make mistakes.
Generally, therefore, this has been one of the things that code generation could take care of. You might create a file MyMessage.cg to specify both layout and members in one step
serialisable MyMessage
{
int myNumber_;
std::string myString_;
std::vector<MyOtherSerialisableObject> aBunchOfThingsIWantToSerialise_;
};
that would be run through a code generation utility and produce the code.
I was wondering if it was possible yet to do this in c++0x without external code generation. Are there any new language mechanisms that make it possible to specify a class as serialisable once, and the names and layout of it's members are used to layout the message during serialisation?
To be clear, I know that there are tricks with boost tuples and fusion that can come close to this kind of behavior even in the pre-c++0x language. Those usages, though, being based on indexing into the tuple rather than by-member-name access, have all been brittle to changing the layout, as other places in the code that access the messages would then also need to be reordered. Some kind of by-member-name access is necessary to not have to duplicate the layout specification in places in the code that use the messages.
Also, I know it might be nice to take this up to the next level and ask for specifying when some of the members shouldn't be serialised. Other languages that offer serialisation built in often offer some kind of attribute to do this, so
int myNonSerialisedNumber_ [[noserialise]];
might seem natural. However, I personally think it is bad design to have serialisable objects where everything is not serialised, since the lifetime of messages is in the transport to/from the communications layer, separate from other data lifetimes. Also, you could have an object which has a purely serialisable as on of it's members, so such functionality doesn't by anything the language doesn't already offer.
Is this possible? Or did the standards committee leave out this kind of introspective capability? I don't need it to look like the code gen file above - any simple method for compiletime specification of layout and members in a single step would solve this common problem.
This is both possible and practical in C++11 – in fact it was possible back in C++03, the syntax was just a little too unwieldy. I wrote a small library based around the same idea - see the following:
www.github.com/molw5/framework
Sample syntax:
class Object : serializable <Object,
value <NAME(“Field 1”), int>,
value <NAME(“Field 2”), float>,
value <NAME(“Field 3”), double>>
{
};
Most of the underlying code could be reproduced, in principal, in C++03 – some of the implementation details without variadic templates would have been...tricky, but I believe it would have been possible to recover the core functionality. What you could not reproduce in C++03 was the NAME macro above and the syntax relies fairly heavily on it. The macro provides the machinery necessary to generate a unique typename from a string, that is the following:
NAME(“Field 1”)
expands to
type_string <'F', 'i', 'e', 'l', 'd', ' ', '1'>
through the use of some common macros and constexpr (for character extraction). Back in C++03 something similar to the type_string above would need to be entered manually.
C++, of any form, supports neither introspection nor reflection (to the extent that they are different).
One nice thing about doing serialization manually (ie: without introspection or reflection) is that you can provide object versioning. You can support older forms of the serialization, and simply create reasonable defaults for the data that wasn't in the old versions. Or if a new version removes some data, you can simply serialize and discard it.
It seems to me that what you need is Boost.Serialization.

VB.NET: is using Structures considered nasty?

I use to use Structures quite a lot in the VB6 days, and try to avoid them now with .NET. Just wondering if using structures in 2010 instead of a Class is considered nasty?
Thanks for the help.
Choosing a Structure takes consideration instead of being inherently "nasty". There are reasons why a Structure can be nasty; however there are also reasons a Class can be nasty in its own way...
Basically when you decide between these two object oriented kinds of containers, you're deciding how memory will be used.
There are different semantics associated with Structure and Class in VB.NET and they represent different memory usage patterns.
By creating a Class you're creating a reference type.
good for large data
memory contains a reference to the object location on the heap (like the concept of pointing to an object) though happens transparently to the VB.NET programmer because you're in "managed mode".
By creating a Structure you're creating a value type.
good for small data
memory allocated contains the actual value
be judicious because these are apt to get pushed on the stack area of memory (i.e. for local vars, but not class fields) - too large and you could run into stack issues.
Also some good video resources on YouTube if you're an audio learner.
Many articles on the Internet like these MSDN articles to teach the basics and details:
Value Types and Reference Types
7.1 Types - Reference and Value
MSDN Type Fundamentals - subheading: Reference and Value Types
Example
Structures exist because in some scenarios they make more sense than classes. They are particular useful for representing small abstract data types such as 3D points, latitude-longitude, rational numbers, etc.
The basic motivation for using structs is to avoid GC pressure. Since structs live inline (on the stack or inside whatever container you put them in) rather than on the heap, they typically result in far fewer small allocations, which makes a huge difference if you need to hold an array of one million points or rationals.
A key issue to watch out for is that structs are value types, and thus are generally passed around by value (the obvious exception being ref and out parameters). This has important implications. For instance:
Point3D[] points = ...;
points[9].Move(0, 0, 5);
The above code works fine, and increases the z coordinate of the 10th point by 5. The following code, however:
List<Point3D> points = ...;
points[9].Move(0, 0, 5);
Will still compile and run, but you will find that the z coordinate of the 10th point remains unchanged. This is because the List's index operator returns a copy of the point, and it is the copy that you are calling Move on.
The solution is quite simple. Always make structs immutable by marking all fields readonly. If you still need to Move points around, define + on the Point3D type and use assignment:
points[9] = points[9] + new Point3D(0, 0, 5);
It's considered pretty bad to use anything without understanding the implications.
Structures are value types, not reference types - and as such, they behave slightly differently. When you pass a value type, modifications are on a copy, not on the original. When you assign a value type to an object reference (say, in a non-generic list), boxing occurs. Make sure you read up on the full effect of choosing one over the other.
Read this for some understanding benefits of structures vs classes and vice-versa.
A structure can be preferable when:
You have a small amount of data and simply want the equivalent of the UDT
(user-defined type) of previous versions of Visual Basic
You perform a large number of operations on each instance and would incur
performance degradation with heap management
You have no need to inherit the structure or to specialize
functionality among its instances
You do not box and unbox the structure
You are passing blittable data across a managed/unmanaged boundary
A class is preferable when:
You need to use inheritance and polymorphism
You need to initialize one or more members at creation time
You need to supply an unparameterized constructor
You need unlimited event handling support
To answer your question directly, there is nothing inherantly wrong with using a structure in VB.NET. As with any design decision you do need to consider the consequences of this decision.
It's important that you're aware of the difference between a class and a structure so that you can make an educated decision about which is appropriate. As stated by Alex et al, one of the key differences between a structure and a class is that a structure is considered a value type and a class is considered a reference type.
Reference types use copy-by-reference sematics, this means that when an object is created or copied, only a pointer to the actual object is allocated on the stack, the actual object data is allocated on the heap.
In contrast, value types have copy-by-value sematics which means that each time a value type (e.g. a structure) is copied, then the entire object is copied to a new location on the stack/
For objects with a small amount of data, this isn't really a problem, but if you have a large amount of data then using a reference type will likely be less expensive in terms of stack allocations because only a pointer will be copied to the stack.
Microsoft have guidelines on the use of structures that more precisely describe the differences between classes and structures and the consequences of choosing one over the other
From a behavioral standpoints, there are three types of 'things' in .net:
Mutable reference types
Value types which can be mutated without being entirely replaced
Immutable reference and value types
Eric Lippert really dislikes group #2 above, since .net isn't terribly good at handling them, and sometimes treats them as though they're in group #1 or #3. Nonetheless, there are times when mutable value types make more sense semantically than would anything else.
Suppose, for example, that one has a rectangle and one wants to make another rectangle which is like the first one, but twice as tall. It is IMHO cleaner to say:
Rect2 = Rect1 ' Makes a new Rectangle that's just like Rect1
Rect2.Height = Rect2.Height*2
than to say either
Rect2 = Rect1.Clone ' Would be necessary if Rect1 were a class
Rect2.Height = Rect2.Height*2
or
Rect2 = New Rectangle(Rect1.Left, Rect1.Top, Rect1.Width, Rect1.Height*2)
When using classes, if one wants an object that's slightly different from an existing object, one must consider before mutating the object whether anyone else might want to use the original; if so, one must make a copy of it and then make the desired changes to the copy. With structs, there's no such restriction.
A simple way to think of value types is to regard every assignment operation as making a clone of the original, but in a way that's considerably cheaper than cloning a class type. If one would end up cloning a lot of objects as often as one would assign references without cloning, that's a substantial argument in favor of structs.