How do you call an object which state can be completely described by its string representation? - oop

Is there a name in the OOP world to refer to such objects? For example, in java
"Word".toString();
Will return an output of Word. This is a string representation of the entity that exists currently in the program.
Some more examples can be accomplished with other datatypes like Doubles, Integers, maybe even lists or different data structures.
And some other more complex that cannot be represented in this way, for example a full fledged RESTful service class might not have a string representation of its current state.
What's the right terminology? native? immutable? those 2 last terms doesn't really reflect this definition.
To expand on the question:
Imagine you have a function/method that converts a string to a map, a string could be {key1=value1,key2=value2} and you would get a map back, this doesn't work for some complex objects, how would you describe the parameters of this function if you were to generalize it's use for other simple object types?

You have an abstract object that consists of internal state.
You have one or more concrete representations of that object's state.
In one case the concrete representation is a chunk of memory containing primitives and references to other component objects on the heap (in Java, other languages may be different).
You have a different representation that is amenable to being stored in a contiguous block of characters or bytes, and possibly transmitted over a network.
Both representations are canonically equivalent given equivalent contexts containing their non-state information (methods, class hierarchy, etc), but they serve different purposes.
Generically, this could be called a "change of representation". When the first representation above is converted to the second it's called "serialization", and the reverse process is "deserialization". Note that you could have many different representations fulfilling different requirements and supporting different functionality.
One important point to note is that in both cases, in-memory and "serialized" (and any other representations), if an object's state contains references to other objects, then the entire "state" consists of that object and all the objects that can be reached from it, and objects reachable from those objects, etc. This is known as an "object graph", and it exists equally in all representations.
As to deciding which one you should or shouldn't use, that depends totally on your processing requirements.
for example a full fledged RESTful service class might not have a string representation of its current state
This is incorrect, you can always define a serialized representation of an object's state. It may be inconvenient to do so, but if it is required it can be done.
Imagine you have a function/method that converts a string to a map, a string could be {key1=value1,key2=value2} and you would get a map back, this doesn't work for some complex objects
Again, it can always be made to work if it is a requirement, as long as the cost of doing so is justified.
In summary, everything is a representation, and you can arrange to transform one representation to any other and back again, without loss, assuming you're willing to incur the costs of doing so. As mentioned above, one factor is the cost of representing not just the single object, but the entire object graph, which can be substantial.

Related

Behavior of components when structures are in an array

I am currently working on the simulation of a physical system in Fortran90 with something like 50 millions particles. Each has a position x (to simplify).
For now, I am using a 1D vector that contains the position of each particle. And when I have to iterate on every particle, I just go through that vector (as I took care to sort the particles to limit cache misses).
I am now considering creating a particle class. But what about the access to its position as I iterate ? Will it be as fast as the previous case ?
So, what does the compiler do to store the attributes of an object ? And a fortiori, what about the case with more than one attributes?
Thank you for your time.
On "how are derived types stored":
Fortran Standard requires components of a sequence type to be stored (in memory) as a sequence of contiguous storage, in components' declaration order. Sequence types are those declared with a SEQUENCE statement, which implies that the type shall have at least one component, each component shall be of an intrinsic or sequence type, shall not be a parameterized or extensible type, and can't have type-bound procedures. If you want this behavior and your type is suitable, make it a sequence type (you may take data alignment into consideration).
On the other hand, Fortran Standard does not state how compilers have to organize storage for non-sequence derived types. That's not bad at all, as compilers are free to optimize storage. Most of times, you may expect almost the same as sequence types: things stored contiguously whenever posible (padding may apply). Arrays and strings are always contiguous. Pointer and allocatable components are a only reference, for obvious reasons, and their targets lay somewhere else.
From the Standard:
A structure resolves into a sequence of components. Unless the
structure includes a SEQUENCE statement, the use of this terminology
in no way implies that these components are stored in this, or any
other, order. Nor is there any requirement that contiguous storage be
used. The sequence merely refers to the fact that in writing the
definitions there will necessarily be an order in which the components
appear, and this will define a sequence of components. This order is
of limited significance because a component of an object of derived
type will always be accessed by a component name except in the
following contexts: the sequence of expressions in a derived-type
value constructor, intrinsic assignment, the data values in namelist
input data, and the inclusion of the structure in an input/output list
of a formatted data transfer, where it is expanded to this sequence of
components. Provided the processor adheres to the defined order in
these cases, it is otherwise free to organize the storage of the
components for any nonsequence structure in memory as best suited to
the particular architecture.
On "Is it faster to have a derived type than independent arrays":
As #VladmirF said in comment, its a broad topic, depends highly on how are you accessing and operating your data, and has been asked and answered before (Check links on its comment). You may find a lot about it arround (link1, link2) and I'll add this one on "cache blocking" thay may interest you.

Code design: Who's responsible for changing object data?

Assuming I have some kind of data structure to work on (for example images) which I want to pre- and postprocess in different ways to make further processing steps easier. What's the best way to implement this responsibility with an OOP language like C++?
Further assuming I have a lot of different processing algorithms with inherent complexity I very likely want to encapsulate them in dedicated classes. This means though that the algorithm implementations externally have to set some kind of info in my data to indicate it having been processed. And that also doesn't look like clean design to me because having been processed seems like an info associated with the data and thus something the data object itself should determine and set on its own.
It also looks like a very common source of error in complex applications: Someone implements another processing algorithm, forgets to set the flags in the data appropriately, something in completely different parts of the application won't work as expected and someone will have lots of fun spotting the error.
Can someone outline a general structure of a good and fail-save way to implement sth like this?
To make sure I understand what you are asking, here are my assumptions based on my reading of the question:
The data is some kind of binary format (presumably an image but as you say it could be anything) that can be represented as an array of bytes
There are a number of processing steps (I'll refer to them as transformations) that can be applied to the data
Some transformations depend on other such that, for example, you would like to avoid applying a transformation if its pre-requisite has not been applied. You would like it to be robust, so that attempting to apply an illegal transformation will be detected and prevented.
And the question is how to do this in an object-oriented way that avoids future bugs as the complexity of the program increases.
One way is to have the image data object, which encapsulates both the binary data and a record of the transformations that have been applied to it, be responsible for executing the transformation through a Transformation object delegate; and the Transformation objects implement both the processing algorithm and the knowledge of whether it can be applied based on previous transformations.
So you might define the following (excuse my Java-like naming style; it's been a long time since I've done C++):
An enumerated type called TransformationType
An abstract class called Transformer, with the following methods:
A method called 'getType' which returns a TransformationType
A method called 'canTransform' that accepts a list of TransformationType and returns a boolean. The list indicates transformations that have already been applied to the data, and the boolean indicates whether it is OK to execute this transformation.
A method called 'transform' that accepts an array of bytes and returns an array of (presumably modified) bytes
A class called BinaryData, containing a byte array and a list of TransformationType. This class implements the method 'void transform(Transformer t)' to do the following:
Query the transformer's 'canTransform' method, passing the list of transformation types; either throw an exception or return if canTransform returns false
Replace he byte array with the results of invoking t.transform(data)
Add the transfomer's type to the list
I think this accomplishes what you want - the image transformation algorithms are defined polymorphically in classes, but the actual application of the transformations is still 'controlled' by the data object. Hence we do not have to trust external code to do the right thing wrt setting / checking flags, etc.

Why do we use serialization?

Why do we need to use serialization?
If we want to send an object or piece of data through a network we can use streams of bytes. If we want to save some data to the disk, we can again use the binary mode along with the byte streams and save it.
So what's the advantage of using serialization?
Technically on the low-level, your serialized object will also end up as a stream of bytes on your cable or your filesystem...
So you can also think of it as a standardized and already available way of converting your objects to a stream of bytes. Storing/transferring object is a very common requirement, and it has less or little meaning to reinvent this wheel in every application.
As other have mentioned, you also know that this object->stream_of_bytes implementation is quite robust, tested, and generally architecture-independent.
This does not mean it is the only acceptable way to save or transfer an object: in some cases, you'll have to implement your own methods, for example to avoid saving unnecessary/private members (for example for security or performance reasons). But if you are in a simple case, you can make your life easier by using the serialization/deserialization of your framework, language or VM instead of having to implement it by yourself.
Hope this helps.
Quoting from Designing Data Intensive Applications book:
Programs usually work with data in (at least) two different
representations:
In memory, data is kept in objects, structs, lists, arrays, hash tables, trees, and so on. These data structures are optimized for
efficient access and manipulation by the CPU (typically using
pointers).
When you want to write data to a file or send it over the network, you have to encode it as some kind of self-contained sequence of bytes
(for example, a JSON document). Since a pointer wouldn’t make sense to
any other process, this sequence-of-bytes representation looks quite
different from the data structures that are normally used in memory.
Thus, we need some kind of translation between the two
representations. The translation from the in-memory representation to
a byte sequence is called encoding (also known as serialization or
marshalling), and the reverse is called decoding (parsing,
deserialization, unmarshalling).
Among other reasons to be compatible between architecture. An integer doesn't have the same number of bytes from one architecture to another, and sometimes from one compiler to another.
Plus what you're talking about is still serialization. Binary Serialization. You're putting all the bytes of your object together in order to store them and be able to reconvert them as an object later. This is serializing.
More info on wikipedia
Serialization is the process of converting an object into a stream so that it can be saved in any physical file like (XML) or can be saved in Database. The main purpose of Serialization in C# is to persist an object and save it in any specified storage medium like stream, physical file or DataBase.
In General, serialization is a method to persist an object's state, but i suggest you to read this wiki page, it is pretty detailed and correct in my opinion:
http://en.wikipedia.org/wiki/Serialization
In serialization, the point is not turning an object into bits and bytes, objects ARE bits and bytes already. Serialization is the process of making the object's "state" persistent. Notice the word "state", which means the values of the instance variables of the entire object graph (the target object and all the objects it references either directly or indirectly) WITHOUT methods and other extra runtime stuff stuck to them (and of course plus a little more info that JVM needs for restoration of these objects, such as their class types).
So this is the main reason of its necessity: Storing the whole bytes of objects would be expensive, and for all intents and purposes, unnecessary.

Serialization vs. Archiving?

The iOS docs differentiate between "serializing" and "archiving." Is this a general distinction (i.e., holds in other languages) or is it specific to Objective-C? Also, what is the difference between these two?
This is a case of one being the other some (but not all) of the time.
Wikipedia has this to say about serialization:
"Serialization is the process of converting a data structure or object into a sequence of bits so that it can be stored in a file or memory buffer, or transmitted across a network connection link to be "resurrected" later in the same or another computer environment"
So, archiving may only be serialization, but it could also be the combination of serialization and compresssion, for example. Or perhaps it adds some kind of header info. So serialization is a form of archive, but an archive is not necessarily a serialization.
This isn't really specific to iOS - these terms are thrown around all over. Their specific meaning in the context of iOS could be quite specific, though.
I was actually trying to look for their difference from IOS perspective. Adding the following for people interested :
Purpose:
Archiving is used to store object graphs. complete data model can be archived and restored easily. The way Nib files work can be considered as example for archiving.
Serialization is used for storing arbitrary hierarchy of objects.
The wat plist files work can be considered as example fo serializations.
Differences(excerpts from Archives programing guide):
"The archive preserves the identity of every object in the graph and all the relationships it has with all the other objects in the graph."
Every object encoded within the context of rootObject invocation is tracked. If the coder is asked to encode an object more than once, the coder encodes a reference to the first encoding instead of encoding the object again.
"The serialization only preserves the values of the objects and their position in the hierarchy. Multiple references to the same value object might result in multiple objects when deserialized. The mutability of the objects is not maintained."
Implementation differences:
Any object that implements NSCoding protocol can be archived where as Only instances of NSArray, NSDictionary, NSString, NSDate, NSNumber, and NSData (and some of their subclasses) can be serialized. The contents of array and dictionary objects must also contain only objects of these few classes.
When to Use:
property lists(serialization) should be used for data that consists primarily of strings and numbers. They are very inefficient when used with large blocks of binary data.
It is worthy to Archive objects other than plist objects or storing large blocks of data.
Generally speaking, Serialization is concerned with converting your program data types into architecture independent byte streams. Archiving is specialized serialization in that you could store type and other relationship based information that allow you to unserialize/unmarshall easily. So archival can be thought of as a specialization and subset of Serialization. For Objective-C
Serialization converts Objective-C
types to and from an
architecture-independent byte stream.
In contrast to archiving, basic
serialization does not record the data
type of the values nor the
relationships between them; only the
values themselves are recorded. It is
your responsibility to deserialize the
data in the proper order. Several
convenience classes, however, do
provide the ability to serialize
property lists, recording their
structure along with their values.
With C++ boost serialization --
http://www.boost.org/doc/libs/1_45_0/libs/serialization/doc/index.html
Here, we use the term "serialization"
to mean the reversible deconstruction
of an arbitrary set of C++ data
structures to a sequence of bytes.
Such a system can be used to
reconstitute an equivalent structure
in another program context. Depending
on the context, this might used
implement object persistence, remote
parameter passing or other facility.
In this system we use the term
"archive" to refer to a specific
rendering of this stream of bytes.
This could be a file of binary data,
text data, XML, or some other created
by the user of this library.

VB.NET: is using Structures considered nasty?

I use to use Structures quite a lot in the VB6 days, and try to avoid them now with .NET. Just wondering if using structures in 2010 instead of a Class is considered nasty?
Thanks for the help.
Choosing a Structure takes consideration instead of being inherently "nasty". There are reasons why a Structure can be nasty; however there are also reasons a Class can be nasty in its own way...
Basically when you decide between these two object oriented kinds of containers, you're deciding how memory will be used.
There are different semantics associated with Structure and Class in VB.NET and they represent different memory usage patterns.
By creating a Class you're creating a reference type.
good for large data
memory contains a reference to the object location on the heap (like the concept of pointing to an object) though happens transparently to the VB.NET programmer because you're in "managed mode".
By creating a Structure you're creating a value type.
good for small data
memory allocated contains the actual value
be judicious because these are apt to get pushed on the stack area of memory (i.e. for local vars, but not class fields) - too large and you could run into stack issues.
Also some good video resources on YouTube if you're an audio learner.
Many articles on the Internet like these MSDN articles to teach the basics and details:
Value Types and Reference Types
7.1 Types - Reference and Value
MSDN Type Fundamentals - subheading: Reference and Value Types
Example
Structures exist because in some scenarios they make more sense than classes. They are particular useful for representing small abstract data types such as 3D points, latitude-longitude, rational numbers, etc.
The basic motivation for using structs is to avoid GC pressure. Since structs live inline (on the stack or inside whatever container you put them in) rather than on the heap, they typically result in far fewer small allocations, which makes a huge difference if you need to hold an array of one million points or rationals.
A key issue to watch out for is that structs are value types, and thus are generally passed around by value (the obvious exception being ref and out parameters). This has important implications. For instance:
Point3D[] points = ...;
points[9].Move(0, 0, 5);
The above code works fine, and increases the z coordinate of the 10th point by 5. The following code, however:
List<Point3D> points = ...;
points[9].Move(0, 0, 5);
Will still compile and run, but you will find that the z coordinate of the 10th point remains unchanged. This is because the List's index operator returns a copy of the point, and it is the copy that you are calling Move on.
The solution is quite simple. Always make structs immutable by marking all fields readonly. If you still need to Move points around, define + on the Point3D type and use assignment:
points[9] = points[9] + new Point3D(0, 0, 5);
It's considered pretty bad to use anything without understanding the implications.
Structures are value types, not reference types - and as such, they behave slightly differently. When you pass a value type, modifications are on a copy, not on the original. When you assign a value type to an object reference (say, in a non-generic list), boxing occurs. Make sure you read up on the full effect of choosing one over the other.
Read this for some understanding benefits of structures vs classes and vice-versa.
A structure can be preferable when:
You have a small amount of data and simply want the equivalent of the UDT
(user-defined type) of previous versions of Visual Basic
You perform a large number of operations on each instance and would incur
performance degradation with heap management
You have no need to inherit the structure or to specialize
functionality among its instances
You do not box and unbox the structure
You are passing blittable data across a managed/unmanaged boundary
A class is preferable when:
You need to use inheritance and polymorphism
You need to initialize one or more members at creation time
You need to supply an unparameterized constructor
You need unlimited event handling support
To answer your question directly, there is nothing inherantly wrong with using a structure in VB.NET. As with any design decision you do need to consider the consequences of this decision.
It's important that you're aware of the difference between a class and a structure so that you can make an educated decision about which is appropriate. As stated by Alex et al, one of the key differences between a structure and a class is that a structure is considered a value type and a class is considered a reference type.
Reference types use copy-by-reference sematics, this means that when an object is created or copied, only a pointer to the actual object is allocated on the stack, the actual object data is allocated on the heap.
In contrast, value types have copy-by-value sematics which means that each time a value type (e.g. a structure) is copied, then the entire object is copied to a new location on the stack/
For objects with a small amount of data, this isn't really a problem, but if you have a large amount of data then using a reference type will likely be less expensive in terms of stack allocations because only a pointer will be copied to the stack.
Microsoft have guidelines on the use of structures that more precisely describe the differences between classes and structures and the consequences of choosing one over the other
From a behavioral standpoints, there are three types of 'things' in .net:
Mutable reference types
Value types which can be mutated without being entirely replaced
Immutable reference and value types
Eric Lippert really dislikes group #2 above, since .net isn't terribly good at handling them, and sometimes treats them as though they're in group #1 or #3. Nonetheless, there are times when mutable value types make more sense semantically than would anything else.
Suppose, for example, that one has a rectangle and one wants to make another rectangle which is like the first one, but twice as tall. It is IMHO cleaner to say:
Rect2 = Rect1 ' Makes a new Rectangle that's just like Rect1
Rect2.Height = Rect2.Height*2
than to say either
Rect2 = Rect1.Clone ' Would be necessary if Rect1 were a class
Rect2.Height = Rect2.Height*2
or
Rect2 = New Rectangle(Rect1.Left, Rect1.Top, Rect1.Width, Rect1.Height*2)
When using classes, if one wants an object that's slightly different from an existing object, one must consider before mutating the object whether anyone else might want to use the original; if so, one must make a copy of it and then make the desired changes to the copy. With structs, there's no such restriction.
A simple way to think of value types is to regard every assignment operation as making a clone of the original, but in a way that's considerably cheaper than cloning a class type. If one would end up cloning a lot of objects as often as one would assign references without cloning, that's a substantial argument in favor of structs.