Behavior of components when structures are in an array - oop

I am currently working on the simulation of a physical system in Fortran90 with something like 50 millions particles. Each has a position x (to simplify).
For now, I am using a 1D vector that contains the position of each particle. And when I have to iterate on every particle, I just go through that vector (as I took care to sort the particles to limit cache misses).
I am now considering creating a particle class. But what about the access to its position as I iterate ? Will it be as fast as the previous case ?
So, what does the compiler do to store the attributes of an object ? And a fortiori, what about the case with more than one attributes?
Thank you for your time.

On "how are derived types stored":
Fortran Standard requires components of a sequence type to be stored (in memory) as a sequence of contiguous storage, in components' declaration order. Sequence types are those declared with a SEQUENCE statement, which implies that the type shall have at least one component, each component shall be of an intrinsic or sequence type, shall not be a parameterized or extensible type, and can't have type-bound procedures. If you want this behavior and your type is suitable, make it a sequence type (you may take data alignment into consideration).
On the other hand, Fortran Standard does not state how compilers have to organize storage for non-sequence derived types. That's not bad at all, as compilers are free to optimize storage. Most of times, you may expect almost the same as sequence types: things stored contiguously whenever posible (padding may apply). Arrays and strings are always contiguous. Pointer and allocatable components are a only reference, for obvious reasons, and their targets lay somewhere else.
From the Standard:
A structure resolves into a sequence of components. Unless the
structure includes a SEQUENCE statement, the use of this terminology
in no way implies that these components are stored in this, or any
other, order. Nor is there any requirement that contiguous storage be
used. The sequence merely refers to the fact that in writing the
definitions there will necessarily be an order in which the components
appear, and this will define a sequence of components. This order is
of limited significance because a component of an object of derived
type will always be accessed by a component name except in the
following contexts: the sequence of expressions in a derived-type
value constructor, intrinsic assignment, the data values in namelist
input data, and the inclusion of the structure in an input/output list
of a formatted data transfer, where it is expanded to this sequence of
components. Provided the processor adheres to the defined order in
these cases, it is otherwise free to organize the storage of the
components for any nonsequence structure in memory as best suited to
the particular architecture.
On "Is it faster to have a derived type than independent arrays":
As #VladmirF said in comment, its a broad topic, depends highly on how are you accessing and operating your data, and has been asked and answered before (Check links on its comment). You may find a lot about it arround (link1, link2) and I'll add this one on "cache blocking" thay may interest you.

Related

How do you call an object which state can be completely described by its string representation?

Is there a name in the OOP world to refer to such objects? For example, in java
"Word".toString();
Will return an output of Word. This is a string representation of the entity that exists currently in the program.
Some more examples can be accomplished with other datatypes like Doubles, Integers, maybe even lists or different data structures.
And some other more complex that cannot be represented in this way, for example a full fledged RESTful service class might not have a string representation of its current state.
What's the right terminology? native? immutable? those 2 last terms doesn't really reflect this definition.
To expand on the question:
Imagine you have a function/method that converts a string to a map, a string could be {key1=value1,key2=value2} and you would get a map back, this doesn't work for some complex objects, how would you describe the parameters of this function if you were to generalize it's use for other simple object types?
You have an abstract object that consists of internal state.
You have one or more concrete representations of that object's state.
In one case the concrete representation is a chunk of memory containing primitives and references to other component objects on the heap (in Java, other languages may be different).
You have a different representation that is amenable to being stored in a contiguous block of characters or bytes, and possibly transmitted over a network.
Both representations are canonically equivalent given equivalent contexts containing their non-state information (methods, class hierarchy, etc), but they serve different purposes.
Generically, this could be called a "change of representation". When the first representation above is converted to the second it's called "serialization", and the reverse process is "deserialization". Note that you could have many different representations fulfilling different requirements and supporting different functionality.
One important point to note is that in both cases, in-memory and "serialized" (and any other representations), if an object's state contains references to other objects, then the entire "state" consists of that object and all the objects that can be reached from it, and objects reachable from those objects, etc. This is known as an "object graph", and it exists equally in all representations.
As to deciding which one you should or shouldn't use, that depends totally on your processing requirements.
for example a full fledged RESTful service class might not have a string representation of its current state
This is incorrect, you can always define a serialized representation of an object's state. It may be inconvenient to do so, but if it is required it can be done.
Imagine you have a function/method that converts a string to a map, a string could be {key1=value1,key2=value2} and you would get a map back, this doesn't work for some complex objects
Again, it can always be made to work if it is a requirement, as long as the cost of doing so is justified.
In summary, everything is a representation, and you can arrange to transform one representation to any other and back again, without loss, assuming you're willing to incur the costs of doing so. As mentioned above, one factor is the cost of representing not just the single object, but the entire object graph, which can be substantial.

Code design: Who's responsible for changing object data?

Assuming I have some kind of data structure to work on (for example images) which I want to pre- and postprocess in different ways to make further processing steps easier. What's the best way to implement this responsibility with an OOP language like C++?
Further assuming I have a lot of different processing algorithms with inherent complexity I very likely want to encapsulate them in dedicated classes. This means though that the algorithm implementations externally have to set some kind of info in my data to indicate it having been processed. And that also doesn't look like clean design to me because having been processed seems like an info associated with the data and thus something the data object itself should determine and set on its own.
It also looks like a very common source of error in complex applications: Someone implements another processing algorithm, forgets to set the flags in the data appropriately, something in completely different parts of the application won't work as expected and someone will have lots of fun spotting the error.
Can someone outline a general structure of a good and fail-save way to implement sth like this?
To make sure I understand what you are asking, here are my assumptions based on my reading of the question:
The data is some kind of binary format (presumably an image but as you say it could be anything) that can be represented as an array of bytes
There are a number of processing steps (I'll refer to them as transformations) that can be applied to the data
Some transformations depend on other such that, for example, you would like to avoid applying a transformation if its pre-requisite has not been applied. You would like it to be robust, so that attempting to apply an illegal transformation will be detected and prevented.
And the question is how to do this in an object-oriented way that avoids future bugs as the complexity of the program increases.
One way is to have the image data object, which encapsulates both the binary data and a record of the transformations that have been applied to it, be responsible for executing the transformation through a Transformation object delegate; and the Transformation objects implement both the processing algorithm and the knowledge of whether it can be applied based on previous transformations.
So you might define the following (excuse my Java-like naming style; it's been a long time since I've done C++):
An enumerated type called TransformationType
An abstract class called Transformer, with the following methods:
A method called 'getType' which returns a TransformationType
A method called 'canTransform' that accepts a list of TransformationType and returns a boolean. The list indicates transformations that have already been applied to the data, and the boolean indicates whether it is OK to execute this transformation.
A method called 'transform' that accepts an array of bytes and returns an array of (presumably modified) bytes
A class called BinaryData, containing a byte array and a list of TransformationType. This class implements the method 'void transform(Transformer t)' to do the following:
Query the transformer's 'canTransform' method, passing the list of transformation types; either throw an exception or return if canTransform returns false
Replace he byte array with the results of invoking t.transform(data)
Add the transfomer's type to the list
I think this accomplishes what you want - the image transformation algorithms are defined polymorphically in classes, but the actual application of the transformations is still 'controlled' by the data object. Hence we do not have to trust external code to do the right thing wrt setting / checking flags, etc.

Some questions on enum and streamwriters vb

Some questions i had regarding vb.net functions:
How do you differentiate between an enumeration and a record? As far as i'm aware an enumerated type is simply one constant with multiple identifiers and that a structure contains different data types?
When declared, does a variable of a structure type need to use all its fields or can some be omitted?
Am i correct in saying sets don't exist in vb.net and the closest thing is an arraylist?
Is there much of a difference in streamreaders/writers and binaryreaders/writers when referring to reading and writing to text/binary files in terms of being called and used? (Aka is the only difference the data being read? [2-3 line examples would help]
I'm a bit confused about transformation variables; I know that they gain their value from the fixed calculation of another variable, but i can't seem to gain an understanding of it.
How do you differentiate between an enumeration and a record?
In what context? Basically an Enum is a list of constants and can be used pretty much anywhere. Records mainly have to do with Databases and Datasets, which means a record could be made up of any data types
As far as i'm aware an enumerated type is simply one constant with multiple
identifiers and that a structure contains different data types?
A structure is basically a way of organizing a certain set of variables.
When declared, does a variable of a structure type need to use all its
fields or can some be omitted?
Every field is part of the structure when it is declared.
Am i correct in saying sets don't exist in vb.net and the closest
thing is an arraylist?
Not sure what you mean by set. .net contains classes for several different types of collections, of which arraylist is just one.
Is there much of a difference in streamreaders/writers and
binaryreaders/writers when referring to reading and writing to
text/binary files in terms of being called and used?
Basically the main difference is that because binary doesn't normally include line breaks, that method uses buffers, where as text readers use strings and recognize line breaks.
I'm a bit confused about transformation variables; I know that they
gain their value from the fixed calculation of another variable, but i
can't seem to gain an understanding of it.
Not really sure what you're getting at here. I suspect that that has to do with higher math functions and not really .net specific.

VB.NET: is using Structures considered nasty?

I use to use Structures quite a lot in the VB6 days, and try to avoid them now with .NET. Just wondering if using structures in 2010 instead of a Class is considered nasty?
Thanks for the help.
Choosing a Structure takes consideration instead of being inherently "nasty". There are reasons why a Structure can be nasty; however there are also reasons a Class can be nasty in its own way...
Basically when you decide between these two object oriented kinds of containers, you're deciding how memory will be used.
There are different semantics associated with Structure and Class in VB.NET and they represent different memory usage patterns.
By creating a Class you're creating a reference type.
good for large data
memory contains a reference to the object location on the heap (like the concept of pointing to an object) though happens transparently to the VB.NET programmer because you're in "managed mode".
By creating a Structure you're creating a value type.
good for small data
memory allocated contains the actual value
be judicious because these are apt to get pushed on the stack area of memory (i.e. for local vars, but not class fields) - too large and you could run into stack issues.
Also some good video resources on YouTube if you're an audio learner.
Many articles on the Internet like these MSDN articles to teach the basics and details:
Value Types and Reference Types
7.1 Types - Reference and Value
MSDN Type Fundamentals - subheading: Reference and Value Types
Example
Structures exist because in some scenarios they make more sense than classes. They are particular useful for representing small abstract data types such as 3D points, latitude-longitude, rational numbers, etc.
The basic motivation for using structs is to avoid GC pressure. Since structs live inline (on the stack or inside whatever container you put them in) rather than on the heap, they typically result in far fewer small allocations, which makes a huge difference if you need to hold an array of one million points or rationals.
A key issue to watch out for is that structs are value types, and thus are generally passed around by value (the obvious exception being ref and out parameters). This has important implications. For instance:
Point3D[] points = ...;
points[9].Move(0, 0, 5);
The above code works fine, and increases the z coordinate of the 10th point by 5. The following code, however:
List<Point3D> points = ...;
points[9].Move(0, 0, 5);
Will still compile and run, but you will find that the z coordinate of the 10th point remains unchanged. This is because the List's index operator returns a copy of the point, and it is the copy that you are calling Move on.
The solution is quite simple. Always make structs immutable by marking all fields readonly. If you still need to Move points around, define + on the Point3D type and use assignment:
points[9] = points[9] + new Point3D(0, 0, 5);
It's considered pretty bad to use anything without understanding the implications.
Structures are value types, not reference types - and as such, they behave slightly differently. When you pass a value type, modifications are on a copy, not on the original. When you assign a value type to an object reference (say, in a non-generic list), boxing occurs. Make sure you read up on the full effect of choosing one over the other.
Read this for some understanding benefits of structures vs classes and vice-versa.
A structure can be preferable when:
You have a small amount of data and simply want the equivalent of the UDT
(user-defined type) of previous versions of Visual Basic
You perform a large number of operations on each instance and would incur
performance degradation with heap management
You have no need to inherit the structure or to specialize
functionality among its instances
You do not box and unbox the structure
You are passing blittable data across a managed/unmanaged boundary
A class is preferable when:
You need to use inheritance and polymorphism
You need to initialize one or more members at creation time
You need to supply an unparameterized constructor
You need unlimited event handling support
To answer your question directly, there is nothing inherantly wrong with using a structure in VB.NET. As with any design decision you do need to consider the consequences of this decision.
It's important that you're aware of the difference between a class and a structure so that you can make an educated decision about which is appropriate. As stated by Alex et al, one of the key differences between a structure and a class is that a structure is considered a value type and a class is considered a reference type.
Reference types use copy-by-reference sematics, this means that when an object is created or copied, only a pointer to the actual object is allocated on the stack, the actual object data is allocated on the heap.
In contrast, value types have copy-by-value sematics which means that each time a value type (e.g. a structure) is copied, then the entire object is copied to a new location on the stack/
For objects with a small amount of data, this isn't really a problem, but if you have a large amount of data then using a reference type will likely be less expensive in terms of stack allocations because only a pointer will be copied to the stack.
Microsoft have guidelines on the use of structures that more precisely describe the differences between classes and structures and the consequences of choosing one over the other
From a behavioral standpoints, there are three types of 'things' in .net:
Mutable reference types
Value types which can be mutated without being entirely replaced
Immutable reference and value types
Eric Lippert really dislikes group #2 above, since .net isn't terribly good at handling them, and sometimes treats them as though they're in group #1 or #3. Nonetheless, there are times when mutable value types make more sense semantically than would anything else.
Suppose, for example, that one has a rectangle and one wants to make another rectangle which is like the first one, but twice as tall. It is IMHO cleaner to say:
Rect2 = Rect1 ' Makes a new Rectangle that's just like Rect1
Rect2.Height = Rect2.Height*2
than to say either
Rect2 = Rect1.Clone ' Would be necessary if Rect1 were a class
Rect2.Height = Rect2.Height*2
or
Rect2 = New Rectangle(Rect1.Left, Rect1.Top, Rect1.Width, Rect1.Height*2)
When using classes, if one wants an object that's slightly different from an existing object, one must consider before mutating the object whether anyone else might want to use the original; if so, one must make a copy of it and then make the desired changes to the copy. With structs, there's no such restriction.
A simple way to think of value types is to regard every assignment operation as making a clone of the original, but in a way that's considerably cheaper than cloning a class type. If one would end up cloning a lot of objects as often as one would assign references without cloning, that's a substantial argument in favor of structs.

Variable Naming Conventions For Maps/Lists in Dynamically-Typed languages

I am getting into Groovy language, which has dynamic typing (as well as optional static typing). It also has native support for Lists, Maps, and Ranges, so I find myself using lists and maps a lot, especially lists of lists, lists of maps, maps of lists, etc.
In static languages (esp with Generics) you always have an idea of what your type is. I am fairly new to dynamic languages, and it's getting a bit difficult to keep track of what my variable is supposed to be, so I was wondering if other people use some kind of variable naming conventions to keep these straight.
For example, suppose I have a map of dates as key and integers as values. Or List of integers, or List of Maps that contain strings as keys and account objects as values.
It seems like creating a clear convention behind variable names will help me keep track of what data type structure I am dealing with without having to look it up.
Any tips?
This is a common beginner's lament. You could use a naming convention, but odds are you'll drop it before too long and focus on what the variable represents (its meaning in relation to the rest of the code) rather than worrying about how it's represented (it's "type").
The name of your variable should explain to someone reading the code what it is supposed to be, what it stands for. If you have a map of dates to integers, does it represent, for example (suggested variable names are in brackets):
a number of payments due on that date (paymentsDue)
a number of days between mapped date and some other point in time (daysPassed)
a number of messages posted on that date on Stack Overflow (numberOfPostedMessages)
In languages where variable type is not readily available, you might want to append a prefix of suffix, such as paymentsDueMap. I would, however, advise against encoding any additional type information inside a variable name, such as datesToInts - that routinely does more harm than good.
Finally, if you have a complex data structure, such as a list of maps between strings and accounts, the best thing would be to encapsulate that into a separate class, and name it according to its intent.
In static languages (esp with Generics) you always have an idea of what your type is.
After a while of programming in dynamic languages, you learn that using types this way is a crutch. Two pieces of advice:
Use good variable naming. For instance, if you have a map of dates to ints, you can name it something like BirthdateToTotalLookup.
Learn what visual clues to look for. It may seem obvious, but it took me a while to get in the habit of looking for clues like this:
sum += x['10-16-92']
From the piece of code above, I can tell that x is a map that has a date as a key and returns a number of some kind.
If the names can be kept short, then I tend to name maps something like "nounToNoun". So using your example of dates mapping to integers, I would name that "dateToCount" (if the integers are counters for something). That way its obvious that it is a map, and its obvious what is being mapped to what. The problem is that sometimes it is difficult to keep these sort of names short and readable. For example, "userToLoginHistory" starts getting a little unwieldy.
For lists I generally use a plural for the variable name. So "user" would be a single user, and "users" would be a list of users.
To be honest, I am not sure what a good name would be for a list of maps.
One of the benefits of dynamic languages is that even if you're using an object as a Map - it doesn't HAVE to be a map. All it has to do is support whatever messages are sent to it. In Groovy, if I know that a given method expects a map so it can look up things by a String key - I can give it the full map, a stripped-down map, an Expando with a property named the same thing as the key, or any other object that has a property named the same thing as the key. This is because someObject["keyname"] and someObject.keyname are the same thing. (Of course if the code calls someObject.get("keyname") I've got to wire that method up somehow.)
The point is, in a dynamic language like Groovy you think less about TYPES and more about SUPPORTED MESSAGES. If it's conceptually a map, fine - naming it birthdateToTotal would make sense (though I prefer to call it 'totals', because totals[birthdate] looks better than birthdateToTotal[birthdate]) - but if it doesn't have to be specified, don't specify it. You leave yourself flexibility later.
This is something you'll outgrow over time. Not to say I don't know a 20-year programmer still using Hungarian, but he's coding in a static-typed language, so it's almost understandable.
Consider this. That variable you're naming might be a HashMap, so what type do you add to the name? Map? This is a middle-of-the-road answer. Why not Collection? Since that way if you decide to change the WAY the data is stored, you don't have to change the variable name. Why not HashMap, if you really want to let the reader know what's going on.
As you may suspect, none of these are necessary. The point of a dynamic language (and even of polymorphism) is that you don't need to know the exact type of the variable being presented, only the data itself is important. While you might like a hint as to how to interface to that data, you'll soon find you already know in most cases, or can easily put that info in the variable without specifying types: addressesByZipCode, totalByBirthdate, etc.