Efficient way to define a class with multiple, optionally-empty slots in S4 of R? - oop

I am building a package to handle data that arrives with up to 4 different types. Each of these types is a legitimate class in the form of a matrix, data.frame or tree. Depending on the way the data is processed and other experimental factors, some of these data components may be missing, but it is still extremely useful to be able to store this information as an instance of a special class and have methods that recognize the different component data.
Approach 1:
I have experimented with an incremental inheritance structure that looks like a nested tree, where each combination of data types has its own class explicitly defined. This seems difficult to extend for additional data types in the future, and is also challenging for new developers to learn all the class names, however well-organized those names might be.
Approach 2:
A second approach is to create a single "master-class" that includes a slot for all 4 data types. In order to allow the slots to be NULL for the instances of missing data, it appears necessary to first define a virtual class union between the NULL class and the new data type class, and then use the virtual class union as the expected class for the relevant slot in the master-class. Here is an example (assuming each data type class is already defined):
################################################################################
# Use setClassUnion to define the unholy NULL-data union as a virtual class.
################################################################################
setClassUnion("dataClass1OrNULL", c("dataClass1", "NULL"))
setClassUnion("dataClass2OrNULL", c("dataClass2", "NULL"))
setClassUnion("dataClass3OrNULL", c("dataClass3", "NULL"))
setClassUnion("dataClass4OrNULL", c("dataClass4", "NULL"))
################################################################################
# Now define the master class with all 4 slots, and
# also the possibility of empty (NULL) slots and an explicity prototype for
# slots to be set to NULL if they are not provided at instantiation.
################################################################################
setClass(Class="theMasterClass",
representation=representation(
slot1="dataClass1OrNULL",
slot2="dataClass2OrNULL",
slot3="dataClass3OrNULL",
slot4="dataClass4OrNULL"),
prototype=prototype(slot1=NULL, slot2=NULL, slot3=NULL, slot4=NULL)
)
################################################################################
So the question might be rephrased as:
Are there more efficient and/or flexible alternatives to either of these approaches?
This example is modified from an answer to a SO question about setting the default value of slot to NULL. This question differs in that I am interested in knowing the best options in R for creating classes with slots that can be empty if needed, despite requiring a specific complex class in all other non-empty cases.

In my opinion...
Approach 2
It sort of defeats the purpose to adopt a formal class system, and then to create a class that contains ill-defined slots ('A' or NULL). At a minimum I would try to make DataClass1 have a 'NULL'-like default. As a simple example, the default here is a zero-length numeric vector.
setClass("DataClass1", representation=representation(x="numeric"))
DataClass1 <- function(x=numeric(), ...) {
new("DataClass1", x=x, ...)
}
Then
setClass("MasterClass1", representation=representation(dataClass1="DataClass1"))
MasterClass1 <- function(dataClass1=DataClass1(), ...) {
new("MasterClass1", dataClass1=dataClass1, ...)
}
One benefit of this is that methods don't have to test whether the instance in the slot is NULL or 'DataClass1'
setMethod(length, "DataClass1", function(x) length(x#x))
setMethod(length, "MasterClass1", function(x) length(x#dataClass1))
> length(MasterClass1())
[1] 0
> length(MasterClass1(DataClass1(1:5)))
[1] 5
In response to your comment about warning users when they access 'empty' slots, and remembering that users usually want functions to do something rather than tell them they're doing something wrong, I'd probably return the empty object DataClass1() which accurately reflects the state of the object. Maybe a show method would provide an overview that reinforced the status of the slot -- DataClass1: none. This seems particularly appropriate if MasterClass1 represents a way of coordinating several different analyses, of which the user may do only some.
A limitation of this approach (or your Approach 2) is that you don't get method dispatch -- you can't write methods that are appropriate only for an instance with DataClass1 instances that have non-zero length, and are forced to do some sort of manual dispatch (e.g., with if or switch). This might seem like a limitation for the developer, but it also applies to the user -- the user doesn't get a sense of which operations are uniquely appropriate to instances of MasterClass1 that have non-zero length DataClass1 instances.
Approach 1
When you say that the names of the classes in the hierarchy are going to be confusing to your user, it seems like this is maybe pointing to a more fundamental issue -- you're trying too hard to make a comprehensive representation of data types; a user will never be able to keep track of ClassWithMatrixDataFrameAndTree because it doesn't represent the way they view the data. This is maybe an opportunity to scale back your ambitions to really tackle only the most prominent parts of the area you're investigating. Or perhaps an opportunity to re-think how the user might think of and interact with the data they've collected, and to use the separation of interface (what the user sees) from implementation (how you've chosen to represent the data in classes) provided by class systems to more effectively encapsulate what the user is likely to do.
Putting the naming and number of classes aside, when you say "difficult to extend for additional data types in the future" it makes me wonder if perhaps some of the nuances of S4 classes are tripping you up? The short solution is to avoid writing your own initialize methods, and rely on the constructors to do the tricky work, along the lines of
setClass("A", representation(x="numeric"))
setClass("B", representation(y="numeric"), contains="A")
A <- function(x = numeric(), ...) new("A", x=x, ...)
B <- function(a = A(), y = numeric(), ...) new("B", a, y=y, ...)
and then
> B(A(1:5), 10)
An object of class "B"
Slot "y":
[1] 10
Slot "x":
[1] 1 2 3 4 5

Related

What design patterns can help model objects whose behaviors change dynamically, or objects with many optional behaviors?

Sorry for the long-winded examples, but I've run into this type of software design problem recently, and have been thinking about it. Not sure if there's a term for it, so I'll give 2 general examples to give an idea:
Ex 1: You're working on an RPG game, and you have a class for the main character. That character's reactions to the game world changes based on what you're wearing/holding, skills allotted, basically your object's internal state.
Say you have the following items in the game:
Ring of regeneration: allows your character to regenerate health over time
Sneaky sneakers: increases sneak
Magic mirror: reflects % of incoming damage
Doofy glasses: highlights crafting resources
Ex 2: You have a Toyota Corolla. There are different configurations of it, but they're all Toyota Corollas. ABS, and Traction Control are optional features (behaviors) you can add to the baseline model.
The baseline model does nothing extra when running.
With ABS, the car checks and responds for sudden stops when the car's running
With Traction Control, the car checks and responds to loss of traction when the car's running
And obviously, when you have a car with both, you'll do both behaviors when the car's running.
Common properties of the two examples:
the class of importance has a concrete blank slate that it can start with
optional items/parts can add an ability or extra behavior to that object; something extra to do per game tick/while running
may or may not mutate the object when item/behavior is added or taken off of it (increment sneak when putting on sneakers, decrement it when taking it off)
Potential solutions (but not adequate):
if statements:
if ch.isWearing 'doofy glasses':
render with doofy shader
else if ch.isWearing ...
Doesn't work. Need to add a clause for every part. Class can get big and complicated very fast.
Strategy pattern:
class TractionControlStrategy
class ABSstrategy
class Toyota:
TractionControlStrategy tcs
ABSstrategy abs
run():
if tcs not null:
tcs.run()
if abs not null:
abs.run()
carWithTCS = new Toyota(new TractionControlStrategy())
Not much better than the previous solution as you still have the long list of if statements
Strategy with subclasses:
class Toyota:
run():
// does nothing
class ToyotaWithABS : Toyota
ABSstrategy abs = new ABSstrategy()
run():
abs.run
class ToyotaWithTCS : Toyota ...
Satisfies the Open/Closed Principle I think. Better than the previous one maybe? But now you'll have to create a class for every combination of configurations. If you found out later on that there's other optional features, then the number of classes would double for every feature you need to implement...
How can these types of interactions and behaviors be modeled with OOP? What design patterns, or combinations of design patterns promote this kind of thing?
Really not sure if this is a good question or if I'm clear with what I'm asking as I've never really practiced good software design.
I'm learning OpenGL, working on my 3D mesh/model class. This question is related because in my renderer, indexing and textures are optional for a mesh. So a mesh can be vertices only, indexed vertices, vertices and textures, or all 3. Plus, I can't foresee what features I may want to add in the future as I don't know what I'll be learning like a month down the line, so I need the class to be flexible, and extensible
You are right and any option where you need to describe all possible combinations (whether via switch/if or via class hierarchy) is not good.
One way is to use the decorator pattern to wrap your main class and add dynamic properties.
Or you can have separate Stats class as an filed in the main class and decorate it with additional items.
class Thing
BasicStats stats
constructor()
this->stats = new BasicStats()
addItem(Item item)
// Decorate current stats with new stats
item->setComponent(this->stats)
this->stats = item
return this
int getHealth()
return this->stats->getHealth()
You can use it like this:
thing = new Thing() // has basic stats
thing->addItem(new MagicMirror)->addItem(new SilverBullet)
// will go through the chain of decorators to get the value
health = thing->getHealth()
Another way is to have a list of dynamic options (or items) in your main class:
class Thing
Stats stats
ItemList items
updateStats()
for item in this->items
item->updateStats(this->stats)
// OR if we want to not disclose the stats to items
// we can pass this and items should use
// character's methods to change stats
item->updateStats(this)
add(Item item)
this->items->append(item)
return this
Which can be used like this:
thing = new Thing()
thing->add(new MagicMirror())->add(new RingOfRegeneration)->updateStats()
Memento can also be useful if you don't want do change the character stats directly. For example, if you have some "compare" feature where the user (or player) can combine different sets of items to see the impact and then "apply" them.
Also look at the chain-of-responsibility, similar to the option with list of stats - you can create a chain of "Items" and request the stat from this chain. Like you can start with passing the base health value which will be then transformed by each item and you'll get the "upgraded" value at the end of chain.
Update: one more idea, visitor can be useful too:
# Base class for stats, Element
class Stat
abstract accept(StatVisitor visitor)
class Health extends Stat
private int health
accept(StatVisitor visitor)
# or you can have visitor->visitHealth(this)
visitor->visit(this)
multiply(int mult)
this->health = this->health * mult
class Strength extends Stat
private int strength
accept(StatVisitor visitor)
visitor->visit(this)
add(int strength)
this->strength = this->strength + strength
Stats are "Elements" of the Visitor pattern. The Thing class represents "Client":
# Thing contains stats, this is Client
class Thing
StatsList stats
accept(StatsVisitor visitor)
for stat in this->stats
stat->visit(visitor)
And "Visitors" are our items, which can modify stats:
# Base visitor class
class StatVisitor
abstract visit(Health health)
abstract visit(Strength strength)
class MagicMirror
# magic mirror multiplies health by 10
visit(Health health)
health->multiply(10)
# magic mirror increases strength +5
visit(Strength strength)
strength->add(5)
Now you can do this:
thing = new Thing()
item = MagicMirror()
# now update all the stats with MagicMirror
thing->accept(item)
Don't overcomplicate things:
Ring of regeneration: allows your character to regenerate health over time
This really shouldn't modify your object at all. E.g. putting the ring on may add a timer that tries to increase an entities health (i.e. the wearer's) and taking the ring off removes this timer again.
Sneaky sneakers: increases sneak
This can be modelled as a modification on a stats:
class Entity
Map<Skill, SortedList<Modifier>> skillModifiers
getSneakChance()
sneakChance = .. // compute base value from attributes
for each mod in skillModifiers[SneakChance]
sneakChance = mod.apply(sneakChance)
return sneakChance
The map avoids two drawbacks of decorator and chain-of-responsibility: not every modifier adds another indirection to the evaluation and you can have simpler, value based modifiers (increase a value by x%) as opposed of pushing the logic of what stat/skill is to be modified or how to access it into the modifier also.
One difficulty is deciding in what order to apply the modifiers: should you first add 20 to your attack damage and then increase it by 5% or the other way around, loosing the 1 damage point? Should certain modifiers apply to the base value only, i.e. do you need passing both the accumulated and the base value in mod.apply?
Magic mirror: reflects % of incoming damage
Same as for 2. This time map modifiers and reactions onto specific events (i.e. taking damage). This allows for the same effect being applied to multiple entities like an aura or some other aoe spell.
Doofy glasses: highlights crafting resources
This is again something that doesn't affect the entity. The gui-controller requires this kind of information, so simply query the player for visual modifiers (unless every entity in your game needs to manipulate the behaviour and look of your gui).

Another Layer (1 Interface) vs. extending N Interfaces?

I have an Data-Access Layer (SAP ABAP, but the language does not matter here) where I have 1 interface per entity/database-table, like
IF_DATA_CONTRACT_POSITION->get_contract_positions( )
IF_DATA_CONTRACT_HEAD->get_contract_header( )
IF_DATA_OBJECT_CALC->get_object_calculations( )
40 more ...
These interfaces are implemented by the actual database-access class-impls and a generated caching-layer, which is pretty simple since the methods really do not have any parameters and just return "the relevant" data.
In certain consumers however, I require a filtered access to the returned data, specifically I need to get the data of all interfaces (~50) constrained by contract-position.
So, do you recommend to
A) extend all interfaces by an optional parameter like IF_DATA_CONTRACT_POSITION->get_contract_positions([OPTIONAL-FILTER]) which means my impl and my caching-layer gets more complex
B) should I create another interface IF_DATA_FILTER_CONTRACT_POSITION->set_contract_position_filter? for the sole purpose of explicitly filtering data-acesss
A) When extending every existing interface (the ~40-50 listed above) with the optional contract-position filter/constraint, the API is quite clean and would look like the following:
result = lo_data_object_calc->get_contract_positions( <FILTER> ).
As already mentioned, it would require me to extend every implementation, the data-access as well as the generated caching-layer.
B) With the explicit filter-interface IF_DATA_FILTER_CONTRACT_POSITION on the other hand, I would have yet another interface-layer around data-access and I could generate the uncoupled filtering impls. I would neither need to touch the actual data-access impl nor the generated cache-layer. However, the usage would be a little more clumsy, like
TRY.
" down-cast from data-interface to filter-interface
DATA lo_object_filter ?= lo_data_object_calc.
lo_object_filter->set_contract_position_filter( <FILTER> ).
CATCH could_not_cast. RAISE i-need-a-filter-impl!
ENDTRY.
result = lo_data_object_calc->get_object_calculations( ).
Update 05.08.2014: I decided to go with C) create a seperate filter-object which explicitly filters collections retrieved by e.g. get_contract_positions().
I would go for solution A.
1. If you later need to optimize your data retrieval you can do that in your db-access-class
2. You can use your filter in your where-clause and don't have to program by yourself
3. Someone else using your interfaces have to find, understand and use your filter interfaces/classes. I think it's easier if you have the filter as parameter inside your data access-methods

What should I name a class whose sole purpose is procedural?

I have a lot to learn in the way of OO patterns and this is a problem I've come across over the years. I end up in situations where my classes' sole purpose is procedural, just basically wrapping a procedure up in a class. It doesn't seem like the right OO way to do things, and I wonder if someone is experienced with this problem enough to help me consider it in a different way. My specific example in the current application follows.
In my application I'm taking a set of points from engineering survey equipment and normalizing them to be used elsewhere in the program. By "normalize" I mean a set of transformations of the full data set until a destination orientation is reached.
Each transformation procedure will take the input of an array of points (i.e. of the form class point { float x; float y; float z; }) and return an array of the same length but with different values. For example, a transformation like point[] RotateXY(point[] inList, float angle). The other kind of procedure wold be of the analysis type, used to supplement the normalization process and decide what transformation to do next. This type of procedure takes in the same points as a parameter but returns a different kind of dataset.
My question is, what is a good pattern to use in this situation? The one I was about to code in was a Normalization class which inherits class types of RotationXY for instance. But RotationXY's sole purpose is to rotate the points, so it would basically be implementing a single function. This doesn't seem very nice, though, for the reasons I mentioned in the first paragraph.
Thanks in advance!
The most common/natural approach for finding candidate classes in your problem domain is to look for nouns and then scan for the verbs/actions associated with those nouns to find the behavior that each class should implement. While this is generally a good advise, it doesn't mean that your objects must only represent concrete elements. When processes (which are generally modeled as methods) start to grow and become complex, it is a good practice to model them as objects. So, if your transformation has a weight on its own, it is ok to model it as an object and do something like:
class RotateXY
{
public function apply(point p)
{
//Apply the transformation
}
}
t = new RotateXY();
newPoint = t->apply(oldPoint);
in case you have many transformations you can create a polymorphic hierarchy and even chain one transformation after another. If you want to dig a bit deeper you can also take a look at the Command design pattern, which closely relates to this.
Some final comments:
If it fits your case, it is a good idea to model the transformation at the point level and then apply it to a collection of points. In that way you can properly isolate the transformation concept and is also easier to write test cases. You can later even create a Composite of transformations if you need.
I generally don't like the Utils (or similar) classes with a bunch of static methods, since in most of the cases it means that your model is missing the abstraction that should carry that behavior.
HTH
Typically, when it comes to classes that contain only static methods, I name them Util, e.g. DbUtil for facading DB access, FileUtil for file I/O etc. So find some term that all your methods have in common and name it that Util. Maybe in your case GeometryUtil or something along those lines.
Since the particulars of the transformations you apply seem ad-hoc for the problem and possibly prone to change in the future you could code them in a configuration file.
The point's client would read from the file and know what to do. As for the rotation or any other transformation method, they could go well as part of the Point class.
I see nothing particularly wrong with classes/interfaces having just essentially one member.
In your case the member is an "Operation with some arguments of one type that returns same type" - common for some math/functional problems. You may find convenient to have interface/base class and helper methods that combine multiple transformation classes together into more complex transformation.
Alternative approach: if you language support it is just go functional style altogether (similar to LINQ in C#).
On functional style suggestion: I's start with following basic functions (probably just find them in standard libraries for the language)
collection = map(collection, perItemFunction) to transform all items in a collection (Select in C#)
item = reduce (collection, agregateFunction) to reduce all items into single entity (Aggregate in C#)
combine 2 functions on item funcOnItem = combine(funcFirst, funcSecond). Can be expressed as lambda in C# Func<T,T> combined = x => second(first(x)).
"bind"/curry - fix one of arguments of a function functionOfOneArg = curry(funcOfArgs, fixedFirstArg). Can be expressed in C# as lambda Func<T,T> curried = x => funcOfTwoArg(fixedFirstArg, x).
This list will let you do something like "turn all points in collection on a over X axis by 10 and shift Y by 15": map(points, combine(curry(rotateX, 10), curry(shiftY(15))).
The syntax will depend on language. I.e. in JavaScript you just pass functions (and map/reduce are part of language already), C# - lambda and Func classes (like on argument function - Func<T,R>) are an option. In some languages you have to explicitly use class/interface to represent a "function" object.
Alternative approach: If you actually dealing with points and transformation another traditional approach is to use Matrix to represent all linear operations (if your language supports custom operators you get very natural looking code).

Overextending object design by adding many trivial fields?

I have to add a bunch of trivial or seldom used attributes to an object in my business model.
So, imagine class Foo which has a bunch of standard information such as Price, Color, Weight, Length. Now, I need to add a bunch of attributes to Foo that are rarely deviating from the norm and rarely used (in the scope of the entire domain). So, Foo.DisplayWhenConditionIsX is true for 95% of instances; likewise, Foo.ShowPriceWhenConditionIsY is almost always true, and Foo.PriceWhenViewedByZ has the same value as Foo.Price most of the time.
It just smells wrong to me to add a dozen fields like this to both my class and database table. However, I don't know that wrapping these new fields into their own FooDisplayAttributes class makes sense. That feels like adding complexity to my DAL and BLL for little gain other than a smaller object. Any recommendations?
Try setting up a separate storage class/struct for the rarely used fields and hold it as a single field, say "rarelyUsedFields" (for example, it will be a pointer in C++ and a reference in Java - you don't mention your language.)
Have setters/getters for these fields on your class. Setters will check if the value is not the same as default and lazily initialize rarelyUsedFields, then set the respective field value (say, rarelyUsedFields.DisplayWhenConditionIsX = false). Getters they will read the rarelyUsedFields value and return default values (true for DisplayWhenConditionIsX and so on) if it is NULL, otherwise return rarelyUsedFields.DisplayWhenConditionIsX.
This approach is used quite often, see WebKit's Node.h as an example (and its focused() method.)
Abstraction makes your question a bit hard to understand, but I would suggest using custom getters such as Foo.getPrice() and Foo.getSpecialPrice().
The first one would simply return the attribute, while the second would perform operations on it first.
This is only possible if there is a way to calculate the "seldom used version" from the original attribute value, but in most common cases this would be possible, providing you can access data from another object storing parameters, such as FooShop.getCurrentDiscount().
The problem I see is more about the Foo object having side effects.
In your example, I see two features : display and price.
I would build one or many Displayer (who knows how to display) and make the price a component object, with a list of internal price modificators.
Note all this is relevant only if your Foo objects are called by numerous clients.

Why does Seq[V] not extend Map[Int,V] nor does Set[V] extend Map[V,Bool]?

The three immediate subtypes of Iterable are Map, Seq, and Set. It seems like—aside from performance issues—a Seq is a map from integers to values, and a Set is a map from values to booleans (true if the value is in the set, false otherwise).
If this is the case, why is this not expressed in the type system by making Seq[V] extend Map[Int, V] and Set[V] extend Map[V, Boolean]?
Well, they sort of do, at least actually common functionality. Seq[B] inherits from Int => B (via PartialFunction[Int, B]), Map[A, B] inherits from A => B (also via PartialFunction[A, B]), and Set[A] inherits from A => Boolean. Thus, as far as function application and composition methods are concerned, all three can be used interchangeably. Additionally, they can be used interchangeably as far as traversal goes, as all implement TraversableLike.
Seeing a sequence as an assignment from integers to elements is only one way to describe what a sequence is. There are other ways, and there is no reason why that way of describing a sequence should become canonical. The actual purpose of a sequence is to make a bunch of elements accessible and traversable. A sequence is not required to actually assign integer numbers to the elements. For example, most Stream implementations probably don't have a counter running in parallel to the traversal. Requiring that would impose an unnecessary overhead on the implementation.
Besides, a Map[K,V] is also an Iterable[(K,V)]. Following your suggestion, a Seq[A] would also have to be a Map[Int,A], which would by that also make it an Iterable[(Int,A)]. Since Seq extends Iterable, this would make the Seq[A] both an Iterable[A] and an Iterable[(Int,A)] (and, recursively, an Iterable[(Int,(Int,A))], Iterable[(Int,(Int,(Int,A)))], and so on), which is not an allowed way of inheritance in Scala.
You can construct a similar argument for your suggestion regarding Set.
Well, if all you care about Seq and Set was that, you'd have a point. Myself, I happen to think that's one of the least importants aspects, and one which is already well represented by all of them being functions.
That is, a Map is a function of a key into a value, a Seq is a function of an Int into a value, and a Set is a function of a value into a Boolean. This property, which you called a "map", is a funciton. And it is already shared by all three.
What, in my opinion, Map, Seq and Set are really about are:
A Seq is concerned about knowing in what order its elements are. Conceptually, how would you prepend an element in a Map? You'd have to renumber all keys!
A Set is concerned about the presence or absence of an element. How one would model that in a Map? It would have to be a map with default value -- not a common map -- and one in which all non-default values are the same! That is clearly a degenerate behavior, not an abstraction.
A Map is concerned about mapping arbitrary keys to arbitrary values. A Seq doesn't have arbitrary keys, and a Set doesn't have arbitrary values.