class design for for a dynamic GetHash function - oop

I have the requirement to calculate an internal hashcode of instances of several types (some types are derived from each other). Two aspects are dynamic here and can vary independently. Only the client that requests the hash knows what hash algorithm is to be used and what properties are to be included.
The actual algorithm that is used
for the hash caluctation can change.
What members for each type should be take into account for the hash calculation can change.
How would you design your types around these requirement?

First, require all "hashable" objects to implement the same interface, with one method: getHash().
Second, introduce an Abstract Factory instantiating hash Strategies.
Example (Java 1.6):
public class Foo implements Hashable {
private String field;
#Override
public String getHash() {
HashFactory.INSTANCE.getMD5Strategy().hash(this.field);
}
}
Proper encapsulation of decision making inside your objects, and an effective decoupling of strategies and objects.

Here are some guidelines for GetHashCode() from Eric Lippert.
http://ericlippert.com/2011/02/28/guidelines-and-rules-for-gethashcode/
An excerpt:
Rule: the integer returned by GetHashCode must never change while the object is contained in a data structure that depends on the hash code remaining stable
It is permissible, though dangerous, to make an object whose hash code value can mutate as the fields of the object mutate. If you have such an object and you put it in a hash table then the code which mutates the object and the code which maintains the hash table are required to have some agreed-upon protocol that ensures that the object is not mutated while it is in the hash table. What that protocol looks like is up to you.
In short, if the algorithm for generating the hash code can change, then you need to make sure that it doesn't change while the object is in your collection.

Related

When is a class a data class?

I know what classes are about, but for better understanding I need a use case. Recently I discovered the construct of data classes. I get the idea behind normal classes, but I cannot imagine a real use case for data classes.
When should I use a data class and when I use a "normal" class? For all I know, all classes keep data.
Can you provide a good example that distinguishes data classes from non-data classes?
A data class is used to store data. It's lighter than a normal class, and can be compared to an array with key/value (dictionary, hash, etc.), but represented as an object with fixed attributes. In kotlin, according to the documentation, that adds those attributes to the class:
equals()/hashCode() pair
toString() of the form "User(name=John, age=42)"
componentN() functions corresponding to the properties in their order of declaration.
copy() function
Also it has a different behavior during class inheritence :
If there are explicit implementations of equals(), hashCode(), or toString() in the data class body or final implementations in a
superclass, then these functions are not generated, and the existing
implementations are used.
If a supertype has componentN() functions that are open and return compatible types, the corresponding functions are generated for the
data class and override those of the supertype. If the functions of
the supertype cannot be overridden due to incompatible signatures or
due to their being final, an error is reported.
Providing explicit implementations for the componentN() and copy() functions is not allowed.
So in kotlin, if you want to describe an object (a data) then you may use a dataclass, but if you're creating a complex application and your class needs to have special behavior in the constructor, with inheritence or abstraction, then you should use a normal class.
I do not know Kotlin, but in Python, a dataclass can be seen as a structured dict. When you want to use a dict to store an object which has always the same attributes, then you should not put it in a dict but use a Dataclass.
The advantage with a normal class is that you don't need to declare the __init__ method, as it is "automatic" (inherited).
Example :
This is a normal class
class Apple:
def __init__(size:int, color:str, sweet:bool=True):
self.size = size
self.color = color
self.sweet = sweet
Same class as a dataclass
from dataclasses import dataclass
#dataclass
class Apple:
size: int
color: str
sweet: bool = True
Then the advantage compared to a dict is that you are sure of what attribute it has. Also it can contains methods.
The advantage over to a normal class is that it is simpler to declare and make the code lighter. We can see that the attributes keywords (e.g size) are repeated 3 times in a normal class, but appear only once in a dataclass.
The advantage of normal class also is that you can personalize the __init__ method, (in a dataclass also, but then you lose it's main advantage I think) example:
# You need only 2 variable to initialize your class
class Apple:
def __init__(size:int, color:str):
self.size = size
self.color = color
# But you get much more info from those two.
self.sweet = True if color == 'red' else False
self.weight = self.__compute_weight()
self.price = self.weight * PRICE_PER_GRAM
def __compute_weight(self):
# ...
return (self.size**2)*10 # That's a random example
Abstractly, a data class is a pure, inert information record that doesn’t require any special handling when copied or passed around, and it represents nothing more than what is contained in its fields; it has no identity of its own. A typical example is a point in 3D space:
data class Point3D(
val x: Double,
val y: Double,
val z: Double
)
As long as the values are valid, an instance of a data class is entirely interchangeable with its fields, and it can be put apart or rematerialized at will. Often there is even little use for encapsulation: users of the data class can just access the instance’s fields directly. The Kotlin language provides a number of convenience features when data classes are declared as such in your code, which are described in the documentation. Those are useful when for example building more complex data structures employing data classes: you can for example have a hashmap assign values to particular points in space, and then be able to look up the value using a newly-constructed Point3D.
val map = HashMap<Point3D, String>()
map.set(Point3D(3, 4, 5), "point of interest")
println(map.get(Point3D(3, 4, 5))) // prints "point of interest"
For an example of a class that is not a data class, take FileReader. Underneath, this class probably keeps some kind of file handle in a private field, which you can assume to be an integer (as it actually is on at least some platforms). But you cannot expect to store this integer in a database, have another process read that same integer from the database, reconstruct a FileReader from it and expect it to work. Passing file handles between processes requires more ceremony than that, if it is even possible on a given platform. That property makes FileReader not a data class. Many examples of non-data classes will be of this kind: any class whose instances represent transient, local resources like a network connection, a position within a file or a running process, cannot be a data class. Likewise, any class where different instances should not be considered equal even if they contain the same information is not a data class either.
From the comments, it sounds like your question is really about why non-data classes exist in Kotlin and why you would ever choose not to make a data class. Here are some reasons.
Data classes are a lot more restrictive than a regular class:
They have to have a primary constructor, and every parameter of the primary constructor has to be a property.
They cannot have an empty primary constructor.
They cannot be open so they cannot be subclassed.
Here are other reasons:
Sometimes you don't want a class to have a copy function. If a class holds onto some heavy state that is expensive to copy, maybe it shouldn't advertise that it should be copied by presenting a copy function.
Sometimes you want to use an instance of a class in a Set or as Map keys without two different instances being considered as equivalent just because their properties have the same values.
The features of data classes are useful specifically for simple data holders, so the drawbacks are often something you want to avoid.

Recursively building a data class in Kotlin

I have am trying to create a recursive data class like so:
data class AttributeId (
val name: String,
val id: Int,
val children: List<AttributeId>?
)
The thing I'm struggling with now is building the data class by iterating over a source object.
How do I recursively build this object?? Is a data class the wrong solution here?
EDIT: Some more information about the Source object from which I want to construct my data class instance
The source object is a Java Stream that essentially* has the following shape:
public Category(final String value,
final Integer id,
final List<Category> children) {
this.value = value;
this.id = id;
this.children = children;
}
(For brevity the fields I don't care about have been removed from example)
I think I need to map over this stream and call a recursive function in order to construct the AttributeId data class, but my attempts seem to end in a stack overflow and a lot of confusion!
I don't think there's anything necessarily wrong with a data class that contains references to others.
There are certainly some gotchas.  For example:
If the list were mutable, or if its field was mutable (i.e. var rather than val), then you'd have to take care because its hashcode &c could change.
And if the chain of links could form a loop (i.e. you could follow the links and end up back at the original class), that could be very dangerous.  (E.g. calling a method such as toString() or hashCode() might either get stuck in an endless loop or crash the thread with a StackOverflowError.  You'd have to prevent that by overriding those methods to prevent them recursing.)  But that couldn't happen if the list and field were both immutable.
None of these issues are specific to data classes, though; a normal class could suffer the same issues (especially if you overrode methods like toString() or hashCode() without taking care).  So whether you make this a data class comes down to whether it feels like one: whether its primary purpose is to hold data, and/or whether the automatically-generated methods match how you want it to behave.
As Tenfour04 says, it depends what you're constructing these from.  If it naturally forms a tree structure, then this could be a good representation for it.
Obviously, you wouldn't be able to construct a parent before any of its children.  (In particular, the first instance you create would have to have either null or an empty list for its children.)  This would probably mean traversing the source in post-order.  The rest should fall out naturally from that.

Differences between Function that returns a string and read only string property [duplicate]

I need to expose the "is mapped?" state of an instance of a class. The outcome is determined by a basic check. It is not simply exposing the value of a field. I am unsure as to whether I should use a read-only property or a method.
Read-only property:
public bool IsMapped
{
get
{
return MappedField != null;
}
}
Method:
public bool IsMapped()
{
return MappedField != null;
}
I have read MSDN's Choosing Between Properties and Methods but I am still unsure.
The C# standard says
§ 8.7.4
A property is a member that provides access to a characteristic of an object or a class. Examples of properties include the length of a string, the size of a font, the caption of a window, the name of a customer, and so on. Properties are a natural extension of fields. Both are named members with associated types, and the syntax for accessing fields and properties is the same. However, unlike fields, properties do not denote storage locations. Instead, properties have accessors that specify the statements to be executed when their values are read or written.
while as methods are defined as
§ 8.7.3
A method is a member that implements a computation or action that can be performed by an object or class. Methods have a (possibly empty) list of formal parameters, a return value (unless the method’s return-type is void ), and are either static or non-static.
Properties and methods are used to realize encapsulation. Properties encapsulate data, methods encapsulate logic. And this is why you should prefer a read-only property if you are exposing data. In your case there is no logic that modifies the internal state of your object. You want to provide access to a characteristic of an object.
Whether an instance of your object IsMapped or not is a characteristic of your object. It contains a check, but that's why you have properties to access it. Properties can be defined using logic, but they should not expose logic. Just like the example mentioned in the first quote: Imagine the String.Length property. Depending on the implementation, it may be that this property loops through the string and counts the characters. It also does perform an operation, but "from the outside" it just give's an statement over the internal state/characteristics of the object.
I would use the property, because there is no real "doing" (action), no side effects and it's not too complex.
I personally believe that a method should do something or perform some action. You are not performing anything inside IsMapped so it should be a property
I'd go for a property. Mostly because the first senctence on the referenced MSDN-article:
In general, methods represent actions and properties represent data.
In this case it seems pretty clear to me that it should be a property. It's a simple check, no logic, no side effects, no performance impact. It doesn't get much simpler than that check.
Edit:
Please note that if there was any of the above mentioned and you would put it into a method, that method should include a strong verb, not an auxiliary verb like is or has. A method does something. You could name it VerifyMapping or DetermineMappingExistance or something else as long as it starts with a verb.
I think this line in your link is the answer
methods represent actions and properties represent data.
There is no action here, just a piece of data. So it's a Property.
In situations/languages where you have access to both of these constructs, the general divide is as follows:
If the request is for something the object has, use a property (or a field).
If the request is for the result of something the object does, use a method.
A little more specifically, a property is to be used to access, in read and/or write fashion, a data member that is (for consuming purposes) owned by the object exposing the property. Properties are better than fields because the data doesn't have to exist in persistent form all the time (they allow you to be "lazy" about calculation or retrieval of this data value), and they're better than methods for this purpose because you can still use them in code as if they were public fields.
Properties should not, however, result in side effects (with the possible, understandable exception of setting a variable meant to persist the value being returned, avoiding expensive recalculation of a value needed many times); they should, all other things being equal, return a deterministic result (so NextRandomNumber is a bad conceptual choice for a property) and the calculation should not result in the alteration of any state data that would affect other calculations (for instance, getting PropertyA and PropertyB in that order should not return any different result than getting PropertyB and then PropertyA).
A method, OTOH, is conceptually understood as performing some operation and returning the result; in short, it does something, which may extend beyond the scope of computing a return value. Methods, therefore, are to be used when an operation that returns a value has additional side effects. The return value may still be the result of some calculation, but the method may have computed it non-deterministically (GetNextRandomNumber()), or the returned data is in the form of a unique instance of an object, and calling the method again produces a different instance even if it may have the same data (GetCurrentStatus()), or the method may alter state data such that doing exactly the same thing twice in a row produces different results (EncryptDataBlock(); many encryption ciphers work this way by design to ensure encrypting the same data twice in a row produces different ciphertexts).
If at any point you'll need to add parameters in order to get the value, then you need a method. Otherwise you need a property
IMHO , the first read-only property is correct because IsMapped as a Attribute of your object, and you're not performing an action (only an evaluation), but at the end of the day consistancy with your existing codebase probably counts for more than semantics.... unless this is a uni assignment
I'll agree with people here in saying that because it is obtaining data, and has no side-effects, it should be a property.
To expand on that, I'd also accept some side-effects with a setter (but not a getter) if the side-effects made sense to someone "looking at it from the outside".
One way to think about it is that methods are verbs, and properties are adjectives (meanwhile, the objects themselves are nouns, and static objects are abstract nouns).
The only exception to the verb/adjective guideline is that it can make sense to use a method rather than a property when obtaining (or setting) the information in question can be very expensive: Logically, such a feature should probably still be a property, but people are used to thinking of properties as low-impact performance-wise and while there's no real reason why that should always be the case, it could be useful to highlight that GetIsMapped() is relatively heavy perform-wise if it in fact was.
At the level of the running code, there's absolutely no difference between calling a property and calling an equivalent method to get or set; it's all about making life easier for the person writing code that uses it.
I would expect property as it only is returning the detail of a field. On the other hand I would expect
MappedFields[] mf;
public bool IsMapped()
{
mf.All(x => x != null);
}
you should use the property because c# has properties for this reason

Unit testing value objects in isolation from its dependencies

TL;DR
How do you test a value object in isolation from its dependencies without stubbing or injecting them?
In Misko Hevery's blog post To “new” or not to “new”… he advocates the following (quoted from the blog post):
An Injectable class can ask for other Injectables in its constructor.(Sometimes I refer to Injectables as Service Objects, but
that term is overloaded.). Injectable can never ask for a non-Injectable (Newable) in its constructor.
Newables can ask for other Newables in their constructor, but not for Injectables (Sometimes I refer to Newables as Value Object, but
again, the term is overloaded)
Now if I have a Quantity value object like this:
class Quantity{
$quantity=0;
public function __construct($quantity){
$intValidator = new Zend_Validate_Int();
if(!$intValidator->isValid($quantity)){
throw new Exception("Quantity must be an integer.");
}
$gtValidator = new Zend_Validate_GreaterThan(0);
if(!$gtvalidator->isValid($quantity)){
throw new Exception("Quantity must be greater than zero.");
}
$this->quantity=$quantity;
}
}
My Quantity value object depends on at least 2 validators for its proper construction. Normally I would have injected those validators through the constructor, so that I can stub them during testing.
However, according to Misko a newable shouldn't ask for injectables in its constructor. Frankly a Quantity object that looks like this
$quantity=new Quantity(1,$intValidator,$gtValidator); looks really awkward.
Using a dependency injection framework to create a value object is even more awkward. However now my dependencies are hard coded in the Quantity constructor and I have no way to alter them if the business logic changes.
How do you design the value object properly for testing and adherence to the separation between injectables and newables?
Notes:
This is just a very very simplified example. My real object my have serious logic in it that may use other dependencies as well.
I used a PHP example just for illustration. Answers in other languages are appreciated.
A Value Object should only contain primitive values (integers, strings, boolean flags, other Value Objects, etc.).
Often, it would be best to let the Value Object itself protect its invariants. In the Quantity example you supply, it could easily do that by checking the incoming value without relying on external dependencies. However, I realize that you write
This is just a very very simplified example. My real object my have serious logic in it that may use other dependencies as well.
So, while I'm going to outline a solution based on the Quantity example, keep in mind that it looks overly complex because the validation logic is so simple here.
Since you also write
I used a PHP example just for illustration. Answers in other languages are appreciated.
I'm going to answer in F#.
If you have external validation dependencies, but still want to retain Quantity as a Value Object, you'll need to decouple the validation logic from the Value Object.
One way to do that is to define an interface for validation:
type IQuantityValidator =
abstract Validate : decimal -> unit
In this case, I patterned the Validate method on the OP example, which throws exceptions upon validation failures. This means that if the Validate method doesn't throw an exception, all is good. This is the reason the method returns unit.
(If I hadn't decided to pattern this interface on the OP, I'd have preferred using the Specification pattern instead; if so, I'd instead have declared the Validate method as decimal -> bool.)
The IQuantityValidator interface enables you to introduce a Composite:
type CompositeQuantityValidator(validators : IQuantityValidator list) =
interface IQuantityValidator with
member this.Validate value =
validators
|> List.iter (fun validator -> validator.Validate value)
This Composite simply iterates through other IQuantityValidator instances and invokes their Validate method. This enables you to compose arbitrarily complex validator graphs.
One leaf validator could be:
type IntegerValidator() =
interface IQuantityValidator with
member this.Validate value =
if value % 1m <> 0m
then
raise(
ArgumentOutOfRangeException(
"value",
"Quantity must be an integer."))
Another one could be:
type GreaterThanValidator(boundary) =
interface IQuantityValidator with
member this.Validate value =
if value <= boundary
then
raise(
ArgumentOutOfRangeException(
"value",
"Quantity must be greater than zero."))
Notice that the GreaterThanValidator class takes a dependency via its constructor. In this case, boundary is just a decimal, so it's a Primitive Dependency, but it could just as well have been a polymorphic dependency (A.K.A a Service).
You can now compose your own validator from these building blocks:
let myValidator =
CompositeQuantityValidator([IntegerValidator(); GreaterThanValidator(0m)])
When you invoke myValidator with e.g. 9m or 42m, it returns without errors, but if you invoke it with e.g. 9.8m, 0m or -1m it throws the appropriate exception.
If you want to build something a bit more complicated than a decimal, you can introduce a Factory, and compose the Factory with the appropriate validator.
Since Quantity is very simple here, we can just define it as a type alias on decimal:
type Quantity = decimal
A Factory might look like this:
type QuantityFactory(validator : IQuantityValidator) =
member this.Create value : Quantity =
validator.Validate value
value
You can now compose a QuantityFactory instance with your validator of choice:
let factory = QuantityFactory(myValidator)
which will let you supply decimal values as input, and get (validated) Quantity values as output.
These calls succeed:
let x = factory.Create 9m
let y = factory.Create 42m
while these throw appropriate exceptions:
let a = factory.Create 9.8m
let b = factory.Create 0m
let c = factory.Create -1m
Now, all of this is very complex given the simple nature of the example domain, but as the problem domain grows more complex, complex is better than complicated.
Avoid value types with dependencies on non-value types. Also avoid constructors that perform validations and throw exceptions. In your example I'd have a factory type that validates and creates quantities.
Your scenario can also be applied to entities. There are cases where an entity requires some dependency in order to perform some behaviour. As far as I can tell the most popular mechanism to use is double-dispatch.
I'll use C# for my examples.
In your case you could have something like this:
public void Validate(IQuantityValidator validator)
As other answers have noted a value object is typically simple enough to perform its invariant checking in the constructor. An e-mail value object would be a good example as an e-mail has a very specific structure.
Something a bit more complex could be an OrderLine where we need to determine, totally hypothetical, whether it is, say, taxable:
public bool IsTaxable(ITaxableService service)
In the article you reference I would assert that the 'newable' relates quite a bit to the 'transient' type of life cycle that we find in DI containers as we are interested in specific instances. However, when we need to inject specific values the transient business does not really help. This is the case for entities where each is a new instance but has very different state. A repository would hydrate the object but it could just as well use a factory.
The 'true' dependencies typically have a 'singleton' life-cycle.
So for the 'newable' instances a factory could be used if you would like to perform validation upon construction by having the factory call the relevant validation method on your value object using the injected validator dependency as Mark Seemann has mentioned.
This gives you the freedom to still test in isolation without coupling to a specific implementation in your constructor.
Just a slightly different angle on what has already been answered. Hope it helps :)

Is there a commonly used OO Pattern for holding "constant variables"?

I am working on a little pinball-game project for a hobby and am looking for a pattern to encapsulate constant variables.
I have a model, within which there are values which will be constant over the life of that model e.g. maximum speed/maximum gravity etc. Throughout the GUI and other areas these values are required in order to correctly validate input. Currently they are included either as references to a public static final, or just plain hard-coded. I'd like to encapsulate these "constant variables" in an object which can be injected into the model, and retrieved by the view/controller.
To clarify, the value of the "constant variables" may not necessarily be defined at compile-time, they could come from reading in a file; user input etc. What is known at compile time is which ones are needed. A way which may be easier to explain it is that whatever this encapsulation is, the values it provides are immutable.
I'm looking for a way to achieve this which:
has compile time type-safety (i.e. not mapping a string to variable at runtime)
avoids anything static (including enums, which can't be extended)
I know I could define an interface which has the methods such as:
public int getMaximumSpeed();
public int getMaximumGravity();
... and inject an instance of that into the model, and make it accessible in some way. However, this results in a lot of boilerplate code, which is pretty tedious to write/test etc (I am doing this for funsies :-)).
I am looking for a better way to do this, preferably something which has the benefits of being part of a shared vocabulary, as with design patterns.
Is there a better way to do this?
P.S. I've thought some more about this, and the best trade-off I could find would be to have something like:
public class Variables {
enum Variable {
MaxSpeed(100),
MaxGravity(10)
Variable(Object variableValue) {
// assign value to field, provide getter etc.
}
}
public Object getVariable(Variable v) { // look up enum and get member }
} // end of MyVariables
I could then do something like:
Model m = new Model(new Variables());
Advantages: the lookup of a variable is protected by having to be a member of the enum in order to compile, variables can be added with little extra code
Disadvantages: enums cannot be extended, brittleness (a recompile is needed to add a variable), variable values would have to be cast from Object (to Integer in this example), which again isn't type safe, though generics may be an option for that... somehow
Are you looking for the Singleton or, a variant, the Monostate? If not, how does that pattern fail your needs?
Of course, here's the mandatory disclaimer that Anything Global Is Evil.
UPDATE: I did some looking, because I've been having similar debates/issues. I stumbled across a list of "alternatives" to classic global/scope solutions. Thought I'd share.
Thanks for all the time spent by you guys trying to decipher what is a pretty weird question.
I think, in terms of design patterns, the closest that comes to what I'm describing is the factory pattern, where I have a factory of pseudo-constants. Technically it's not creating an instance each call, but rather always providing the same instance (in the sense of a Guice provider). But I can create several factories, which each can provide different psuedo-constants, and inject each into a different model, so the model's UI can validate input a lot more flexibly.
If anyone's interested I've came to the conclusion that an interface providing a method for each psuedo-constant is the way to go:
public interface IVariableProvider {
public int maxGravity();
public int maxSpeed();
// and everything else...
}
public class VariableProvider {
private final int maxGravity, maxSpeed...;
public VariableProvider(int maxGravity, int maxSpeed) {
// assign final fields
}
}
Then I can do:
Model firstModel = new Model(new VariableProvider(2, 10));
Model secondModel = new Model(new VariableProvider(10, 100));
I think as long as the interface doesn't provide a prohibitively large number of variable getters, it wins over some parameterised lookup (which will either be vulnerable at run-time, or will prohibit extension/polymorphism).
P.S. I realise some have been questioning what my problem is with static final values. I made the statement (with tongue in cheek) to a colleague that anything static is an inherently not object-oriented. So in my hobby I used that as the basis for a thought exercise where I try to remove anything static from the project (next I'll be trying to remove all 'if' statements ;-D). If I was on a deadline and I was satisfied public static final values wouldn't hamstring testing, I would have used them pretty quickly.
If you're just using java/IOC, why not just dependency-inject the values?
e.g. Spring inject the values via a map, specify the object as a singleton -
<property name="values">
<map>
<entry> <key><value>a1</value></key><value>b1</value></entry>
<entry> <key><value>a2</value></key><value>b3</value></entry>
</map>
</property>
your class is a singleton that holds an immutable copy of the map set in spring -
private Map<String, String> m;
public String getValue(String s)
{
return m.containsKey(s)?m.get(s):null;
}
public void setValues(Map m)
{
this.m=Collections.unmodifiableMap(m):
}
From what I can tell, you probably don't need to implement a pattern here -- you just need access to a set of constants, and it seems to me that's handled pretty well through the use of a publicly accessible static interface to them. Unless I'm missing something. :)
If you simply want to "objectify" the constants though, for some reason, than the Singleton pattern would probably be called for, if any; I know you mentioned in a comment that you don't mind creating multiple instances of this wrapper object, but in response I'd ask, then why even introduce the sort of confusion that could arise from having multiple instances at all? What practical benefit are you looking for that'd be satisfied with having the data in object form?
Now, if the values aren't constants, then that's different -- in that case, you probably do want a Singleton or Monostate. But if they really are constants, just wrap a set of enums or static constants in a class and be done! Keep-it-simple is as good a "pattern" as any.