VoyageMongo: is it OK to override #= in persistent classes? - smalltalk

I'm having an issue with VoyageMongo. I'm getting duplicated objects when editing them (i.e. altering and saving already persisted objects), specifically with those that override #= and #hash.
Here's the (simplified) case: I have the UserAccount class, with instance vars email, salt (for password encryption) and name. These are the #= and #hash methods:
= anObject
(self isKindOf: anObject class)
ifFalse: [ ^ false ].
^ self email = anObject email and: [ self salt = anObject salt ]
hash
^ (self salt hash + self email hash) hash
email and salt are set at creation and never change. Now, here's a little script:
UserAccount removeAll.
20 timesRepeat: [ UserAccount new save ].
10 timesRepeat: [ UserAccount selectAll atRandom
name: 'Joe Doe';
save ].
UserAccount selectAll size = 20
This generates 20 UserAccounts (#new creates an instance with random email and salt in this case), then picks 10 at random and edits their names. The final size of UserAccount selectAll should stay at 20, but it is usually larger, meaning it's storing duplicates.
Possible culprit: debugging into VOCache, the WeakKeyDictionary holding the cached objects (in reversedObjects var, where objects themselves are the keys) sometimes fails to "hit" existing object, since the #scanFor: starts looking at different points (more specifically, #startIndexFor:) as the keys array grows larger. When this happens, I can see the object inside VOCache's reversedObjects but VOCache>>keyAtValue: fails to find it.
Long story short:
is it that I shouldn't override #= in persistent objects? Or...
is it that my #hash is not well implemented?
Or, of course, any other issue I'm not seeing :)
Thanks a lot!
PS: tested this in Pharo 6.1 and 7 with latest VoyageMongo.

As a general rule you shouldn't override the #= and #hash in entity objects as these need to be based on identity vs on value.
If 2 objects match on the values of their parameters then this does not necessarily mean they represent the same entity; if you really need to override the #= then you will need a business key.
Best to simply not override and just use entity objects based on identity when you pull them from the DB. i.e. as if you're working with an OO DB.
Perhaps this is a Voyage bug as the reversedObjects variable should be a WeakIdentityKeyDictionary?

Related

Kotlin modifying dataclass object key from map changes the reference after modifying variable

I have a MutableMap that its keys are objects from a DataClass (User dataclass), and the values are arrays from other Dataclass (Dog dataclass). If i have a variable with a User object, and i put it in the MutableMap and i test if the map contains the User, it says that is true. But after putting the user in the MutableMap if i change one of the attributes of the User object using the variable that holds the User object, the Map says that it doesnt contains the user object.
This is an example
data class User(
var name: String,
var project: String,
)
data class Dog(
var kind: String
)
fun main(args: Array<String>) {
var mapUserDogs: MutableMap<User, MutableList<Dog>> = mutableMapOf()
var userSelected = User("name2", "P2")
mapUserDogs.put(
User("name1", "P1"),
mutableListOf(Dog("R1"), Dog("R2"))
)
mapUserDogs.put(
userSelected,
mutableListOf(Dog("R21"), Dog("R31"))
)
println(userSelected)
println(mapUserDogs.keys.toString())
println(mapUserDogs.contains(userSelected))
println(mapUserDogs.values.toString())
println("\n")
userSelected.name = "Name3"
println(userSelected)
println(mapUserDogs.keys.toString())
println(mapUserDogs.contains(userSelected))
println(mapUserDogs.values.toString())
}
The prints statements show this:
User(name=name2, project=P2)
[User(name=name1, project=P1), User(name=name2, project=P2)]
true
[[Dog(kind=R1), Dog(kind=R2)], [Dog(kind=R21), Dog(kind=R31)]]
User(name=Name3, project=P2)
[User(name=name1, project=P1), User(name=Name3, project=P2)]
false
[[Dog(kind=R1), Dog(kind=R2)], [Dog(kind=R21), Dog(kind=R31)]]
Process finished with exit code 0
But it doesn't make sense. Why the map says that it doesn't contains the user object if its clear that it still holds the reference to it after being modified?
User(name=Name3, project=P2)
[User(name=name1, project=P1), User(name=Name3, project=P2)]
The user in the keys collection was also changed when i modified the userSelected variable, so now the object has it attribute name as "Name3" in both the variable and in the Map keys, but it still says that it doesnt contains it.
What can i do so that i can change the attributes in the userSelected object and the Map still return true when using the "contains" method?. And doing the same process in reverse shows the same. If i get from the map the user and i modify it, the userVariable is also modified but if i later test if the map contains the userVariable, it says false.
What can i do so that i can change the attributes in the userSelected object and the Map still return true when using the "contains" method?
There is nothing you can do that preserves both your ability to look up the entry in the map and your ability to modify the key.
Make your data class immutable (val instead of var, etc.), and when you need to change a mapping, remove the old key and put in the new key. That's really the only useful thing you can do.
To add to Louis Wasserman's correct answer:
This is simply the way that maps work in Kotlin: their contract requires that keys don't change significantly once stored. The docs for java.util.Map* spell this out:
Note: great care must be exercised if mutable objects are used as map keys. The behavior of a map is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is a key in the map.
The safest approach is to use only immutable objects as keys. (Note that not just the object itself, but any objects it references, and so on, must all be immutable for it to be completely safe.)
You can get away with mutable keys as long as, once the key is stored in the map, you're careful never to change anything that would affect the results of calling equals() on it. (This may be appropriate if the object needs some initial set-up that can't all be done in its constructor, or to avoid having both mutable and immutable variants of a class.) But it's not easy to guarantee, and leaves potential problems for future maintenance, so full immutability is preferable.
The effects of mutating keys can be obvious or subtle. As OP noticed, mappings may appear to vanish, and maybe later reappear. But depending on the exact map implementation, it may cause further problems such as errors when fetching/adding/removing unrelated mappings, memory leaks, or even infinite loops. (“The behaviour… is not specified” means that anything can happen!)
What can i do so that i can change the attributes in the userSelected object and the Map still return true when using the "contains" method?
What you're trying to do there is to change the mapping. If you store a map from key K1 to value V, and you mutate the key to hold K2, then you're effectively saying “K1 no longer maps to V; instead, K2 now maps to V.”
So the correct way to do that is to remove the old mapping, and then add the new one. If the key is immutable, that's what you have to do — but even if the key is mutable, you must remove the old mapping before changing it, and then add a new mapping after changing it, so that it never changes while it's stored in the map.
(* The Kotlin library docs don't address this, unfortunately — IMHO this is one of many areas in which they're lacking, as compared to the exemplary Java docs…)
That happens because data classes in Kotlin are compared by value, unlike regular classes which are compared by reference. When you use a data class as a key, the map gets searched for a User with the same string values for the name and project fields, not for the object itself in memory.
For example:
data class User(
var name: String,
var project: String,
)
val user1 = User("Daniel", "Something Cool")
val user2 = User("Daniel", "Something Cool")
println(user1 == user2) // true
works because, even though they are different objects (and thus different references), they have the same name and project values.
However, if I were to do this:
user1.name = "Christian"
println(user1 == user2) // false
the answer would be false because they don't share the same value for all of their fields.
If I made User a standard class:
class User(
var name: String,
var project: String,
)
val user1 = User("Daniel", "Something Cool")
val user2 = User("Daniel", "Something Cool")
println(user1 == user2) // false
the answer would be false because they are different references, even though they share the same values.
For your code to work the way you want, make User a regular class instead of a data class.
That's the key difference between regular classes and data classes: a class is passed by reference, a data class is passed by value. Data classes are nothing more than collections of values with (optionally) some methods attached to them, classes are individual objects.

Understanding UnsafeRawPointer in objc_setAssociatedObject

I'd like to set n number of associated objects to an object in Swift
My understanding is the usual pattern for the UnsafeRawPointer reference is like this...
static var reference = "someRef"
public func add(to myObject: AnyObject) {
let adding = ThingToAdd()
objc_setAssociatedObject(myObject, &reference, adding, .OBJC_ASSOCIATION_RETAIN_NONATOMIC)
}
but this wouldn't work for n number of calls to add(to:)
Similarly, the below code works fine if I only call it once per myObject.. However, if I add another associated object in the same way... it replaces the first ThingToAdd with the second. Building a unique string inline does not work. On the simulator it's fine but on a device it's not.
public func add(to myObject: AnyObject) {
let adding = ThingToAdd()
var reference = "objectref\(UUID().uuidString)".replacingOccurrences(of: "-", with: "")
objc_setAssociatedObject(myObject, reference, adding, .OBJC_ASSOCIATION_RETAIN_NONATOMIC)
}
I understand that this is incorrect, however I don't know how to create n number of UnsafeRawPointers in order to store the associated objects with unique references.
Any help / explanation would be greatly appreciated.
Passing reference to the key: UnsafeRawPointer argument of objc_setAssociatedObject passes the address of the variable storage to that function.
Global variables and static member variables (and only those) are guaranteed to have a fixed address.
In your first version, the second and all subsequent calls to objc_setAssociatedObject with the same key replace a previous association.
In your second version, the address of a local variable is passed as key. That address may or may not be the same on subsequent calls. Note that the contents of the string is irrelevant, the key is just a pointer, i.e. the address of that variable.
If you need to associate a variable number of values with a given object then I would suggest to associate a single NSMutableArray (or NSMutableDictionary) and add the values to that array or dictionary.

Value object in event sourcing

Is there a place for value objects in an event sourced domain model?
Lets define a value object as an object with immutable state that guards its invariants and has no particular identifier.
An event sourced domain model in this context is a domain that is entirely or partially event sourced, meaning that its current state can be derived from applying all events that have occurred in the past. Events themselves are considered immutable, even over time.
Debate has taken place about the validity of using value objects within events - this question goes slightly further: Do value objects have a place in event sourced domains at all?
The (potential) problem with using value objects is that it becomes rather tricky to alter the domain in such a way that invariants are tightened.
An example of this scenario would be to have a Username value object, with the sole constraint that the name must be anywhere between 2 and 16 characters.
While this has been working well for some time, the business decides to only allow usernames of at least 5 characters.
A migration period begins and users with names of less than 5 characters are asked to update their names.
Lets say the process was successful, correction events are applied and everyone is happy.
We tighten the constraints on our Username value object to require at least 5 characters.
For a while everyone is happy, but then we discover a problem with the snapshots and replay all events.
We now face an exception from our Username object: by loading the historic data, we're breaking an invariant of our domain.
The rules of a value objects apply retroactively - does this make them inherently unsuitable for event sourcing? Would it be worth applying versioning of value objects? Is there a simpler way of avoiding such problems?
I would say, that at the moment you redefined what Username means, and you don't migrate historical data somehow, you've essentially created 2 different Username meanings.
Because there are 2 different meanings of the word, you have to make it explicit in the code somehow. "Versioning" is one way, although I wouldn't use such a generic solution, there are different modeling options.
You could make it explicit that the history of a "username" is just that, a history. So for example create a HistoricUsername, which is the event-sourced object, even a value object if you want. And create a Username which is at all times the username with the most current rules, which is not persisted at all, but created from a HistoricUsername if it can.
Some people suggest sometimes to extract the "rules" from the object, and re-apply it later. That way the object itself is valid at all times and you can ask it to validate itself against rules that might change. I don't really prefer these kinds of solutions, but it's an option, and the Username would still be a value-object.
So the problem is not really that value-objects don't fit into event-sourcing, it's just that the modeling has to be more accurate.
Do value objects have a place in event sourced domains at all?
Yes.
Is there a simpler way of avoiding such problems?
"Don't do that."
The problem you are describing is really one about messaging - if we make backwards incompatible changes to our messages, then things break.
(More precisely, you have a "Username" message, and you are trying to re-use that message with a new set of constraints that reject some previously valid uses of the message).
The answer is that you don't introduce backwards incompatible changes - instead, introduce new names that match the new requirements, and deprecated the old ones.
Which is to say, adding support for new messages, and removing support for the old messages, become two separately managed options.
Greg Young's book Versioning in an Event Sourced System dedicates some chapters to this idea. Also, Rich Hickey ends up touching on these important ideas in most of his talks -- I'd suggest starting from Spec-ulation.
The "value object", meaning that the type that the current implementation of the domain model uses to move the information around, is a separate concern from the messages. The data structures we use in memory don't need to be coupled to our serialization formats.
The representation of the information on the wire is distinct from the representation of information in memory, and that in turn is distinct from the abstractions that manipulate the information in memory.
The challenging thing is that, at the beginning of a project, you have the least amount of information about when the different representations are going to diverge.
We've solved this in a slightly different way. By separating the public API of our value objects from the internal (domain only) API, we are able to evolve one without affecting the other.
For example:
public class Username
{
private readonly string value;
// Domain-only (internal) constructor.
// Does not enforce constriants and can only be called within the domain.
internal Username(string value)
{
this.value = value;
}
// Public factory method.
// Enforces business constraints. Used by consumers of the domain (application layer etc.)
// to create new instances of the value object.
public static Username Create(string value)
{
// Business constraints. These will evolve and grow over time.
if (value == null)
{
// throw exception etc.
}
if (value.Length < 2)
{
// throw exception etc.
}
return new Username(value);
}
}
Consumers of the domain must use the static Create method to create a new instance of the value object. This factory method contains all of our business constraints and prevents an instance being created in an invalid state.
Inside the domain, classes have access to the internal (constraint-less) constructor. Since this does not enforce any business constraints, an instance of the value object can always be created in this way (regardless of its value). By using this constructor when replaying events we can ensure that historical data will always succeed.
The benefits of this design are:
A single class is used to represent the domain concept (no need for multiple classes, versioning etc.).
Business rules are free to evolve over time.
Historical data always works. A Username from a year ago is still a user name, even if our rules have changed.
Although already answered I do find this an interesting situation.
I agree with others that the event data should be record-based and, therefore, nothing more than a data container that may be used to reconstitute the aggregate.
That being said when the rules change so does the domain. A major portion of domain-driven design is to capture as much of the domain (rules/structure) as is required. If this is the case should the changes in the rules not also be kept?
For instance, if we have a Username Value Object and it starts out with the 2 to 16 characters rules then that is coded as such:
public class Username
{
public string Value { get; }
public Username(string value)
{
if (value.Length < 2 || value.Length > 16)
{
throw new DomainException("Username must be between 2 and 16 characters");
}
Value = value;
}
}
Now we get to 1 March 2018 and the rule changes. We can keep the rule around:
public class Username
{
public string Value { get; }
public Username(string value, DateTime registrationDate)
{
if (registrationDate < new Date(2018, 3, 1) &&
(value.Length < 2 || value.Length > 16))
{
throw new DomainException("Username must be between 2 and 16 characters");
}
if (registrationDate >= new Date(2018, 3, 1) &&
(value.Length < 5 || value.Length > 16))
{
throw new DomainException("Username must be between 5 and 16 characters");
}
Value = value;
}
}
That is the basic idea. In this way we keep our "old" rules around as well. This may become quite a hassle but I don't have enough experience to say. Changing our rules retroactively may introduce some pretty tricky situation so I guess one would need to evaluate this on a case-by-case basis.
Just a thought.

Smalltallk - How can I get an Array (or Collection) of the all the Instance variables in an Object (the current Instance) of a Class?

Let's say we have a Class and we instantiate it, creating an Instance of that Class. This Instance has a number of (instance)variables, defined by the class, that I need to use. I'd like to get all these (instance)variables in an Array or some Collection so I can iterate through them and set them to some value, not nil.
How can I do this?
I would like to build up on #Uko's answer because there is a more direct way to implement his idea.
The message instSize sent to a Class will answer the number of named instance variables of its instances. This, of course, would include instance variables defined in superclasses.
For instance, RemoteTempVectorNode instSize answers with 17 (wow!). Therefore you could do:
fields := (1 to: anObject class instSize) collect: [:i | anObject instVarAt: i]
and then, change them with:
values withIndexDo: [:v :i | anObject instVarAt: i put: v]
where values is the array of new values you want to inject into the object.
So, why I'm suggesting this instead of instVarNamed:? Because the latter is indirect. If you take a look at its implementation you will see that it has to first find out the name of the i-th ivar by sending instVarIndexFor:ifAbsent: to the object's class. In other words, if you need the ivar names, follow #Uko's suggestion; otherwise don't bring them into the equation because they will only add CPU cycles to your program.
One more thing. As #Sean DeNegris wisely raised in his comment to your question, it would be beneficial if you elaborated a little bit more on why you need such an unusual maneuver.
EDIT:
Now that Pharo has Flexible Object Layouts the mapping between inst var names and the class instSize is no longer valid (in classes that use the new capability.) So, the simpler approach of using just indexes would not work with generality. In fact, under the new "taxonomy" the instSize (number of fields) of an object may be different from the #numberOfInstanceVariables. I guess that the added flexibility has its costs and benefits.
You can send #allInstVarNames to a class (Behavior) to get names of all instance variables defined by it and by superclasses. If you need without superclass variables, you can use #instVarNames
Let's say that var is your variable that you need to work with. Then you can get the collection of instance variable names and iterate them.
You can use #instVarNamed:put: to set instance variable by name, and #instVarNamed: to get the value by name (in case you need).
I think that something like this may help you:
var class allInstVarNames do: [ :instVarName |
var instVarNamed: instVarName put: <yourValue>

class design for for a dynamic GetHash function

I have the requirement to calculate an internal hashcode of instances of several types (some types are derived from each other). Two aspects are dynamic here and can vary independently. Only the client that requests the hash knows what hash algorithm is to be used and what properties are to be included.
The actual algorithm that is used
for the hash caluctation can change.
What members for each type should be take into account for the hash calculation can change.
How would you design your types around these requirement?
First, require all "hashable" objects to implement the same interface, with one method: getHash().
Second, introduce an Abstract Factory instantiating hash Strategies.
Example (Java 1.6):
public class Foo implements Hashable {
private String field;
#Override
public String getHash() {
HashFactory.INSTANCE.getMD5Strategy().hash(this.field);
}
}
Proper encapsulation of decision making inside your objects, and an effective decoupling of strategies and objects.
Here are some guidelines for GetHashCode() from Eric Lippert.
http://ericlippert.com/2011/02/28/guidelines-and-rules-for-gethashcode/
An excerpt:
Rule: the integer returned by GetHashCode must never change while the object is contained in a data structure that depends on the hash code remaining stable
It is permissible, though dangerous, to make an object whose hash code value can mutate as the fields of the object mutate. If you have such an object and you put it in a hash table then the code which mutates the object and the code which maintains the hash table are required to have some agreed-upon protocol that ensures that the object is not mutated while it is in the hash table. What that protocol looks like is up to you.
In short, if the algorithm for generating the hash code can change, then you need to make sure that it doesn't change while the object is in your collection.