Identical objects in a list produce different hashes and fail comparison tests - vb.net

I have a weird issue. I want to implement an extension to List with a function to merge another list into it excluding the duplicate values:
<Extension()>
Public Sub AddUnique(Of T)(ByVal self As IList(Of T), ByVal items As IEnumerable(Of T))
For Each item In items
If Not self.Contains(item) Then self.Add(item)
Next
End Sub
Now, I have a class that I'll be creating objects from, and adding them to a list:
Class DocInfo
Public Property title As String
Public Property fullPath As String
Sub New(title As String, fullPath As String)
Me.title = title
Me.fullPath = fullPath
End Sub
End Class
Then, I have a list as a global variable:
Public docsInfo As New List(Of DocInfo)
And then I have a button handler that adds new items to that list:
Private Sub AddToList_Button_Click(sender As Object, e As RoutedEventArgs)
Dim candidateItems As New List(Of DocInfo)
For Each doc In selectedDocs
candidateItems.Add(New DocInfo(doc.GetTitle(), doc.GetPathName()))
Next
docsInfo.AddUnique(candidateItems)
End Sub
(The doc and selectedDocs variables are outside of the scope of this question.)
Now, the important bit - GetTitle() and GetPathName() return the same strings on every button click (I have the same docs selected between clicks). Meaning that DocInfo objects that are added to the candidateItems, and then added to docsInfo, are identical. Nevertheless, the extension function AddUnique fails, resulting in duplicates in the list.
Puzzled, I ran GetHashCode() on these duplicate DocsInfo class objects:
For Each docInfo In docsInfo
Console.WriteLine(docInfo.title)
Console.WriteLine(docInfo.fullPath)
Console.WriteLine(docInfo.GetHashCode())
Next
And this is the output:
Assem1^Test assembly.SLDASM
C:\Users\Justinas\AppData\Local\Temp\swx5396\VC~~\Test assembly\Assem1^Test assembly.SLDASM
7759225
Assem1^Test assembly.SLDASM
C:\Users\Justinas\AppData\Local\Temp\swx5396\VC~~\Test assembly\Assem1^Test assembly.SLDASM
14797678
With each button click, I am getting identical DocsInfo objects (title and fullPath properties have the same values), yet their hashes are different every time, and every comparison I can think of, fails to acknowledge that these objects are for all intents and purposes idendical.
Why is this happening? And how can I fix the AddUnique extension function to work as intended?

This behavior is because of the difference in .NET between "Reference" types and "Value" types. The fundamental philosophy of these is that for "Reference" types, object identity takes precedence over contents (that is, two different object instances with the same contents are still considered distinct), while for "Value" types, the contents are the only thing that matters.
In VB, Class denotes a reference type while Structure denotes a value type. Their respective behaviors are what you would expect, then: by default, Equals on a Class is equivalent to ReferenceEquals, checking to see if the references are the same, and GetHashCode returns a value based on the object identity. Equals on a Structure does member-wise value equality, and GetHashCode returns a value based on the hash codes of the members.
There are a couple of different options for overriding the default behavior, with differing impacts and levels of intrusiveness.
You can change Class to Structure. If you do so, I would strongly recommend to eliminate any mutable behavior on them (i.e. make all fields and properties ReadOnly), because mutable Structures can be extremely hard to reason about correctly. If you really do have immutable data, though, this is the easiest to maintain because .NET will already do what you want, you don't have to maintain your own Equals or GetHashCode override.
You can override GetHashCode and Equals on your Class to act like the Structure versions. This won't change anything else about your class, but it will make it act like a value type for the purposes of containers and sequences. If you're worried about maintenance, an alternative would be to do something reflection-based, though this shouldn't be used for anything that will be high-throughput because reflection is generally not particularly performant.
I believe the hashing and ordering containers take optional constructor parameters that will let you provide a class for overriding the behavior of the contents without altering the Class itself. You could do something like this. I'd recommend to look at the MSDN docs for HashSet.

Related

Do I understand not using getters and setters correctly

After reading this piece by Yegor about not using getters and setters, it sounds like something that makes sense to me.
Please note this question is not about whether doing it is better/worst, only if I am implementing it correctly
I was wondering in the following two examples in VBA, if I understand the concept correctly, and if I am applying it correctly.
The standard way would be:
Private userName As String
Public Property Get Name() As String
Name = userName
End Property
Public Property Let Name(rData As String)
userName = rData
End Property
It looks to me his way would be something like this:
Private userName As String
Public Function returnName() As String
returnName = userName
End Function
Public Function giveNewName(newName As String) As String
userName = newName
End Function
From what I understand from the two examples above is that if I wanted to change the format of userName (lets say return it in all-caps), then I can do this with the second method without changing the name of the method that gives the name through - I can just let returnName point to a userNameCaps property. The rest of my code in my program can still stay the same and point to the method userName.
But if I want to do this with the first example, I can make a new property, but then have to change my code everywhere in the program as well to point to the new property... is that correct?
In other words, in the first example the API gets info from a property, and in the second example the API gets info from a method.
Your 2nd snippet is neither idiomatic nor equivalent. That article you link to, is about Java, a language which has no concept whatsoever of object properties - getFoo/setFoo is a mere convention in Java.
In VBA this:
Private userName As String
Public Property Get Name() As String
Name = userName
End Property
Public Property Let Name(rData As String)
userName = rData
End Property
Is ultimately equivalent to this:
Public UserName As String
Not convinced? Add such a public field to a class module, say, Class1. Then add a new class module and add this:
Implements Class1
The compiler will force you to implement a Property Get and a Property Let member, so that the Class1 interface contract can be fulfilled.
So why bother with properties then? Properties are a tool, to help with encapsulation.
Option Explicit
Private Type TSomething
Foo As Long
End Type
Private this As TSomething
Public Property Get Foo() As Long
Foo = this.Foo
End Property
Public Property Let Foo(ByVal value As Long)
If value <= 0 Then Err.Raise 5
this.Foo = value
End Property
Now if you try to assign Foo with a negative value, you'll get a runtime error: the property is encapsulating an internal state that only the class knows and is able to mutate: calling code doesn't see or know about the encapsulated value - all it knows is that Foo is a read/write property. The validation logic in the "setter" ensures the object is in a consistent state at all times.
If you want to break down a property into methods, then you need a Function for the getter, and assignment would be a Sub not a Function. In fact, Rubberduck would tell you that there's a problem with the return value of giveNewName being never assigned: that's a much worse code smell than "OMG you're using properties!".
Functions return a value. Subs/methods do something - in the case of an object/class, that something might imply mutating internal state.
But by avoiding Property Let just because some Java guy said getters & setters are evil, you're just making your VBA API more cluttered than it needs to be - because VBA understands properties, and Java does not. C# and VB.NET do however, so if anything the principles of these languages would be much more readily applicable to VBA than Java's, at least with regards to properties. See Property vs Method.
FWIW public member names in VB would be PascalCase by convention. camelCase public member names are a Java thing. Notice how everything in the standard libraries starts with a Capital first letter?
It seems to me that you've just given the property accessors new names. They are functionally identical.
I think the idea of not using getters/setters implies that you don't try to externally modify an object's state - because if you do, the object is not much more than a user-defined type, a simple collection of data. Objects/Classes should be defined by their behavior. The data they contain should only be there to enable/support that behavior.
That means you don't tell the object how it has to be or what data you want it to hold. You tell it what you want it to do or what is happening to it. The object itself then decides how to modify its state.
To me it seems your example class is a little too simple to work as an example. It's not clear what the intended purpose is: Currently you'd probably better off just using a variable UserName instead.
Have a look at this answer to a related question - I think it provides a good example.
Regarding your edit:
From what I understand from the two examples above is that if I wanted
to change the format of userName (lets say return it in all-caps),
then I can do this with the second method without changing the name of
the method that gives the name through - I can just let returnName
point to a userNameCaps property. The rest of my code in my program
can still stay the same and point to the method iserName.
But if I want to do this with the first example, I can make a new
property, but then have to change my code everywhere in the program as
well to point to the new property... is that correct?
Actually, what you're describing here, is possible in both approaches. You can have a property
Public Property Get Name() As String
' possibly more code here...
Name = UCase(UserName)
End Property
or an equivalent function
Public Function Name() As String
' possibly more code here...
Name = UCase(UserName)
End Function
As long as you only change the property/function body, no external code needs to be adapted. Keep the property's/function's signature (the first line, including the Public statement, its name, its type and the order and type of its parameters) unchanged and you should not need to change anything outside the class to accommodate.
The Java article is making some sort of philosophic design stance that is not limited to Java: The general advise is to severely limit any details on how a class is implemented to avoid making one's code harder to maintain. Putting such advice into VBA terms isn't irrelevant.
Microsoft popularized the idea of a Property that is in fact a method (or two) which masquerade as a field (i.e. any garden-variety variable). It is a neat-and-tidy way to package up a getter and setter together. Beyond that, really, behind the scenes it's still just a set of functions or subroutines that perform as accessors for your class.
Understand that VBA does not do classes, but it does do interfaces. That's what a "Class Module" is: An interface to an (anonymous) class. When you say Dim o As New MyClassModule, VBA calls some factory function which returns an instance of the class that goes with MyClassModule. From that point, o references the interface (which in turn is wired into the instance). As #Mathieu Guindon has demonstrated, Public UserName As String inside a class module really becomes a Property behind the scenes anyway. Why? Because a Class Module is an interface, and an interface is a set of (pointers to) functions and subroutines.
As for the philosophic design stance, the really big idea here is not to make too many promises. If UserName is a String, it must always remain a String. Furthermore, it must always be available - you cannot remove it from future versions of your class! UserName might not be the best example here (afterall, why wouldn't a String cover all needs? for what reason might UserName become superfluous?). But it does happen that what seemed like a good idea at the time the class was being made turns into a big goof. Imagine a Public TwiddlePuff As Integer (or instead getTwiddlePuff() As Integer and setTwiddlePuff(value As Integer)) only to find out (much later on!) that Integer isn't sufficient anymore, maybe it should have been Long. Or maybe a Double. If you try to change TwiddlePuff now, anything compiled back when it was Integer will likely break. So maybe people making new code will be fine, and maybe it's mostly the folks who still need to use some of the old code who are now stuck with a problem.
And what if TwiddlePuff turned out to be a really big design mistake, that it should not have been there in the first place? Well, removing it brings its own set of headaches. If TwiddlePuff was used at all elsewhere, that means some folks may have a big refactoring job on their hands. And that might not be the worst of it - if your code compiles to native binaries especially, that makes for a really big mess, since an interface is about a set of function pointers layed out and ordered in a very specific way.
Too reiterate, do not make too many promises. Think through on what you will share with others. Properties-getters-setters-accessors are okay, but must be used thoughtfully and sparingly. All of that above is important if what you are making is code that you are going to share with others, and others will take it and use it as part of a larger system of code, and it may be that these others intend to share their larger systems of code with yet even more people who will use that in their even larger systems of code.
That right there is probably why hiding implementation details to the greatest extent possible is regarded as fundamental to object oriented programming.

Understanding Array.ConvertAll, can I DirectCast?

I have a base class, DtaRow, that has an internal array of Strings containing data. I have dozens of subclasses of DtaRow, like UnitRow and AccountRow, who's only purpose is to provide Properties to retrieve the values, so you can do aUnit.Name instead of aUnit.pFields(3).
I also have a DtaTable object that contains a Friend pRows As New Dictionary(Of Integer, DtaRow). I don't generally insert DtaRows into the DtaTable, I insert the subclasses like UnitRows and AccountRows. Any given table has only one type in it.
Over in the main part of the app I have an accessor:
Public Readonly Property Units() As IEnumerable
Get
Return Tables(5).pRows.Values 'oh oh oh oh table 5, table 5...
End Get
End Property
This, obviously, returns a list of DtaRows, not UnitRows, which means I can't do MyDB.Units(5).Name, which is the ultimate goal.
The obvious solution is to Dim ret As New UnitRow() and DirectCast everything into it, but then I'm building thousands of new arrays all the time. Uggg. Alternately I could put DirectCast everywhere I pull out a value, also uggg.
I see there is a method called Array.ConvertAll that looks like it might be what I want. But maybe that just does the loop for me and doesn't really save anything? And if this is what I want, I don't really understand how to use DirectCast in it.
Hopefully I'm just missing some other bit of API that does what I want, but failing that, what's the best solution here? I suspect I need...
to make a widening conversion in each DtaRow subclass?
or something in DtaTable that does the same?
You can use ConvertAll to convert an array into a different type.
Dim arr(2) As A
Dim arr2() As B
arr(0) = New B
arr(1) = New B
arr(2) = New B
arr2 = Array.ConvertAll(arr, Function(o) DirectCast(o, B))
Class A
End Class
Class B
Inherits A
End Class
In your case, I think it would look like this
Return Array.ConvertAll(Tables(5).pRows.Values, Function(o) DirectCast(o, UnitRow))
Note that this will create a new array each time.
You can cast the objects into a list(Of String) based on the field you want.
Return Tables(5).pRows.Values.Cast(Of DtaRow).Select(Function(r) r.name).ToList
YES! I went non-linear. This only works because of OOP...
My ultimate goal was to return objects from the collection as a particular type, because I knew I put that type in there in the first place. Sure, I could get the value out of the collection and CType it, but that's fugly - although in C# I would have been perfectly happy because the syntax is nicer.
So wait... the method that retrieves the row from the collection is in the collection class, not the various subclasses of DtaRow. So here is what I did...
Public ReadOnly Property Units() As IEnumerable
Get
Return Tables(dbTblUnits).pRow.Values
End Get
End Property
Public ReadOnly Property Units(ByVal K as Integer) As UnitRow
Get
Return DirectCast(Tables(dbTblUnits)(K), UnitRow)
End Get
End Property
Public ReadOnly Property Units(ByVal K as String) As UnitRow
Get
Return DirectCast(Tables(dbTblUnits).Rows(K), UnitRow)
End Get
End Property
Why does this solve the problem? Well normally if one does...
Dim U as UnitRow = MyDB.Units(K)
It would call the first method (which is all I had originally) which would return the .Values from the Dictionary, and then the Default Property would be called to return .Item(K). But because of the way the method dispatcher works, if I provide a more specific version that more closely matches the parameters, it will call that. So I provide overrides that are peers to the subclasses that do the cast.
Now this isn't perfect, because if I just call Units to get the entire list, when I pull rows out of it I'll still have to cast them. But people expect that, so this is perfectly acceptable in this case. Better yet, when I open this DLL in VBA, only the first of these methods is visible, which returns the entire collection, which means that Units(k) will call the Default Property on the DtaTable, returning a DtaRow, but that's fine in VBA.
OOP to the rescue!

Get item from list as a new copy (not as a reference to the original)

How can I get an item from a list as a new copy/instance, so I can use and change it later without changing the original object in the list?
Public Class entry
Public val As String
' ... other fields ...
End Class
Dim MyList As New List(Of entry)
Dim newitem As New entry
newitem.val = "first"
MyList.Add(newitem)
Now if I try to get an item from this list and change it to something else, it changes the original item in the list as well (it is used as a reference not as a new instance).
Dim newitem2 As New entry
newitem2 = MyList.Item(0)
newitem2.val = "something else"
So now the MyList.item(0).val contains "something else", yet I wanted only the newitem2 to contain that new value for the given field and retain other values from the object in the list.
Is there a way to do this without reassigning all fields one by one?
If entry is defined as a reference type (Class), then your only option is to explicitly create a new instance that has the same values as the originals. For example:
Public Partial Class Entry
Public Function Clone() As Entry
Return New Entry() With { .val = Me.val, … }
End Function
End Class
(The .NET Framework Class Library defined a type ICloneable from early on for exactly this purpose. The type never really caught on for certain reasons.)
Be aware that you might have to do this recursively, that is, if your class contains fields that are of a reference type, you'll have to clone the objects stored in these fields as well.
Then, instead of doing this:
Dim newitem2 As New entry ' (Btw.: `New` is superfluous here, since you are
newitem2 = MyList.Item(0) ' going to throw away the created instance here.)
Do this:
Dim newitem2 As Entry = MyList.Item(0).Clone()
One alternative is to use value types (i.e. declare your item type as Structure). Value types are automatically copied when passed around. However, there are lots of caveats to observe, among them:
Do not do this if your type contains many fields. (Why? Many fields usually means that the type will occupy more bytes in memory, which makes frequent copying quite expensive if the objects get too large.)
Value types should be immutable types. (Why? See e.g. Why are mutable structs evil?)
These are just two guidelines. You can find more infornation about this topic here:
When should I use a struct instead of a class?
Choosing Between Class and Struct

Are there properties that differ in parameters and return type?

I have a class call CalcArray that has an array of doubles called Amounts(), and two ints, StartPeriod and EndPeriod.
The user almost always wants to interact with the items in the array, not the Periods or the object itself. So ideally, I'd like:
property AnAmount() as CalcArray 'So the user can talk to the object if they need to
property AnAmount(i as Integer) as Double 'So the user can just get the value directly
This seems to work sometimes and not others. Is this simply a syntax issue? or is such an overload not possible?
You can do this with a function returning a different based on how it is called. Especially since you have a param, a function might be more appropriate:
Public Function AnAmount(Of T)(parm As SomeType) As T
to use it:
Dim n as Decimal
n = AnAmount(Of Decimal)(foo)
Its very useful as a way to avoid returning an object and then have to use CType to convert the return. In this case, an amount implies a value type, but the function would accept Point, Rectangle etc as T, so you might need to check valid type requests.
You may be bumping into the limitation that a function or property cannot vary by only the return type. In general if the signature has changed, the output type can change also on an overload. Look out also for the limitation for using default properties requires an argument. In some cases class inheritance is the issue, properties and functions being shadowed may explicitly be required to nominate Shadows, Overloads, Overrides etc. or the shadowing will be disallowed by the language.
If these don't cover the cases you've seen, try to catch an example of the problem and study all locations of the same named property in your solution, reporting the results here.

VB Classes Best Practice - give all properties values?

Sorry if this is a bit random, but is it good practice to give all fields of a class a value when the class is instanciated? I'm just wondering if its better practice to have a constuctor that takes no parameters and gives all the fields default values, or whether fields that have values should be assigned and others left alone until required?
I hope that makes sense,
Becky
I don't know if it makes a performance difference or not, but any fields for which you have explicit default values I personally prefer to assign them in the declarations, as so:
Public Class MyClass
Private pIsDirty As Boolean = False
Private pDated as Date = Now()
End Class
Keep in mind most "simple" types like boolean, integer, etc. auto-default and don't NEED to be initialized, but I show that here as example and sometimes for clarity you want it anyway. Additionally since any classes I write are all for internal use (we don't sell any code objects for public use) I can be assured to the consumer of my classes. So I generally just write a minimal constructor (if a non-default one is needed) that only takes the primary fields, and spin up any additional values with the new With syntax in VB as so:
Dim myObj = New SomeClass() With { .Prop1 = "value", .Prop2 = Now() }
Your class' constructor should accept enough parameters to be in an usable state.
You can get the same functionality you seem to be looking for by using Optional Parameters in your constructor.
That way you can set by name just the properties that you have to, and leave the rest with default values until you need to change them.
Sub Notify(ByVal Company As String, Optional ByVal Office As String = "QJZ")
If Office = "QJZ" Then
Debug.WriteLine("Office not supplied -- notifying Headquarters")
Office = "Headquarters"
End If
' Code to notify headquarters or specified office.
End Sub
Remember that optional parameters must be after all non optional parameters.
Ideal practice is to have an object in a usable state as soon as the constructor returns. This reduces errors whereby a partially 'ready' object is inadvertently used.
It depends. How do you intend to use the class? What is the purpose of the class? (i.e. is it an entity class for database modeling or some other class)
I always make my entity classes with nullable properties that are all null when the class is instanced with a constructor that takes no parameter. Then when I call .Load I know all properties reflect the database.
Adding a constructor that calls .Load and assigns all of the properties known values from the database would be a feasible route also.
If you are not referring to an entity modeling class then it really depends on your usage of the class.
My personal preference is to assign all properties a known value (from constructor parameters) and therefore the class is in a known - neutral state.