Removing Duplicate from 2 datatables - vb.net

I have 2 Datatables with column X. I want to delete the entire row if column x value is in both datatables. What is the best way to do this? This is what I tried but it doesn't work not sure if this is the best way anyway.
Private Function SplitData(ByVal dtSome As DataTable, ByVal dtAll As DataTable) As DataTable
' This Routine Creates the Plant DataDictionary
Dim SomelIndex As Integer = 0
Do While SomelIndex < dtSome.Rows.Count
Dim AlllIndex As Integer = 0
Do While AlllIndex < dtAll.Rows.Count
If dtAll.Rows(AlllIndex).Item("x").ToString = dtSome.Rows(SomelIndex).Item("x").ToString Then
'I have the below to removes because it doesn't appear to actually remove the rows even if it gets here
' dtAll.Rows.RemoveAt(AlllIndex)
dtAll.Rows.Remove(dtAll.Rows(AlllIndex))
Exit Do
Else
AlllIndex += 1
End If
Loop
SomelIndex += 1
Loop
Return dtAll
End Function

You could use LINQ to find the common rows and then remove them.
Private Function SplitData(ByVal dtSome As DataTable, ByVal dtAll As DataTable) As DataTable
' This Routine Creates the Plant DataDictionary
Dim common = (
From r1 In dtAll.AsEnumerable()
Join r2 In dtSome.AsEnumerable() On r1("x") Equals r2("x")
Select r1
).ToList()
For Each r In common
dtAll.Rows.Remove(r)
Next
Return dtAll
End Function
I don't know if this is the "best" way, but to me it makes it easier to see what is going on than the nested loops.
Note that DataTable.AsEnumerable requires a reference to System.Data.DataSetExtensions.dll.

Just before you return dtAll, add this line:
dtAll.AcceptChanges()
It will commit the changes you made since "the last time AcceptChanges was called"...or in your particular case, since you invoked the function.

Not sure what my problem was but this is a bit neater of a solution. Instead of using 2 do while loop I am using a for each and a Do While Loop. Prob not the most efficient way but the below definitely works and deletes the data
Public Shared Function SplitDataTables(ByVal dtSome As DataTable, ByVal dtAll As DataTable) As DataTable
For Each drSome As DataRow In dtSome.Rows
Dim intIndex As Integer = 0
Do While intIndex < dtAll.Rows.Count
If drSome.Item("X").ToString = dtAll.Rows(intIndex).Item("X").ToString Then
dtAll.Rows.Remove(dtAll.Rows(intIndex))
Exit Do
Else
intIndex += 1
End If
Loop
Next
Return dtAll
End Function

Related

Compare two datatables, if anything is different show MessageBox

I have two datatables, one of them is populated when application starts and the other one is populated on button click. How can i check (fastest way) if anything changed in second datatable?
I have tried this but it does not work:
For Each row1 As DataRow In dtt.Rows
For Each row2 As DataRow In dtt1.Rows
Dim array1 = row1.ItemArray
Dim array2 = row2.ItemArray
If array1.SequenceEqual(array2) Then
Else
End If
Next
Next
The problem is that your loops are nested. This means that the inner For Each loops through each row of dtt1 for each single row of dtt. This is not what you want. You want to loop the two tables in parallel. You can do so by using the enumerators that the For Each statements use internally
Dim tablesAreDifferent As Boolean = False
If dtt.Rows.Count = dtt1.Rows.Count Then
Dim enumerator1 = dtt.Rows.GetEnumerator()
Dim enumerator2 = dtt1.Rows.GetEnumerator()
Do While enumerator1.MoveNext() AndAlso enumerator2.MoveNext()
Dim array1 = enumerator1.Current.ItemArray
Dim array2 = enumerator2.Current.ItemArray
If Not array1.SequenceEqual(array2) Then
tablesAreDifferent = True
Exit Do
End If
Loop
Else
tablesAreDifferent = True
End If
If tablesAreDifferent Then
'Display message
Else
'...
End If
The enumerators work like this: They have an internal cursor that is initially placed before the first row. Before accessing a row through the Current property, you must move to it with the MoveNext function. This function returns the Boolean True if it succeeds, i.e. as long as there are rows available.
Since now we have a single loop statement and advance the cursors of enumerator1 and enumerator2 at each loop, we can compare corresponding rows.
Note that the Rows collection implements IEnumerable and thus the enumerators returned by GetEnumerator are not strongly typed. I.e. Current is typed as Object. If instead you write
Dim enumerator1 = dtt.Rows.Cast(Of DataRow).GetEnumerator()
Dim enumerator2 = dtt1.Rows.Cast(Of DataRow).GetEnumerator()
Then you get enumerators of type IEnumerator(Of DataRow) returning strongly typed DataRows.

Simpler way to check if a parameter is an Array of a specific type?

Currently I have the method below in a class. When it is called, the class only stores the data if it is an array of length 3 with specific constraints on the first two elements.
However, the following code seems clunky and inefficient to me, especially if I'm dealing with larger arrays. However, without using Try blocks I haven't been able to find a better way to do this, and would like to ask here if it is possible.
Overrides Sub output(ByVal data As Object)
Dim array() As Object = TryCast(data, Object())
If Not array Is Nothing AndAlso array.Length = 3 Then
For Each element In array
If Not TypeOf (element) Is Integer Then Return
Next
If Not (array(0) = -1 OrElse array(1) = -1) Then
memory.Add(array)
End If
End If
End Sub
First off - I would suggest that instead of using Return, use Exit Sub as that is more obvious and readable.
Based on your code, I'm assuming that the parameter that is passed to the sub could be something other than an array and if it is an array, it could be an array of mixed objects rather than all the elements being integers or singles. If all the elements in the array are always going to be the same type, then rather than check all elements of the array, you can just check that the first element is an integer type.. (this isn't the same as checking if the value itself is an integer of course. you can still have a Single type with a value of 1)
For example .. replace the above loop with simply
If Not TypeOf (array(0)) Is Integer Then Exit Sub
If the array IS of mixed objects, you could try to speed things up by running that check using a Parallel.ForEach loop instead .. like this
Parallel.ForEach(array,
Sub(element)
If Not TypeOf (element) Is Integer Then
Exit Sub
End If
End Sub)
But - the processing overheads of multi-threading the tiny amount of code in the loop will likely cause a performance decrease. If the code in your actual loop is longer, you may get a benefit.
Yet another way is to use the Array.TrueForAll function. Replace your loop with
If Not System.Array.TrueForAll(array, AddressOf IsIntegerType) Then
Exit Sub
End If
and add a function to return true if the object is an integer ..
Private Function IsIntegerType(value As Object) As Boolean
If TypeOf (value) Is Integer Then
Return True
Else
Return False
End If
End Function
You would have to benchmark these to figure out which is quickest in your own code of course. And check memory usage if that could potentially be a problem.
Maybe this for output Sub. Don't know if it is an improvement.
If data Is Nothing Then Exit Sub
Dim t As Type = data.GetType
Dim a() As Object
If t.IsArray Then
a = DirectCast(data, Object())
If Not (a IsNot Nothing AndAlso a.Length = 3 AndAlso TypeOf a(0) Is Integer) Then
Exit Sub
End If
'other code
If Not (CInt(a(0)) = -1 OrElse CInt(a(1)) = -1) Then
memory.Add(foo)
End If
End If

Generate a property name in a loop

I'm trying to find a way of loading a single record with 25 columns into a datatable.
I could list all 25 variables called SPOT1 to SPOT25 (columns in the datatable) but I'm looking for more concise method like using a loop or dictionary.
The code below shows two methods, a 'long' method which is cumbersome, and a 'concise' method which I'm trying to get help with.
Public dctMC As Dictionary(Of String, VariantType)
Dim newMC As New MONTE_CARLO()
'long method: this will work but is cumbersome
newMC.SPOT1=999
newMC.SPOT2=887
...
newMC.SPOT25=5
'concise method: can it be done more concisely, like in a loop for example?
Dim k As String
For x = 1 To 25
k = "SPOT" & CStr(x)
newMC.K = dctMC(k) 'convert newMC.k to newMC.SPOT1 etc
Next
'load record
DATA.MONTE_CARLOs.InsertOnSubmit(newMC)
Per the others, I think there are better solutions, but it is possible...
Public Class MONTE_CARLO
Private mintSpot(-1) As Integer
Property Spot(index As Integer) As Integer
Get
If index > mintSpot.GetUpperBound(0) Then
ReDim Preserve mintSpot(index)
End If
Return mintSpot(index)
End Get
Set(value As Integer)
If index > mintSpot.GetUpperBound(0) Then
ReDim Preserve mintSpot(index)
End If
mintSpot(index) = value
End Set
End Property
End Class
Usage...
Dim newMC As New MONTE_CARLO
For i As Integer = 0 To 100
newMC.Spot(i) = i
Next i
MsgBox(newMC.Spot(20))

How to merge two list to have a distinct list without duplicate values in vb.net

I have this problem in vb.net. Lets say I got 2 Lists ListA and ListB both holds objects of same type.
Eg., one of the property of the object is ID. (ID is written in brackets)
ListA ListB
---------------------------
A(3818) A(3818)
B(3819) B(3819)
C(3820) C(3820)
D(3821) D(3821)
E(3823) F(0)
H(3824) G(0)
I(3825)
How do I merge these two Lists to have a new distinct list which holds objects only once whose ID matches and all other objects(whose ID dont match) are simply added to the new list.
Sample output be,
New List
--------
A(3818)
B(3819)
C(3820)
D(3821)
E(3823)
F(0)
G(0)
H(3824)
I(3825)
When I searched I found that AddRange() and Union are some of the methods to do the merge. But i am not able to find if this works for non standard objects(apart from Integer, String)
Use addRange() and then linq with distinct to filter out the duplicates.
Dim b = YourCollection.Distinct().ToList()
Could use a collection bucket
Dim oCol As New Collection
AddTitems(oCol, oListA)
AddTitems(oCol, olistB)
Public Function AddTitems(oSummaryList As Collection, oList As List(Of thing)) As Collection
For Each oThing As thing In oList
If Not oSummaryList.Contains(CStr(oThing.ID)) Then oSummaryList.Add(oList, CStr(oThing.ID))
Next
Return oSummaryList
End Function
Here are a couple simple functions that should do that for you. I'm not sure how efficient they are though. I don't think there is anything built in.
Private Function nameOfFunction(list1 as list(of type), list2 as list(of type)) as list(of type)
Dim result as new list(of type)
for a as integer = 0 to math.max(list1.count, list2.count) - 1 step 1
If a < list1.count AndAlso resultHasID(result, list1(a).ID) = False Then
result.add(list1(a))
end if
If a < list2.count AndAlso resultHasID(result, list2(a).ID) = False Then
result.add(list2(a))
end if
next
End Function
Private Function resultHasID(testList as list(of type), s as string) as boolean
Dim result as Boolean = False
for a as integer = 0 to testlist.count - 1 step 1
if(testlist(a).ID = s) then
result = true
exit for
End if
Next
Return result
End function
For each item as String in ListA
If Not ListB.Contains(item) Then
ListB.Add(item)
End If
Next

vb.net function branching based on optional parameters performance

So I was coding a string search function and ended up with 4 since they needed to go forwards or backwards or be inclusive or exclusive. Then I needed even more functionality like ignoring certain specific things and blah blah.. I figured it would be easier to make a slightly bigger function with optional boolean parameters than to maintain the 8+ functions that would otherwise be required.
Since this is the main workhorse function though, performance is important so I devised a simple test to get a sense of how much I would lose from doing this. The code is as follows:
main window:
Private Sub testbutton_Click(sender As Object, e As RoutedEventArgs) Handles testbutton.Click
Dim rand As New Random
Dim ret As Integer
Dim count As Integer = 100000000
Dim t As Integer = Environment.TickCount
For i = 0 To count
ret = superfunction(rand.Next, False)
Next
t = Environment.TickCount - t
Dim t2 As Integer = Environment.TickCount
For i = 0 To count
ret = simplefunctionNeg(rand.Next)
Next
t2 = Environment.TickCount - t2
MsgBox(t & " " & t2)
End Sub
The functions:
Public Module testoptionality
Public Function superfunction(a As Integer, Optional b As Boolean = False) As Integer
If b Then
Return a
Else
Return -a
End If
End Function
Public Function simpleFunctionPos(a As Integer)
Return a
End Function
Public Function simplefunctionNeg(a As Integer)
Return -a
End Function
End Module
So pretty much as simple as it gets. The weird part is that the superfunction is consistently twice faster than either of the simple functions (my test results are "1076 2122"). This makes no sense.. I tried looking for what i might have done wrong but I cant see it. Can anybody explain this?
You didn't set a return type for simple function. So they return Object type.
So when you using simpleFunctionNeg function application convert Integer to Object type when returning value, and then back from Object to Integer when assigning returning value to your variable
After setting return value to Integer simpleFunctionNeg was little bid faster then superfunction