Using LINQ to find updated rows in DataTable - vb.net

I'm building an application in VB.NET where I am pushing data from one database to another. The source database is SQL Server and the target is MySQL.
What I am doing is first creating DataTables for each table in each database which I use to do a comparison. I've written the queries in such a way so that the source and target DataTables contain exactly the same columns and values to make the comparison easier.
This side of the application works fine. What I do next is find rows which do not exist in the target database by finding PKs which do not exist. I then insert these new rows into the target database with no problem.
The Problem
What I now need to do is find rows in each table that have been updated, i.e. are not identical to the corresponding rows in the target DataTable. I have tried using Except() as per the example below:
Public Function GetUpdates(ByVal DSDataSet As MSSQLQuery, ByVal AADataSet As MySQLQuery, Optional ByVal PK As String = Nothing) As List(Of DataRow)
' Determines records to be updated in the AADB and returns list of new Rows
' Param DSDataSet - MSSQLQuery Object for source table
' Param AADataSet - MySQLQuery Object for destination table
' Optional Param PK - String of name common columns to treat as PK
' Returns List(Of DataRow) containing rows to update in table
Dim orig = DSDataSet.GetDataset()
Dim origTable = orig.Tables(0).AsEnumerable()
Dim destination = AADataSet.GetDataset()
Dim destinationTable = destination.Tables(0).AsEnumerable()
' Get Records which are not in destination table
Dim ChangedRows = Nothing
If IsNothing(PK) Then
ChangedRows = destinationTable.AsEnumerable().Except(origTable.AsEnumerable(), DataRowComparer.Default)
End If
Dim List As New List(Of DataRow)
For Each addRow In ChangedRows
List.Add(addRow)
Next
Return List
End Function
The trouble is that it ends up simply returning the entire set of source rows.
How can I check for these changed rows? I could always hardcode queries to return what I want but this introduces problems because I need to make comparisons for 15 tables so it would be a complete mess.
Ideally I need a solution where it will take into account the variable number columns from the source tables for comparison against what is essentially an identical target table and simply compare the DataRows for equality.
There should be a corresponding row in the target tables for every source row since the addition of new rows is performed prior to this check for updated rows.
I am also open to using methods other than LINQ to achieve this.
Solution
In the end I implemented a custom comparer to use in the query as shown below. It first checks if the first column value matches (PK in my case) where if it does then it we check column-wise that everything matches.
Any discrepancy will set the flag value to FALSE which we return. If there aren't any issues then TRUE will be returned. In this case I used = to compare equality between values rather than Equals() since I'm not concerned about a strict equality.
The resulting set of DataRows is used to UPDATE the database using the first column value (PK) in the WHERE clause.
Imports System.Data
Class MyDataRowComparer
Inherits EqualityComparer(Of DataRow)
Public Overloads Overrides Function Equals(x As DataRow, y As DataRow) As Boolean
If x.Item(0).ToString().Equals(y.Item(0).ToString()) Then
' If PK matches then check column-wise.
Dim Flag As Boolean = True
For Counter As Integer = 0 To x.ItemArray.Count - 1
If Not x.Item(Counter) = y.Item(Counter) Then
Flag = False
End If
Next
Return Flag
Else
' Otherwise don't bother and just skip.
Return False
End If
End Function
...
End Class

class MyDataRowComparer : IEqualityComparer<DataRow>
{
public bool Equals(DataRow x, DataRow y)
{
return x["ColumnName"].Equals(y["ColumnName"]);
// Can add more columns to the Comparison
}
public int GetHashCode(DataRow obj)
{
return obj["ColumnName"].GetHashCode();
// Can add more columns to calculate HashCode
}
}
Now the Except statement will be like:
ChangedRows = destinationTable.AsEnumerable()
.Except(origTable.AsEnumerable(), MyDataRowComparer)

Related

Dataset with Datatable

I am trying to check if my dictionary contains values in my dataset.datatable and if its quantities in the second column of the dataset are less than or greater than the quantities in my datatable. I tried using the SELECT method but it doesn’t seem to work, I get the error BC30469 reference to non-shared member requires object reference?
I was just trying to do a simple search in the table first to see if I can even do that..... apparently not. Thanks for the help!
Dim row As DataRow = DataSet.DataTable.Select("ColumnName1 = 'value3'")
If Not row Is Nothing Then
searchedValue = row.Item("ColumnName2")
End If
You could get a dictionary to compare with the one you already have like this (assuming your key is a string and the amount an Int32 and that your dataset contains only one table):
Dim myDBDict As Dictionary(Of String, Int32) =
From e In myDataSet.Tables(0).Rows.Cast(Of DataRow).ToDictionary(Of String, Int32)(
Function(e) e.Field(Of String)("MyIDColumn"),
Function(e) e.Field(Of Int32)("myAmountColumn"))

Can I use a method that returns a list of strings in SSRS report code as the headers in a tablix?

I have table that needs to contain 50 columns for each half hour in the day (+2 for daylight savings). So each column will be HH1, HH2, HH3... HH50.
I have written this piece of code in the report properties code section.
Function GetHH() As List(Of String)
Dim headers As List(Of String) = new List(Of String)
For index As Integer = 1 to 50
headers.Add("HH" & index)
Next
return headers
End Function
Is there a way to use the output of this function as the headers of my tablix? Or will I need to add the headers to some sort of dataset in the database and add it from there?
The column group functionality would be well suited for this. As you mentioned, you would need to write a SQL statement to return these values in a dataset. Then you can set your column group to group on these values. This way your table always gets the right number of columns and you don't have to add them manually.

Comparing one dataset to another dataset in vb.net

I have two data set in my code. I need to compare that second data set
with first data set My first data set returns this result below:-
FirstDs:-
MaxUpdatedPrepped MaxUpdatedSent MaxUpdatedStamped
1900-01-01 1900-01-01 1900-01-01
And my second data set returns below:-
SecondDS:-
MaxUpdatedPrepped MaxUpdatedSent MaxUpdatedStamped
1900-01-01 1900-01-01 2014-11-11
I need to compare that both result and return alert like "Not matched" if the both first data set value is not match with second data set value. I tried a lot but i could get only wrong answer
For i As Integer = 0 To DsMaxDates1.Tables(0).Rows.Count - 1
Dim found As Boolean = False
For j As Integer = 0 To ds.Tables(0).Rows.Count - 1
If DsMaxDates1.Tables(0).Rows(i)(0).ToString = ds.Tables(0).Rows(j)(0).ToString Then
found = True
End If
Next
If found = False Then
ASPNET_MsgBox("Another User Working in Same Account. Please Click Reset.")
End If
Next
This above result returns true instead of false.
You should never change the type of your data unless it's absolutely necessary. Treat dates as Date, integers as Integer, strings as String, decimals as Decimal, etc. The ToString method is mostly used when you want to display the data to the user.
With that being said, you're not comparing datasets, you're comparing datatables.
The reason as to why it returns True is because you only compare the first column. You need to compare all the columns. If your table doesn't contain complex data types like byte arrays then the simplest way is to use LINQ combined with Enumerable.SequenceEqual.
The following code assumes that each table contains the same number of rows and columns.
''Uncomment to unleash the one-liner:
'Dim notEqual As Boolean = (From i As Integer In Enumerable.Range(0, DsMaxDates1.Tables(0).Rows.Count) Where (Not DsMaxDates1.Tables(0).Rows(i).ItemArray.SequenceEqual(ds.Tables(0).Rows(i).ItemArray)) Select True).FirstOrDefault()
Dim notEqual As Boolean = (
From i As Integer In Enumerable.Range(0, DsMaxDates1.Tables(0).Rows.Count)
Where (Not DsMaxDates1.Tables(0).Rows(i).ItemArray.SequenceEqual(ds.Tables(0).Rows(i).ItemArray))
Select True
).FirstOrDefault()
If (notEqual) Then
ASPNET_MsgBox("Another User Working in Same Account. Please Click Reset.")
End If
You can expand this even further by creating a reusable extension method:
Public Module Extensions
<System.Runtime.CompilerServices.Extension()>
Public Function SequenceEqual(table1 As DataTable, table2 As DataTable) As Boolean
Return (((((Not table1 Is Nothing) AndAlso (Not table2 Is Nothing))) AndAlso ((table1.Rows.Count = table2.Rows.Count) AndAlso (table1.Columns.Count = table2.Columns.Count))) AndAlso ((table1.Rows.Count = 0) OrElse (Not (From i As Integer In Enumerable.Range(0, table1.Rows.Count) Where (Not table1.Rows(i).ItemArray.SequenceEqual(table2.Rows(i).ItemArray)) Select True).FirstOrDefault())))
End Function
End Module
Then you can simply do as follows:
If (Not DsMaxDates1.Tables(0).SequenceEqual(ds.Tables(0))) Then
ASPNET_MsgBox("Another User Working in Same Account. Please Click Reset.")
End If

Get duplicated row in two datatable using lambda expression

I have 2 datatable with identical structure and some rows with duplicate keys. So I want to use lambda expression to get all the records in table 1 that have its keys duplicates with any row's keys in table 2. I tried to use this: assume that item1 and item2 is keys
Dim result as IEnumerable(Of DataRow) = table1.Asenumerable.Where(function(t1) _
table2.AsEnumerable().Any(function(t2) t1("item1") = t2("item1") _
andalso t1("item2") = t2("item2")))
But this code snippet always give me no result (result.count = 0) despite the fact that there's duplicate between 2 tables
P/S: sorry for my bad English
I would try following:
' prepare HashSet from keys from table2 '
Dim table2Keys = new HashSet(Of Tuple(Of String, String))
set.UnionWith(table2.AsEnumerable().Select(Function(x) Tuple.Create(x("item1"), x("item2"))
' search table1 for duplicates '
Dim result = table1.Where(Function(x) table2Keys.Contains(Tuple.Create(x("item1", y("item2")))
It should have better performance then using Any, because HashSet lookup can be done in O(1).

Simplest/fastest way to check if value exists in DataTable in VB.net?

I have a DataTable (currently with multiple columns but I could just grab one column if it makes it easier). I want to check if a String value exists in a column of the DataTable. (I'm doing it many times so I want it to be reasonably fast.)
What is a good way to do this? Iterating through the DataTable rows each time seems like a bad way. Can I convert the column to a flat List/Array format, and use a built in function? Something like myStrList.Contains("value")?
You can use select to find whether that value exist or not. If so, it returns rows or it will not. Here is some sample code to help you.
Dim foundRow() As DataRow
foundRow = dt.Select("SalesCategory='HP'")
If the data in your DataTable doesn't change very often, and you search the DataTable multiple times, and your DataTable contains many rows, then it's likely going to be a lot faster to build your own index for the data.
The simplest way to do this is to sort the data by the key column so that you can then do a binary search on the sorted list. For instance, you can build an index like this:
Private Function BuildIndex(table As DataTable, keyColumnIndex As Integer) As List(Of String)
Dim index As New List(Of String)(table.Rows.Count)
For Each row As DataRow in table.Rows
index.Add(row(keyColumnIndex))
Next
index.Sort()
Return index
End Function
Then, you can check if a value exists in the index quickly with a binary search, like this:
Private Function ItemExists(index As List(Of String), key As String) As Boolean
Dim index As Integer = index.BinarySearch(key)
If index >= 0 Then
Return True
Else
Return False
End If
End Function
You could also do the same thing with a simple string array. Or, you could use a Dictionary object (which is an implementation of a hash table) to build a hash index of your DataTable, for instance:
Private Function BuildIndex(table As DataTable, keyColumnIndex As Integer) As Dictionary(Of String, DataRow)
Dim index As New Dictionary(Of String, DataRow)(table.Rows.Count)
For Each row As DataRow in table.Rows
index(row(keyColumnIndex)) = row
Next
Return index
End Function
Then, you can get the matching DataRow for a given key, like this:
Dim index As Dictionary(Of String, DataRow) = BuildIndex(myDataTable, myKeyColumnIndex)
Dim row As DataRow = Nothing
If index.TryGetValue(myKey, row) Then
' row was found, can now use row variable to access all the data in that row
Else
' row with that key does not exist
End If
You may also want to look into using either the SortedList or SortedDictionary class. Both of these are implementations of binary trees. It's hard to say which of all of these options is going to be fastest in your particular scenario. It all depends on the type of data, how often the index needs to be re-built, how often you search it, how many rows are in the DataTable, and what you need to do with the found items. The best thing to do would be to try each one in a test case and see which one works best for what you need.
You should use row filter or DataTable.Rows.Find() instead of select (select does not use indexes). Depending on your table structure, specifically if your field in question is indexed (locally), performance of either way should be much faster than looping through all rows. In .NET, a set of fields needs to be a PrimaryKey to become indexed.
If your field is not indexed, I would avoid both select and row filter, because aside from overhead of class complexity, they don't offer compile time check for correctness of your condition. If it's a long one, you may end up spending lots of time debugging it once in a while.
It is always preferable to have your check strictly typed. Having first defined an underlying type, you can also define this helper method, which you can convert to extension method of DataTable class later:
Shared Function CheckValue(myTable As DataTable, columnName As String, searchValue As String) As Boolean
For row As DataRow In myTable.Rows
If row(columnName) = searchValue Then Return True
Next
Return False
End Function
or a more generic version of it:
Shared Function CheckValue(myTable As DataTable, checkFunc As Func(Of DataRow, Boolean)) As Boolean
For Each row As DataRow In myTable.Rows
If checkFunc(row) Then Return True
Next
Return False
End Function
and its usage:
CheckValue(myTable, Function(x) x("myColumn") = "123")
If your row class has MyColumn property of type String, it becomes:
CheckValue(myTable, Function(x) x.myColumn = "123")
One of the benefits of above approach is that you are able to feed calculated fields into your check condition, since myColumn here does not need to match a physical myColumn in the table/database.
bool exists = dt.AsEnumerable().Where(c => c.Field<string>("Author").Equals("your lookup value")).Count() > 0;