Get duplicated row in two datatable using lambda expression - vb.net

I have 2 datatable with identical structure and some rows with duplicate keys. So I want to use lambda expression to get all the records in table 1 that have its keys duplicates with any row's keys in table 2. I tried to use this: assume that item1 and item2 is keys
Dim result as IEnumerable(Of DataRow) = table1.Asenumerable.Where(function(t1) _
table2.AsEnumerable().Any(function(t2) t1("item1") = t2("item1") _
andalso t1("item2") = t2("item2")))
But this code snippet always give me no result (result.count = 0) despite the fact that there's duplicate between 2 tables
P/S: sorry for my bad English

I would try following:
' prepare HashSet from keys from table2 '
Dim table2Keys = new HashSet(Of Tuple(Of String, String))
set.UnionWith(table2.AsEnumerable().Select(Function(x) Tuple.Create(x("item1"), x("item2"))
' search table1 for duplicates '
Dim result = table1.Where(Function(x) table2Keys.Contains(Tuple.Create(x("item1", y("item2")))
It should have better performance then using Any, because HashSet lookup can be done in O(1).

Related

ADO.NET: How to bulk copy rows in a DataTable into another

A legacy application is copying 100K records in one DataTable into another like so:
For index = 0 To dataTable.Rows.Count() - 1
Dim column1 = CType(dataTable.Rows(index).Item("column1"), Integer)
Dim column2 = CType(dataTable.Rows(index).Item("column2"), Integer)
Dim column3 = CType(dataTable.Rows(index).Item("column3"), Integer)
Dim dataRow = ds.Tables("MyTable").NewRow
dataRow("column1") = column1
dataRow("column2") = column2
dataRow("column3") = column3
ds.Tables("MyTable").Rows.Add(dataRow)
Next
This seems to be very slow as we need to iterate 100K times and add a new row. Are there any .NET APIs to bulk copy rows to decrease the time it takes to copy everything? The source DataTable has a lot more columns whereas the destination is a subset. We could refactor the code to only use the source, but this is a complex app and it will require regression testing since both source and destination tables are global variables and used in many places.
How about it ? "dataTable.Copy()"
https://learn.microsoft.com/en-gb/dotnet/api/system.data.datatable.copy?view=netframework-4.8
// Create an object variable for the copy.
DataTable copyDataTable;
copyDataTable = table.Copy();
I'm not sure by how much but I would expect this to be faster:
Dim newTable = oldTable.DefaultView.ToTable(False,
{"column1",
"column2",
"column3"})

How do I query a local datatable and return information to a datatable in VB.net

I am trying to pass a query and existing datatable into a function. The function will query the passed datatable using the passed query and return the result.
Unfortunately, I am unable to return any data. I have posted my code below. Can anyone help me fix it? I don't know what I am doing wrong.
Public Function ExecQueryTest(Query As String, DT As DataTable) As DataTable
Dim Result() As DataRow
'initialize the table to have the same number of columns of the table that is passed into the function
Dim LocalTable As DataTable = DT
'initialize counting variables
Dim x, y As Integer
'use the select command to run a query and store the results in an array
Result = DT.Select(Query)
'remove all items from the localtable after initial formatting
For x = 0 To LocalTable.Rows.Count - 1
LocalTable.Rows.RemoveAt(0)
Next
'for loop to iterate for the amount of rows stored in result
For x = 0 To Result.GetUpperBound(0)
'add each array row into the table
LocalTable.Rows.Add(Result(x))
Next
ExecQueryTest = LocalTable
End Function
If there is a better way to accomplish my goal, I don't mind starting from scratch. I just want to be able to handle dynamic tables, queries, and be able to return the information in a datatable format.
The problem is here:
Dim LocalTable As DataTable = DT
That code does not do what you think it does. DataTable is a reference type, which means assigning DT to the LocalTable variable only assigns a reference to the same object. No new table is created, and nothing is copied. Therefore, this later code also clears out the original table:
'remove all items from the localtable after initial formatting
For x = 0 To LocalTable.Rows.Count - 1
LocalTable.Rows.RemoveAt(0)
Next
Try this instead:
Public Function ExecQueryTest(Query As String, DT As DataTable) As DataTable
ExecQueryTest = New DataTable() 'create new DataTable object to hold results
For Each row As DataRow In DT.Select(Query)
ExecQueryTest.LoadDataRow(row.ItemArray, True)
Next
End Function
Though you may also need to clone each DataRow record.
You can clear a table with just
LocalTable.Clear()
instead of using that cycle, Also the results of your select can be directly converted to datatable using
LocalTable = Result.CopyToDataTable

Using LINQ to find updated rows in DataTable

I'm building an application in VB.NET where I am pushing data from one database to another. The source database is SQL Server and the target is MySQL.
What I am doing is first creating DataTables for each table in each database which I use to do a comparison. I've written the queries in such a way so that the source and target DataTables contain exactly the same columns and values to make the comparison easier.
This side of the application works fine. What I do next is find rows which do not exist in the target database by finding PKs which do not exist. I then insert these new rows into the target database with no problem.
The Problem
What I now need to do is find rows in each table that have been updated, i.e. are not identical to the corresponding rows in the target DataTable. I have tried using Except() as per the example below:
Public Function GetUpdates(ByVal DSDataSet As MSSQLQuery, ByVal AADataSet As MySQLQuery, Optional ByVal PK As String = Nothing) As List(Of DataRow)
' Determines records to be updated in the AADB and returns list of new Rows
' Param DSDataSet - MSSQLQuery Object for source table
' Param AADataSet - MySQLQuery Object for destination table
' Optional Param PK - String of name common columns to treat as PK
' Returns List(Of DataRow) containing rows to update in table
Dim orig = DSDataSet.GetDataset()
Dim origTable = orig.Tables(0).AsEnumerable()
Dim destination = AADataSet.GetDataset()
Dim destinationTable = destination.Tables(0).AsEnumerable()
' Get Records which are not in destination table
Dim ChangedRows = Nothing
If IsNothing(PK) Then
ChangedRows = destinationTable.AsEnumerable().Except(origTable.AsEnumerable(), DataRowComparer.Default)
End If
Dim List As New List(Of DataRow)
For Each addRow In ChangedRows
List.Add(addRow)
Next
Return List
End Function
The trouble is that it ends up simply returning the entire set of source rows.
How can I check for these changed rows? I could always hardcode queries to return what I want but this introduces problems because I need to make comparisons for 15 tables so it would be a complete mess.
Ideally I need a solution where it will take into account the variable number columns from the source tables for comparison against what is essentially an identical target table and simply compare the DataRows for equality.
There should be a corresponding row in the target tables for every source row since the addition of new rows is performed prior to this check for updated rows.
I am also open to using methods other than LINQ to achieve this.
Solution
In the end I implemented a custom comparer to use in the query as shown below. It first checks if the first column value matches (PK in my case) where if it does then it we check column-wise that everything matches.
Any discrepancy will set the flag value to FALSE which we return. If there aren't any issues then TRUE will be returned. In this case I used = to compare equality between values rather than Equals() since I'm not concerned about a strict equality.
The resulting set of DataRows is used to UPDATE the database using the first column value (PK) in the WHERE clause.
Imports System.Data
Class MyDataRowComparer
Inherits EqualityComparer(Of DataRow)
Public Overloads Overrides Function Equals(x As DataRow, y As DataRow) As Boolean
If x.Item(0).ToString().Equals(y.Item(0).ToString()) Then
' If PK matches then check column-wise.
Dim Flag As Boolean = True
For Counter As Integer = 0 To x.ItemArray.Count - 1
If Not x.Item(Counter) = y.Item(Counter) Then
Flag = False
End If
Next
Return Flag
Else
' Otherwise don't bother and just skip.
Return False
End If
End Function
...
End Class
class MyDataRowComparer : IEqualityComparer<DataRow>
{
public bool Equals(DataRow x, DataRow y)
{
return x["ColumnName"].Equals(y["ColumnName"]);
// Can add more columns to the Comparison
}
public int GetHashCode(DataRow obj)
{
return obj["ColumnName"].GetHashCode();
// Can add more columns to calculate HashCode
}
}
Now the Except statement will be like:
ChangedRows = destinationTable.AsEnumerable()
.Except(origTable.AsEnumerable(), MyDataRowComparer)

Simplest/fastest way to check if value exists in DataTable in VB.net?

I have a DataTable (currently with multiple columns but I could just grab one column if it makes it easier). I want to check if a String value exists in a column of the DataTable. (I'm doing it many times so I want it to be reasonably fast.)
What is a good way to do this? Iterating through the DataTable rows each time seems like a bad way. Can I convert the column to a flat List/Array format, and use a built in function? Something like myStrList.Contains("value")?
You can use select to find whether that value exist or not. If so, it returns rows or it will not. Here is some sample code to help you.
Dim foundRow() As DataRow
foundRow = dt.Select("SalesCategory='HP'")
If the data in your DataTable doesn't change very often, and you search the DataTable multiple times, and your DataTable contains many rows, then it's likely going to be a lot faster to build your own index for the data.
The simplest way to do this is to sort the data by the key column so that you can then do a binary search on the sorted list. For instance, you can build an index like this:
Private Function BuildIndex(table As DataTable, keyColumnIndex As Integer) As List(Of String)
Dim index As New List(Of String)(table.Rows.Count)
For Each row As DataRow in table.Rows
index.Add(row(keyColumnIndex))
Next
index.Sort()
Return index
End Function
Then, you can check if a value exists in the index quickly with a binary search, like this:
Private Function ItemExists(index As List(Of String), key As String) As Boolean
Dim index As Integer = index.BinarySearch(key)
If index >= 0 Then
Return True
Else
Return False
End If
End Function
You could also do the same thing with a simple string array. Or, you could use a Dictionary object (which is an implementation of a hash table) to build a hash index of your DataTable, for instance:
Private Function BuildIndex(table As DataTable, keyColumnIndex As Integer) As Dictionary(Of String, DataRow)
Dim index As New Dictionary(Of String, DataRow)(table.Rows.Count)
For Each row As DataRow in table.Rows
index(row(keyColumnIndex)) = row
Next
Return index
End Function
Then, you can get the matching DataRow for a given key, like this:
Dim index As Dictionary(Of String, DataRow) = BuildIndex(myDataTable, myKeyColumnIndex)
Dim row As DataRow = Nothing
If index.TryGetValue(myKey, row) Then
' row was found, can now use row variable to access all the data in that row
Else
' row with that key does not exist
End If
You may also want to look into using either the SortedList or SortedDictionary class. Both of these are implementations of binary trees. It's hard to say which of all of these options is going to be fastest in your particular scenario. It all depends on the type of data, how often the index needs to be re-built, how often you search it, how many rows are in the DataTable, and what you need to do with the found items. The best thing to do would be to try each one in a test case and see which one works best for what you need.
You should use row filter or DataTable.Rows.Find() instead of select (select does not use indexes). Depending on your table structure, specifically if your field in question is indexed (locally), performance of either way should be much faster than looping through all rows. In .NET, a set of fields needs to be a PrimaryKey to become indexed.
If your field is not indexed, I would avoid both select and row filter, because aside from overhead of class complexity, they don't offer compile time check for correctness of your condition. If it's a long one, you may end up spending lots of time debugging it once in a while.
It is always preferable to have your check strictly typed. Having first defined an underlying type, you can also define this helper method, which you can convert to extension method of DataTable class later:
Shared Function CheckValue(myTable As DataTable, columnName As String, searchValue As String) As Boolean
For row As DataRow In myTable.Rows
If row(columnName) = searchValue Then Return True
Next
Return False
End Function
or a more generic version of it:
Shared Function CheckValue(myTable As DataTable, checkFunc As Func(Of DataRow, Boolean)) As Boolean
For Each row As DataRow In myTable.Rows
If checkFunc(row) Then Return True
Next
Return False
End Function
and its usage:
CheckValue(myTable, Function(x) x("myColumn") = "123")
If your row class has MyColumn property of type String, it becomes:
CheckValue(myTable, Function(x) x.myColumn = "123")
One of the benefits of above approach is that you are able to feed calculated fields into your check condition, since myColumn here does not need to match a physical myColumn in the table/database.
bool exists = dt.AsEnumerable().Where(c => c.Field<string>("Author").Equals("your lookup value")).Count() > 0;

Get the BindingSource position based on DataTable row

I have a datatable that contains the rows of a database table. This table has a primary key formed by 2 columns.
The components are assigned this way: datatable -> bindingsource -> datagridview. What I want is to search a specific row (based on the primary key) to select it on the grid. I cant use the bindingsource.Find method because you only can use one column.
I have access to the datatable, so I do manually search on the datatable, but how can I get bindingsource row position based on the datatable row? Or there is another way to solve this?
Im using Visual Studio 2005, VB.NET.
I am attempting to add an answer for this 2-year old question. One way to solve this is by appending this code after the UpdateAll method(of SaveItem_Click):
Me.YourDataSet.Tables("YourTable").Rows(YourBindingSource.Position).Item("YourColumn") = "YourNewValue"
Then call another UpdateAll method.
Well, I end up iterating using bindingsource.List and bindingsource.Item. I didnt know but these properties contains the data of the datatable applying the filter and sorting.
Dim value1 As String = "Juan"
Dim value2 As String = "Perez"
For i As Integer = 0 To bsData.Count - 1
Dim row As DataRowView = bsData.Item(i)
If row("Column1") = value1 AndAlso row("Column2") = value2 Then
bsData.Position = i
Return
End If
Next