VB.Net - Efficient way of de-duplicating data - vb.net

I am dealing with a legacy application which is written in VB.Net 2.0 against a SQL 2000 database.
There is a single table which has ~125,000 rows and 2 pairs of fields with similar data.
i.e. FieldA1, FieldB1, FieldA2, FieldB2
I need to process a combined, distinct list of FieldA, FieldB.
Using SQL I have confirmed that there are ~140,000 distinct rows.
Due to a very restrictive framework in the application I can only retrieve the data as either 2 XML objects, 2 DataTable objects or 2 DataTableReader objects. I am unable to execute custom SQL using the framework.
Due to a very restrictive DB access policy I am unable to add a View or Stored Proc to retrieve as a single list.
What is the most efficient way to combine the 2 XML / DataTable / DataTableReader objects into a single, distinct, IEnumerable object for later processing?

I may have missed something here but could you not combine both DataTables using Merge?
DataTableA.Merge(DataTableB)
You can then use DataTableA.AsEnumerable()
Then see this answer on how to remove duplicates or
You can do this with a DataView as follows: dt.DefaultView.ToTable(True,[Column names])

This is the solution I came up with.
Combine the 2 DataTables using .Merge (thanks to Matt's answer)
Using this as a base I came up with the following code to get distinct rows from the DataTable based on 2 columns:
Private Shared Function GetDistinctRows(sourceTable As DataTable, ParamArray columnNames As String()) As DataTable
Dim dt As New DataTable
Dim sort = String.Empty
For Each columnName As String In columnNames
dt.Columns.Add(columnName, sourceTable.Columns(columnName).DataType)
If sort.Length > 0 Then
sort = sort & ","
End If
sort = sort & columnName
Next
Dim lastValue As DataRow = Nothing
For Each dr As DataRow In sourceTable.Select("", sort)
Dim add As Boolean = False
If IsNothing(lastValue) Then
add = True
Else
For Each columnName As String In columnNames
If Not (lastValue(columnName).Equals(dr(columnName))) Then
add = True
Exit For
End If
Next
End If
If add Then
lastValue = dr
dt.ImportRow(dr)
End If
Next
Return dt
End Function

Related

Copy two dataTables to another table in vb.net

I have two dataTables which I want to union to make one final dataTable.
Both are results of different functions.
I tried this :
dtfinal = dt1.Copy()
dtfinal = dt2.Copy()
But here dt2 data is replaced by dt1. What should be used to get union of both into the final dt.
You can use DataTable.Merge:
Dim allTables() As DataTable = {dt1, dt2}
Dim dtfinal = new DataTable("dtfinal")
dtfinal.BeginLoadData() ' Turns off notifications, index maintenance, and constraints while loading data
For Each t As DataTable in allTables
dtfinal.Merge(t) ' same as table.Merge(t, false, MissingSchemaAction.Add)
Next
dtfinal.EndLoadData()
If you don't have primary keys specified you could end up with repeating rows where you actually want to merge them. Then either specify the PKs or use this method i have provided here(needs conversion from C#):
Combining n DataTables into a Single DataTable

VB.NET Multiple Selects at once using SQL Server CE

I have an array list which contains ids for some items. I would like to perform a multiple select at once from a SQL Server CE database and using my array list which contains what items id to be selected, something similar when doing for example multiple update in oracle (ODP.NET) as explained here: Oracle bulk updates using ODP.NET
where you can pass an array as a parameter.
I would like to do the same but for a multiple select instead in case of SQL Server CE. Is it possible?
DRAFT about what I would like to do:
SqlCeCommand = SqlCeConnection.CreateCommand()
SqlCeCommand.CommandText = "SELECT * FROM MyTable WHERE Id=:ids"
SqlCeCommand.CommandType = CommandType.Text
SqlCeCommand.Parameters.Add(":ids", DbType.Int32, ArrayListOfIds, ParameterDirection.Input)
Using reader As System.Data.SqlServerCe.SqlCeDataReader = SqlCeCommand.ExecuteReader()
Using targetDb As Oracle.DataAccess.Client.OracleBulkCopy = New Oracle.DataAccess.Client.OracleBulkCopy(con.ConnectionString)
targetDb.DestinationTableName = "MyTable"
targetDb.BatchSize = 100
targetDb.NotifyAfter = 100
targetDb.BulkCopyOptions = Oracle.DataAccess.Client.OracleBulkCopyOptions.UseInternalTransaction
AddHandler targetDb.OracleRowsCopied, AddressOf OnOracleRowsCopied targetDb.WriteToServer(reader)
targetDb.Close()
End Using
reader.Close()
End Using
You should try this approach by constructing your "IN" clause and adding each parameter in a for each loop:
SqlCeCommand = SqlCeConnection.CreateCommand()
SqlCeCommand.CommandType = CommandType.Text
Dim sb As New StringBuilder()
Dim i As Integer = 1
For Each id As Integer In ArrayListOfIds
' IN clause
sb.Append("#Id" & i.ToString() & ",")
' parameter
SqlCeCommand.Parameters.Add("#Id" & i.ToString(), DbType.Int32, id, ParameterDirection.Input)
i += 1
Next
If you're calling a Stored Procedure, you can do this:
Serialize the array to a string of XML, like this: https://stackoverflow.com/a/6937351/734914
Call the stored procedure, passing in the string parameter
Parse the string of XML into a local table variable containing the ID's, like this: https://stackoverflow.com/a/8046830/734914
Execute whatever queries you need to using the ID's
The links that I referenced might not be the best examples on the web, but the concept of "serialize to XML, pass string parameter, deserialize XML" should work here

How to store selected data in a list in vb.net?

I have filtered data in a myRawData table where the resulting query will be inserted in myImportedData table.
The situation is that I am going to have some formatting in the filtered data before I will insert it into myImportedData.
My question is how to store the filtered data in a list? Because that is the easiest way for me to reiterate over the filtered data.
So far here is my code, It only store 1 data in the list.
Public Sub ImportData()
Dim con2 As MySqlConnection = New MySqlConnection("Data Source=server;Database=dataRecord;User ID=root;")
con2.Open()
Dim sql As MySqlCommand = New MySqlCommand("SELECT dataRec FROM myRawData WHERE dataRec LIKE '%20130517%' ", con2)
Dim dataSet As DataSet = New DataSet()
Dim dataAdapter As New MySqlDataAdapter()
dataAdapter.SelectCommand = sql
dataAdapter.Fill(dataSet, "dataRec")
Dim datTable As DataTable = dataSet.Tables("dataRec")
listOfCanteenSwipe.Add(Convert.ToString(sql.ExecuteScalar()))
'ListBox1.Items.Add(listOfCanteenSwipe(0))
End Sub
Example of data in the myRawData table is this:
myRawData Table
--------------------------
' id ' dataRec
--------------------------
' 1 ' K10201305170434010040074A466
' 2 ' K07201305170434010040074UN45
Please help. Thank you.
EDIT:
What i just want to achieve is to store my filtered data in a list. I used list to loop over the filtered data - and I have no problem with that.
After storing in a list, i will now segragate the information in the dataRec field to be imported in the myImportedData table.
To add some knowledge, i will format the dataRec field just like below:
K07 ----> Loc
20130514 ----> date
0455 ----> time
010 ----> temp
18006D9566 ----> id
Try this
dim x as integer
for x = 0 to datTable.rows.count - 1
listOfCanteenSwipe.Add(datTable.rows(x).item("datarec"))
next
After getting the data in Data Table you can convert it to a generic list. Then you can use this list for further operations:
List<MyType> list = dataTable.Rows.OfType<DataRow>()
.Select(dr => dr.Field<MyType>(columnName)).ToList();
Why not change your SQL statement to split the field out for you ?
You shuold use the power of SQL to perform any data manipulation you can perform at the server, It is far quicker, has less overhead on the server and was designed for this very purpose
SELECT
SUBSTRING(dataRec,1,3) as [Loc]
,SUBSTRING(dataRec,4,8) as [date]
,SUBSTRING(dataRec,12,4) as [time]
,SUBSTRING(dataRec,16,3) as [temp]
,SUBSTRING(dataRec,19,10) as [Loc]
FROM myRawData WHERE dataRec LIKE '%20130517%'
then load this directly into your .NET datatable "myImportedData"

VB.NET delete empty datarow

For Each dr In ds.Tables(0).Rows
If String.IsNullOrEmpty(dr("BIL")) Then
dr.Delete() //how to delete this row?
End If
Next
first,will loop all data then check which row in BIL column are empty,if the row in BIL column are empty,then delete the row from dataset,how to delete this empty datarow?
Do you want to delete it in your database or do you want to remove it from the DataTable? Btw, use dr.IsNull("BIL") instead. Your code compiles only because you've set OPTION STRICT off because dr("BIL") returns object instead of string.
Dataset are getting data from EXCEL,so,dont have any identity
column.BTW i just want remove from datatable, not database
Then you have to use DataRowCollection.Remove instead of DataRow.Delete. With Delete wthe row will change it's RowState to Deleted. If you then use a DataAdapter to update the DataSet/DataTable or DataRow it will be deleted in the database.
But you can also use Linq-To-DataSet to filter the table and use DataRow.Field extension method which is strongly typed and supports nullable types:
Dim notNullBilRows = From row In ds.Tables(0)
Where Not String.IsNullOrEmpty(row.Field(Of String)("BIL"))
Now you can use CopyToDataTable to create a new DataTable with only rows where BIL is not null, which seems to be the actual requirement.
Dim tblNotNullBilRows = notNullBilRows.CopyToDataTable()
Here's the non-Linq approach with Remove, you have to create an intermediate collection since you cannot remove elements from a collection during enumeration:
Dim removeList = New List(Of DataRow)
For Each dr As DataRow In ds.Tables(0).Rows
If String.IsNullOrEmpty(dr.Field(Of String)("BIL")) Then
removeList.Add(dr)
End If
Next
For Each dr As DataRow In removeList
ds.Tables(0).Rows.Remove(dr)
Next
Try this:
For i As Integer = dt.Rows.Count - 1 To 0 Step -1
If String.IsNullOrEmpty(dt.Rows(i)("BIL")) Then
dt.Rows.RemoveAt(i)
End If
Next
You will want to put the index of the rows you wish to delete into an array, then iterate through the array deleting each rows from the datatable using the indexes. You will not need an 'identity' column to do this, the rows will automatically be asigned indexes.
Assuming you have 2 columns in table tbl: ColumnA and ColumnB
Dim dv as new DataView
dv = new DataView(tbl)
dv.RowFilter = "ColumnA <> '' AND ColumnB <> ''"
tbl = dv.ToTable()
tbl should no longer have empty rows. Hope this helps.

DataTable.Select.Where VB.Net - Delete rows

I'm currently pulling information using a query that I'm not allowed to tamper with:
Dim dt As DataTable = BLL.GetData(variable).Tables(0)
Immediately afterwards, I'm removing any records where a field begins with a specific value:
For Each dr As DataRow In dt.Rows
If dr.Item(2).ToString().StartsWith("value") Then
dr.Delete()
End If
Next
What I'd really like to do is something like:
dt.Select.Where(field1 => field1.StartsWith("value")).Delete()
I know that is not the syntax of it and I'm probably very off from what it would be like. The For Each works fine, I'm just trying to "simplify" it. Any idea? Any and all help is appreciated.
Actually, your initial code is probably the cleanest and most straight forward.
To delete items using LINQ, you first need to read them into a separate collection, then loop through that collection and call Delete on each record. If you'd rather go that route, you could try:
Dim records = dt.Rows.Where(Function(r) r.StartsWith("value")).ToList()
For Each r In records
r.Delete()
Next
The answer I think you are looking for is below from Microsoft. https://msdn.microsoft.com/en-us/library/det4aw50(v=vs.110).aspx?cs-save-lang=1&cs-lang=vb#code-snippet-2
Dim table As DataTable = DataSet1.Tables("Orders")
' Presuming the DataTable has a column named Date.
Dim expression As String
expression = "Date > #1/1/00#"
Dim foundRows() As DataRow
' Use the Select method to find all rows matching the filter.
foundRows = table.Select(expression)
Dim i As Integer
' Print column 0 of each returned row.
For i = 0 to foundRows.GetUpperBound(0)
Console.WriteLine(foundRows(i)(0))
Next i