Search and replace inside string column in DataTable is slow? - vb.net

I am fetching distinct words in a string column of a DataTable (.dt) and then replacing the unique values with another value, so essentially changing words to other words. Both approaches listed below work, however, for 90k records, the process is not very fast. Is there a way to speed up either approach?
The first approach, is as follows:
'fldNo is column number in dt
For Each Word As String In DistinctWordList
Dim myRow() As DataRow
myRow = dt.Select(MyColumnName & "='" & Word & "'")
For Each row In myRow
row(fldNo) = dicNewWords(Word)
Next
Next
A second LINQ-based approach is as follows, and is actually not very fast either:
Dim flds as new List(of String)
flds.Add(myColumnName)
For Each Word As String In DistinctWordsList
Dim rowData() As DataRow = dt.AsEnumerable().Where(Function(f) flds.Where(Function(el) f(el) IsNot DBNull.Value AndAlso f(el).ToString = Word).Count = flds.Count).ToArray
ReDim foundrecs(rowData.Count)
Cnt = 0
For Each row As DataRow In rowData
Dim Index As Integer = dt.Rows.IndexOf(row)
foundrecs(Cnt) = Index + 1 'row.RowId
Cnt += 1
Next
For i = 0 To Cnt
dt(foundrecs(i))(fldNo) = dicNewWords(Word)
Next
Next

So you have your dictionary of replacements:
Dim d as New Dictionary(Of String, String)
d("foo") = "bar"
d("baz") = "buf"
You can apply them to your table's ReplaceMe column:
Dim rep as String = Nothing
For Each r as DataRow In dt.Rows
If d.TryGetValue(r.Field(Of String)("ReplaceMe"), rep) Then r("ReplaceMe") = rep
Next r
On my machine it takes 340ms for 1 million replacements. I can cut that down to 260ms by using column number rather than name - If d.TryGetValue(r.Field(Of String)(0), rep) Then r(0) = rep
Timing:
'setup, fill a dict with string replacements like "1" -> "11", "7" -> "17"
Dim d As New Dictionary(Of String, String)
For i = 0 To 9
d(i.ToString()) = (i + 10).ToString()
Next
'put a million rows in a datatable, randomly assign dictionary keys as row values
Dim dt As New DataTable
dt.Columns.Add("ReplaceMe")
Dim r As New Random()
Dim k = d.Keys.ToArray()
For i = 1 To 1000000
dt.Rows.Add(k(r.Next(k.Length)))
Next
'what range of values do we have in our dt?
Dim minToMaxBefore = dt.Rows.Cast(Of DataRow).Min(Function(ro) ro.Field(Of String)("ReplaceMe")) & " - " & dt.Rows.Cast(Of DataRow).Max(Function(ro) ro.Field(Of String)("ReplaceMe"))
'it's a crappy way to time, but it'll prove the point
Dim start = DateTime.Now
Dim rep As String = Nothing
For Each ro As DataRow In dt.Rows
If d.TryGetValue(ro.Field(Of String)("ReplaceMe"), rep) Then ro("ReplaceMe") = rep
Next
Dim ennd = DateTime.Now
'what range of values do we have now
Dim minToMaxAfter = dt.Rows.Cast(Of DataRow).Min(Function(ro) ro.Field(Of String)("ReplaceMe")) & " - " & dt.Rows.Cast(Of DataRow).Max(Function(ro) ro.Field(Of String)("ReplaceMe"))
MessageBox.Show($"min to max before of {minToMaxBefore} became {minToMaxAfter} proving replacements occurred, it took {(ennd - start).TotalMilliseconds} ms for 1 million replacements")

Related

Cannot create a csv file from a datatable and dataview

I have code that produces a csv file (with headers) that contain fields identifying pixel locations of points of interest in selected images. Unfortunately, there are occasional duplicates that cause issues later in my production process. My plan to solve this is to open the csv file in the midst of the application, read the data into a data table, sort the data table with a data view using 4 sort keys (I am using the default Column1, Column2 etc) as i cannot figure out ho to use the real column header names contained in the first row. I have attempted to write the resulting rows to a new file, and I get the correct number of rows in the output csv file, but the data in the row is missing and i get a text string ' System.Data.DataRowView, not the 12 fields i was expecting.
In the code, my most recent attempt is to create a writeOutPutLine containing the concatenated contents of the row, in my desired sort order.
I have attempted to use CSVHelper with no success, and also tried about 10 other code sets posted on the various web sites.
Here is my code
Sub New_Csv_Sort()
Dim linecount As Integer = 1
Dim dt As DataTable = New DataTable
Dim dgv As DataGridView = New DataGridView
Dim TextLine As String = ""
Dim SplitLine() As String
Dim objReader As New System.IO.StreamReader(TargetFileName, Encoding.ASCII)
Dim ColumnCount As Integer = 0
For ColumnCount = 1 To 13 Step 1
dt.Columns.Add()
Next
Dim firstline As Boolean = True
Do While objReader.Peek() <> -1
TextLine = objReader.ReadLine()
SplitLine = Split(TextLine, ",")
dt.Rows.Add(SplitLine)
If FirstTime = True Then
' Here I was trying to use the first line to set the column labels but no joy
' For ColumnCount = 1 To 12 Step 1
' dt.datacolumn.
' Next
End If
Loop
' This next is commentted out in the sort statement because I could not get the column names to be recognised
'.Sort = "Datum ASC, Picture ASC, Y_Loc ASC, X_Loc, ASC"
Dim dv As New DataView(dt) With {
.Sort = "Column5 ASC, Column1 ASC, Column12 ASC, Column11 ASC"
}
FileOpen(SortedTargetFileNumber, SortedTargetFileName, OpenMode.Output)
'I have looked that the data and it is read in correctly
But my output line is trash!
For Each Rowview As DataRowView In dv
'WriteOutPutLine = dv.row
For Each DataColumn In dt.Columns
WriteOutPutLine = WriteOutPutLine & Row[col].tostring() & ","
Next
WriteOutPutLine = Rowview.ToString
PrintLine(SortedTargetFileNumber, WriteOutPutLine)
linecount = linecount + 1
Next
FileClose(SortedTargetFileNumber)
MsgBox("Sorted Lines = " & linecount)
End Sub
Sub New_Csv_Sort()
Dim dt As DataTable = New DataTable
Dim TextLine As String = ""
Dim SplitLine() As String
Dim objReader As New System.IO.StreamReader(TargetFileName, Encoding.ASCII)
Dim ColumnCount As Integer = 0
'Create columns for the data -
For ColumnCount = 1 To 13 Step 1
dt.Columns.Add()
Next
'Load the datatable
Do While objReader.Peek() <> -1
TextLine = objReader.ReadLine()
SplitLine = Split(TextLine, ",")
dt.Rows.Add(SplitLine)
Loop
'This is the sort order based on the header titles, but is translated the the default column name
'.Sort = "Datum ASC, Picture ASC, Y_Loc ASC, X_Loc ASC"
Dim dv As New DataView(dt) With {
.Sort = "Column5 ASC, Column1 ASC, Column12 ASC, Column11 ASC"
}
'Open the output file
FileOpen(SortedTargetFileNumber, SortedTargetFileName, OpenMode.Output)
'Create a string of all the rows in the dataview separated by commas
Dim DataLine As String = ""
For Each r As DataRowView In dv
DataLine &= String.Join(",", r.Row.ItemArray) & Environment.NewLine
Next
' Write the data to a file
WriteOutPutLine = DataLine
PrintLine(SortedTargetFileNumber, WriteOutPutLine)
FileClose(SortedTargetFileNumber)
' and we are done
End Sub
try this function
Private Function ConvertToCSV(ByVal dt As DataTable) As String
Dim sb As New Text.StringBuilder()
For Each row As DataRow In dt.Rows
sb.AppendLine(String.Join(",", (From i As Object In row.ItemArray Select i.ToString().Replace("""", """""").Replace(",", "\,").Replace(Environment.NewLine, "\" & Environment.NewLine).Replace("\", "\\")).ToArray()))
Next
Return sb.ToString()
End Function

Split output result by n and loop

I'm solving an issue where I need to create one PDF form.
That PDF form has 8 sections where I need to put info about and looks like shown on picture (only 4 shown).
The point is that my query will return 0 - n different results. So I need to split by 8 and post on different pages.
I tried like shown below but that seems not to work since I always load a new document. Does anyone have some advice how to make it?
Try
Dim sCommand As OleDb.OleDbCommand
sCommand = New OleDb.OleDbCommand("SELECT a,b,c Query to fetch n results ", _dbCon)
sCommand.CommandTimeout = 0
Dim _dbREADER As OleDb.OleDbDataReader
_dbREADER = sCommand.ExecuteReader
Dim dt As DataTable = New DataTable()
dt.Load(_dbREADER)
Dim totalPages As Integer = dt.Rows.Count / 8
Dim currentPage As Integer = 1
Dim rowCounter As Long = 0
If dt.Rows.Count > 0 Then
For Each row In dt.Rows
rowCounter += 1
If rowCounter = 8 Then
currentPage += 1
rowCounter = 0
End If
_pdfDocumentOutput = System.IO.Path.GetTempPath() & "MailingForm_" & currentPage & ".pdf"
SaveFromResources(_pdfDocument, My.Resources.template)
Using reader As New PdfReader(_pdfDocument)
Using stamper As New PdfStamper(reader, New IO.FileStream(_pdfDocumentOutput, IO.FileMode.Create))
Dim fontName As String = System.IO.Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Fonts), "SCRIPTIN.ttf")
Dim bf As BaseFont = BaseFont.CreateFont(fontName, BaseFont.IDENTITY_H, BaseFont.NOT_EMBEDDED)
Dim pdfForm As AcroFields = stamper.AcroFields
pdfForm.AddSubstitutionFont(bf)
pdfForm.SetField(rowCounter - 1 & "0", row("Customer")) 'Checks the top radiobutton of the VrPr4 field
pdfForm.SetField(rowCounter - 1 & "1", row("Address"))
pdfForm.SetField(rowCounter - 1 & "2", row("Location"))
stamper.FormFlattening = True
End Using
End Using
Next
End If
Status.Text = "Store info loaded ! "
Catch ex As Exception
Status.Text = ex.Message
End Try
I found the solution by splitting tables
Private Shared Function SplitTable(ByVal originalTable As DataTable, ByVal batchSize As Integer) As List(Of DataTable)
Dim tables As List(Of DataTable) = New List(Of DataTable)()
Dim i As Integer = 0
Dim j As Integer = 1
Dim newDt As DataTable = originalTable.Clone()
newDt.TableName = "Table_" & j
newDt.Clear()
For Each row As DataRow In originalTable.Rows
Dim newRow As DataRow = newDt.NewRow()
newRow.ItemArray = row.ItemArray
newDt.Rows.Add(newRow)
i += 1
If i = batchSize Then
tables.Add(newDt)
j += 1
newDt = originalTable.Clone()
newDt.TableName = "Table_" & j
newDt.Clear()
i = 0
End If
If row.Equals(originalTable.Rows(originalTable.Rows.Count - 1)) Then
tables.Add(newDt)
j += 1
newDt = originalTable.Clone()
newDt.TableName = "Table_" & j
newDt.Clear()
i = 0
End If
Next
Return tables
End Function
And after that loop through each one of table . And put all results to one file
Dim tables = SplitTable(dt, 8)

VB: Count number of columns in csv

So, quite simple.
I am importing CSVs into a datagrid, though the csv always has to have a variable amount of columns.
For 3 Columns, I use this code:
Dim sr As New IO.StreamReader("E:\test.txt")
Dim dt As New DataTable
Dim newline() As String = sr.ReadLine.Split(";"c)
dt.Columns.AddRange({New DataColumn(newline(0)), _
New DataColumn(newline(1)), _
New DataColumn(newline(2))})
While (Not sr.EndOfStream)
newline = sr.ReadLine.Split(";"c)
Dim newrow As DataRow = dt.NewRow
newrow.ItemArray = {newline(0), newline(1), newline(2)}
dt.Rows.Add(newrow)
End While
DG1.DataSource = dt
This works perfectly. But how do I count the number of "newline"s ?
Can I issue a count on the number of newlines somehow? Any other example code doesn't issue column heads.
If my csv file has 5 columns, I would need an Addrange of 5 instead of 3 and so on..
Thanks in advance
Dim sr As New IO.StreamReader(path)
Dim dt As New DataTable
Dim newline() As String = sr.ReadLine.Split(","c)
' MsgBox(newline.Count)
' dt.Columns.AddRange({New DataColumn(newline(0)),
' New DataColumn(newline(1)),
' New DataColumn(newline(2))})
Dim i As Integer
For i = 0 To newline.Count - 1
dt.Columns.AddRange({New DataColumn(newline(i))})
Next
While (Not sr.EndOfStream)
newline = sr.ReadLine.Split(","c)
Dim newrow As DataRow = dt.NewRow
newrow.ItemArray = {newline(0), newline(1)}
dt.Rows.Add(newrow)
End While
dgv.DataSource = dt
End Sub
Columns and item values can be added to a DataTable individually, using dt.Columns.Add and newrow.Item, so that these can be done in a loop instead of hard-coding for a specific number of columns. e.g. (this code assumes Option Infer On, so adjust as needed):
Public Function CsvToDataTable(csvName As String, Optional delimiter As Char = ","c) As DataTable
Dim dt = New DataTable()
For Each line In File.ReadLines(csvName)
If dt.Columns.Count = 0 Then
For Each part In line.Split({delimiter})
dt.Columns.Add(New DataColumn(part))
Next
Else
Dim row = dt.NewRow()
Dim parts = line.Split({delimiter})
For i = 0 To parts.Length - 1
row(i) = parts(i)
Next
dt.Rows.Add(row)
End If
Next
Return dt
End Function
You could then use it like:
Dim dt = CsvToDataTable("E:\test.txt", ";"c)
DG1.DataSource = dt

How can I get String values rather than integer

How To get StartString And EndString
Dim startNumber As Integer
Dim endNumber As Integer
Dim i As Integer
startNumber = 1
endNumber = 4
For i = startNumber To endNumber
MsgBox(i)
Next i
Output: 1,2,3,4
I want mo make this like sample: startString AAA endString AAD
and the output is AAA, AAB, AAC, AAD
This is a simple function that should be easy to understand and use. Every time you call it, it just increments the string by one value. Just be careful to check the values in the text boxes or you can have an endless loop on your hands.
Function AddOneChar(Str As String) As String
AddOneChar = ""
Str = StrReverse(Str)
Dim CharSet As String = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"
Dim Done As Boolean = False
For Each Ltr In Str
If Not Done Then
If InStr(CharSet, Ltr) = CharSet.Length Then
Ltr = CharSet(0)
Else
Ltr = CharSet(InStr(CharSet, Ltr))
Done = True
End If
End If
AddOneChar = Ltr & AddOneChar
Next
If Not Done Then
AddOneChar = CharSet(0) & AddOneChar
End If
End Function
Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
Dim S = TextBox1.Text
Do Until S = TextBox2.Text
S = AddOneChar(S)
MsgBox(S)
Loop
End Sub
This works as a way to all the codes given an arbitrary alphabet:
Public Function Generate(starting As String, ending As String, alphabet As String) As IEnumerable(Of String)
Dim increment As Func(Of String, String) = _
Function(x)
Dim f As Func(Of IEnumerable(Of Char), IEnumerable(Of Char)) = Nothing
f = _
Function(cs)
If cs.Any() Then
Dim first = cs.First()
Dim rest = cs.Skip(1)
If first = alphabet.Last() Then
rest = f(rest)
first = alphabet(0)
Else
first = alphabet(alphabet.IndexOf(first) + 1)
End If
Return Enumerable.Repeat(first, 1).Concat(rest)
Else
Return Enumerable.Empty(Of Char)()
End If
End Function
Return New String(f(x.ToCharArray().Reverse()).Reverse().ToArray())
End Function
Dim results = New List(Of String)
Dim text = starting
While True
results.Add(text)
If text = ending Then
Exit While
End If
text = increment(text)
End While
Return results
End Function
I used it like this to produce the required result:
Dim alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"
Dim results = Generate("S30AB", "S30B1", alphabet)
This gave me 63 values:
S30AB
S30AC
...
S30BY
S30BZ
S30B0
S30B1
It should now be very easy to modify the alphabet as needed and to use the results.
One option would be to put those String values into an array and then use i as an index into that array to get one element each iteration. If you do that though, keep in mind that array indexes start at 0.
You can also use a For Each loop to access each element of the array without the need for an index.
if the default first two string value of your output is AA.
You can have a case or if-else conditioning statement :
and then set 1 == A 2 == B...
the just add or concatenate your default two string and result string of your case.
I have tried to understand that you are looking for a series using range between 2 textboxes. Here is the code which will take the series and will give the output as required.
Dim startingStr As String = Mid(TextBox1.Text, TextBox1.Text.Length, 1)
Dim endStr As String = Mid(TextBox2.Text, TextBox2.Text.Length, 1)
Dim outputstr As String = String.Empty
Dim startNumber As Integer
Dim endNumber As Integer
startNumber = Asc(startingStr)
endNumber = Asc(endStr)
Dim TempStr As String = Mid(TextBox1.Text, 1, TextBox1.Text.Length - 1)
Dim i As Integer
For i = startNumber To endNumber
outputstr = outputstr + ", " + TempStr + Chr(i)
Next i
MsgBox(outputstr)
The First two lines will take out the Last Character of the String in the text box.
So in your case it will get A and D respectively
Then outputstr to create the series which we will use in the loop
StartNumber and EndNumber will be give the Ascii values for the character we fetched.
TempStr to Store the string which is left off of the series string like in our case AAA - AAD Tempstr will have AA
then the simple loop to get all the items fixed and show
in your case to achive goal you may do something like this
Dim S() As String = {"AAA", "AAB", "AAC", "AAD"}
For Each el In S
MsgBox(el.ToString)
Next
FIX FOR PREVIOUS ISSUE
Dim s1 As String = "AAA"
Dim s2 As String = "AAZ"
Dim Last As String = s1.Last
Dim LastS2 As String = s2.Last
Dim StartBase As String = s1.Substring(0, 2)
Dim result As String = String.Empty
For I As Integer = Asc(s1.Last) To Asc(s2.Last)
Dim zz As String = StartBase & Chr(I)
result += zz & vbCrLf
zz = Nothing
MsgBox(result)
Next
**UPDATE CODE VERSION**
Dim BARCODEBASE As String = "SBA0021"
Dim BarCode1 As String = "SBA0021AA1"
Dim BarCode2 As String = "SBA0021CD9"
'return AA1
Dim FirstBarCodeSuffix As String = Replace(BarCode1, BARCODEBASE, "")
'return CD9
Dim SecondBarCodeSuffix As String = Replace(BarCode2, BARCODEBASE, "")
Dim InternalSecondBarCodeSuffix = SecondBarCodeSuffix.Substring(1, 1)
Dim IsTaskCompleted As Boolean = False
For First As Integer = Asc(FirstBarCodeSuffix.First) To Asc(SecondBarCodeSuffix)
If IsTaskCompleted = True Then Exit For
For Second As Integer = Asc(FirstBarCodeSuffix.First) To Asc(InternalSecondBarCodeSuffix)
For Third As Integer = 1 To 9
Dim tmp = Chr(First) & Chr(Second) & Third
Console.WriteLine(BARCODEBASE & tmp)
If tmp = SecondBarCodeSuffix Then
IsTaskCompleted = True
End If
Next
Next
Next
Console.WriteLine("Completed")
Console.Read()
Take a look into this check it and let me know if it can help

Stored list of array using For Each loop

I want to store the "zone_check_value" to a array of string then while inserting into array i will check the array of string if the next value is repeated or have duplicate.
Example.
1st Example
1st loop = zone_check_value = ZD1/01/2014
2nd loop = zone_check_value = ZD1/01/2014
2nd Example
1st loop = zone_check_value = ZD1/01/2014
2nd loop = zone_check_value = ZD2/02/2014
3rd loop = zone_check_value = ZD1/01/2014
Code:
For Each dt As DataTable In xls.Tables
Dim array_of_string as String() 'i want to put the value in here
For Each dr As DataRow In dt.Rows
Dim zone_destination As String = dr(2).ToString
Dim affected_date As String = dr(7).ToString
Dim zone_check_value = zone_destination + affected_date
''''''How can i store zone_check_value in a string array?
Next
Next
EDIT
What if i add select case in my loop? the array_of_string value become NULL. I need to check the value of the current value in array_of_string .
Example Code
For Each dt As DataTable In xls.Tables
Dim array_of_string as String() 'i want to put the value in here
select Case dt.tablename
case "Sheet1"
For Each dr As DataRow In dt.Rows
Dim zone_destination As String = dr(2).ToString
Dim affected_date As String = dr(7).ToString
Dim zone_check_value = zone_destination + affected_date
''''''How can i store zone_check_value in a string array?
Next
case "Sheet 2"
For Each dr As DataRow In dt.Rows
Dim check_value as Boolean = array_of_string.Contains(dr(0).ToString)
'but when i got in sheet 2 the array_of_string is null
Next
Next
The easiest way is to use a List(Of String).
For Each dt As DataTable In xls.Tables
Dim array_of_string as List(Of String) = New List(Of String) 'i want to put the value in here
For Each dr As DataRow In dt.Rows
Dim zone_destination As String = dr(2).ToString
Dim affected_date As String = dr(7).ToString
Dim zone_check_value = zone_destination & affected_date
''''''How can i store zone_check_value in a string array?
array_of_string.Add(zone_check_value)
Next
''' Now if you really need it in array form you can cast it via:
''' Dim values() As String = array_of_string.ToArray()
Next
You might even consider:
For Each dt As DataTable In xls.Tables
Dim values As List(Of String) = New List(Of String)
dt.Rows.ForEach( Sub(item) values.Add(item(2).ToString & item(7).ToString) )
''' Now do something with values.
Next
Either way, make sure you always use the string concatenation operator & to concatenate strings. The arithmetic addition operator + will cause you problems from time to time if you use it on strings.