How do I improve perfomance creating DataTable in VB.net? - vb.net

I have following code and it's slow to load first time. CSV file is about 4mb 16000 rows.
If Session("tb") Is Nothing Then
Dim str As String()
If (IsNothing(Cache("csvdata"))) Then
str = File.ReadAllLines(Server.MapPath("~/test/feed.csv"))
Cache.Insert("csvdata", str, Nothing, DateTime.Now.AddHours(12), TimeSpan.Zero)
Else
str = CType(Cache("csvdata"), Array)
End If
Dim dt As New DataTable
dt.Columns.Add("Shape", GetType(System.String))
dt.Columns.Add("Weight", GetType(System.Double))
dt.Columns.Add("Color", GetType(System.String))
dt.Columns.Add("Clarity", GetType(System.String))
dt.Columns.Add("Price", GetType(System.Int32))
dt.Columns.Add("CutGrade", GetType(System.String))
For i As Integer = 1 To str.Length - 1
Dim pattern As String = ",(?=([^""]*""[^""]*"")*[^""]*$)"
Dim rgx As New Regex(pattern)
Dim t As String = rgx.Replace(str(i), "\")
Dim s As String() = t.Split("\"c)
Dim pr As Int32 = CType(s(5), Int32)
Dim fpr As Int32
Dim rate As Double
Select Case pr
Case Is < 300
rate = 2
Case 301 To 600
rate = 1.7
Case Is > 600
rate = 1.16
End Select
fpr = Math.Round(pr * rate)
Dim a As String() = {s(1), s(2), s(3), s(4), fpr, s(40)}
dt.Rows.Add(a)
Next
Session("tb") = dt
ListView1.DataSource = dt
ListView1.DataBind()
Else
Dim x As DataTable = CType(Session("tb"), DataTable)
ListView1.DataSource = x
ListView1.DataBind()
End If
csv file is cached and I assume this can share with everyone.
(one person loads once in 12 hours)
Once I create Session, the page loads fast as well.
So, creating Datatable seems to be the slow process.
This is first time to deal with datatable and I'm sure someone can point out what I'm doing wrong.
Thank you
UPDATE:
I have changed Cache to the original Datatable instead of CSV file.
It loads fast now, but I would like to know if this is a bad idea or not.
Cache.Insert("csvdata", dt, Nothing, DateTime.Now.AddHours(12), TimeSpan.Zero)
Once it's stored in Cache, I can run Query against it using Linq.
SAMPLE CSV first 3 rows
Supplier ID,Shape,Weight,Color,Clarity,Price / Carat,Lot Number,Stock Number,Lab,Cert #,Certificate Image,2nd Image,Dimension,Depth %,Table %,Crown Angle,Crown %,Pavilion Angle,Pavilion %,Girdle Thinnest,Girdle Thickest,Girdle %,Culet Size,Culet Condition,Polish,Symmetry,Fluor Color,Fluor Intensity,Enhancements,Remarks,Availability,Is Active,FC-Main Body,FC- Intensity,FC- Overtone,Matched Pair,Separable,Matching Stock #,Pavilion,Syndication,Cut Grade,External Url
9349,Round,1.74,F,VVS1,13650.00,,IM-95-188-243,ABC,11228,,,7.81|7.85|4.62,59.00,62.00,34.00,13.00,,,Medium,,0,None,,Excellent,Very Good,Blue,Medium,,"",Not Specified,Y,,,,False,True,,,,Very Good,http://www.test/teste.
9949,Round,1.00,I,VVS1,6059.00,,IM-95-189-C021,ABC,212197,,,6.37|6.42|3.96,61.90,54.00,34.50,16.00,,,Thin,Slightly Thick,0,None,,Excellent,Good,,None,,"Additional pinpoints are not shown.",Guaranteed Available,Y,,,,False,True,,,,Very Good,http://www.test/test.

Look into using a TextFieldParser to read the CSV instead of splitting the strings yourself.
Also, if you use a List(Of CustomClass) where CustomClass has the Shape, Weight, Color, etc. properties you can avoid the unnecessary overhead of the DataTable and you can still do your LINQ queries against the List.
Pardon my C#, I do not have VB.NET installed on this box.
public class Gemstone
{
public string Shape { get; set; }
public double Weight { get; set; }
public string Color { get; set; }
}
static void Main(string[] args)
{
TextFieldParser textFieldParser = new TextFieldParser("data.txt");
textFieldParser.Delimiters = new string[] {","};
textFieldParser.ReadLine(); // skip header line
List<Gemstone> list = new List<Gemstone>(16000); // allocate the list with your best calculated guess of its final size
while(!textFieldParser.EndOfData)
{
string[] fields = textFieldParser.ReadFields();
Gemstone gemstone = new Gemstone();
gemstone.Shape = fields[1];
gemstone.Weight = Double.Parse(fields[2]);
gemstone.Color = fields[3];
list.Add(gemstone);
}

FYI I just found this whole TextFieldParser thing, and I do a LOT of text file parsing so I tested it....
On an 11mb file, with about 5200 rows and 300 columns.
It was 25% of the speed i was using when putting in a datatable. It was about 15% of the speed when I removed the datatable code:
Dim DataTable As New DataTable()
Dim StartTime As Long = Now.Ticks
Dim Reader As New FileIO.TextFieldParser("file.txt")
Reader.TextFieldType = FileIO.FieldType.Delimited
Reader.SetDelimiters(vbTab)
Reader.HasFieldsEnclosedInQuotes = False
Dim Header As Boolean = True
While Not Reader.EndOfData
Dim Fields() As String = Reader.ReadFields
If Header Then
For I As Integer = 1 To 320
DataTable.Columns.Add("Col" & I)
Next
Header = False
Else
If Mid(Fields(0), 1, 1) <> "#" Then DataTable.Rows.Add(Fields)
End If
End While
Debug.Print((Now.Ticks - StartTime) / 10000 & "ms")
Dim DataTable2 As New DataTable()
StartTime = Now.Ticks
For I As Integer = 1 To 320
DataTable2.Columns.Add("Col" & I)
Next
For Each line As String In System.IO.File.ReadAllLines("file.txt")
Dim NVP() As String = Split(line, vbTab)
If Mid(line, 1, 1) <> "#" Then DataTable2.Rows.Add(NVP)
Next
Debug.Print((Now.Ticks - StartTime) / 10000 & "ms")
With the datable code removed:
Dim StartTime As Long = Now.Ticks
Dim Reader As New FileIO.TextFieldParser("file.txt")
Reader.TextFieldType = FileIO.FieldType.Delimited
Reader.SetDelimiters(vbTab)
Reader.HasFieldsEnclosedInQuotes = False
Dim Header As Boolean = True
While Not Reader.EndOfData
Dim Fields() As String = Reader.ReadFields
End While
Debug.Print((Now.Ticks - StartTime) / 10000 & "ms")
StartTime = Now.Ticks
For Each line As String In System.IO.File.ReadAllLines("file.txt")
Dim NVP() As String = Split(line, vbTab)
Next
Debug.Print((Now.Ticks - StartTime) / 10000 & "ms")
Kinda surprising to me, but I guess the datatable has more functionality. Another new thing I discover that I'll never use :(

Related

how to read a specific csv line vb.net

ask permission,
I created a bot to input data to the web using vb.net and selenium.
Retrieve data from csv .
How to retrieve data from csv as needed, for example, there are 100 rows, only 30-50 rows are taken, for example. The loop code should not be looped at all.
Dim textFieldParser As TextFieldParser = New TextFieldParser(TextBox1.Text) With
{
.TextFieldType = FieldType.Delimited,
.Delimiters = New String() {","}
}
drv = New ChromeDriver(options)
While Not textFieldParser.EndOfData
Try
Dim strArrays As String() = textFieldParser.ReadFields()
Dim name As String = strArrays(0)
Dim alamat As String = strArrays(1)
Dim notlp As String = strArrays(2)
drv.Navigate().GoToUrl("URL")
Dim Nm = drv.FindElement(By.XPath("/html/body/div[1]/div[3]/form/div[1]/div[1]/div[1]/div/div[2]/input"))
Nm.SendKeys(name)
Threading.Thread.Sleep(3000)
Catch ex As Exception
MsgBox("Line " & ex.Message & "is not valid and will be skipped.")
End Try
End While
Thank you
Here's an example of using TextFieldParser to read one specific line and a specific range of lines. Note that I am using zero-based indexes for the lines. You can adjust as required if you want to use 1-based line numbers.
Public Function GetLine(filePath As String, index As Integer) As String()
Using parser As New TextFieldParser(filePath) With {.Delimiters = {","}}
Dim linesDiscarded = 0
Do Until linesDiscarded = index
parser.ReadLine()
linesDiscarded += 1
Loop
Return parser.ReadFields()
End Using
End Function
Public Function GetLines(filePath As String, startIndex As Integer, count As Integer) As List(Of String())
Using parser As New TextFieldParser(filePath) With {.Delimiters = {","}}
Dim linesDiscarded = 0
Do Until linesDiscarded = startIndex
parser.ReadLine()
linesDiscarded += 1
Loop
Dim lines As New List(Of String())
Do Until lines.Count = count
lines.Add(parser.ReadFields())
Loop
Return lines
End Using
End Function
Simple loops to skip and to take lines.

Need to Batch a Large DataTable and write each batch to a Text file - VB.Net

I have a requirement that I need to query a DB and fetch the records in a Data Table. The Data Table has 20,000 records.
I need to batch these records in Batches of 100 records each and write these batches into a individual Text files.
Till now I have been able to batch the records in batches of 100 each using IEnumerable(of DataRow).
I am now facing issue in writing the IEnumeable(Of DatRow) to a Text File.
My code is a below:
Dim strsql = "Select * from myTable;"
Dim dt as DataTable
Using cnn as new SqlConnection(connectionString)
cnn.Open()
Using dad as new SqlAdapter(strsql ,cnn)
dad.fill(dt)
End Using
cnn.Close()
End Using
Dim Chunk = getChunks(dt,100)
For each chunk as IEnumerable(Of DataRow) In Chunks
Dim path as String = "myFilePath"
If Not File.Exists(myFilePath) Then
//** Here I will write my Batch into the File.
End If
Next
Public Iterator Function getChunks(byVal Tab as DataTable, byVal size as Integer) as IEnumerable (Of IEnumerable(of DataRow))
Dim chunk as List(Of DataRow) = New List(of DataRow)(size)
For Each row As DataRow in tab.Rows
chunk.Add(row)
if chunk.Count = size Then
Yield chunk
chunk = New List(of DataRow0(size)
Next
if chunk.Any() Then Yield chunk
End Function
Need your help to write the IEneumerable of DataRows into a Text file for each Batch of Records.
Thanks
:)
Your existing code is needlessly complex. If this is all you're doing, then using a datatable is unnecessary/unwise; this is one of the few occasions I would advocate using a lower level datareader to keep the memory impact low
Writing a db table to a file, quick, easy and low memory consumption:
Dim dr = sqlCommand.ExecuteReader()
Dim sb as New StringBuilder
Dim lineNum = -1
Dim batchSize = 100
While dr.Read()
'turn the row into a string for our file
For x = 0 to dr.FieldCount -1
sb.Append(dr.GetString(x)).Append(",")
Next x
sb.Length -= 1 'remove trailing comma
sb.AppendLine()
'keep track of lines written so we can batch accordingly
lineNum += 1
Dim fileNum = lineNum \ batchSize
File.AppendAllText($"c:\temp\file{fileNum}.csv", sb.ToString())
'clear the stringbuilder
sb.Length = 0
End While
If you really want to use a datatable, there isn't anything stopping you swapping this while dr For a For Each r as DataRow in myDatatable.Rows
Please note, this isn't an exercise in creating a fully escaped csv, nor formatting the data; it is demonstrating the concept of having a firehose of data and simply writing it to N different files by utilising the fact that doing an integer divide on every number from 0 to 99 will result in 0 (and hence go in file 0) and then very number from 1 to 199 will result in 1 (and hence lines go in file 1) etc, and doing this process on a single stream of data, or single iteration of N items
You could build the file lines in the string builder and write them every batchSize if lineNum Mod batchSize = batchSize - 1, if you feel that it would be more efficient than calling file appendalltext (which opens and closes the file)
Tested this with a table of a little over 1,500 records and 10 fields. The file creation took a little over 5 seconds (excluding data access). All things being equal (which I know they are not) that would be over 13 seconds writing the files.
Since your problem was with the iterator I assume the there were no memory issues with the DataTable.
You can include more than one database object in a Using block by using a comma to designate a list of objects in the Using.
Private Sub OPCode()
Dim myFilePath = "C:\Users\xxx\Documents\TestLoop\DataFile"
Dim strsql = "Select * from myTable;"
Dim dt As New DataTable
Using cnn As New SqlConnection(connectionString),
cmd As New SqlCommand(strsql, cnn)
cnn.Open()
dt.Load(cmd.ExecuteReader)
End Using
sw.Start()
Dim StartRow = 0
Dim EndRow = 99
Dim FileNum = 1
Dim TopIndex = dt.Rows.Count - 1
Do
For i = StartRow To EndRow
Dim s = String.Join("|", dt.Rows(i).ItemArray)
File.AppendAllText(myFilePath & FileNum & ".txt", s & Environment.NewLine)
Next
FileNum += 1
StartRow += 100
EndRow += 100
If EndRow >= TopIndex Then
EndRow = TopIndex
End If
Loop Until StartRow >= TopIndex
sw.Stop()
MessageBox.Show(sw.ElapsedMilliseconds.ToString)
End Sub
I thought your code was a great use of the iteration function.
Here is the code for your iterator.
Public Iterator Function getChunks(ByVal Tab As DataTable, ByVal size As Integer) As IEnumerable(Of IEnumerable(Of DataRow))
Dim chunk As List(Of DataRow) = New List(Of DataRow)(size)
For Each row As DataRow In Tab.Rows
chunk.Add(row)
If chunk.Count = size Then
Yield chunk
chunk = New List(Of DataRow)(size)
End If
Next
If chunk.Any() Then Yield chunk
End Function
Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
Dim dt = LoadDataTable()
Dim myFilePath As String = "C:\Users\xxx\Documents\TestLoop\DataFile"
Dim FileNum = 1
For Each chunk As IEnumerable(Of DataRow) In getChunks(dt, 100)
For Each row As DataRow In chunk
Dim s = String.Join("|", row.ItemArray)
File.AppendAllText(myFilePath & FileNum & ".txt", s & Environment.NewLine)
Next
FileNum += 1
Next
MessageBox.Show("Done")
End Sub
You just needed to nest the For Each to get at the data rows.

Performance improvement on vb.net code

I need to write 50 million records with 72 columns into text file, the file size is growing as 9.7gb .
I need to check each and every column length need to format as according to the length as defined in XML file.
Reading records from oracle one by one and checking the format and writing into text file.
To write 5 crores records it is taking more than 24 hours. how to increase the performance in the below code.
Dim valString As String = Nothing
Dim valName As String = Nothing
Dim valLength As String = Nothing
Dim valDataType As String = Nothing
Dim validationsArray As ArrayList = GetValidations(Directory.GetCurrentDirectory() + "\ReportFormat.xml")
Console.WriteLine("passed xml")
Dim k As Integer = 1
Try
Console.WriteLine(System.DateTime.Now())
Dim selectSql As String = "select * from table where
" record_date >= To_Date('01-01-2014','DD-MM-YYYY') and record_date <= To_Date('31-12-2014','DD-MM-YYYY')"
Dim dataTable As New DataTable
Dim oracleAccess As New OracleConnection(System.Configuration.ConfigurationManager.AppSettings("OracleConnection"))
Dim cmd As New OracleCommand()
cmd.Connection = oracleAccess
cmd.CommandType = CommandType.Text
cmd.CommandText = selectSql
oracleAccess.Open()
Dim Tablecolumns As New DataTable()
Using oracleAccess
Using writer = New StreamWriter(Directory.GetCurrentDirectory() + "\FileName.txt")
Using odr As OracleDataReader = cmd.ExecuteReader()
Dim sbHeaderData As New StringBuilder
For i As Integer = 0 To odr.FieldCount - 1
sbHeaderData.Append(odr.GetName(i))
sbHeaderData.Append("|")
Next
writer.WriteLine(sbHeaderData)
While odr.Read()
Dim sbColumnData As New StringBuilder
Dim values(odr.FieldCount - 1) As Object
Dim fieldCount As Integer = odr.GetValues(values)
For i As Integer = 0 To fieldCount - 1
Dim vals As Array = validationsArray(i).ToString.ToUpper.Split("|")
valName = vals(0).trim
valDataType = vals(1).trim
valLength = vals(2).trim
Select Case valDataType
Case "VARCHAR2"
If values(i).ToString().Length = valLength Then
sbColumnData.Append(values(i).ToString())
'sbColumnData.Append("|")
ElseIf values(i).ToString().Length > valLength Then
sbColumnData.Append(values(i).ToString().Substring(0, valLength))
'sbColumnData.Append("|")
Else
sbColumnData.Append(values(i).ToString().PadRight(valLength))
'sbColumnData.Append("|")
End If
Case "NUMERIC"
valLength = valLength.Substring(0, valLength.IndexOf(","))
If values(i).ToString().Length = valLength Then
sbColumnData.Append(values(i).ToString())
'sbColumnData.Append("|")
Else
sbColumnData.Append(values(i).ToString().PadLeft(valLength, "0"c))
'sbColumnData.Append("|")
End If
'sbColumnData.Append((values(i).ToString()))
End Select
Next
writer.WriteLine(sbColumnData)
k = k + 1
Console.WriteLine(k)
End While
End Using
writer.WriteLine(System.DateTime.Now())
End Using
End Using
Console.WriteLine(System.DateTime.Now())
'Dim Adpt As New OracleDataAdapter(selectSql, oracleAccess)
'Adpt.Fill(dataTable)
Return Tablecolumns
Catch ex As Exception
Console.WriteLine(System.DateTime.Now())
Console.WriteLine("Error: " & ex.Message)
Console.ReadLine()
Return Nothing
End Try

How to convert memory stream to string array and vice versa

I have a code where i want to get an stream from an image and convert the memory stream to string array and store in a variable. But my problem is i also want to get the image from the string variable and paint on a picture box.
If i use this like
PictureBox1.Image = image.FromStream(memoryStream)
I am able to print the picture on picture box. But this is not my need. I just want to get the image stream from file and convert the stream as text and store it to some string variable and again i want to use the string variable and convert it to stream to print the image on picture box.
Here is my Code.(Vb Express 2008)
Public Function ImageConversion(ByVal image As System.Drawing.Image) As String
If image Is Nothing Then Return ""
Dim memoryStream As System.IO.MemoryStream = New System.IO.MemoryStream
image.Save(memoryStream, System.Drawing.Imaging.ImageFormat.Gif)
Dim value As String = ""
For intCnt As Integer = 0 To memoryStream.ToArray.Length - 1
value = value & memoryStream.ToArray(intCnt) & " "
Next
Dim strAsBytes() As Byte = New System.Text.UTF8Encoding().GetBytes(value)
Dim ms As New System.IO.MemoryStream(strAsBytes)
PictureBox1.Image = image.FromStream(ms)
Return value
End Function
This wouldn`t work in the way you have posted it (at least the part of recreating the image).
See this:
Public Function ImageConversion(ByVal image As System.Drawing.Image) As String
If image Is Nothing Then Return ""
Dim memoryStream As System.IO.MemoryStream = New System.IO.MemoryStream
image.Save(memoryStream, System.Drawing.Imaging.ImageFormat.Gif)
Dim value As String = ""
Dim content As Byte() = memoryStream.ToArray()
' instead of repeatingly call memoryStream.ToArray by using
' memoryStream.ToArray(intCnt)
For intCnt As Integer = 0 To content.Length - 1
value = value & content(intCnt) & " "
Next
value = value.TrimEnd()
Return value
End Function
To recreate the image using the created string you can`t use Encoding.GetBytes() like you did, because you would get a bytearray which represents your string. E.g "123 32 123" you would not get a byte array with the elements 123, 32 , 123
Public Function ImageConversion(ByVal stringRepOfImage As String) As System.Drawing.Image
Dim stringBytes As String() = stringRepOfImage.Split(New String() {" "}, StringSplitOptions.RemoveEmptyEntries)
Dim bytes As New List(Of Byte)
For intCount = 0 To stringBytes.Length - 1
Dim b As Byte
If Byte.TryParse(stringBytes(intCount), b) Then
bytes.Add(b)
Else
Throw new FormatException("Not a byte value")
End If
Next
Dim ms As New System.IO.MemoryStream(bytes.ToArray)
Return Image.FromStream(ms)
End Function
Reference: Byte.TryParse

parameter replacement in a string similar to mysql

I am trying to create a user modifiable string that will have already set data to replace certain patches of text
example
dim name1 as string = "John"
dim state as string = "Virginia"
dim idnumb as integer = 122
dim textlist1 as string
textlist1 = "Hello {NAME}, I see you are from {STATE}, and your id number is {ID}."
Ideally I would want to replace these tags with the set strings
I am familiar with
Replace(textlist1, "{NAME}", name1)
my question is: is this the correct way to do this or is there a method more similar to the way we do parameters in MySql
Ideally you could use StringBuilder to avoid the reallocation of a new string for every replace.
dim name1 as string = "John"
dim state as string = "Virginia"
dim idnumb as integer = 122
dim textlist1 as StringBuilder = new StringBuilder _
("Hello {NAME}, I see you are from {STATE}, and your id number is {ID}.")
textlist1.Replace("{NAME}", name1)
As supposed the StringBuilder approach is not the best for performance related comparisons
This is just a little benchmark executed via LinqPad
Sub Main
dim name1 as string = "John"
dim state as string = "Virginia"
dim idnumb as integer = 122
dim textlist1 as StringBuilder
Dim sw = new Stopwatch()
sw.Start()
for i = 0 to 100000
textlist1 = new StringBuilder("Hello {NAME}, I see you are from {STATE}, and your id number is {ID}.")
textlist1.Replace("{NAME}", name1)
textlist1.Replace("{STATE}", state)
textlist1.Replace("{ID}", idnumb.ToString())
Next
sw.Stop()
sw.Elapsed.Dump("StringBuilder.Replace")
sw = new Stopwatch()
sw.Start()
for i = 0 to 100000
Dim test = "Hello {NAME}, I see you are from {STATE}, and your id number is {ID}."
Dim test2 = test.Replace("{NAME}", name1)
Dim test3 = test2.Replace("{STATE}", state)
Dim test4 = test3.Replace("{ID}", idnumb.ToString())
Next
sw.Stop()
sw.Elapsed.Dump("String.Replace")
End Sub
StringBuilder.Replace
00:00:00.2795878
String.Replace
00:00:00.1642420
So it seems that the cost to allocate a string builder outweight the cost to use a fixed interned string. Of course the memory fragmentation caused by the immutable string behavior is not easily measurable.