How do I improve perfomance creating DataTable in VB.net? - vb.net
I have following code and it's slow to load first time. CSV file is about 4mb 16000 rows.
If Session("tb") Is Nothing Then
Dim str As String()
If (IsNothing(Cache("csvdata"))) Then
str = File.ReadAllLines(Server.MapPath("~/test/feed.csv"))
Cache.Insert("csvdata", str, Nothing, DateTime.Now.AddHours(12), TimeSpan.Zero)
Else
str = CType(Cache("csvdata"), Array)
End If
Dim dt As New DataTable
dt.Columns.Add("Shape", GetType(System.String))
dt.Columns.Add("Weight", GetType(System.Double))
dt.Columns.Add("Color", GetType(System.String))
dt.Columns.Add("Clarity", GetType(System.String))
dt.Columns.Add("Price", GetType(System.Int32))
dt.Columns.Add("CutGrade", GetType(System.String))
For i As Integer = 1 To str.Length - 1
Dim pattern As String = ",(?=([^""]*""[^""]*"")*[^""]*$)"
Dim rgx As New Regex(pattern)
Dim t As String = rgx.Replace(str(i), "\")
Dim s As String() = t.Split("\"c)
Dim pr As Int32 = CType(s(5), Int32)
Dim fpr As Int32
Dim rate As Double
Select Case pr
Case Is < 300
rate = 2
Case 301 To 600
rate = 1.7
Case Is > 600
rate = 1.16
End Select
fpr = Math.Round(pr * rate)
Dim a As String() = {s(1), s(2), s(3), s(4), fpr, s(40)}
dt.Rows.Add(a)
Next
Session("tb") = dt
ListView1.DataSource = dt
ListView1.DataBind()
Else
Dim x As DataTable = CType(Session("tb"), DataTable)
ListView1.DataSource = x
ListView1.DataBind()
End If
csv file is cached and I assume this can share with everyone.
(one person loads once in 12 hours)
Once I create Session, the page loads fast as well.
So, creating Datatable seems to be the slow process.
This is first time to deal with datatable and I'm sure someone can point out what I'm doing wrong.
Thank you
UPDATE:
I have changed Cache to the original Datatable instead of CSV file.
It loads fast now, but I would like to know if this is a bad idea or not.
Cache.Insert("csvdata", dt, Nothing, DateTime.Now.AddHours(12), TimeSpan.Zero)
Once it's stored in Cache, I can run Query against it using Linq.
SAMPLE CSV first 3 rows
Supplier ID,Shape,Weight,Color,Clarity,Price / Carat,Lot Number,Stock Number,Lab,Cert #,Certificate Image,2nd Image,Dimension,Depth %,Table %,Crown Angle,Crown %,Pavilion Angle,Pavilion %,Girdle Thinnest,Girdle Thickest,Girdle %,Culet Size,Culet Condition,Polish,Symmetry,Fluor Color,Fluor Intensity,Enhancements,Remarks,Availability,Is Active,FC-Main Body,FC- Intensity,FC- Overtone,Matched Pair,Separable,Matching Stock #,Pavilion,Syndication,Cut Grade,External Url
9349,Round,1.74,F,VVS1,13650.00,,IM-95-188-243,ABC,11228,,,7.81|7.85|4.62,59.00,62.00,34.00,13.00,,,Medium,,0,None,,Excellent,Very Good,Blue,Medium,,"",Not Specified,Y,,,,False,True,,,,Very Good,http://www.test/teste.
9949,Round,1.00,I,VVS1,6059.00,,IM-95-189-C021,ABC,212197,,,6.37|6.42|3.96,61.90,54.00,34.50,16.00,,,Thin,Slightly Thick,0,None,,Excellent,Good,,None,,"Additional pinpoints are not shown.",Guaranteed Available,Y,,,,False,True,,,,Very Good,http://www.test/test.
Look into using a TextFieldParser to read the CSV instead of splitting the strings yourself.
Also, if you use a List(Of CustomClass) where CustomClass has the Shape, Weight, Color, etc. properties you can avoid the unnecessary overhead of the DataTable and you can still do your LINQ queries against the List.
Pardon my C#, I do not have VB.NET installed on this box.
public class Gemstone
{
public string Shape { get; set; }
public double Weight { get; set; }
public string Color { get; set; }
}
static void Main(string[] args)
{
TextFieldParser textFieldParser = new TextFieldParser("data.txt");
textFieldParser.Delimiters = new string[] {","};
textFieldParser.ReadLine(); // skip header line
List<Gemstone> list = new List<Gemstone>(16000); // allocate the list with your best calculated guess of its final size
while(!textFieldParser.EndOfData)
{
string[] fields = textFieldParser.ReadFields();
Gemstone gemstone = new Gemstone();
gemstone.Shape = fields[1];
gemstone.Weight = Double.Parse(fields[2]);
gemstone.Color = fields[3];
list.Add(gemstone);
}
FYI I just found this whole TextFieldParser thing, and I do a LOT of text file parsing so I tested it....
On an 11mb file, with about 5200 rows and 300 columns.
It was 25% of the speed i was using when putting in a datatable. It was about 15% of the speed when I removed the datatable code:
Dim DataTable As New DataTable()
Dim StartTime As Long = Now.Ticks
Dim Reader As New FileIO.TextFieldParser("file.txt")
Reader.TextFieldType = FileIO.FieldType.Delimited
Reader.SetDelimiters(vbTab)
Reader.HasFieldsEnclosedInQuotes = False
Dim Header As Boolean = True
While Not Reader.EndOfData
Dim Fields() As String = Reader.ReadFields
If Header Then
For I As Integer = 1 To 320
DataTable.Columns.Add("Col" & I)
Next
Header = False
Else
If Mid(Fields(0), 1, 1) <> "#" Then DataTable.Rows.Add(Fields)
End If
End While
Debug.Print((Now.Ticks - StartTime) / 10000 & "ms")
Dim DataTable2 As New DataTable()
StartTime = Now.Ticks
For I As Integer = 1 To 320
DataTable2.Columns.Add("Col" & I)
Next
For Each line As String In System.IO.File.ReadAllLines("file.txt")
Dim NVP() As String = Split(line, vbTab)
If Mid(line, 1, 1) <> "#" Then DataTable2.Rows.Add(NVP)
Next
Debug.Print((Now.Ticks - StartTime) / 10000 & "ms")
With the datable code removed:
Dim StartTime As Long = Now.Ticks
Dim Reader As New FileIO.TextFieldParser("file.txt")
Reader.TextFieldType = FileIO.FieldType.Delimited
Reader.SetDelimiters(vbTab)
Reader.HasFieldsEnclosedInQuotes = False
Dim Header As Boolean = True
While Not Reader.EndOfData
Dim Fields() As String = Reader.ReadFields
End While
Debug.Print((Now.Ticks - StartTime) / 10000 & "ms")
StartTime = Now.Ticks
For Each line As String In System.IO.File.ReadAllLines("file.txt")
Dim NVP() As String = Split(line, vbTab)
Next
Debug.Print((Now.Ticks - StartTime) / 10000 & "ms")
Kinda surprising to me, but I guess the datatable has more functionality. Another new thing I discover that I'll never use :(
Related
how to read a specific csv line vb.net
ask permission, I created a bot to input data to the web using vb.net and selenium. Retrieve data from csv . How to retrieve data from csv as needed, for example, there are 100 rows, only 30-50 rows are taken, for example. The loop code should not be looped at all. Dim textFieldParser As TextFieldParser = New TextFieldParser(TextBox1.Text) With { .TextFieldType = FieldType.Delimited, .Delimiters = New String() {","} } drv = New ChromeDriver(options) While Not textFieldParser.EndOfData Try Dim strArrays As String() = textFieldParser.ReadFields() Dim name As String = strArrays(0) Dim alamat As String = strArrays(1) Dim notlp As String = strArrays(2) drv.Navigate().GoToUrl("URL") Dim Nm = drv.FindElement(By.XPath("/html/body/div[1]/div[3]/form/div[1]/div[1]/div[1]/div/div[2]/input")) Nm.SendKeys(name) Threading.Thread.Sleep(3000) Catch ex As Exception MsgBox("Line " & ex.Message & "is not valid and will be skipped.") End Try End While Thank you
Here's an example of using TextFieldParser to read one specific line and a specific range of lines. Note that I am using zero-based indexes for the lines. You can adjust as required if you want to use 1-based line numbers. Public Function GetLine(filePath As String, index As Integer) As String() Using parser As New TextFieldParser(filePath) With {.Delimiters = {","}} Dim linesDiscarded = 0 Do Until linesDiscarded = index parser.ReadLine() linesDiscarded += 1 Loop Return parser.ReadFields() End Using End Function Public Function GetLines(filePath As String, startIndex As Integer, count As Integer) As List(Of String()) Using parser As New TextFieldParser(filePath) With {.Delimiters = {","}} Dim linesDiscarded = 0 Do Until linesDiscarded = startIndex parser.ReadLine() linesDiscarded += 1 Loop Dim lines As New List(Of String()) Do Until lines.Count = count lines.Add(parser.ReadFields()) Loop Return lines End Using End Function Simple loops to skip and to take lines.
Need to Batch a Large DataTable and write each batch to a Text file - VB.Net
I have a requirement that I need to query a DB and fetch the records in a Data Table. The Data Table has 20,000 records. I need to batch these records in Batches of 100 records each and write these batches into a individual Text files. Till now I have been able to batch the records in batches of 100 each using IEnumerable(of DataRow). I am now facing issue in writing the IEnumeable(Of DatRow) to a Text File. My code is a below: Dim strsql = "Select * from myTable;" Dim dt as DataTable Using cnn as new SqlConnection(connectionString) cnn.Open() Using dad as new SqlAdapter(strsql ,cnn) dad.fill(dt) End Using cnn.Close() End Using Dim Chunk = getChunks(dt,100) For each chunk as IEnumerable(Of DataRow) In Chunks Dim path as String = "myFilePath" If Not File.Exists(myFilePath) Then //** Here I will write my Batch into the File. End If Next Public Iterator Function getChunks(byVal Tab as DataTable, byVal size as Integer) as IEnumerable (Of IEnumerable(of DataRow)) Dim chunk as List(Of DataRow) = New List(of DataRow)(size) For Each row As DataRow in tab.Rows chunk.Add(row) if chunk.Count = size Then Yield chunk chunk = New List(of DataRow0(size) Next if chunk.Any() Then Yield chunk End Function Need your help to write the IEneumerable of DataRows into a Text file for each Batch of Records. Thanks :)
Your existing code is needlessly complex. If this is all you're doing, then using a datatable is unnecessary/unwise; this is one of the few occasions I would advocate using a lower level datareader to keep the memory impact low Writing a db table to a file, quick, easy and low memory consumption: Dim dr = sqlCommand.ExecuteReader() Dim sb as New StringBuilder Dim lineNum = -1 Dim batchSize = 100 While dr.Read() 'turn the row into a string for our file For x = 0 to dr.FieldCount -1 sb.Append(dr.GetString(x)).Append(",") Next x sb.Length -= 1 'remove trailing comma sb.AppendLine() 'keep track of lines written so we can batch accordingly lineNum += 1 Dim fileNum = lineNum \ batchSize File.AppendAllText($"c:\temp\file{fileNum}.csv", sb.ToString()) 'clear the stringbuilder sb.Length = 0 End While If you really want to use a datatable, there isn't anything stopping you swapping this while dr For a For Each r as DataRow in myDatatable.Rows Please note, this isn't an exercise in creating a fully escaped csv, nor formatting the data; it is demonstrating the concept of having a firehose of data and simply writing it to N different files by utilising the fact that doing an integer divide on every number from 0 to 99 will result in 0 (and hence go in file 0) and then very number from 1 to 199 will result in 1 (and hence lines go in file 1) etc, and doing this process on a single stream of data, or single iteration of N items You could build the file lines in the string builder and write them every batchSize if lineNum Mod batchSize = batchSize - 1, if you feel that it would be more efficient than calling file appendalltext (which opens and closes the file)
Tested this with a table of a little over 1,500 records and 10 fields. The file creation took a little over 5 seconds (excluding data access). All things being equal (which I know they are not) that would be over 13 seconds writing the files. Since your problem was with the iterator I assume the there were no memory issues with the DataTable. You can include more than one database object in a Using block by using a comma to designate a list of objects in the Using. Private Sub OPCode() Dim myFilePath = "C:\Users\xxx\Documents\TestLoop\DataFile" Dim strsql = "Select * from myTable;" Dim dt As New DataTable Using cnn As New SqlConnection(connectionString), cmd As New SqlCommand(strsql, cnn) cnn.Open() dt.Load(cmd.ExecuteReader) End Using sw.Start() Dim StartRow = 0 Dim EndRow = 99 Dim FileNum = 1 Dim TopIndex = dt.Rows.Count - 1 Do For i = StartRow To EndRow Dim s = String.Join("|", dt.Rows(i).ItemArray) File.AppendAllText(myFilePath & FileNum & ".txt", s & Environment.NewLine) Next FileNum += 1 StartRow += 100 EndRow += 100 If EndRow >= TopIndex Then EndRow = TopIndex End If Loop Until StartRow >= TopIndex sw.Stop() MessageBox.Show(sw.ElapsedMilliseconds.ToString) End Sub
I thought your code was a great use of the iteration function. Here is the code for your iterator. Public Iterator Function getChunks(ByVal Tab As DataTable, ByVal size As Integer) As IEnumerable(Of IEnumerable(Of DataRow)) Dim chunk As List(Of DataRow) = New List(Of DataRow)(size) For Each row As DataRow In Tab.Rows chunk.Add(row) If chunk.Count = size Then Yield chunk chunk = New List(Of DataRow)(size) End If Next If chunk.Any() Then Yield chunk End Function Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click Dim dt = LoadDataTable() Dim myFilePath As String = "C:\Users\xxx\Documents\TestLoop\DataFile" Dim FileNum = 1 For Each chunk As IEnumerable(Of DataRow) In getChunks(dt, 100) For Each row As DataRow In chunk Dim s = String.Join("|", row.ItemArray) File.AppendAllText(myFilePath & FileNum & ".txt", s & Environment.NewLine) Next FileNum += 1 Next MessageBox.Show("Done") End Sub You just needed to nest the For Each to get at the data rows.
Performance improvement on vb.net code
I need to write 50 million records with 72 columns into text file, the file size is growing as 9.7gb . I need to check each and every column length need to format as according to the length as defined in XML file. Reading records from oracle one by one and checking the format and writing into text file. To write 5 crores records it is taking more than 24 hours. how to increase the performance in the below code. Dim valString As String = Nothing Dim valName As String = Nothing Dim valLength As String = Nothing Dim valDataType As String = Nothing Dim validationsArray As ArrayList = GetValidations(Directory.GetCurrentDirectory() + "\ReportFormat.xml") Console.WriteLine("passed xml") Dim k As Integer = 1 Try Console.WriteLine(System.DateTime.Now()) Dim selectSql As String = "select * from table where " record_date >= To_Date('01-01-2014','DD-MM-YYYY') and record_date <= To_Date('31-12-2014','DD-MM-YYYY')" Dim dataTable As New DataTable Dim oracleAccess As New OracleConnection(System.Configuration.ConfigurationManager.AppSettings("OracleConnection")) Dim cmd As New OracleCommand() cmd.Connection = oracleAccess cmd.CommandType = CommandType.Text cmd.CommandText = selectSql oracleAccess.Open() Dim Tablecolumns As New DataTable() Using oracleAccess Using writer = New StreamWriter(Directory.GetCurrentDirectory() + "\FileName.txt") Using odr As OracleDataReader = cmd.ExecuteReader() Dim sbHeaderData As New StringBuilder For i As Integer = 0 To odr.FieldCount - 1 sbHeaderData.Append(odr.GetName(i)) sbHeaderData.Append("|") Next writer.WriteLine(sbHeaderData) While odr.Read() Dim sbColumnData As New StringBuilder Dim values(odr.FieldCount - 1) As Object Dim fieldCount As Integer = odr.GetValues(values) For i As Integer = 0 To fieldCount - 1 Dim vals As Array = validationsArray(i).ToString.ToUpper.Split("|") valName = vals(0).trim valDataType = vals(1).trim valLength = vals(2).trim Select Case valDataType Case "VARCHAR2" If values(i).ToString().Length = valLength Then sbColumnData.Append(values(i).ToString()) 'sbColumnData.Append("|") ElseIf values(i).ToString().Length > valLength Then sbColumnData.Append(values(i).ToString().Substring(0, valLength)) 'sbColumnData.Append("|") Else sbColumnData.Append(values(i).ToString().PadRight(valLength)) 'sbColumnData.Append("|") End If Case "NUMERIC" valLength = valLength.Substring(0, valLength.IndexOf(",")) If values(i).ToString().Length = valLength Then sbColumnData.Append(values(i).ToString()) 'sbColumnData.Append("|") Else sbColumnData.Append(values(i).ToString().PadLeft(valLength, "0"c)) 'sbColumnData.Append("|") End If 'sbColumnData.Append((values(i).ToString())) End Select Next writer.WriteLine(sbColumnData) k = k + 1 Console.WriteLine(k) End While End Using writer.WriteLine(System.DateTime.Now()) End Using End Using Console.WriteLine(System.DateTime.Now()) 'Dim Adpt As New OracleDataAdapter(selectSql, oracleAccess) 'Adpt.Fill(dataTable) Return Tablecolumns Catch ex As Exception Console.WriteLine(System.DateTime.Now()) Console.WriteLine("Error: " & ex.Message) Console.ReadLine() Return Nothing End Try
How to convert memory stream to string array and vice versa
I have a code where i want to get an stream from an image and convert the memory stream to string array and store in a variable. But my problem is i also want to get the image from the string variable and paint on a picture box. If i use this like PictureBox1.Image = image.FromStream(memoryStream) I am able to print the picture on picture box. But this is not my need. I just want to get the image stream from file and convert the stream as text and store it to some string variable and again i want to use the string variable and convert it to stream to print the image on picture box. Here is my Code.(Vb Express 2008) Public Function ImageConversion(ByVal image As System.Drawing.Image) As String If image Is Nothing Then Return "" Dim memoryStream As System.IO.MemoryStream = New System.IO.MemoryStream image.Save(memoryStream, System.Drawing.Imaging.ImageFormat.Gif) Dim value As String = "" For intCnt As Integer = 0 To memoryStream.ToArray.Length - 1 value = value & memoryStream.ToArray(intCnt) & " " Next Dim strAsBytes() As Byte = New System.Text.UTF8Encoding().GetBytes(value) Dim ms As New System.IO.MemoryStream(strAsBytes) PictureBox1.Image = image.FromStream(ms) Return value End Function
This wouldn`t work in the way you have posted it (at least the part of recreating the image). See this: Public Function ImageConversion(ByVal image As System.Drawing.Image) As String If image Is Nothing Then Return "" Dim memoryStream As System.IO.MemoryStream = New System.IO.MemoryStream image.Save(memoryStream, System.Drawing.Imaging.ImageFormat.Gif) Dim value As String = "" Dim content As Byte() = memoryStream.ToArray() ' instead of repeatingly call memoryStream.ToArray by using ' memoryStream.ToArray(intCnt) For intCnt As Integer = 0 To content.Length - 1 value = value & content(intCnt) & " " Next value = value.TrimEnd() Return value End Function To recreate the image using the created string you can`t use Encoding.GetBytes() like you did, because you would get a bytearray which represents your string. E.g "123 32 123" you would not get a byte array with the elements 123, 32 , 123 Public Function ImageConversion(ByVal stringRepOfImage As String) As System.Drawing.Image Dim stringBytes As String() = stringRepOfImage.Split(New String() {" "}, StringSplitOptions.RemoveEmptyEntries) Dim bytes As New List(Of Byte) For intCount = 0 To stringBytes.Length - 1 Dim b As Byte If Byte.TryParse(stringBytes(intCount), b) Then bytes.Add(b) Else Throw new FormatException("Not a byte value") End If Next Dim ms As New System.IO.MemoryStream(bytes.ToArray) Return Image.FromStream(ms) End Function Reference: Byte.TryParse
parameter replacement in a string similar to mysql
I am trying to create a user modifiable string that will have already set data to replace certain patches of text example dim name1 as string = "John" dim state as string = "Virginia" dim idnumb as integer = 122 dim textlist1 as string textlist1 = "Hello {NAME}, I see you are from {STATE}, and your id number is {ID}." Ideally I would want to replace these tags with the set strings I am familiar with Replace(textlist1, "{NAME}", name1) my question is: is this the correct way to do this or is there a method more similar to the way we do parameters in MySql
Ideally you could use StringBuilder to avoid the reallocation of a new string for every replace. dim name1 as string = "John" dim state as string = "Virginia" dim idnumb as integer = 122 dim textlist1 as StringBuilder = new StringBuilder _ ("Hello {NAME}, I see you are from {STATE}, and your id number is {ID}.") textlist1.Replace("{NAME}", name1) As supposed the StringBuilder approach is not the best for performance related comparisons This is just a little benchmark executed via LinqPad Sub Main dim name1 as string = "John" dim state as string = "Virginia" dim idnumb as integer = 122 dim textlist1 as StringBuilder Dim sw = new Stopwatch() sw.Start() for i = 0 to 100000 textlist1 = new StringBuilder("Hello {NAME}, I see you are from {STATE}, and your id number is {ID}.") textlist1.Replace("{NAME}", name1) textlist1.Replace("{STATE}", state) textlist1.Replace("{ID}", idnumb.ToString()) Next sw.Stop() sw.Elapsed.Dump("StringBuilder.Replace") sw = new Stopwatch() sw.Start() for i = 0 to 100000 Dim test = "Hello {NAME}, I see you are from {STATE}, and your id number is {ID}." Dim test2 = test.Replace("{NAME}", name1) Dim test3 = test2.Replace("{STATE}", state) Dim test4 = test3.Replace("{ID}", idnumb.ToString()) Next sw.Stop() sw.Elapsed.Dump("String.Replace") End Sub StringBuilder.Replace 00:00:00.2795878 String.Replace 00:00:00.1642420 So it seems that the cost to allocate a string builder outweight the cost to use a fixed interned string. Of course the memory fragmentation caused by the immutable string behavior is not easily measurable.