I have code that produces a csv file (with headers) that contain fields identifying pixel locations of points of interest in selected images. Unfortunately, there are occasional duplicates that cause issues later in my production process. My plan to solve this is to open the csv file in the midst of the application, read the data into a data table, sort the data table with a data view using 4 sort keys (I am using the default Column1, Column2 etc) as i cannot figure out ho to use the real column header names contained in the first row. I have attempted to write the resulting rows to a new file, and I get the correct number of rows in the output csv file, but the data in the row is missing and i get a text string ' System.Data.DataRowView, not the 12 fields i was expecting.
In the code, my most recent attempt is to create a writeOutPutLine containing the concatenated contents of the row, in my desired sort order.
I have attempted to use CSVHelper with no success, and also tried about 10 other code sets posted on the various web sites.
Here is my code
Sub New_Csv_Sort()
Dim linecount As Integer = 1
Dim dt As DataTable = New DataTable
Dim dgv As DataGridView = New DataGridView
Dim TextLine As String = ""
Dim SplitLine() As String
Dim objReader As New System.IO.StreamReader(TargetFileName, Encoding.ASCII)
Dim ColumnCount As Integer = 0
For ColumnCount = 1 To 13 Step 1
dt.Columns.Add()
Next
Dim firstline As Boolean = True
Do While objReader.Peek() <> -1
TextLine = objReader.ReadLine()
SplitLine = Split(TextLine, ",")
dt.Rows.Add(SplitLine)
If FirstTime = True Then
' Here I was trying to use the first line to set the column labels but no joy
' For ColumnCount = 1 To 12 Step 1
' dt.datacolumn.
' Next
End If
Loop
' This next is commentted out in the sort statement because I could not get the column names to be recognised
'.Sort = "Datum ASC, Picture ASC, Y_Loc ASC, X_Loc, ASC"
Dim dv As New DataView(dt) With {
.Sort = "Column5 ASC, Column1 ASC, Column12 ASC, Column11 ASC"
}
FileOpen(SortedTargetFileNumber, SortedTargetFileName, OpenMode.Output)
'I have looked that the data and it is read in correctly
But my output line is trash!
For Each Rowview As DataRowView In dv
'WriteOutPutLine = dv.row
For Each DataColumn In dt.Columns
WriteOutPutLine = WriteOutPutLine & Row[col].tostring() & ","
Next
WriteOutPutLine = Rowview.ToString
PrintLine(SortedTargetFileNumber, WriteOutPutLine)
linecount = linecount + 1
Next
FileClose(SortedTargetFileNumber)
MsgBox("Sorted Lines = " & linecount)
End Sub
Sub New_Csv_Sort()
Dim dt As DataTable = New DataTable
Dim TextLine As String = ""
Dim SplitLine() As String
Dim objReader As New System.IO.StreamReader(TargetFileName, Encoding.ASCII)
Dim ColumnCount As Integer = 0
'Create columns for the data -
For ColumnCount = 1 To 13 Step 1
dt.Columns.Add()
Next
'Load the datatable
Do While objReader.Peek() <> -1
TextLine = objReader.ReadLine()
SplitLine = Split(TextLine, ",")
dt.Rows.Add(SplitLine)
Loop
'This is the sort order based on the header titles, but is translated the the default column name
'.Sort = "Datum ASC, Picture ASC, Y_Loc ASC, X_Loc ASC"
Dim dv As New DataView(dt) With {
.Sort = "Column5 ASC, Column1 ASC, Column12 ASC, Column11 ASC"
}
'Open the output file
FileOpen(SortedTargetFileNumber, SortedTargetFileName, OpenMode.Output)
'Create a string of all the rows in the dataview separated by commas
Dim DataLine As String = ""
For Each r As DataRowView In dv
DataLine &= String.Join(",", r.Row.ItemArray) & Environment.NewLine
Next
' Write the data to a file
WriteOutPutLine = DataLine
PrintLine(SortedTargetFileNumber, WriteOutPutLine)
FileClose(SortedTargetFileNumber)
' and we are done
End Sub
try this function
Private Function ConvertToCSV(ByVal dt As DataTable) As String
Dim sb As New Text.StringBuilder()
For Each row As DataRow In dt.Rows
sb.AppendLine(String.Join(",", (From i As Object In row.ItemArray Select i.ToString().Replace("""", """""").Replace(",", "\,").Replace(Environment.NewLine, "\" & Environment.NewLine).Replace("\", "\\")).ToArray()))
Next
Return sb.ToString()
End Function
Related
I am fetching distinct words in a string column of a DataTable (.dt) and then replacing the unique values with another value, so essentially changing words to other words. Both approaches listed below work, however, for 90k records, the process is not very fast. Is there a way to speed up either approach?
The first approach, is as follows:
'fldNo is column number in dt
For Each Word As String In DistinctWordList
Dim myRow() As DataRow
myRow = dt.Select(MyColumnName & "='" & Word & "'")
For Each row In myRow
row(fldNo) = dicNewWords(Word)
Next
Next
A second LINQ-based approach is as follows, and is actually not very fast either:
Dim flds as new List(of String)
flds.Add(myColumnName)
For Each Word As String In DistinctWordsList
Dim rowData() As DataRow = dt.AsEnumerable().Where(Function(f) flds.Where(Function(el) f(el) IsNot DBNull.Value AndAlso f(el).ToString = Word).Count = flds.Count).ToArray
ReDim foundrecs(rowData.Count)
Cnt = 0
For Each row As DataRow In rowData
Dim Index As Integer = dt.Rows.IndexOf(row)
foundrecs(Cnt) = Index + 1 'row.RowId
Cnt += 1
Next
For i = 0 To Cnt
dt(foundrecs(i))(fldNo) = dicNewWords(Word)
Next
Next
So you have your dictionary of replacements:
Dim d as New Dictionary(Of String, String)
d("foo") = "bar"
d("baz") = "buf"
You can apply them to your table's ReplaceMe column:
Dim rep as String = Nothing
For Each r as DataRow In dt.Rows
If d.TryGetValue(r.Field(Of String)("ReplaceMe"), rep) Then r("ReplaceMe") = rep
Next r
On my machine it takes 340ms for 1 million replacements. I can cut that down to 260ms by using column number rather than name - If d.TryGetValue(r.Field(Of String)(0), rep) Then r(0) = rep
Timing:
'setup, fill a dict with string replacements like "1" -> "11", "7" -> "17"
Dim d As New Dictionary(Of String, String)
For i = 0 To 9
d(i.ToString()) = (i + 10).ToString()
Next
'put a million rows in a datatable, randomly assign dictionary keys as row values
Dim dt As New DataTable
dt.Columns.Add("ReplaceMe")
Dim r As New Random()
Dim k = d.Keys.ToArray()
For i = 1 To 1000000
dt.Rows.Add(k(r.Next(k.Length)))
Next
'what range of values do we have in our dt?
Dim minToMaxBefore = dt.Rows.Cast(Of DataRow).Min(Function(ro) ro.Field(Of String)("ReplaceMe")) & " - " & dt.Rows.Cast(Of DataRow).Max(Function(ro) ro.Field(Of String)("ReplaceMe"))
'it's a crappy way to time, but it'll prove the point
Dim start = DateTime.Now
Dim rep As String = Nothing
For Each ro As DataRow In dt.Rows
If d.TryGetValue(ro.Field(Of String)("ReplaceMe"), rep) Then ro("ReplaceMe") = rep
Next
Dim ennd = DateTime.Now
'what range of values do we have now
Dim minToMaxAfter = dt.Rows.Cast(Of DataRow).Min(Function(ro) ro.Field(Of String)("ReplaceMe")) & " - " & dt.Rows.Cast(Of DataRow).Max(Function(ro) ro.Field(Of String)("ReplaceMe"))
MessageBox.Show($"min to max before of {minToMaxBefore} became {minToMaxAfter} proving replacements occurred, it took {(ennd - start).TotalMilliseconds} ms for 1 million replacements")
hello this is my firs thread ,
i'm trying to convert description of this page (https://www.tokopedia.com/indoislamicstore/cream-zaitun-arofah)
with regex and replace <br/> tag with new line and convert it to csv .
the datagridview it's alright but the csv got screwed
this is my code :
Dim dskrip As New System.Text.RegularExpressions.Regex("<p itemprop=""description"" class=""mt-20"">(.*?)\<\/p>\<\/div>")
Dim dskripm As MatchCollection = dskrip.Matches(rssourcecode0)
For Each itemdskrm As Match In dskripm
getdeskripsinew = itemdskrm.Groups(1).Value
Next
Dim deskripsinew As String = Replace(getdeskripsinew, ",", ";")
Dim deskripsitotal As String = Replace(deskripsinew, "<br/>", Environment.NewLine)
' ListView1.s = Environment.NewLine & deskripsinew
txtDeskripsi.Text = deskripsitotal
datascrapes.ColumnCount = 5
datascrapes.Columns(0).Name = "Title"
datascrapes.Columns(1).Name = "Price"
datascrapes.Columns(2).Name = "Deskripsi"
datascrapes.Columns(3).Name = "Gambar"
datascrapes.Columns(4).Name = "Total Produk"
Dim row As String() = New String() {getname, totalprice, deskripsitotal, directoryme + getfilename, "10"}
datascrapes.Rows.Add(row)
Dim filePath As String = Environment.GetFolderPath(Environment.SpecialFolder.Desktop) & "\" & "Tokopedia_Upload.csv"
Dim delimeter As String = ","
Dim sb As New StringBuilder
For i As Integer = 0 To datascrapes.Rows.Count - 1
Dim array As String() = New String(datascrapes.Columns.Count - 1) {}
If i.Equals(0) Then
For j As Integer = 0 To datascrapes.Columns.Count - 1
array(j) = datascrapes.Columns(j).HeaderText
Next
sb.AppendLine(String.Join(delimeter, array))
End If
For j As Integer = 0 To datascrapes.Columns.Count - 1
If Not datascrapes.Rows(i).IsNewRow Then
array(j) = datascrapes(j, i).Value.ToString
End If
Next
If Not datascrapes.Rows(i).IsNewRow Then
sb.AppendLine(String.Join(delimeter, array))
End If
Next
File.WriteAllText(filePath, sb.ToString)
this is the csv file
I'm not sure where your problem is looking at the CSV file, but there are certain cases where you'll want to quote the values for a CSV. There's no official spec but RFC 4180 is often used as an unofficial standard. I would recommend using a library like CSV Helper
So, quite simple.
I am importing CSVs into a datagrid, though the csv always has to have a variable amount of columns.
For 3 Columns, I use this code:
Dim sr As New IO.StreamReader("E:\test.txt")
Dim dt As New DataTable
Dim newline() As String = sr.ReadLine.Split(";"c)
dt.Columns.AddRange({New DataColumn(newline(0)), _
New DataColumn(newline(1)), _
New DataColumn(newline(2))})
While (Not sr.EndOfStream)
newline = sr.ReadLine.Split(";"c)
Dim newrow As DataRow = dt.NewRow
newrow.ItemArray = {newline(0), newline(1), newline(2)}
dt.Rows.Add(newrow)
End While
DG1.DataSource = dt
This works perfectly. But how do I count the number of "newline"s ?
Can I issue a count on the number of newlines somehow? Any other example code doesn't issue column heads.
If my csv file has 5 columns, I would need an Addrange of 5 instead of 3 and so on..
Thanks in advance
Dim sr As New IO.StreamReader(path)
Dim dt As New DataTable
Dim newline() As String = sr.ReadLine.Split(","c)
' MsgBox(newline.Count)
' dt.Columns.AddRange({New DataColumn(newline(0)),
' New DataColumn(newline(1)),
' New DataColumn(newline(2))})
Dim i As Integer
For i = 0 To newline.Count - 1
dt.Columns.AddRange({New DataColumn(newline(i))})
Next
While (Not sr.EndOfStream)
newline = sr.ReadLine.Split(","c)
Dim newrow As DataRow = dt.NewRow
newrow.ItemArray = {newline(0), newline(1)}
dt.Rows.Add(newrow)
End While
dgv.DataSource = dt
End Sub
Columns and item values can be added to a DataTable individually, using dt.Columns.Add and newrow.Item, so that these can be done in a loop instead of hard-coding for a specific number of columns. e.g. (this code assumes Option Infer On, so adjust as needed):
Public Function CsvToDataTable(csvName As String, Optional delimiter As Char = ","c) As DataTable
Dim dt = New DataTable()
For Each line In File.ReadLines(csvName)
If dt.Columns.Count = 0 Then
For Each part In line.Split({delimiter})
dt.Columns.Add(New DataColumn(part))
Next
Else
Dim row = dt.NewRow()
Dim parts = line.Split({delimiter})
For i = 0 To parts.Length - 1
row(i) = parts(i)
Next
dt.Rows.Add(row)
End If
Next
Return dt
End Function
You could then use it like:
Dim dt = CsvToDataTable("E:\test.txt", ";"c)
DG1.DataSource = dt
Let me explain it on an excel sheet. I have few txt files in directory (f.txt, d.txt, s.txt, a.txt and q.txt). Each file has few lines of text. And I want to combine those files but in specific way - it is shown on screenshot.
and output should be:
I've already made a code but it doesn't work - I don't know why.
Dim fileEntries As String() = Directory.GetFiles("D:\dir\", "*.txt")
' Process the list of .txt files found in the directory. '
Dim i As Integer = 0
Dim filesCount As Integer = Directory.GetFiles("D:\dir\", "*.txt").Count
Do Until i = filesCount
'do it for every file in folder'
i = i + 1
Dim reader As New System.IO.StreamReader(fileEntries(i))
Dim files() As String = reader.ReadToEnd.Split({Environment.NewLine}, StringSplitOptions.RemoveEmptyEntries)
Dim lineCount = File.ReadAllLines(fileEntries(i)).Length
Dim w As Integer = 0
Dim dt As DataTable
dt.Columns.Add(i)
'add column "0" for file 1, "1" for file 2 etc.'
Do Until w = lineCount
dt.Rows.Add(files(w))
'write each line in file 1 to column 0, etc.'
w = w + 1
Loop
Loop
Can somebody help me?
Read/write
If your goal is as shown in the last image, write back to a file named output.txt, then this can be done in a single line of code.
My.Computer.FileSystem.WriteAllText("D:\dir\output.txt", String.Join(Environment.NewLine, (From path As String In Directory.GetFiles("D:\dir", "*.txt") Where IO.Path.GetFileNameWithoutExtension(path) <> "output" Select My.Computer.FileSystem.ReadAllText(path, Encoding.UTF8))), False, Encoding.UTF8)
You can of course make this a bit more readable if you don't like one-liners.
My.Computer.FileSystem.WriteAllText(
"D:\dir\output.txt",
String.Join(
Environment.NewLine,
(
From
path As String
In
Directory.GetFiles("D:\dir", "*.txt")
Where
IO.Path.GetFileNameWithoutExtension(path) <> "output"
Select
My.Computer.FileSystem.ReadAllText(path, Encoding.UTF8)
)
),
False,
Encoding.UTF8
)
Iterate
If you need to iterate each line and/or each file, store the result in a local variable.
Dim files As IEnumerable(Of String()) = (
From
path As String
In
Directory.GetFiles("D:\dir", "*.txt")
Select
My.Computer.FileSystem.ReadAllText(path, Encoding.UTF8).Split({Environment.NewLine}, StringSplitOptions.RemoveEmptyEntries)
)
For Each file As String() In files
For Each line As String In file
Next
Next
DataSet
If you need to create a DataSet from the result, then take advantage of anonymous types. This way you can store both the name of the file and its lines.
Dim files = (
From
path As String
In
Directory.GetFiles("D:\dir", "*.txt")
Select
New With {
Key .Name = IO.Path.GetFileNameWithoutExtension(path),
.Lines = My.Computer.FileSystem.ReadAllText(path, Encoding.UTF8).Split({Environment.NewLine}, StringSplitOptions.RemoveEmptyEntries)
}
)
Dim data As New DataSet()
With data
.BeginInit()
For Each item In files
With data.Tables.Add(item.Name)
.BeginInit()
.Columns.Add("Column1", GetType(String))
.EndInit()
.BeginLoadData()
For Each line As String In item.Lines
.Rows.Add(line)
Next
.EndLoadData()
End With
Next
.EndInit()
End With
There are few problems in your code:
Your datatable was not initialized
value of w is exceed than the size of files array
Note: I use DataSet to add each DataTable, However you can remove it if it's not required.
Try following code:
Dim fileEntries As String() = Directory.GetFiles("C:\dir\", "*.txt")
' Process the list of .txt files found in the directory. '
Dim filesCount As Integer = Directory.GetFiles("C:\dir\", "*.txt").Count()
Dim ds As New DataSet()
For i As Integer = 0 To filesCount - 1
'do it for every file in folder'
i = i + 1
Dim reader As New System.IO.StreamReader(fileEntries(i))
Dim files As String() = reader.ReadToEnd().Split(New String() {Environment.NewLine}, StringSplitOptions.RemoveEmptyEntries)
Dim lineCount = File.ReadAllLines(fileEntries(i)).Length
Dim w As Integer = 0
Dim dt As New DataTable()
dt.Columns.Add(i.ToString())
'add column "0" for file 1, "1" for file 2 etc.'
While w <> lineCount
If files.Length = w AndAlso w <> 0 Then
Continue While
End If
dt.Rows.Add(files(w))
'write each line in file 1 to column 0, etc.'
w = w + 1
End While
ds.Tables.Add(dt)
Next
I have a CSV file that we extract from a core system. A recent development change to the system has made there CSV file data contain duplicate column/field names.
So the fields appear more than once.
Creator First Name
Creator Last Name
Creator Salary Number
Revenue
Contact
Sales Team
Channel
This data we then upload via a SSIS/DTS package. The package won't actually run or work with Duplicate field names. So we need to remove or rename the duplicate fields.
So I was thinking of creating a C# or VB script that renames the duplicate field names with a post fix of 1 when it counts a field name of more than 1.
Is there a easy way of doing this through the Microsoft.VisualBasic.FileIO.TextFieldParser or something similiar? Not sure if someone has encountered a similiar issue.
I have added a Script Task that runs a Visual Basic 2008 code (below) WIP;
Public Sub Main()
'Check if file exist
Dim filename As String = Dts.Variables.Item("fileName").Value
If Not System.IO.File.Exists(filename) Then
Exit Sub
End If
Dim csvFileArray(1, 1) As String
Dim newCsvFileArray(1, 1) As String
Dim streamReader As StreamReader = File.OpenText(filename)
Dim strlines() As String
strlines = streamReader.ReadToEnd().Split(Environment.NewLine)
' Redimension the array.
Dim num_rows As Long
Dim num_cols As Long
'Dim counter As Integer = 0 '' >> Idea is to add this to the array as a 3rd dimension so we can count each field
Dim fieldCounter As Integer = 0
Dim strline() As String
num_rows = UBound(strlines)
strline = strlines(0).Split(",")
num_cols = UBound(strline)
ReDim csvFileArray(num_rows, num_cols)
ReDim newCsvFileArray(num_rows, num_cols)
' Copy the data into the array.
Dim fields As Integer
Dim rows As Integer
For rows = 0 To num_rows - 1
strline = strlines(rows).Split(",")
For fields = 0 To num_cols - 1
csvFileArray(rows, fields) = strline(fields)
Next
Next
Dim currentField As String = ""
Dim comparrisionField As String = ""
Dim newRows As Integer = 0
Dim newFields As Integer = 0
rows = 0
' Compare the current array to if they match anything in the new array
For rows = 0 To num_rows - 1
For fields = 0 To num_cols - 1
currentField = csvFileArray(rows, fields)
If rows = 0 Then
' If we are dealing with Fields i.e Row=0
For newFields = 0 To num_cols - 1
comparrisionField = newCsvFileArray(newRows, newFields)
If String.IsNullOrEmpty(currentField) Then
Else
If currentField.Equals(comparrisionField) Then
If currentField <> "" Then
fieldCounter = fieldCounter + 1
' if we have already added this column, then append a number
If fieldCounter >= 1 Then
currentField = currentField + " " + CStr(fieldCounter)
End If
End If
End If
End If
Next
Else
' This means we are dealing with the Rows i/e not row = 0
currentField = currentField
End If
newRows = 0
newFields = 0
fieldCounter = 0
' save currentField in the same layout as the initial file
newCsvFileArray(rows, fields) = currentField
Next
Next
' Amend the duplicate field names
Dim sw As New System.IO.StreamWriter(Left(filename, Len(filename) - Len(Right(filename, 4))) + "_amended.csv")
rows = 0
Dim currentLine As String = ""
For rows = 0 To num_rows - 1
For fields = 0 To num_cols - 1
' Save the data back to the filesystem
If fields = 0 Then
currentLine = newCsvFileArray(rows, fields)
Else
currentLine = currentLine + "," + newCsvFileArray(rows, fields)
End If
Next
If currentLine <> "" Then
sw.WriteLine(currentLine.Replace(vbCr, "").Replace(vbLf, ""))
End If
Next
sw.Close()
Dts.TaskResult = ScriptResults.Success
End Sub