CSV duplicate field names - vb.net

I have a CSV file that we extract from a core system. A recent development change to the system has made there CSV file data contain duplicate column/field names.
So the fields appear more than once.
Creator First Name
Creator Last Name
Creator Salary Number
Revenue
Contact
Sales Team
Channel
This data we then upload via a SSIS/DTS package. The package won't actually run or work with Duplicate field names. So we need to remove or rename the duplicate fields.
So I was thinking of creating a C# or VB script that renames the duplicate field names with a post fix of 1 when it counts a field name of more than 1.
Is there a easy way of doing this through the Microsoft.VisualBasic.FileIO.TextFieldParser or something similiar? Not sure if someone has encountered a similiar issue.
I have added a Script Task that runs a Visual Basic 2008 code (below) WIP;
Public Sub Main()
'Check if file exist
Dim filename As String = Dts.Variables.Item("fileName").Value
If Not System.IO.File.Exists(filename) Then
Exit Sub
End If
Dim csvFileArray(1, 1) As String
Dim newCsvFileArray(1, 1) As String
Dim streamReader As StreamReader = File.OpenText(filename)
Dim strlines() As String
strlines = streamReader.ReadToEnd().Split(Environment.NewLine)
' Redimension the array.
Dim num_rows As Long
Dim num_cols As Long
'Dim counter As Integer = 0 '' >> Idea is to add this to the array as a 3rd dimension so we can count each field
Dim fieldCounter As Integer = 0
Dim strline() As String
num_rows = UBound(strlines)
strline = strlines(0).Split(",")
num_cols = UBound(strline)
ReDim csvFileArray(num_rows, num_cols)
ReDim newCsvFileArray(num_rows, num_cols)
' Copy the data into the array.
Dim fields As Integer
Dim rows As Integer
For rows = 0 To num_rows - 1
strline = strlines(rows).Split(",")
For fields = 0 To num_cols - 1
csvFileArray(rows, fields) = strline(fields)
Next
Next
Dim currentField As String = ""
Dim comparrisionField As String = ""
Dim newRows As Integer = 0
Dim newFields As Integer = 0
rows = 0
' Compare the current array to if they match anything in the new array
For rows = 0 To num_rows - 1
For fields = 0 To num_cols - 1
currentField = csvFileArray(rows, fields)
If rows = 0 Then
' If we are dealing with Fields i.e Row=0
For newFields = 0 To num_cols - 1
comparrisionField = newCsvFileArray(newRows, newFields)
If String.IsNullOrEmpty(currentField) Then
Else
If currentField.Equals(comparrisionField) Then
If currentField <> "" Then
fieldCounter = fieldCounter + 1
' if we have already added this column, then append a number
If fieldCounter >= 1 Then
currentField = currentField + " " + CStr(fieldCounter)
End If
End If
End If
End If
Next
Else
' This means we are dealing with the Rows i/e not row = 0
currentField = currentField
End If
newRows = 0
newFields = 0
fieldCounter = 0
' save currentField in the same layout as the initial file
newCsvFileArray(rows, fields) = currentField
Next
Next
' Amend the duplicate field names
Dim sw As New System.IO.StreamWriter(Left(filename, Len(filename) - Len(Right(filename, 4))) + "_amended.csv")
rows = 0
Dim currentLine As String = ""
For rows = 0 To num_rows - 1
For fields = 0 To num_cols - 1
' Save the data back to the filesystem
If fields = 0 Then
currentLine = newCsvFileArray(rows, fields)
Else
currentLine = currentLine + "," + newCsvFileArray(rows, fields)
End If
Next
If currentLine <> "" Then
sw.WriteLine(currentLine.Replace(vbCr, "").Replace(vbLf, ""))
End If
Next
sw.Close()
Dts.TaskResult = ScriptResults.Success
End Sub

Related

Cannot create a csv file from a datatable and dataview

I have code that produces a csv file (with headers) that contain fields identifying pixel locations of points of interest in selected images. Unfortunately, there are occasional duplicates that cause issues later in my production process. My plan to solve this is to open the csv file in the midst of the application, read the data into a data table, sort the data table with a data view using 4 sort keys (I am using the default Column1, Column2 etc) as i cannot figure out ho to use the real column header names contained in the first row. I have attempted to write the resulting rows to a new file, and I get the correct number of rows in the output csv file, but the data in the row is missing and i get a text string ' System.Data.DataRowView, not the 12 fields i was expecting.
In the code, my most recent attempt is to create a writeOutPutLine containing the concatenated contents of the row, in my desired sort order.
I have attempted to use CSVHelper with no success, and also tried about 10 other code sets posted on the various web sites.
Here is my code
Sub New_Csv_Sort()
Dim linecount As Integer = 1
Dim dt As DataTable = New DataTable
Dim dgv As DataGridView = New DataGridView
Dim TextLine As String = ""
Dim SplitLine() As String
Dim objReader As New System.IO.StreamReader(TargetFileName, Encoding.ASCII)
Dim ColumnCount As Integer = 0
For ColumnCount = 1 To 13 Step 1
dt.Columns.Add()
Next
Dim firstline As Boolean = True
Do While objReader.Peek() <> -1
TextLine = objReader.ReadLine()
SplitLine = Split(TextLine, ",")
dt.Rows.Add(SplitLine)
If FirstTime = True Then
' Here I was trying to use the first line to set the column labels but no joy
' For ColumnCount = 1 To 12 Step 1
' dt.datacolumn.
' Next
End If
Loop
' This next is commentted out in the sort statement because I could not get the column names to be recognised
'.Sort = "Datum ASC, Picture ASC, Y_Loc ASC, X_Loc, ASC"
Dim dv As New DataView(dt) With {
.Sort = "Column5 ASC, Column1 ASC, Column12 ASC, Column11 ASC"
}
FileOpen(SortedTargetFileNumber, SortedTargetFileName, OpenMode.Output)
'I have looked that the data and it is read in correctly
But my output line is trash!
For Each Rowview As DataRowView In dv
'WriteOutPutLine = dv.row
For Each DataColumn In dt.Columns
WriteOutPutLine = WriteOutPutLine & Row[col].tostring() & ","
Next
WriteOutPutLine = Rowview.ToString
PrintLine(SortedTargetFileNumber, WriteOutPutLine)
linecount = linecount + 1
Next
FileClose(SortedTargetFileNumber)
MsgBox("Sorted Lines = " & linecount)
End Sub
Sub New_Csv_Sort()
Dim dt As DataTable = New DataTable
Dim TextLine As String = ""
Dim SplitLine() As String
Dim objReader As New System.IO.StreamReader(TargetFileName, Encoding.ASCII)
Dim ColumnCount As Integer = 0
'Create columns for the data -
For ColumnCount = 1 To 13 Step 1
dt.Columns.Add()
Next
'Load the datatable
Do While objReader.Peek() <> -1
TextLine = objReader.ReadLine()
SplitLine = Split(TextLine, ",")
dt.Rows.Add(SplitLine)
Loop
'This is the sort order based on the header titles, but is translated the the default column name
'.Sort = "Datum ASC, Picture ASC, Y_Loc ASC, X_Loc ASC"
Dim dv As New DataView(dt) With {
.Sort = "Column5 ASC, Column1 ASC, Column12 ASC, Column11 ASC"
}
'Open the output file
FileOpen(SortedTargetFileNumber, SortedTargetFileName, OpenMode.Output)
'Create a string of all the rows in the dataview separated by commas
Dim DataLine As String = ""
For Each r As DataRowView In dv
DataLine &= String.Join(",", r.Row.ItemArray) & Environment.NewLine
Next
' Write the data to a file
WriteOutPutLine = DataLine
PrintLine(SortedTargetFileNumber, WriteOutPutLine)
FileClose(SortedTargetFileNumber)
' and we are done
End Sub
try this function
Private Function ConvertToCSV(ByVal dt As DataTable) As String
Dim sb As New Text.StringBuilder()
For Each row As DataRow In dt.Rows
sb.AppendLine(String.Join(",", (From i As Object In row.ItemArray Select i.ToString().Replace("""", """""").Replace(",", "\,").Replace(Environment.NewLine, "\" & Environment.NewLine).Replace("\", "\\")).ToArray()))
Next
Return sb.ToString()
End Function

Adding rows to a DataGridView control causes crash but only on second attempt

I have a DataGridView (dgvNew) which is populated by a JSON file which is located by a FileSystemWatcher, data is added row by row after being read. It works fine on first file. But if i trigger a new file by copying and pasting the same JSON file it adds the rows again row by row as id expect, but then the whole form crashes with no error.
I've tried TRY..CATCH with WHILE loops for opened the files which works in terms of openning them and adding rows, i just don't understand why it crashes. The code continues to step through regardless even though the form is frozen ? is it Thread related ?
Public Sub subParseJSONs(strFilePath As String, strDesiredField As String)
Dim json As String
Dim strMachine As String
Dim read As New Newtonsoft.Json.Linq.JObject
Dim booErrorJSNOArrRead As Boolean
Dim i As Integer
Dim dgvIndex As Integer
Dim booOpened As Boolean
Dim k As Integer, j As Integer
booOpened = False
k = 1
j = 1
json = Nothing
While json Is Nothing
Try
j = j + 1
If j = 10 Then
MessageBox.Show("J integer reached 10")
Exit While
Exit Try
End If
json = Replace(Replace(System.IO.File.ReadAllText(strFilePath), vbLf, ""), vbTab, "")
read = Newtonsoft.Json.Linq.JObject.Parse(json)
Catch ex As IOException
'MessageBox.Show(ex.Message)
Threading.Thread.Sleep(300)
'GoTo EndOfSUb
Catch ex As Exception
'MessageBox.Show(ex.Message)
Threading.Thread.Sleep(300)
'GoTo EndOfSUb
Finally
booOpened = True
End Try
End While
booErrorJSNOArrRead = False
i = 0
dgvNew.ColumnCount = 6
dgvNew.Columns(0).Name = "TempID"
dgvNew.Columns(1).Name = "DriverName"
dgvNew.Columns(2).Name = "Seat"
dgvNew.Columns(3).Name = "RaceTime"
dgvNew.Columns(4).Name = "ResultTime"
dgvNew.Columns(5).Name = "CarDriven"
dgvNew.RefreshEdit()
dgvNew.Refresh()
Do Until i = read.Item("Result").Count
If Not read.Item("Result")(i)("DriverName") = "" Then
Dim milliseconds As Double = Convert.ToDouble(read.Item("Result")(i)("TotalTime"))
Dim ts As TimeSpan = TimeSpan.FromMilliseconds(milliseconds)
Dim strMMSSmmm As String = ts.Minutes.ToString & ":" & ts.Seconds.ToString & "." & ts.Milliseconds.ToString
Dim row As String() = New String() {i + 1,
read.Item("Result")(i)("DriverName"),
read.Item("Result")(i)("DriverName"),
strMMSSmmm,
DateTime.Now, read.Item("Result")(i)("CarModel")}
dgvNew.Rows.Add(row)
End If
i = i + 1
Loop
read = Nothing
End Sub
I'm expecting new rows to be added to the bottom of dgvNew, which they are, but then it crashes ?

vb.net how do i add long text into csv

hello this is my firs thread ,
i'm trying to convert description of this page (https://www.tokopedia.com/indoislamicstore/cream-zaitun-arofah)
with regex and replace <br/> tag with new line and convert it to csv .
the datagridview it's alright but the csv got screwed
this is my code :
Dim dskrip As New System.Text.RegularExpressions.Regex("<p itemprop=""description"" class=""mt-20"">(.*?)\<\/p>\<\/div>")
Dim dskripm As MatchCollection = dskrip.Matches(rssourcecode0)
For Each itemdskrm As Match In dskripm
getdeskripsinew = itemdskrm.Groups(1).Value
Next
Dim deskripsinew As String = Replace(getdeskripsinew, ",", ";")
Dim deskripsitotal As String = Replace(deskripsinew, "<br/>", Environment.NewLine)
' ListView1.s = Environment.NewLine & deskripsinew
txtDeskripsi.Text = deskripsitotal
datascrapes.ColumnCount = 5
datascrapes.Columns(0).Name = "Title"
datascrapes.Columns(1).Name = "Price"
datascrapes.Columns(2).Name = "Deskripsi"
datascrapes.Columns(3).Name = "Gambar"
datascrapes.Columns(4).Name = "Total Produk"
Dim row As String() = New String() {getname, totalprice, deskripsitotal, directoryme + getfilename, "10"}
datascrapes.Rows.Add(row)
Dim filePath As String = Environment.GetFolderPath(Environment.SpecialFolder.Desktop) & "\" & "Tokopedia_Upload.csv"
Dim delimeter As String = ","
Dim sb As New StringBuilder
For i As Integer = 0 To datascrapes.Rows.Count - 1
Dim array As String() = New String(datascrapes.Columns.Count - 1) {}
If i.Equals(0) Then
For j As Integer = 0 To datascrapes.Columns.Count - 1
array(j) = datascrapes.Columns(j).HeaderText
Next
sb.AppendLine(String.Join(delimeter, array))
End If
For j As Integer = 0 To datascrapes.Columns.Count - 1
If Not datascrapes.Rows(i).IsNewRow Then
array(j) = datascrapes(j, i).Value.ToString
End If
Next
If Not datascrapes.Rows(i).IsNewRow Then
sb.AppendLine(String.Join(delimeter, array))
End If
Next
File.WriteAllText(filePath, sb.ToString)
this is the csv file
I'm not sure where your problem is looking at the CSV file, but there are certain cases where you'll want to quote the values for a CSV. There's no official spec but RFC 4180 is often used as an unofficial standard. I would recommend using a library like CSV Helper

only read certain columns from a csv

so I have a csv file that has extra commas in it. I know I won't ever need anything after a specific column. So basically any information after column 12 I won't need. I don't have a say on how the csv looks when it gets to me, so I can't change it there. I was wondering if there is a way to just read the first 12 columns and ignore the rest of the csv file.
this is what the code looks like now.
thank you for any help
Private Sub GetData(ByVal Path As String, ByRef DG As DataGridView, Optional ByVal NoHeader As Boolean = False)
Dim Fields(100) As String
Dim Start As Integer = 1
If NoHeader Then Start = 0
If Not File.Exists(Path) Then
Return
End If
Dim Lines() As String = File.ReadAllLines(Path)
Lines(0) = Lines(0).Replace(Chr(34), "")
Fields = Lines(0).Split(",")
If NoHeader Then
For I = 1 To Fields.Count - 1
Fields(I) = Str(I)
Next
End If
dt = New DataTable()
For Each Header As String In Fields
dt.Columns.Add(New DataColumn(Header.Trim()))
Dim desiredSize As Integer = 11
While dt.Columns.Count > desiredSize
dt.Columns.RemoveAt(desiredSize)
End While
Next
For I = Start To Lines.Count - 1
Lines(I) = Lines(I).Replace(Chr(34), "")
Fields = Lines(I).Split(",")
Dim dr As DataRow = dt.Rows.Add()
For j = 0 To Fields.Count - 1
dr(j) = Fields(j).Trim()
Next
Next
DG.DataSource = dt
End Sub
Really all you need to do is, in the for loop where you iterate through Fields at the bottom, replace For j = 0 to Fields.Count - 1 with For j = 0 to 11.

vb.net - Why do I get this error when trying to bubble sort a csv file?

I have a csv file which I'm trying to sort by data (numerical form)
The csv file:
date, name, phone number, instructor name
1308290930,jim,041231232,sushi
123123423,jeremy,12312312,albert
The error I get is: Conversion from string "jeremy" to type 'double'is not valid
Even though no where in my code I mention double...
My code:
Public Class Form2
Dim currentRow As String()
Dim count As Integer
Dim one As Integer
Dim two As Integer
Dim three As Integer
Dim four As Integer
'concatenation / and operator
'casting
Dim catchit(100) As String
Dim count2 As Integer
Dim arrayone(4) As Decimal
Dim arraytwo(4) As String
Dim arraythree(4) As Decimal
Dim arrayfour(4) As String
Dim array(4) As String
Dim bigstring As String
Dim builder As Integer
Dim twodata As Integer
Private Sub Form2_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
Using MyReader As New Microsoft.VisualBasic.FileIO.TextFieldParser("D:\completerecord.txt")
MyReader.TextFieldType = FileIO.FieldType.Delimited
MyReader.SetDelimiters(",")
Dim currentRow As String()
Dim count As Integer
Dim currentField As String
count = 0
While Not MyReader.EndOfData
Try
currentRow = MyReader.ReadFields()
For Each currentField In currentRow
' makes one array to contain a record for each peice of text in the file
'MsgBox(currentField) '- test of Field Data
' builds a big string with new line-breaks for each line in the file
bigstring = bigstring & currentField + Environment.NewLine
'build two arrays for the two columns of data
If (count Mod 2 = 1) Then
arraytwo(two) = currentField
two = two + 1
'MsgBox(currentField)
ElseIf (count Mod 2 = 0) Then
arrayone(one) = currentField
one = one + 1
ElseIf (count Mod 2 = 2) Then
arraythree(three) = currentField
three = three + 1
ElseIf (count Mod 2 = 3) Then
arrayfour(four) = currentField
four = four + 1
End If
count = count + 1
'MsgBox(count)
Next
Catch ex As Microsoft.VisualBasic.FileIO.MalformedLineException
MsgBox("Error Occured, Please contact Admin.")
End Try
End While
End Using
RichTextBox1.Text = bigstring
' MsgBox("test")
End Sub
Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
Dim NoMoreSwaps As Boolean
Dim counter As Integer
Dim Temp As Integer
Dim Temp2 As String
Dim listcount As Integer
Dim builder As Integer
Dim bigString2 As String = ""
listcount = UBound(arraytwo)
'MsgBox(listcount)
builder = 0
'bigString2 = ""
counter = 0
Try
'this should sort the arrays using a Bubble Sort
Do Until NoMoreSwaps = True
NoMoreSwaps = True
For counter = 0 To (listcount - 1)
If arraytwo(counter) > arraytwo(counter + 1) Then
NoMoreSwaps = False
If arraytwo(counter + 1) > 0 Then
Temp = arraytwo(counter)
Temp2 = arrayone(counter)
arraytwo(counter) = arraytwo(counter + 1)
arrayone(counter) = arrayone(counter + 1)
arraytwo(counter + 1) = Temp
arrayone(counter + 1) = Temp2
End If
End If
Next
If listcount > -1 Then
listcount = listcount - 1
End If
Loop
'now we need to output arrays to the richtextbox first we will build a new string
'and we can save it to a new sorted file
Dim FILE_NAME As String = "D:\sorted.txt"
'Location of file^ that the new data will be saved to
If System.IO.File.Exists(FILE_NAME) = True Then
Dim objWriter As New System.IO.StreamWriter(FILE_NAME, True)
'If D:\sorted.txt exists then enable it to be written to
While builder < listcount
bigString2 = bigString2 & arraytwo(builder) & "," & arrayone(builder) + Environment.NewLine
objWriter.Write(arraytwo(builder) & "," & arrayone(builder) + Environment.NewLine)
builder = builder + 1
End While
RichTextBox2.Text = bigString2
objWriter.Close()
MsgBox("Text written to log file")
Else
MsgBox("File Does Not Exist")
End If
Catch ex As Exception
MsgBox(ex.Message)
End Try
End Sub
End Class
I think in this line is the problem
arrayone(one) = currentField
Here it trys to cast the string to a double. You have to use something like this:
arrayone(one) = Double.Parse(currentField)
or to have it a saver way:
Dim dbl As Double
If Double.TryParse(currentField, dbl) Then
arrayone(one) = dbl
Else
arrayone(one) = 0.0
End If