how to create CSV string from DataTable in VB.NET? - vb.net

I'm fetching data from database as a DataTable and need to convert into CSV string in VB.NET.

Create a generic method with DataTable, CSV Headers, DataTable Columns parameters:
Private Function CSVBuilder(dt As DataTable, headers As List(Of String), columns As List(Of String)) As String
Dim sCSV = New StringBuilder(String.Join(",", headers))
sCSV.Append(Environment.NewLine)
Dim view As New DataView(dt)
Dim tDt As DataTable = view.ToTable(True, columns.ToArray)
For Each row As DataRow In tDt.Rows
'-- Handle comma
sCSV.Append(String.Join(",", (From rw In row.ItemArray Select If(rw.ToString.Trim.Contains(","), String.Format("""{0}""", rw.ToString.Trim), rw.ToString.Trim))))
sCSV.Append(Environment.NewLine)
Next
Return sCSV.ToString
End Function
And then call in your code to get CSV string:
CSVBuilder(dataTable,
New List(Of String) From {"Header Column 1", "Header Column 2", ...},
New List(Of String) From {"DataTableColumn1", "DataTableColumn2", ...})

In response to the comment, since this wouldn't fit in that space:
Private Function CSVBuilder(dt As DataTable) As String
Dim sCSV As New StringBuilder()
'Headers
Dim delimeter As String = ""
For Each col As String In dt.Columns.Select(Func(col) col.ColumnName)
If col.Contains(",") Then col = """" & col & """"
sCSV.Append(delimeter).Append(col)
delimeter = ","
Next
sCSV.AppendLine()
For Each row As DataRow In tDt.Rows
sCSV.AppendLine(String.Join(",", (From rw In row.ItemArray Select If(rw.ToString.Trim.Contains(","), String.Format("""{0}""", rw.ToString.Trim), rw.ToString.Trim))))
Next
Return sCSV.ToString
End Function
Now, I did remove this code:
Dim view As New DataView(dt)
Dim tDt As DataTable = view.ToTable(True, columns.ToArray)
But I wouldn't do this as part of the CSVBuilder() method. If you want to project a specific view of a table, I would do that separately from creating the CSV data. You could make a separate method for it:
Public Function GetProjection(dt As DataTable, columns As IEnumerable(Of String)) As DataTable
Dim view As New DataView(dt)
Return view.ToTable(True, columns.ToArray())
End Function
And then you call them together like this:
Dim dt As DataTable = '.... original table here
Dim columns() As String = '... the columns you want
Dim csv As String = CSVBuilder(GetProjection(dt, columns))
or like this:
Dim dt As DataTable = '.... original table here
Dim columns() As String = '... the columns you want
Dim dt1 = GetProjection(dt, columns)
Dim csv As String = CSVBuilder(dt1)
This is called Currying, and it's a good thing to do.
Finally, I'll repeat my suggestion to think in terms of writing to a stream. Long strings with repeated append operations can cause real problems for the .Net garbage collector. Using StringBuilder can help, but won't fully eliminate these problems. Writing to a Stream, which is often connected to a file on disk, gives you the opportunity to completely eliminate this issue. Plus, it will likely save you work later on.

Related

get column names Jet OLE DB in vb.net

I've written a function which reads csv files and parametrizes them accordingly, therefore i have a function gettypessql which queries sql table at first to get data types and therefore to adjust the columns which are later inserted in sql. So my problem is when I set HDR=Yes in Jet OLE DB I get only column names like F1, F2, F3. To circumvent this issue I've set HDR=No and written some for loops but now I get only empty strings, what is actually the problem? here is my code:
Private Function GetCSVFile(ByVal file As String, ByVal min As Integer, ByVal max As Integer) As DataTable
Dim ConStr As String = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & TextBox1.Text & ";Extended Properties=""TEXT;HDR=NO;IMEX=1;FMT=Delimited;CharacterSet=65001"""
Dim conn As New OleDb.OleDbConnection(ConStr)
Dim dt As New DataTable
Dim da As OleDb.OleDbDataAdapter = Nothing
getData = Nothing
Try
Dim CMD As String = "Select * from " & _table & ".csv"
da = New OleDb.OleDbDataAdapter(CMD, conn)
da.Fill(min, max, dt)
getData = New DataTable(_table)
Dim firstRow As DataRow = dt.Rows(0)
For i As Integer = 0 To dt.Columns.Count - 1
Dim columnName As String = firstRow(i).ToString()
Dim newColumn As New DataColumn(columnName, mListOfTypes(i))
getData.Columns.Add(newColumn)
Next
For i As Integer = 1 To dt.Rows.Count - 1
Dim row As DataRow = dt.Rows(i)
Dim newRow As DataRow = getData.NewRow()
For j As Integer = 0 To getData.Columns.Count - 1
If row(j).GetType Is GetType(String) Then
Dim colValue As String = row(j).ToString()
colValue = ChangeEncoding(colValue)
colValue = ParseString(colValue)
colValue = ReplaceChars(colValue)
newRow(j) = colValue
Else
newRow(j) = row(j)
End If
Next
getData.Rows.Add(newRow)
Application.DoEvents()
Next
Catch ex As OleDbException
MessageBox.Show(ex.Message)
Catch ex As Exception
MessageBox.Show(ex.Message)
Finally
dt.Dispose()
da.Dispose()
End Try
Return getData
End Function
and get types sql, this one doesn't convert properly, especially doubles
Private Sub GetTypesSQL()
If (mListOfTypes Is Nothing) Then
mListOfTypes = New List(Of Type)()
End If
mListOfTypes.Clear()
Dim dtTabelShema As DataTable = db.GetDataTable("SELECT TOP 0 * FROM " & _table)
Using dtTabelShema
For Each col As DataColumn In dtTabelShema.Columns
mListOfTypes.Add(col.DataType)
Next
End Using
End Sub
I think you have made it more complicated than it needs to be. For instance, you get the dbSchema by creating an empty DataTable and harvesting the Datatypes from it. Why not just use that first table rather than creating a new table from the Types? The table also need not be reconstructed over and over for each batch of rows imported.
Generally since OleDb will try to infer types from the data, it seems unnecessary and may even get in the way in some cases. Also, you are redoing everything that OleDB does and copying data to a different DT. Given that, I'd skip the overhead OleDB imposes and work with the raw data.
This creates the destination table using the CSV column name and the Type from the Database. If the CSV is not in the same column order as those delivered in a SELECT * query, it will fail.
The following uses a class to map csv columns to db table columns so the code is not depending on the CSVs being in the same order (since they may be generated externally). My sample data CSV is not in the same order:
Public Class CSVMapItem
Public Property CSVIndex As Int32
Public Property ColName As String = ""
'optional
Public Property DataType As Type
Public Sub New(ndx As Int32, csvName As String,
dtCols As DataColumnCollection)
CSVIndex = ndx
For Each dc As DataColumn In dtCols
If String.Compare(dc.ColumnName, csvName, True) = 0 Then
ColName = dc.ColumnName
DataType = dc.DataType
Exit For
End If
Next
If String.IsNullOrEmpty(ColName) Then
Throw New ArgumentException("Cannot find column: " & csvName)
End If
End Sub
End Class
The code to parse the csv uses CSVHelper but in this case the TextFieldParser could be used since the code just reads the CSV rows into a string array.
Dim SQL = String.Format("SELECT * FROM {0} WHERE ID<0", DBTblName)
Dim rowCount As Int32 = 0
Dim totalRows As Int32 = 0
Dim sw As New Stopwatch
sw.Start()
Using dbcon As New MySqlConnection(MySQLConnStr)
Using cmd As New MySqlCommand(SQL, dbcon)
dtSample = New DataTable
dbcon.Open()
' load empty DT, create the insert command
daSample = New MySqlDataAdapter(cmd)
Dim cb = New MySqlCommandBuilder(daSample)
daSample.InsertCommand = cb.GetInsertCommand
dtSample.Load(cmd.ExecuteReader())
' dtSample is not only empty, but has the columns
' we need
Dim csvMap As New List(Of CSVMapItem)
Using sr As New StreamReader(csvfile, False),
parser = New CsvParser(sr)
' col names from CSV
Dim csvNames = parser.Read()
' create a map of CSV index to DT Columnname SEE NOTE
For n As Int32 = 0 To csvNames.Length - 1
csvMap.Add(New CSVMapItem(n, csvNames(n), dtSample.Columns))
Next
' line data read as string
Dim data As String()
data = parser.Read()
Dim dr As DataRow
Do Until data Is Nothing OrElse data.Length = 0
dr = dtSample.NewRow()
For Each item In csvMap
' optional/as needed type conversion
If item.DataType = GetType(Boolean) Then
' "1" wont convert to bool, but (int)1 will
dr(item.ColName) = Convert.ToInt32(data(item.CSVIndex).Trim)
Else
dr(item.ColName) = data(item.CSVIndex).Trim
End If
Next
dtSample.Rows.Add(dr)
rowCount += 1
data = parser.Read()
If rowCount = 50000 OrElse (data Is Nothing OrElse data.Length = 0) Then
totalRows += daSample.Update(dtSample)
' empty the table if there will be more than 100k rows
dtSample.Rows.Clear()
rowCount = 0
End If
Loop
End Using
End Using
End Using
sw.Stop()
Console.WriteLine("Parsed and imported {0} rows in {1}", totalRows,
sw.Elapsed.TotalMinutes)
The processing loop updates the DB every 50K rows in case there are many many rows. It also does it in one pass rather than reading N rows thru OleDB at a time. CsvParser will read one row at a time, so there should never be more than 50,001 rows worth of data on hand at a time.
There may be special cases to handle for type conversions as shown with If item.DataType = GetType(Boolean) Then. A Boolean column read in as "1" cant be directly passed to a Boolean column, so it is converted to integer which can. There could be other conversions such as for funky dates.
Time to process 250,001 rows: 3.7 mins. An app which needs to apply those string transforms to every single string column will take much longer. I'm pretty sure that using the CsvReader in CSVHelper you could have those applied as part of parsing to a Type.
There is a potential disaster waiting to happen since this is meant to be an all-purpose importer/scrubber.
For i As Integer = 0 To dt.Columns.Count - 1
Dim columnName As String = firstRow(i).ToString()
Dim newColumn As New DataColumn(columnName, mListOfTypes(i))
getData.Columns.Add(newColumn)
Next
Both the question and the self-answer build the new table using the column names from the CSV and the DataTypes from a SELECT * query on the destination table. So, it assumes the CSV Columns are in the same order that SELECT * will return them, and that all CSVs will always use the same names as the tables.
The answer above is marginally better in that it finds and matches based on name.
A more robust solution is to write a little utility app where a user maps a DB column name to a CSV index. Save the results to a List(Of CSVMapItem) and serialize it. There could be a whole collection of these saved to disk. Then, rather than creating a map based on dead reckoning, just deserialize the desired for user as the csvMap in the above code.

Selecting different number of columns in a CSV file

The task is to extract data from multiple CSV files according to a criteria. The file contains a sampleId (this is the criteria) and other columns. At the end of the file there are the measurement values under 0...100 named columns (the numbers are the actual names of the columns). To make it a bit more interesting there can be variations in different CSV files, depending on the customer needs. This means the measurement data count can be 15, 25, 50 etc. but no more than 100 and no variations within one file. This data is always placed in the end of the line, so there is a set of columns before the numbers.
I'd like to have a SQL statement which can accept parameters:
SELECT {0} FROM {1} WHERE sampleId = {2}
0 is the numbers, 1 is the CSV file name and 2 is sampleId is what we looking for. The other solution which came into my mind is to look all the columns after the last fix column. I don't know is it possible or not, just thinking out loud.
Please be descriptive, my SQL knowledge is basic. Any help is really appreciated.
So finally managed to solve it. The code is in VB.NET, but the logic is quite clear.
Private Function GetDataFromCSV(sampleIds As Integer()) As List(Of KeyValuePair(Of String, List(Of Integer)))
Dim dataFiles() As String = System.IO.Directory.GetFiles(OutputFolder(), "*.CSV")
Dim results As List(Of KeyValuePair(Of String, List(Of Integer))) = New List(Of KeyValuePair(Of String, List(Of Integer)))
If dataFiles.Length > 0 And sampleIds.Length > 0 Then
For index As Integer = 0 To sampleIds.Length - 1
If sampleIds(index) > 0 Then
For Each file In dataFiles
If System.IO.File.Exists(file) Then
Dim currentId As String = sampleIds(index).ToString()
Dim filename As String = Path.GetFileName(file)
Dim strPath As String = Path.GetDirectoryName(file)
Dim conn As OleDb.OleDbConnection = New OleDb.OleDbConnection("Provider=Microsoft.Jet.OLEDB.4.0; Data Source=" & strPath & "; Extended Properties='text; HDR=Yes; FMT=Delimited'")
Dim command As OleDb.OleDbCommand = conn.CreateCommand()
command.CommandText = "SELECT * FROM [" & filename & "] 'WHERE Sample ID = " & currentId
conn.Open()
Dim reader As OleDb.OleDbDataReader = command.ExecuteReader()
Dim numberOfFields = reader.FieldCount
While reader.Read()
If reader("Sample ID").ToString() = currentId Then 'If found write particle data into output file
Dim particles As List(Of Integer) = New List(Of Integer)
For field As Integer = 0 To numberOfFields - 1
particles.Add(CInt(reader(field.ToString())))
Next field
results.Add(New KeyValuePair(Of String, List(Of Integer))(currentId, particles))
End If
End While
conn.Close()
End If
Next file
End If
Next index
Return results
Else
MessageBox.Show("Missing csv files or invalid sample Id(s)", "Internal error", MessageBoxButtons.OK, MessageBoxIcon.Exclamation)
End If
End Function

Converting a comma delimited string to a datatable row in VB.NET

I am trying to convert a string with data separated by commas into the first row of a datatable. The datatable is otherwise empty.
The datatable will then need to populate into datagridview, however nothing appears in it.
Dim plt As New System.Data.DataTable
Dim rowData As String() = output.Split(New Char() {","}, StringSplitOptions.RemoveEmptyEntries)
Dim row As DataRow = PLT.NewRow()
dataGridView1.DataSource = PLT
I don't get any errors, but it's also not populating the gridview so I don't know if it's just failing to populate or if the table itself never got populated from the string.
You need to add the columns and then the row:
Dim rowData As String() = output.Split(New Char() {","c}, StringSplitOptions.RemoveEmptyEntries)
Dim plt As New System.Data.DataTable()
For i As Int32 = 1 To rowData.Length
plt.Columns.Add(String.Format("Column {0}", i))
Next
Dim newRow As DataRow = plt.Rows.Add() ' already added now '
For col As Int32 = 0 To rowData.Length - 1
newRow.SetField(col, rowData(col))
Next
DataTable need DataColumn for every item of array.
If you want add array of string to DataGridView then add it straight.
Dim rowData As String() = output.Split(New Char() {","},
StringSplitOptions.RemoveEmptyEntries)
Me.dataGridView1.Rows.Add(rowData )

Stop function from returning variable with same data type

I am calling a function that returns a data set. However, to generate the data set I have to grab a parameter from another table, which necessitates another data set being created in the function to retrieve that parameter. The issue is that once the first data set is created to get the parameter for the second data set, the function is returning to the calling code, rather than running to the return statement. Is there a way around this?
Example:
Private Sub Form_Load()
Dim dataset as new dataset
dataset = GetList(PathToDB, AccessCode)
End Sub
Function GetList(Path as String, AccessCode as String)
Dim ListConnectionString as String = "..."
Dim Listds as New DataSet
Dim Listcnn as OleDbConnection = New OleDbConnection(ListConnectionString)
Dim ListAdapter As New OleDbDataAdapter
Dim Parameterds As New DataSet
Dim ParameterAdapter As New OleDbDataAdapter
Dim ParameterSelectQuery As String = "..."
Dim ParameterSelectCommand As New OleDbCommand(ParameterSelectQuery, Listcnn)
ParameterAdapter.SelectCommand = ParameterSelectCommand
******** ParameterAdapter.Fill(Parameterds) ********
Dim Parameter As String = Parameterds.Tables(0).Rows(0).Item(0)
Dim ListSelectQuery As String = "...WHERE Value = '" & Parameter & "';"
Dim ListSelectCommand As New OleDbCommand(ListSelectQuery, Listcnn)
ListAdapter.SelectCommand = ListSelectCommand
ListAdapter.Fill(Listds)
Return Modelds
End Function
The code is returning on the line with ********, but should return on the return statement. I've also tried this with separate functions, but it still does the same thing.

Trimming a datagridview duplicat rows except the recent one

i'm in VS2008 Studio, i have this datagridview with multiple columns which the last column contains a date and time value.
lot's of rows are pretty the same except by they're date column.
what i wanted to do is to trim the whole datagridview duplicate rows except they're most recent ones based on they're date column.
i have sth like this:
Administrator,192.168.137.221,2,file://C:\WMPub\WMRoot\industrial.wmv , 07.Jul.2014 - 23:11:59
Administrator,192.168.137.221,2,file://C:\WMPub\WMRoot\industrial.wmv , 07.Jul.2014 -
21:11:59
Administrator,192.168.137.221,2,file://C:\WMPub\WMRoot\industrial.wmv , 07.Jul.2014 - 22:11:59
Administrator,192.168.137.221,2,file://C:\WMPub\WMRoot\industrial.wmv , 07.Jul.2014 - 20:11:59
Administrator,192.168.137.221,2,file://C:\WMPub\WMRoot\industrial.wmv , 07.Jul.2014 - 11:11:59
Everyone ,192.168.137.221,2,file://C:\WMPub\WMRoot\industrial.wmv , 07.Jul.2014 - 17:11:59
Everyone ,192.168.137.221,2,file://C:\WMPub\WMRoot\industrial.wmv , 07.Jul.2014 - 14:11:59
the output i want should be like this:
Administrator 192.168.137.221 2 file://C:\WMPub\WMRoot\industrial.wmv 07.Jul.2014 - 23:11:59
Everyone 192.168.137.201 2 file://C:\WMPub\WMRoot\industrial.wmv 07.Jul.2014 - 17:11:59
....
please consider "," as column seprators! (i dont know how to draw a table here, sorry again)!
i have this snippet that trim the duplicate lines in a datagridview but it lacks preserving the latest entry:
Public Function RemoveDuplicateRows(ByVal dTable As DataTable, ByVal colName As String) As DataTable
Dim hTable As New Hashtable()
Dim duplicateList As New ArrayList()
For Each dtRow As DataRow In dTable.Rows
If hTable.Contains(dtRow(colName)) Then
duplicateList.Add(dtRow)
Else
hTable.Add(dtRow(colName), String.Empty)
End If
Next
For Each dtRow As DataRow In duplicateList
dTable.Rows.Remove(dtRow)
Next
Return dTable
End Function
what should i do?
thanks in advance
Here is some code that illustrates the approach:
Dim dict As New dictionary(Of String, DataRow)
For Each dtRow As DataRow In dTable.Rows
Dim key As String = dtRow("column1") + "," + dtRow("column2") ' + etc.
Dim dictRow As DataRow = Nothing
If dict.TryGetValue(key, dictRow) Then
'check and update date
'you can skip this part, if your data is sorted
If dtRow("dateColumn") > dictRow("dateColumn") Then
dictRow("dateColumn") = dtRow("dateColumn")
End If
Else
dict.Add(key, dtRow)
End If
Next
In the end dict contains the rows you need, you can get them via dict.Values.ToArray()
EDIT: I found the error - dictRow should be dtRow in the above code (now fixed). Then it should work. Here is a full version of self contained example (console app), since I wrote it anyway - focus on RemoveDuplicates, the rest is just prepwork:
Sub Main()
Dim dt As New DataTable
With dt.Columns
.Add("PublishingPoint")
.Add("Username")
.Add("IP")
.Add("Status")
.Add("Req URL")
.Add("Last seen", GetType(Date))
End With
'this populates the initial data table, use your method
Dim _assembly As Assembly = Assembly.GetExecutingAssembly()
Dim _textStreamReader As New StreamReader(_assembly.GetManifestResourceStream("ConsoleApplication16.data.csv"))
While Not _textStreamReader.EndOfStream
Dim sLine As String = _textStreamReader.ReadLine().TrimEnd
If String.IsNullOrEmpty(sLine) Then Exit While
Dim values() As String = sLine.Split(",")
Dim newRow As DataRow = dt.NewRow
For iColumnIndex As Integer = 0 To dt.Columns.Count - 1
Dim columnName As String = dt.Columns(iColumnIndex).ColumnName
newRow.Item(columnName) = values(iColumnIndex)
Next
dt.Rows.Add(newRow)
End While
Console.WriteLine("Old count: " & dt.Rows.Count)
Dim newDt As DataTable = RemoveDuplicates(dt, "Last seen")
Console.WriteLine("New count: " & newDt.Rows.Count)
Console.ReadLine()
End Sub
Private Function RemoveDuplicates(dt As DataTable, colName As String) As DataTable
Dim keyColumnNames As New List(Of String)
Dim exceptColumnsHash As New HashSet(Of String)({colName})
For Each col As DataColumn In dt.Columns
Dim columnName As String = col.ColumnName
If Not exceptColumnsHash.Contains(col.ColumnName) Then
keyColumnNames.Add(columnName)
End If
Next
Dim dict As New Dictionary(Of String, DataRow)
For Each dtRow As DataRow In dt.Rows
Dim keyColumnValues As New List(Of String)
For Each keyColumnName In keyColumnNames
keyColumnValues.Add(dtRow.Item(keyColumnName))
Next
Dim key As String = String.Join(",", keyColumnValues)
Dim dictRow As DataRow = Nothing
If dict.TryGetValue(key, dictRow) Then
If dtRow(colName) > dictRow(colName) Then
dictRow(colName) = dtRow(colName)
End If
Else
dict.Add(key, dtRow)
End If
Next
Dim dtReturn As DataTable = dt.Clone
For Each dtRow As DataRow In dict.Values
dtReturn.ImportRow(dtRow)
Next
Return dtReturn
End Function
To make this code run, you need to manually add a file to the project and set build action to "Embedded resource".