CSV string is being trimmed while importing with OleDb - vb.net

I am reading csv file with oledb mechanism. My main issue is that the string values inside csv while reading are being trimmed (both: at the beggining and and the end with white spaces). I have some specific data in csv file which needs to have such white spaces in only some cases - that's why i cannot handle that after being processed. It has to be done with the convertion.
Unfortunatelly it has to be done with oledb and vb.net as our complex mechanism is based on those technologies.
Is that possible to find a hack or workaround that oledb will not trim my strings?
Below is my code, actual results and expected:
csv file:
Column1|Column2|Column3|Column4
Text1 | Text2| Text3 |Text4
schema.ini
[test.csv]
Format=Delimited(|)
Col1=Column1 Text
Col2=Column2 Text
Col3=Column3 Text
Col4=Column4 Text
Code
Private conn As New OleDbConnection
Private cmd As New OleDbCommand
Private myAccessDataReader As OleDb.OleDbDataReader = Nothing
Sub Main()
Try
Dim dirInfo As String = "C:\csv"
If conn.State = ConnectionState.Open Then
conn.Close()
End If
conn.ConnectionString = "Provider=Microsoft.ACE.OLEDB.12.0; Data Source=" & dirInfo & ";Extended Properties=""Text;HDR=Yes;"";"
conn.Open()
cmd = New OleDbCommand("SELECT * From [test.csv]", conn)
myAccessDataReader = cmd.ExecuteReader()
If myAccessDataReader.HasRows Then
myAccessDataReader.Read()
End If
Console.WriteLine("|" + myAccessDataReader.Item("Column1") + "|")
Console.WriteLine("|" + myAccessDataReader.Item("Column2") + "|")
Console.WriteLine("|" + myAccessDataReader.Item("Column3") + "|")
Console.WriteLine("|" + myAccessDataReader.Item("Column4") + "|")
Console.ReadKey()
Catch ex As Exception
Throw New Exception(ex.Message)
End Try
End Sub
Actual Results:
|Text1|
|Text2|
|Text3|
|Text4|
Expected Results:
|Text1 |
| Text2|
| Text3 |
|Text4|
Ps. I have tried with different settings in schema.ini: encoding, MaxScanRows, fixed width, but nothing helped.

I guess there is a general issue with trailing spaces when dealing with database: some char data types use spaces to fill the rest of the characters. For MSSql there is an option ANSI PADDING which you can turn ON/OFF, but I don't see a way to set that for Microsoft JET Engine which we use for CSV files; we support both oledb and odbc and this issue exists for both.
So, the answer is you can't. Trailing spaces will be always removed when you import data from a CSV data source, no matter if you define text/char/memo data type for your columns (e.g. using schema.ini) or enclose strings into double quotes. You can put some special character (non-space) in the end, after space(s), such as tab, for instance.
microsoft website

Try this out.....but there's no guarantee since I haven't put any error handling....
Function ReadCSVToTable(ByVal Schema As String) As DataTable
Dim file As New StreamReader("C:\dump\" & Schema)
Dim CSVName As String = file.ReadLine()
CSVName = Strings.Mid(CSVName, 2, CSVName.Length - 2)
Dim Delimiter As String = file.ReadLine
Delimiter = Strings.Mid(Delimiter, Strings.InStr(Delimiter, "(") + 1, Delimiter.Length - Strings.InStr(Delimiter, ")") + 1)
Dim Buffer As String = ""
Dim xtable As New DataTable
xtable.TableName = CSVName
'create table
Do
Buffer = file.ReadLine
Dim xCol As New DataColumn
With xCol
.ColumnName = Buffer.Split("=")(0)
.Caption = Buffer.Split("=")(1).Split(" ")(0)
Select Case Buffer.Split("=")(1).Split(" ")(1).ToLower
Case "text"
.DataType = GetType(String)
Case "integer"
.DataType = GetType(Integer)
Case "decimal"
.DataType = GetType(Decimal)
Case "boolean"
.DataType = GetType(Boolean)
Case Else
.DataType = GetType(String)
End Select
End With
xtable.Columns.Add(xCol)
Loop Until file.EndOfStream = True
file.Close()
file.Dispose()
'Fill the table
file = New StreamReader("C:\dump\" & CSVName)
'skip header
Buffer = file.ReadLine
Do
Buffer = file.ReadLine
Dim xCol(xtable.Columns.Count - 1)
Dim xCount As Integer = 0
For Each tCol As DataColumn In xtable.Columns
Select Case tCol.DataType
Case GetType(String)
xCol(xCount) = Convert.ToString(Buffer.Split(New String() {Delimiter}, StringSplitOptions.None)(xCount))
Case GetType(Integer)
xCol(xCount) = Convert.ToInt64(Buffer.Split(New String() {Delimiter}, StringSplitOptions.None)(xCount))
Case GetType(Decimal)
xCol(xCount) = Convert.ToDecimal(Buffer.Split(New String() {Delimiter}, StringSplitOptions.None)(xCount))
Case GetType(Boolean)
xCol(xCount) = Convert.ToBoolean(Buffer.Split(New String() {Delimiter}, StringSplitOptions.None)(xCount))
Case Else
xCol(xCount) = Convert.ToString(Buffer.Split(New String() {Delimiter}, StringSplitOptions.None)(xCount))
End Select
xCount = xCount + 1
Next
xtable.Rows.Add(xCol)
Loop Until file.EndOfStream = True
file.Close()
file.Dispose()
Return xtable
End Function
Private Sub Form1_Load(sender As Object, e As EventArgs) Handles MyBase.Load
Dim CSVTable As DataTable = ReadCSVToTable("schema.ini")
End Sub

Related

Index was outside the bounds of the array. VB.NET

My problem
Index was outside the bounds of the array. when i try to run the code , it generates this error
i have two forms : SIGN IN and SIGN UP , my problem is they don't work together and generates the error attached below
Dim fs As New FileStream("C:\Users\Selmen\Desktop\vb\logs.txt", FileMode.Open, FileAccess.ReadWrite)
Dim sr As New StreamReader(fs)
Dim sw As New StreamWriter(fs)
Dim s As String
Dim t() As String
Dim trouve As Integer = 0
Dim tt() As String
Dim ch As String
ch = TextBox1.Text + "#" + TextBox2.Text + "#" + TextBox3.Text + "#" + TextBox4.Text + "#" + TextBox5.Text
tt = ch.Split("#")
Do While (trouve = 0) And (sr.Peek > -1)
s = sr.ReadLine
t = s.Split("#")
If String.Compare(t(2), tt(2)) = 0 Then
trouve = 1
End If
Loop
If (trouve = 1) Then
MsgBox("user existant")
Else
sw.WriteLine(ch)
Me.Hide()
Form4.Show()
End If
sw.Close()
sr.Close()
fs.Close()
End Sub
If String.Compare(t(2), tt(2)) = 0 Then I get:
IndexOutOfRangeException was unhandled / Index was outside the bounds of the array.
Streams need to be disposed. Instead of using streams you can easily access a text file with the .net File class.
File.ReadAllLines returns an array of lines in the file. We can loop through the lines in a For Each. The lower case c following the "#" tells the compiler that you intend a Char not a String. String.Split expects a Char. Normally, String.Compare is used to order strings in alphabetical order. You just need an =. As soon as we find a match we exit the loop with Exit For.
We don't actually need the array of the text boxes Text property unless there is no match. Putting the elements in braces intializes and fills the array of String.
File.AppendAllLines does what it says. It is expecting an array of strings. As with the text boxes, we put our line to append in braces.
Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
Dim p = "path to file"
Dim lines = File.ReadAllLines(p)
Dim trouve As Integer
For Each line In lines
Dim thirdField = line.Split("#"c)(2)
If thirdField = TextBox3.Text Then
trouve = 1
Exit For
End If
Next
If trouve = 1 Then
MsgBox("user existant")
Else
Dim tt = {TextBox1.Text, TextBox2.Text, TextBox3.Text, TextBox4.Text, TextBox5.Text}
File.AppendAllLines(p, {String.Join("#", tt)})
Me.Hid3e()
Form4.Show()
End If
End Sub

Opening a CSV file as a OLEDB Connection converts file text to double

I'm opening up a couple of CSV files and reading them in as a DataTable per the example I found here. The issue I am running into it the basic query I'm using to import the data is converting the column of IP addresses into Doubles. So I want to read in 10.0.0.1 and it shows up as 10.001. How can I get this column to read in as a string? I would like to not double process the file if I can.
Query I'm using is basic and is as follows:
SELECT * FROM [ComputerList.csv]
Here is my function to open and read the CSV file into a DataTable
Public Function OpenFile(ByVal strFolderPath as String, ByVal strQuery as String) as DataTable
Dim strConn as String = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & strFolderPath & ";Extended Propteries=""text; HDR=Yes;FMT=Delimited"""
Dim conn as OleDb.OleDbConnection = New OleDb.OleDbConnection(strConn)
Try
conn.Open()
Dim cmd as OleDb.OleDbCommand = New OleDb.OleDbCommand(strQuery, conn)
Dim da as OleDb.OleDbDataAdapter = New OleDb.OleDbDataAdapter()
da.SelectCommand = cmd
Dim ds as DataSet = New DataSet()
da.Fill(ds)
da.Dispose()
return ds.Tables(0)
Catch
return Nothing
Finally
conn.Close()
End Try
End Function
i use this to read csv's and force all to string.
Public Function convert_csv_to_data_table(ByVal File As String, ByVal separator As String) As DataTable
Dim dt As New DataTable
Dim firstLine As Boolean = True
If IO.File.Exists(File) Then
Using sr As New StreamReader(File)
While Not sr.EndOfStream
If firstLine Then
firstLine = False
Dim cols = sr.ReadLine.Split(separator)
For Each col In cols
dt.Columns.Add(New DataColumn(col, GetType(String)))
Next
Else
Dim data() As String = sr.ReadLine.Split(separator)
dt.Rows.Add(data.ToArray)
End If
End While
End Using
End If
Return dt
End Function
EDIT:- this will only work with a separator btw
ok I tried variations of everyone's suggested and have settled on a hybrid between all of them. My goal was to read in a CSV file manipulate it in a DataTable form and them write it back out. Some of my CSV files had multiple lines with in a cell and some had deliminators within a cell. Below you can find my hybrid solution that utilized TextFieldParser to read in the file and break it up.
Public Function OpenFile(ByVal File as String, NyVal delim as String) as DataTable
Dim dt as New DataTable()
Dim firstline as Boolean = True
Using MyReader as New Microsoft.VisualBasic.FileIO.TextFieldParser(File)
MyReader.TextFieldType = FileIO.FieldType.Delimited
MyReader.SetDelimiters(delim)
Dim currentRow as String()
While Not MyReader.EndOfData
Try
currentRow = MyReader.ReadFields()
If firstline
firstline = false
For Each col in currentRow
dt.Columns.Add(New DataColumn(col.ToString(), System.Type.GetType("System.String")))
Next
Else
dt.Rows.Add(currentRow.ToArray())
End If
Catch ex as Microsoft.VisualBasic.FileIO.MalformedLineException
Console.WriteLIne("Line " + ex.Message + " is not valid and will be skipped")
End Try
End While
End Using
return dt
End Function

Input String was not in the correct format sometimes

I found a snippet of code on the internet that I have adopted for my need of writing a datareader connection to a .csv file. The database files that I am pulling range from 10 columns, all the way up to 200 columns. On some queries, I get a the Input String was not in the correct format; which I believe is occurring in the conversion portion of my code that changes the reader value to a string. Attached is the code.
Dim sw As New StreamWriter(filename)
Try
Using Conn As New Odbc.OdbcConnection(ConnStr)
Using Cmd As New Odbc.OdbcCommand(query, Conn)
Conn.Open()
Using dr As Odbc.OdbcDataReader = Cmd.ExecuteReader()
Dim fields As Integer = dr.FieldCount - 1
While dr.Read()
Dim sb As New StringBuilder()
Dim i As Integer = 0
While i <= fields
If i <> fields Then
sep = ","
Else
sep = ""
End If
sb.Append(dr(i) + sep)
i += 1
End While
sw.WriteLine(sb.ToString())
End While
End Using
End Using
sw.Close()
sw.Dispose()
Conn.Close()
Conn.Dispose()
End Using
Most likely culprit is the line
sb.Append(dr(i) + sep)
VB uses & operator to concatenate strings, try replacing it with
sb.Append(dr(i) & sep)
Or better yet use different logic, instead of
If i <> fields Then
sep = ","
Else
sep = ""
End If
sb.Append(dr(i) + sep)
do something like
sb.Append(dr(i))
If i <> fields Then sb.Append(",")
EDIT: Added check for possible Null values:
If dr(i) Is DbNull.Value OrElse dr(i) Is Nothing Then
sb.Append("[No Data]")
Else
sb.Append(dr(i))
End If

Split in VB.net

FASTER,WW0011,"CTR ,REURN,ALT TUBING HELIUM LEAK",DEFAULT test,1,3.81,test
I need to get the result of the following line as
Arr(0) =faster
Arr(1) =WW0011
Arr(2) =CTR ,REURN,ALT TUBING HELIUM LEAK
Arr(3) =DEFAULT test
Arr(4) =faster
Arr(5) = 1
Arr(6)=3.81
Arr(7) = test
I tried using split, but the problem is on Arr(2)
could anyone please give me a solution
You could use the TextFieldParser class which will take care of situations like this. Set the HasFieldEnclosedInQuotes property to true. Here is an example from MSDN (slightly altered):
Using MyReader As New Microsoft.VisualBasic.FileIO.TextFieldParser("c:\logs\bigfile")
MyReader.TextFieldType = Microsoft.VisualBasic.FileIO.FieldType.Delimited
MyReader.Delimiters = New String() {","}
'Set this to ignore commas in quoted fields.
MyReader.HasFieldsEnclosedInQuotes = True
Dim currentRow As String()
'Loop through all of the fields in the file.
'If any lines are corrupt, report an error and continue parsing.
While Not MyReader.EndOfData
Try
currentRow = MyReader.ReadFields()
' Include code here to handle the row.
Catch ex As Microsoft.VisualBasic.FileIO.MalformedLineException
MsgBox("Line " & ex.Message & " is invalid. Skipping")
End Try
End While
End Using
I use this function alot myself
Private Function splitQuoted(ByVal line As String, ByVal delimeter As Char) As String()
Dim list As New List(Of String)
Do While line.IndexOf(delimeter) <> -1
If line.StartsWith("""") Then
line = line.Substring(1)
Dim idx As Integer = line.IndexOf("""")
While line.IndexOf("""", idx) = line.IndexOf("""""", idx)
idx = line.IndexOf("""""", idx) + 2
End While
idx = line.IndexOf("""", idx)
list.Add(line.Substring(0, idx))
line = line.Substring(idx + 2)
Else
list.Add(line.Substring(0, Math.Max(line.IndexOf(delimeter), 0)))
line = line.Substring(line.IndexOf(delimeter) + 1)
End If
Loop
list.Add(line)
Return list.ToArray
End Function
Use a for loop to iterate the string char by char!

vb.net xls to csv with quotes?

I have a xls file, or a csv without quotes, and using vb.net need to turn it into a csv with quotes around every cell. If I open the xls/csv without quotes in MS Access, set every column to text and then export it, its in the format I need. Is there an easier way? If not, how do I do replicate this in vb.net? Thanks.
If you use the .Net OLE DB provider, you can specify the .csv formatting details in a schema.ini file in the folder your data files live in. For the 'unquoted' .csv the specs
should look like
[noquotes.csv] <-- file name
ColNameHeader=True <-- or False
CharacterSet=1252 <-- your encoding
Format=Delimited(,) <--
TextDelimiter= <-- important: no " in source file
Col1=VendorID Integer <-- your columns, of course
Col2=AccountNumber Char Width 15
for the 'quoted' .csv, just change the name and delete the TextDelimiter= line (put quotes around text fields is the default).
Then connect to the Text Database and execute the statement
SELECT * INTO [quotes.csv] FROM [noquotes.csv]
(as this creates quotes.csv, you may want to delete the file before each experimental run)
Added to deal with "Empty fields must be quoted"
This is a VBScript demo, but as the important things are the parameters for .GetString(), you'll can port it to VB easily:
Dim sDir : sDir = resolvePath( "§LibDir§testdata\txt" )
Dim sSrc : sSrc = "noquotes.csv"
Dim sSQL : sSQL = "SELECT * FROM [" & sSrc & "]"
Dim oTxtDb : Set oTxtDb = New cADBC.openDb( Array( "jettxt", sDir ) )
WScript.Echo goFS.OpenTextFile( goFS.BuildPath( sDir, sSrc ) ).ReadAll()
Dim sAll : sAll = oTxtDb.GetSelectFRO( sSQL ).GetString( _
adClipString, , """,""", """" & vbCrlf & """", "" _
)
WScript.Echo """" & Left( sAll, Len( sAll ) - 1 )
and output:
VendorID;AccountNumber;SomethingElse
1;ABC 123 QQQ;1,2
2;IJK 654 ZZZ;2,3
3;;3,4
"1","ABC 123 QQQ","1,2"
"2","IJK 654 ZZZ","2,3"
"3","","3,4"
(german locale, therefore field separator ; and decimal symbol ,)
Same output from this VB.Net code:
Imports ADODB
...
Sub useGetString()
Console.WriteLine("useGetString")
Const adClipString As Integer = 2
Dim cn As New ADODB.Connection
Dim rs As ADODB.Recordset
Dim sAll As String
cn.ConnectionString = _
"Provider=Microsoft.Jet.OLEDB.4.0;" _
& "Data Source=M:\lib\kurs0705\testdata\txt\;" _
& "Extended Properties=""text;"""
cn.Open()
rs = cn.Execute("SELECT * FROM [noquotes.csv]")
sAll = rs.GetString( adClipString, , """,""", """" & vbCrLf & """", "" )
cn.Close()
sAll = """" & Left( sAll, Len( sAll ) - 1 )
Console.WriteLine( sAll )
End Sub
Check out the method at this link.
What you can do to make sure quotes go around is append quotes to the beginning and end of each column data in the loop that is putting the column data in the file.
for example make the loop like this:
For InnerCount = 0 To ColumnCount - 1
Str &= """" & DS.Tables(0).Rows(OuterCount).Item(InnerCount) & ""","
Next
Public Class clsTest
Public Sub Test
Dim s as string = "C:\!Data\Test1.csv"
Dim Contents As String = System.IO.File.ReadAllText(s)
Dim aryLines As String() = Contents.Split(New String() { Environment.Newline }, StringSplitOptions.None)
Dim aryParts() As String
Dim aryHeader() As String
Dim dt As System.Data.DataTable
For i As Integer = 0 To aryLines.Length - 1
aryParts = SplitCSVLine(aryLines(i))
If dt Is Nothing And aryHeader Is Nothing Then
aryHeader = CType(aryParts.Clone, String())
ElseIf dt Is Nothing And aryHeader IsNot Nothing Then
dt = DTFromStringArray(aryParts, 1000, "", aryHeader)
Else
DTAddStringArray(dt, aryParts)
End If
Next
dt.dump
End Sub
Public Shared Function SplitCSVLine(strCSVQuotedLine As String) As String()
Dim aryLines As String() = strCSVQuotedLine.Split(New String() {Environment.NewLine}, StringSplitOptions.None)
Dim aryParts As String() = Nothing
For i As Integer = 0 To aryLines.Length - 1
Dim regx As New Text.RegularExpressions.Regex(",(?=(?:[^\""]*\""[^\""]*\"")*(?![^\""]*\""))")
aryParts = regx.Split(aryLines(i))
For p As Integer = 0 To aryParts.Length - 1
aryParts(p) = aryParts(p).Trim(" "c, """"c)
Next
Next
Return aryParts
End Function
Public Shared Function DTFromStringArray(ByVal aryValues() As String, Optional ByVal intDefaultColumnWidth As Integer = 255, Optional ByVal strTableName As String = "tblArray", Optional ByVal aryColumnNames() As String = Nothing) As DataTable
If String.IsNullOrWhiteSpace(strTableName) Then strTableName = "tblArray"
Dim dt As DataTable = New DataTable(strTableName)
Dim colNew(aryValues.GetUpperBound(0)) As DataColumn
If aryColumnNames Is Nothing Then
ReDim aryColumnNames(aryValues.Length)
Else
If aryColumnNames.GetUpperBound(0) < aryValues.GetUpperBound(0) Then
ReDim Preserve aryColumnNames(aryValues.Length)
End If
End If
For x As Integer = aryColumnNames.GetLowerBound(0) To aryColumnNames.GetUpperBound(0)
If String.IsNullOrWhiteSpace(aryColumnNames(x)) Then
aryColumnNames(x) = "Field" & x.ToString
Else
aryColumnNames(x) = aryColumnNames(x)
End If
Next
For i As Integer = 0 To aryValues.GetUpperBound(0)
colNew(i) = New DataColumn
With colNew(i)
.ColumnName = aryColumnNames(i) '"Value " & i
.DataType = GetType(String)
.AllowDBNull = False
.DefaultValue = ""
.MaxLength = intDefaultColumnWidth
.Unique = False
End With
Next
dt.Columns.AddRange(colNew)
Dim pRow As DataRow = dt.NewRow
For i As Integer = aryValues.GetLowerBound(0) To aryValues.GetUpperBound(0)
pRow.Item(i) = aryValues(i)
Next
dt.Rows.Add(pRow)
Return dt
End Function
Public Shared Sub DTAddStringArray(ByRef dt As DataTable, ByVal aryRowValues() As String)
Dim pRow As DataRow
pRow = dt.NewRow
For i As Integer = aryRowValues.GetLowerBound(0) To aryRowValues.GetUpperBound(0)
pRow.Item(i) = aryRowValues(i)
Next
dt.Rows.Add(pRow)
End Sub
End Class