csv file reading problem - vb.net

So I have a csv file:
"453FDG","656HGH","54645","MARIA","V543","534","TRETCITY","R34",09094553,09094553,09094553,"21/01/10","RE"
"45er3FDG","656HGH","54645","M343ARIA","V543","534","TRETCITY","R34",090-94553,0909-4553,090-94553,"21/01/10","RE"
problem 1:
Connection string is this:
Dim strConnString As String = "Driver={Microsoft Text Driver (*.txt; *.csv)};Dbq=" & System.IO.Path.GetDirectoryName(filediag.PostedFile.FileName).ToString & ";Extensions=asc,csv,tab,txt;Persist Security Info=False;HDR=NO;IMEX=1"
My problem is when i use this schema.ini, the 9th, 10th and 11th column of the second row of the csv file doesn't read properly if there's a special character in it (it supposed to be telphone number), i think because the row above is returned as a number(integer) because it's pure numeric:
[#42r.csv]:
ColNameHeader= false
Format=CSVDelimited
MaxScanRows=0
CharacterSet=ANSI
So what will I do with this?
problem 2:
Since I can't solve the prob no 1, i tried to use the second connection string:
Dim sConnectionString As String = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & System.IO.Path.GetDirectoryName(filediag.PostedFile.FileName).ToString & ";Extended Properties='text;HDR=No;FMT=Delimited;IMEX=1"
The problem with this is it treat the first row of the csv file as a column header. Please help. Thanks.

The issue is that you're using Select * FROM CSVFILE.CSV. This is forcing ADO to infer the datatypes, for which it will probably just use the first row.
The best thing to do is probably to follow the schema suggested in this question:
When reading a CSV file using a DataReader and the OLEDB Jet data provider, how can I control column data types?

Related

Using VB.Net to import my CSV file to my Access DB

I am using the below code to import a CSV file to my Access DB. I just have a couple of questions.
Con.Open()
Dim strSqlCommand = "SELECT F1 AS id, F2 AS firstname " &
"INTO MyNewTable " &
"FROM [Text;FMT=Delimited;HDR=No;CharacterSet=850;DATABASE=" & GlobalVariables.strDefaultDownloadPath & "].Airports.csv;"
Dim sqlCommand = New System.Data.OleDb.OleDbCommand(strSqlCommand, Con)
sqlCommand.ExecuteNonQuery()
Con.Close()
How can I change the Character Set to UTF-8? If I enter utf8 instead of 850 I get an error.
Also, the first line of my CSV file contains the column names. Can I amend the above code to take that in to account?
Regards,
Andrew
You could run into trouble trying to import and select all at once, for one thing you may not want to leave converting data types up to Access. For that, you will need 2 connections and SQL string to select from one another to insert into the other.
The connection string will need to look something like this:
"Provider=Microsoft.Jet.OLEDB.4.0; Data Source=C:\Temp\Tmp;Extended Properties='TEXT;HDR=Yes;FMT=Delimited;CharacterSet=ANSI'"
Note that just the path is listed and the Extended Properties are enclosed in ticks. If the first line has headers/field names then HDR=Yes will skip them in the result set. One of the benefits of having field names as the first line is that OleDB will use them as column names (no need for F1 As foo, F2 As bar; in fact that will fail because they have been renamed from F1, F2...).
The SQL to read from the CSV:
"SELECT * FROM filename.csv"
There are several ways to process it. You could use a reader to read a row at a time to INSERT them into the Access database. This is probably simpler: get all the data from the CSV into a DataTable and use it to INSERT into Access:
Private myDT As DataTable ' form level variable
...
Dim csvStr As String = "Provider=Microsoft.Jet.OLEDB.4.0; Data Source=C:\Temp\Tmp;Extended Properties='TEXT;HDR=Yes;FMT=Delimited;CharacterSet=ANSI'"
Dim csvSQL = "SELECT * FROM Capitals.csv" ' use YOUR file name
Using csvCn = New OleDbConnection(csvStr),
cmd As New OleDbCommand(csvSQL, csvCn)
Using da As New OleDbDataAdapter(cmd)
myDT = New DataTable
da.Fill(myDT)
End Using
End Using
For Each r As DataRow In myDT.Rows
'ToDo: INSERT INTO Access
Next
The Connection, Command and DataAdapter are all resources, so they are in USING blocks to dispose of them when we are done with them. myDT will have a collection of Rows, each with a collection of Items representing the fields from the CSV. Just loop thru the rows adding the desired items to the Access DB.
You will very likely have to do same data type conversion from String to Integer or DateTime etc.
As for the question about UTF8 - you can use the Codepage identifier. If you leave it off the connection string it will use whatever is in the Registry which may also work. For UTF8 use CharacterSet=65001.

How to use VB to create Excel style and format data sheet?

I want to write a program which will replace my current paper based record. My current paper record is basically many column and rows with different width, height, and other properties. I know how to write a VB program that can save the information, but I don't know how to make the VB program to generate a xls datasheet to which would exactly like my paper record.
Would someone please give me the information about that?
Thanks :)
I would recommend http://epplus.codeplex.com/releases/view/42439.
It is very easy to use and integrates flawlessy in vb.net.
I am not providing code as a sample because the samples which are included in the package are very good.
As a hint: Internally I would use a Data-Table to store your values and then use a separate module to load/store it to excel.
An excel file could be thougth as a simple database where each sheet is a different table.
Assuming you have Excel on your machine, you could create an empty XLS file and then use OleDB to fill the worksheets.
Sub WriteToExcel()
Dim con As String con = "Provider=Microsoft.Jet.OLEDB.4.0;" & _
"Data Source=C:\temp\test.xls;" & _
"Extended Properties='Excel 8.0;HDR=No;'"
Using c as OleDbConnection = new OleDbConnection(con))
c.Open()
Dim commandString as String = "Insert into [Sheet1$] (F1, F2, F3) " & _
"values('Column1Text', 'Column2Text', 'Column3Text')"
Using cmd As OleDbCommand = new OleDbCommand(commandString))
cmd.Connection = c
cmd.ExecuteNonQuery()
End Using
End Using
End Sub
other options include OpenXml (which I'd have thought is the "recommended" way to do it but which brings with it a learning curve) or at the other end of the scale (in terms of crudeness) write your data in a comma-separated (csv) format and manually import it into Excel

VB.NET - Reading a blank string from Excel when trying to read the header of a data column?

A simple though very odd problem. I use an OLEDB connection to read from an excel database, and this loop to read in all of the data from each of the columns
While reader.Read()
For i As Integer = 0 To reader.FieldCount - 1
temp = reader(i).ToString + ControlChars.Tab
output_file.Write(temp)
'output_file.Write(reader(i).ToString() + ControlChars.Tab)
Next
output_file.WriteLine()
End While
Some of the columns contain date information, which are read in fine (usually as a string "2/20/2011" or so), but the headers of those columns are read in as a blank "". The headers for all the other columns read in fine, but not for the date containing columns. Any idea how I can fix this?
Is it because OLEDB is inferring type for the date columns (DateTime or whatever) and the headers to do not conform to this type? I've had similar issues with ODBC ignoring the odd alpha string in a column that is otherwise numeric.
Well here's the solution, which I stumble across accidentally. Your connection string needs "IMEX=1;" in it, which tells the reader that all data types should be read as strings.
Dim jet_string As String = "provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + input_file_path + ";Extended Properties=""Excel 8.0;HDR=No;IMEX=1;"""

VB2005 Import Fixed Width text file into Access2007 table...almost?

I am trying to load a text file into an Access 2007 table. I know you can read the file line by line and then create a record out of each line. i was trying to see if this could be done with an INSERT INTO rather than cyclying through all lines of text. My text file is not character delimited but rather by fixed column width. For example:
Date Speed Weight CarID Fuel
1120 200 10000 T230 200
1112 215 11000 F3AE 160
The data in the example has spaces for readability but in reality the data are clumped together like so
112020010000T230200
111221511000F3AE160
Anyway i was attempting
Dim sImportFolder As String = "C:\MyData"
Dim sSource As String = "C:\data.accdb"
Dim sImportFile As String = "week.txt"
Dim AccessConn As New System.Data.OleDb.OleDbConnection("Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" & sSource & ";Persist Security Info=True;Jet OLEDB:Database Password=blah")
AccessConn.Open() 'open the connection to the database
Dim AccessCommand As New System.Data.OleDb.OleDbCommand("INSERT INTO [tblData] ([PtDate], [PtSpeed], [PtWt], [PtCar], [PtFuel]) SELECT F1, F2, F3, F4, F5 FROM [Text;DATABASE=" & sImportFolder & ";].[" & sImportFile & "]")
AccessCommand.Connection = AccessConn
AccessCommand.ExecuteNonQuery()
AccessConn.Close()
I cant figure out how to tell the command how the data is structured. I know you can use a schema file but there's got to be a way to do this all through code.
AGP
There is a similar question on SO here:
Read fixed width record from text file
Basically, the answer is that there isn't something simple you can do in the code to specify the schema and have it broke up for you. What you would need to do is either loop through each row, pulling out the data using SubString and then doing one insert into Access per row (not terribly efficient) or you could build a DataTable in the loop and then do an insert into the Access database using the DataTable. To build the DataTable, you will still need to parse your data (either using SubString or a RegEx).

Problem reading dBase DBF with non-English characters

I have a tool which reads dBase files and uploads the contents to SQL Server, part of a system to import shapefiles. It works but now we have a requirement to import files that include non-English characters (Norwegian in this case, could be other languages later) and they're being corrupted.
The dBase files are being read using an OleDbDataAdapter. Stepping through the code I can see that the text is wrong as it is read in. I'm assuming it's something to do with code pages or Unicode but I have no idea how to fix it.
A dBase Reader application tells me the DBFs are in code page 1252 - I don't know if this is correct. My upload tool runs on Win7 with English (UK) regional settings.
Examples:
ÅSGARD in DBF becomes +SGARD in VB.Net & SQL Server.
RINGHORNE ØST in DBF becomes RINGHORNE ÏST in VB.Net & SQL Server.
The code that reads the DBF:
dbfConnectionString = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & strPath & ";Extended Properties=dBASE IV"
Cnn.ConnectionString = dbfConnectionString
Cnn.Open()
strSQL = "SELECT * FROM [" & strDBF & "]"
DA = New OleDb.OleDbDataAdapter(strSQL, Cnn)
DS = New DataSet
DA.Fill(DS)
If DS.Tables(0).Rows.Count > 0 Then
dtDBF = DS.Tables(0)
Else
dtDBF = Nothing
End If
Data is read like: Name = dtDBF.Rows(index)("NAME_1")
Is there a way to tell OleDbDataAdapter what code page to use or a better way to read dBase files from VB.Net?
Try adding this to your DSN:
CollatingSequence=Norwegian-Danish
You might also be able to use:
CollatingSequence=International
Check whether the shapefile contains codepage information. There are two places to look
Look in the language driver ID (LDID), which is found in the header of the shapefile’s DBF table (in the 29th byte).
Look for an associated separate file with extension .cpg.
If the code page is not specified in those locations, it defaults to the codepage on the PC that generated the shapefile. You will just have to know that :(
I've never used it, but maybe Shape2SQL takes care of this for you? Or shp2text? I believe the PostGIS shapefile loader handles code pages: maybe you could import into PostGIS and then export in another format??
Old question, but this may answer it for future readers...
You might try adding a property setting in your connection string:
Locale Identifier=1044
This property (and a list of values including this one) is documented for ADO in conjunction with Jet 4.0's OLDB Provider but I have no reason to believe it isn't also supported by ADO.Net. This value (1044) is Norwegian/Danish.
Untested, but something else to try.