I have a large CSV file to import (and has to import on a regular basis). The biggest problem is that one of the fields contains descriptions that use double-quotes. So a field might have a value inside the raw csv file like this:
...,...,"100/5 OZ 5/8" x 1/2" x 1/2" Cube",...,...
I currently have simple ADO code that pulls the CSV into a table, but that won't work because of these double quotes :( Currently I am just trying to pull the CSV into a datatable, but the final effort will be to push into a SQL Server table.
Simple ADO Code:
Dim cnStr = "Provider=Microsoft.Jet.OLEDB.4.0;Datasource='C:\csv\';Extended Properties='text;HDR=No;FMT=Delimited';"
Dim dt as new Datatable
Using tblAdp as new OleDBDataAdapter("Select * from [ZMES.csv]",cnStr
tblAdp.fill(dt)
End Using
DataGridView1.DataSource = dt
When it hits those " in the midst of the string, it truncates that row and moves to the next. I need to do something similar to what I would do inserting quotes into a database in php (escaping the double quotes), but not real sure how. As a test, I also tried LumenWorks CSV Reader and it faulted out on those lines.
Related
I need help with how to work on text file (like database).
I create excel GUI (with macro's), that search imputed string in sheets with lots of data and display entire row with matching string (for people with installed MS office)
Now I must create alternative VB.Net application working only on tab delimited text files (without ADO.Net) for people who haven't installed MS office, and I don't know how start to work with it.
import them? if yes, then how.
working directly on them? if yes, then how.
My text files is exported excels files/sheets to tab delimited .txt, with loots of columns (100+) with headers, and lots of rows 500+
need help :)
thx
If you want to get the headers from the first line of the file then do this ...
Sub Main()
Dim dt = New DataTable
Dim lines = File.ReadAllLines("TextFile1.txt")
Dim headers = lines(0).Split(vbTab)
For Each header In headers
dt.Columns.Add(header)
Next
For Each line In lines.Skip(1)
Dim parts = line.Split(vbTab)
dt.Rows.Add(parts)
Next
End Sub
The following code I have reads a tab delimited file into a DataGridView. It works fine, but there are a couple of issues I'm not exactly sure how to address.
Dim query = From line In IO.File.ReadAllLines("C:\Temp\Temp.txt")
Let Data = line.Split(vbTab)
Let field1 = Data(0)
Let field2 = Data(1)
Let field3 = Data(2)
Let field4 = Data(3)
DataGridView1.DataSource = query.ToList
DataGridView1.Columns(0).Visible = False
How do I go about adding fields (columns) based on the number of fields in the header row? The header row currently contains 110 fields, which I'd hate to define in a similar manner to Let field1 = Data(0)
I'd also need to skip the header row and only display the lines after this.
Is there a better way to handle this then what I'm currently doing?
There are several tools to parse this type of file. One is OleDB.
I cant quite figure out how the (deleted) answer works because, HDR=No; tells the Text Driver the first row does not contain column names. But this is sometimes ignored after it reads the first 8 lines without IMEX.
However, FMT=Delimited\""" looks like it was copied from a C# answer because VB doesnt use \ to escape chars. It also looks like it is confusing the column delimiter (comma or tab in this case) and text delimiter (usually ")
If the file is tab delimited, the correct value would be FMT=TabDelimited. I am guessing that the fields are text delimited with quotes (e.g. "France" "Paris" "2.25") and OleDB is chopping the data by quotes rather than tabs to accidentally get the same result.
The correct ACE string would be:
Dim connstr = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source='C:\Temp';Extended Properties='TEXT;HDR=Yes;FMT=TabDelimited';"
Using just the connection string will import each filed as string. You can also have OleDB convert the data read to whatever datatype it is meant to be so that you do not have to litter your code with lots of Convert.ToXXXX to convert the String data to whatever.
This requires using a Schema.INI to define the file. This replaces most of the Extended Properties in the connection string leaving only Extended Properties='TEXT';" (which means use the TEXT Driver). Create a file name Schema.INI in the same folder as the data:
[Capitals.txt]
ColNameHeader=True
CharacterSet=437
Format=TabDelimited
TextDelimiter="
DecimalSymbol=.
CurrencySymbol=$
Col1="Country" Text Width 254
Col2="Capital City" Text Width 254
Col3="Population" Single
Col4="Fake" Integer
One Schema.INI can contain the layout for many files. Each file has its own section titled with the name of the file (e.g. [FooBar.CSV], [Capitals.txt]etc)
Most of the entries should be self-explanatory, but FORMAT defines the column delimiter (TabDelimited, CSVDelimited or custom Delimited(;)); TextDelimiter is the character is used to enclose column data when it might contain spaces or other special characters. Things like CurrencySymbol lets you allow for a foreign symbol and can be omitted.
The ColN= listings are where you can rename columns and specify the datatype. This might be tedious to enter for 100+ columns, however it would probably be mostly copy and paste. Once it is done you'd always have it and be able to easily use typed data.
You do not need to specify the column names/size/type to use a Schema.INI If the file includes column names as the first row (ColNameHeader=True), you can use the Schema simply to specify the various parameters in a clear and readable fashion rather than squeezing them into the connection string.
OleDB looks for a Schema.INI in the same folder as the import file, and then looks for a section bearing the exact name of the "table" used in the SQL:
' form level DT var
Private capDT As DataTable
' procedure code to load the file:
Dim connstr = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source='C:\Temp';Extended Properties='TEXT';"
Dim SQL = "SELECT * FROM Capitals.txt"
capDT = New DataTable
' USING will close and dispose of resources
Using cn As New OleDbConnection(connstr),
cmd As New OleDbCommand(SQL, cn)
cn.Open()
Using da As New OleDbDataAdapter(cmd)
da.Fill(capDT)
End Using
End Using ' close and dispose
The DataTable is now ready to use. If we iterate the columns, you can see they match the Type specified in the schema:
' display data types
For n As Int32 = 0 To capDT.Columns.Count - 1
Console.WriteLine("name: {0}, datatype: {1}",
capDT.Columns(n).ColumnName,
capDT.Columns(n).DataType.ToString)
Next
Output:
name: Country, datatype: System.String
name: Capital City, datatype: System.String
name: Population, datatype: System.Single
name: Fake, datatype: System.Int32
See also:
Schema.INI for most legal settings
Code Page Identifiers for the values to use for CharacterSet
I am using the below code to import a CSV file to my Access DB. I just have a couple of questions.
Con.Open()
Dim strSqlCommand = "SELECT F1 AS id, F2 AS firstname " &
"INTO MyNewTable " &
"FROM [Text;FMT=Delimited;HDR=No;CharacterSet=850;DATABASE=" & GlobalVariables.strDefaultDownloadPath & "].Airports.csv;"
Dim sqlCommand = New System.Data.OleDb.OleDbCommand(strSqlCommand, Con)
sqlCommand.ExecuteNonQuery()
Con.Close()
How can I change the Character Set to UTF-8? If I enter utf8 instead of 850 I get an error.
Also, the first line of my CSV file contains the column names. Can I amend the above code to take that in to account?
Regards,
Andrew
You could run into trouble trying to import and select all at once, for one thing you may not want to leave converting data types up to Access. For that, you will need 2 connections and SQL string to select from one another to insert into the other.
The connection string will need to look something like this:
"Provider=Microsoft.Jet.OLEDB.4.0; Data Source=C:\Temp\Tmp;Extended Properties='TEXT;HDR=Yes;FMT=Delimited;CharacterSet=ANSI'"
Note that just the path is listed and the Extended Properties are enclosed in ticks. If the first line has headers/field names then HDR=Yes will skip them in the result set. One of the benefits of having field names as the first line is that OleDB will use them as column names (no need for F1 As foo, F2 As bar; in fact that will fail because they have been renamed from F1, F2...).
The SQL to read from the CSV:
"SELECT * FROM filename.csv"
There are several ways to process it. You could use a reader to read a row at a time to INSERT them into the Access database. This is probably simpler: get all the data from the CSV into a DataTable and use it to INSERT into Access:
Private myDT As DataTable ' form level variable
...
Dim csvStr As String = "Provider=Microsoft.Jet.OLEDB.4.0; Data Source=C:\Temp\Tmp;Extended Properties='TEXT;HDR=Yes;FMT=Delimited;CharacterSet=ANSI'"
Dim csvSQL = "SELECT * FROM Capitals.csv" ' use YOUR file name
Using csvCn = New OleDbConnection(csvStr),
cmd As New OleDbCommand(csvSQL, csvCn)
Using da As New OleDbDataAdapter(cmd)
myDT = New DataTable
da.Fill(myDT)
End Using
End Using
For Each r As DataRow In myDT.Rows
'ToDo: INSERT INTO Access
Next
The Connection, Command and DataAdapter are all resources, so they are in USING blocks to dispose of them when we are done with them. myDT will have a collection of Rows, each with a collection of Items representing the fields from the CSV. Just loop thru the rows adding the desired items to the Access DB.
You will very likely have to do same data type conversion from String to Integer or DateTime etc.
As for the question about UTF8 - you can use the Codepage identifier. If you leave it off the connection string it will use whatever is in the Registry which may also work. For UTF8 use CharacterSet=65001.
I have this txt file with the following information:
National_Insurence_Number;Name;Surname;Hours_Worked;Price_Per_Hour so:
eg.: aa-12-34-56-a;Peter;Smith;36;12
This data has been inputed to the txt file through a VB form which works totally fine, the problem comes when, on another form. This is what I expect it to do:
The user will input into a text box the employees NI Number.
The program will then search through the file that NI Number and, if found;
It will fill in the appropriate text boxes with its data.
(Then the program calculates tax and national insurance which i got working fine)
So basically the problem comes telling the program to search that NI number and introduce each ";" delimited field into its corresponding text box.
Thanks for all.
You just need to parse the file like a csv, you can use Microsoft.VisualBasic.FileIO.TextFieldParser to do this or you can use CSVHelper - https://github.com/JoshClose/CsvHelper
I've used csv helper in the past and it works great, it allows you to create a class with the structure of the records in your data file then imports the data into a list of these for searching.
You can look here for more info on TextFieldParser if you want to go that way -
Parse Delimited CSV in .NET
Dim afile As FileIO.TextFieldParser = New FileIO.TextFieldParser(FileName)
Dim CurrentRecord As String() ' this array will hold each line of data
afile.TextFieldType = FileIO.FieldType.Delimited
afile.Delimiters = New String() {";"}
afile.HasFieldsEnclosedInQuotes = True
' parse the actual file
Do While Not afile.EndOfData
Try
CurrentRecord = afile.ReadFields
Catch ex As FileIO.MalformedLineException
Stop
End Try
Loop
I'd recommend using CsvHelper though, the documentation is pretty good and working with objects is much easier opposed to the raw string data.
Once you have found the record you can then manually set the text of each text box on your form or use a bindingsource.
I am trying to load a text file into an Access 2007 table. I know you can read the file line by line and then create a record out of each line. i was trying to see if this could be done with an INSERT INTO rather than cyclying through all lines of text. My text file is not character delimited but rather by fixed column width. For example:
Date Speed Weight CarID Fuel
1120 200 10000 T230 200
1112 215 11000 F3AE 160
The data in the example has spaces for readability but in reality the data are clumped together like so
112020010000T230200
111221511000F3AE160
Anyway i was attempting
Dim sImportFolder As String = "C:\MyData"
Dim sSource As String = "C:\data.accdb"
Dim sImportFile As String = "week.txt"
Dim AccessConn As New System.Data.OleDb.OleDbConnection("Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" & sSource & ";Persist Security Info=True;Jet OLEDB:Database Password=blah")
AccessConn.Open() 'open the connection to the database
Dim AccessCommand As New System.Data.OleDb.OleDbCommand("INSERT INTO [tblData] ([PtDate], [PtSpeed], [PtWt], [PtCar], [PtFuel]) SELECT F1, F2, F3, F4, F5 FROM [Text;DATABASE=" & sImportFolder & ";].[" & sImportFile & "]")
AccessCommand.Connection = AccessConn
AccessCommand.ExecuteNonQuery()
AccessConn.Close()
I cant figure out how to tell the command how the data is structured. I know you can use a schema file but there's got to be a way to do this all through code.
AGP
There is a similar question on SO here:
Read fixed width record from text file
Basically, the answer is that there isn't something simple you can do in the code to specify the schema and have it broke up for you. What you would need to do is either loop through each row, pulling out the data using SubString and then doing one insert into Access per row (not terribly efficient) or you could build a DataTable in the loop and then do an insert into the Access database using the DataTable. To build the DataTable, you will still need to parse your data (either using SubString or a RegEx).