VB.NET - Reading a blank string from Excel when trying to read the header of a data column? - vb.net

A simple though very odd problem. I use an OLEDB connection to read from an excel database, and this loop to read in all of the data from each of the columns
While reader.Read()
For i As Integer = 0 To reader.FieldCount - 1
temp = reader(i).ToString + ControlChars.Tab
output_file.Write(temp)
'output_file.Write(reader(i).ToString() + ControlChars.Tab)
Next
output_file.WriteLine()
End While
Some of the columns contain date information, which are read in fine (usually as a string "2/20/2011" or so), but the headers of those columns are read in as a blank "". The headers for all the other columns read in fine, but not for the date containing columns. Any idea how I can fix this?

Is it because OLEDB is inferring type for the date columns (DateTime or whatever) and the headers to do not conform to this type? I've had similar issues with ODBC ignoring the odd alpha string in a column that is otherwise numeric.

Well here's the solution, which I stumble across accidentally. Your connection string needs "IMEX=1;" in it, which tells the reader that all data types should be read as strings.
Dim jet_string As String = "provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + input_file_path + ";Extended Properties=""Excel 8.0;HDR=No;IMEX=1;"""

Related

ADO Recordset - Text Field Returned as Double

My recordset object returns a field as a Double data type, even though the data source contains text. Because of this conversion, the recordset object returns a null for that field.
The data source is an Excel worksheet with static data. All records in that field contain text data, but with varying lengths (3-800 characters), and is never blank.
I randomly noticed that when I insert an empty column to the right of this field, the SQL query correctly recognizes the field as a text field (more specifically, an adLongVarWChar/Memo field). It's extremely bizzare, but I would appreciate it if someone could help me figure out what's going on, and if there's a better solution.
I'm using the following connection string with Microsoft Excel 2016:
strConnString = "Provider=Microsoft.ACE.OLEDB.12.0; Data Source=" & mstrFile & ";Extended Properties=""Excel 12.0 Macro;HDR=YES;IMEX=1"""

Multiple open commands oledb connections for querying Excel

Private Function CreateConnString(ByVal Str As String) As String
Return "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" & Str & ";Extended Properties=""Excel 12.0;HDR=Yes;IMEX=1"""
End Function
...
For Each sMatl_Num As String In alMaterialNumbers
ifileNo = 1
dbConnection.ConnectionString = CreateConnString(sExcelDBPath)
dbCommand.Connection = dbConnection
dbCommand.CommandText = "SELECT [col1], [col2], [col3], [col4], [col5], [col6] FROM [sheet$] WHERE [material]='" & sMatl_Num & "'" & " AND [col3] IS NOT NULL" & " AND [col6] IS NOT NULL"
dbConnection.Open()
dbReader = dbCommand.ExecuteReader()
If dbReader.HasRows Then
Do While dbReader.Read
sCol1= dbReader.GetString(0).ToString
sCol2= dbReader.GetString(1).ToString
sCol3= dbReader.GetString(2).ToString
sCol4= dbReader.GetString(3).ToString
sCol5= dbReader.GetString(4).ToString
sCol6= dbReader.GetString(5).ToString
'Write txt file with name and content derived from these strings
Loop
End If
dbReader.Close()
dbConnection.Close()
Next
I am querying an Excel file using oledb. There are quite a few columns that a get data from for each row. This is running insanely slow. Is there a way I can optimize this? the count of the array list is approximately 23k.
Try to only open the connection once then process the rows it returns as a recordset then close it.
So you could change the SQL to order the rows by sMatl_Num and to have a where clause that only selects the MaterialNumbers in alMaterialNumbers (this might be an IN clause or a subquery - it depends on how many values there might be)
So having got a recordset you can loop through it writing out the rows for each MaterialNumbers , when the number changes write out to the next file...
Does this make sense.
The problem is that for every sMatl_Num (I assume there are 23k?) you are opening an OLEDB connection, reading a record in, and then grabbing the fields one by one, writing them, then closing the connection. Opening and closing the connection, querying for a single material number while filtering other fields, and picking up data out of the fields returned is expensive.
This would probably be faster if you brought the entire recordset in at once (copy and paste), filter out col3=null and col6=null using autofilters or something, then just use native vlookup functionality to pull in Col1 - Col6 values. And it could still all be done in VBA.
The method you are using would be better suited for a single quick look up against a huge file. Like if you had a UI in Excel where a user would put in a material number and you would OLEDB over to your material master data and grab the attributes you want and return them to the UI. Once.

Using VB.Net to import my CSV file to my Access DB

I am using the below code to import a CSV file to my Access DB. I just have a couple of questions.
Con.Open()
Dim strSqlCommand = "SELECT F1 AS id, F2 AS firstname " &
"INTO MyNewTable " &
"FROM [Text;FMT=Delimited;HDR=No;CharacterSet=850;DATABASE=" & GlobalVariables.strDefaultDownloadPath & "].Airports.csv;"
Dim sqlCommand = New System.Data.OleDb.OleDbCommand(strSqlCommand, Con)
sqlCommand.ExecuteNonQuery()
Con.Close()
How can I change the Character Set to UTF-8? If I enter utf8 instead of 850 I get an error.
Also, the first line of my CSV file contains the column names. Can I amend the above code to take that in to account?
Regards,
Andrew
You could run into trouble trying to import and select all at once, for one thing you may not want to leave converting data types up to Access. For that, you will need 2 connections and SQL string to select from one another to insert into the other.
The connection string will need to look something like this:
"Provider=Microsoft.Jet.OLEDB.4.0; Data Source=C:\Temp\Tmp;Extended Properties='TEXT;HDR=Yes;FMT=Delimited;CharacterSet=ANSI'"
Note that just the path is listed and the Extended Properties are enclosed in ticks. If the first line has headers/field names then HDR=Yes will skip them in the result set. One of the benefits of having field names as the first line is that OleDB will use them as column names (no need for F1 As foo, F2 As bar; in fact that will fail because they have been renamed from F1, F2...).
The SQL to read from the CSV:
"SELECT * FROM filename.csv"
There are several ways to process it. You could use a reader to read a row at a time to INSERT them into the Access database. This is probably simpler: get all the data from the CSV into a DataTable and use it to INSERT into Access:
Private myDT As DataTable ' form level variable
...
Dim csvStr As String = "Provider=Microsoft.Jet.OLEDB.4.0; Data Source=C:\Temp\Tmp;Extended Properties='TEXT;HDR=Yes;FMT=Delimited;CharacterSet=ANSI'"
Dim csvSQL = "SELECT * FROM Capitals.csv" ' use YOUR file name
Using csvCn = New OleDbConnection(csvStr),
cmd As New OleDbCommand(csvSQL, csvCn)
Using da As New OleDbDataAdapter(cmd)
myDT = New DataTable
da.Fill(myDT)
End Using
End Using
For Each r As DataRow In myDT.Rows
'ToDo: INSERT INTO Access
Next
The Connection, Command and DataAdapter are all resources, so they are in USING blocks to dispose of them when we are done with them. myDT will have a collection of Rows, each with a collection of Items representing the fields from the CSV. Just loop thru the rows adding the desired items to the Access DB.
You will very likely have to do same data type conversion from String to Integer or DateTime etc.
As for the question about UTF8 - you can use the Codepage identifier. If you leave it off the connection string it will use whatever is in the Registry which may also work. For UTF8 use CharacterSet=65001.

Using Dates from Cell or named Range in Sql Query

I have created a sheet to extract data from a Microsoft SQL database to produce a customer report between 2 date StartDate and EndDate.
I have been playing with a few things but have not been successful in anyway. I have searched but have not been able to find anything that was what I was after or able to understand.
The problem I believe is data type of the date I am using in Excel and trying to pass it to the SQL query. I understand I need to convert this in some way to make this possible and correct.
If I manually enter dates into the query it works fine. But not practical for customer use
I am not experience with this and am just! stubbing my way through it. If someone would be so kind to me with this would be much appreciated.
Below is the code I am trying to use
Sub DataExtract()
'
DataExtract Macro
'
' Create a connection object.
Dim cni96X As ADODB.Connection
Set cni96X = New ADODB.Connection
' Set Database Range
' Provide the connection string.
Dim strConn As String
Dim Lan As Integer
Dim OS As Integer
Dim PointID As String
' Set Variables
Lan = Range("Lan").Value
OS = Range("OS").Value
PointID = Range("PointID").Value
StartDate = Range("StartDate").Value
EndDate = Range("EndDate").Value
'Use the SQL Server OLE DB Provider.
strConn = "PROVIDER=SQLOLEDB;"
'Connect to 963 database on the local server.
strConn = strConn & "DATA SOURCE=(local);INITIAL CATALOG=i96X;"
'Use an integrated login.
strConn = strConn & " INTEGRATED SECURITY=sspi;"
'Now open the connection.
cni96X.Open strConn
' Create a recordset object.
Dim rsi96X As ADODB.Recordset
Dim rsi96X1 As ADODB.Recordset
Set rsi96X = New ADODB.Recordset
Set rsi96X1 = New ADODB.Recordset
With rsi96X
' Assign the Connection object.
.ActiveConnection = cni96X
' Extract the required records1.
.Open "SELECT ModuleLabel, originalAlarmTime FROM LastAlarmDetailsByTime WHERE (os = " & OS & " And theModule = N'" & PointID & "'AND AlarmCode = N'DI=1' And lan = " & Lan & " And originalAlarmTime BETWEEN N'" & StartDate & "' AND N'" & EndDate & "') ORDER BY originalAlarmTime DESC"
' Copy the records into sheet.
Range("PointLabel, TimeCallInitiated").CopyFromRecordset rsi96X
With rsi96X1
.ActiveConnection = cni96X
' Assign the Connection object.
.Open "SELECT originalAlarmTime FROM LastAlarmDetailsByTime WHERE (os = " & OS & " And theModule = N'" & PointID & "'AND AlarmCode = N'CDI1' And lan = " & Lan & " And originalAlarmTime BETWEEN N'" & StartDate & "' AND N'" & EndDate & "')ORDER BY originalAlarmTime DESC"
' Copy the records into sheet.
Sheet1.Range("TimeCallEnded").CopyFromRecordset rsi96X1
' Tidy up
.Close
I hope this makes sense.
You cannot specify the data types, the Access database engine (formerly Jet) must guess. You can influence its guesswork by changing certain registry settings (e.g. MaxScanRows) and including IMEX=1 in the connection string. For more details, see this knowledge base article.
Here's something I wrote on the subject many years ago (if you google for "ONEDAYWHEN=0" you can see it has been widely read though perhaps not carefully enough!):
The relevant registry keys (for Jet 4.0) are in:
Hkey_Local_Machine/Software/Microsoft/Jet/4.0/Engines/Excel/
The ImportMixedTypes registry key is always read (whether it is
honored is discussed later). You can test this by changing the key to
ImportMixedTypes=OneDayWhen and trying to use the ISAM: you get the
error, "Invalid setting in Excel key of the Engines section of the
Windows Registry." The only valid values are:
ImportMixedTypes=Text
ImportMixedTypes=Majority Type
Data type is determined column by column. 'Majority Type' means a
certain number of rows (more on this later) in each column are scanned
and the data types are counted. Both a cell's value and format are
used to determine data type. The majority data type (i.e. the one with
the most rows) decides the overall data type for the entire column.
There's a bias in favor os numeric in the event of a tie. Rows from
any minority data types found that can't be cast as the majority data
type will be returned with a null value.
For ImportMixedTypes=Text, the data type for the whole column will be:
Jet (MS Access UI): 'Text' data type
DDL: VARCHAR(255)
ADO: adWChar ('a null-terminated Unicode character string')
Note that this is distinct from:
Jet (MS Access UI): 'Memo' data type
DDL: MEMO
ADO: adLongVarWChar ('a long null-terminated Unicode string value')
ImportMixedTypes=Text will curtail text at 255 characters as Memo is
cast as Text. For a column to be recognized as Memo, majority type
must be detected, meaning the majority of rows detected must contain
256 or more characters.
But how many rows are scanned for each column before is decided that
mixed and/or what the majority type is? There is a second registry
key, TypeGuessRows. This can be a value from 0-16 (decimal). A value
from 1 to 16 inclusive is the number of rows to scan. A value of zero
means all rows will be scanned.
There is one final twist. A setting of IMEX=1 in the connection
string's extended property determines whether the ImportMixedTypes
value is honored. IMEX refers to 'IMport EXport mode'. There are three
possible values. IMEX=0 and IMEX=2 result in ImportMixedTypes being
ignored and the default value of 'Majority Types' is used. IMEX=1 is
the only way to ensure ImportMixedTypes=Text is honored. The resulting
connection string might look like this:
Provider=Microsoft.Jet.OLEDB.4.0;
Data Source=C:\ db.xls;
Extended Properties='Excel 8.0;HDR=Yes;IMEX=1'
Finally, although it is mentioned in MSDN articles that MAXSCANROWS
can be used in the extended properties of the connection string to
override the TypeGuessRows registry keys, this seems to be a fallacy.
Using MAXSCANROWS=0 in this way never does anything under any
circumstances. Put another way, is has just the same effect as putting
ONEDAYWHEN=0 in the extended properties, being none (not even an
error!) The same applied to ImportMixedTypes i.e. can't be used in
the connection string to override the registry setting.
In summary, use TypeGuessRows to get Jet to detect whether a 'mixed
types' situation exists or use it to 'trick' Jet into detecting a
certain data type as being the majority type. In the event of a
'mixed types' situation being detected, use ImportMixedTypes to tell
Jet to either use the majority type or coerce all values as Text
(max 255 characters).
Try changing the date part of your SQL statement to:
"[...] originalAlarmTime BETWEEN '" & Format$(StartDate, "yyyy-mm-dd") & "' AND '" & Format$(EndDate, "yyyy-mm-dd") & "' [...]"
You might also try using a parameterized query.

csv file reading problem

So I have a csv file:
"453FDG","656HGH","54645","MARIA","V543","534","TRETCITY","R34",09094553,09094553,09094553,"21/01/10","RE"
"45er3FDG","656HGH","54645","M343ARIA","V543","534","TRETCITY","R34",090-94553,0909-4553,090-94553,"21/01/10","RE"
problem 1:
Connection string is this:
Dim strConnString As String = "Driver={Microsoft Text Driver (*.txt; *.csv)};Dbq=" & System.IO.Path.GetDirectoryName(filediag.PostedFile.FileName).ToString & ";Extensions=asc,csv,tab,txt;Persist Security Info=False;HDR=NO;IMEX=1"
My problem is when i use this schema.ini, the 9th, 10th and 11th column of the second row of the csv file doesn't read properly if there's a special character in it (it supposed to be telphone number), i think because the row above is returned as a number(integer) because it's pure numeric:
[#42r.csv]:
ColNameHeader= false
Format=CSVDelimited
MaxScanRows=0
CharacterSet=ANSI
So what will I do with this?
problem 2:
Since I can't solve the prob no 1, i tried to use the second connection string:
Dim sConnectionString As String = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & System.IO.Path.GetDirectoryName(filediag.PostedFile.FileName).ToString & ";Extended Properties='text;HDR=No;FMT=Delimited;IMEX=1"
The problem with this is it treat the first row of the csv file as a column header. Please help. Thanks.
The issue is that you're using Select * FROM CSVFILE.CSV. This is forcing ADO to infer the datatypes, for which it will probably just use the first row.
The best thing to do is probably to follow the schema suggested in this question:
When reading a CSV file using a DataReader and the OLEDB Jet data provider, how can I control column data types?