I would like to query an UTF-8 encoded CSV file using VBA in Excel 2010 with the following database connection:
provider=Microsoft.Jet.OLEDB.4.0;;data source='xyz';Extended Properties="text;HDR=Yes;FMT=Delimited(,);CharacterSet=65001"
All CSV files start with the BOM \xEF\xBB\xBF and the header line. Somehow the BOM isn't recognized correctly and the first column header gets read as "?header_name", i.e. a question mark gets prepended. I have tried different CharacterSets and I have also tried to use Microsoft.ACE.OLEDB.12.0, but everything was without success so far.
Is this a known bug or is there any way to get the right first column header name without changing the encoding of the source files?
The following procedure extracts the entire CSVfile into a new Sheet, clearing the BOM from the Header. It has the Path, Filename and BOM string as variables to provide flexibility.
Use this procedure to call the Query procedure
Sub Qry_Csv_Utf8()
Const kFile As String = "UTF8 .csv"
Const kPath As String = "D:\StackOverFlow\Temp\"
Const kBOM As String = "\xEF\xBB\xBF"
Call Ado_Qry_Csv(kPath, kFile, kBOM)
End Sub
This is the Query procedure
Sub Ado_Qry_Csv(sPath As String, sFile As String, sBOM As String)
Dim Wsh As Worksheet
Dim AdoConnect As ADODB.Connection
Dim AdoRcrdSet As ADODB.Recordset
Dim i As Integer
Rem Add New Sheet - Select option required
'With ThisWorkbook 'Use this if procedure is resident in workbook receiving csv data
'With Workbooks(WbkName) 'Use this if procedure is not in workbook receiving csv data
With ActiveWorkbook 'I used this for testing purposes
Set Wsh = .Sheets.Add(After:=.Sheets(.Sheets.Count))
'Wsh.Name = NewSheetName 'rename new Sheet
End With
Set AdoConnect = New ADODB.Connection
AdoConnect.Open "Provider=Microsoft.Jet.OLEDB.4.0;" & _
"Data Source=" & sPath & ";" & _
"Extended Properties='text;HDR=Yes;FMT=Delimited(,);CharacterSet=65001'"
Set AdoRcrdSet = New ADODB.Recordset
AdoRcrdSet.Open Source:="SELECT * FROM [" & sFile & "]", _
ActiveConnection:=AdoConnect, _
CursorType:=adOpenDynamic, _
LockType:=adLockReadOnly, _
Options:=adCmdText
Rem Enter Csv Records in Worksheet
For i = 0 To -1 + AdoRcrdSet.Fields.Count
Wsh.Cells(1, 1 + i).Value = _
WorksheetFunction.Substitute(AdoRcrdSet.Fields(i).Name, sBOM, "")
Next
Wsh.Cells(2, 1).CopyFromRecordset AdoRcrdSet
End Sub
The only solution for this problem I found is to use Schema.ini file.
my test csv file
Col_A;Col_B;Col_C
Some text example;123456789;3,14
Schema.ini for my test csv file
[UTF-8_Csv_With_BOM.csv]
Format=Delimited(;)
Col1=Col_A Text
Col2=Col_B Long
Col3=Col_C Double
This Schema.ini file contains the name of the source csv file and describes my columns. Each column is specified by its name and type but you can specify more informations. This file must be located in the same folder as your csv file. More info here.
Finally the VBA code which reads the csv file. Note that HDR=No. This is because the columns headers are defined in the Schema.ini.
' Add reference to Microsoft ActiveX Data Objects 6.1 Library
Sub ReadCsv()
Const filePath As String = "c:\Temp\StackOverflow\"
Const fileName As String = "UTF-8_Csv_With_BOM.csv"
Dim conn As ADODB.Connection
Dim rs As New ADODB.Recordset
Set conn = New ADODB.Connection
conn.Open "Provider=Microsoft.Jet.OLEDB.4.0;Data Source='" & filePath & _
"';Extended Properties='text;HDR=No;FMT=Delimited()';"
With rs
.ActiveConnection = conn
.Open "SELECT * FROM [" & fileName & "]"
If Not .BOF And Not .EOF Then
While (Not .EOF)
Debug.Print rs.Fields("Col_A") & " " & _
rs.Fields("Col_B") & " " & _
rs.Fields("Col_C")
.MoveNext
Wend
End If
.Close
End With
conn.Close
Set conn = Nothing
End Sub
Output
Some text example 123456789 3,14
Related
I have a folder with lots (hundreds) of locked .xls files.
I need to copy a specific range from one of the worksheets in each file into one big worksheet, which would be my data file for future analysis.
I tried to write a macro for this, but keep getting errors.
Please help me debug what I wrote:
Sub ProcessFiles()
' declarations & definitions
Dim Pathname As String
Dim Filename As String
Dim sourceWB As Workbook
Dim targetWB As Workbook
targetWB = ActiveWorkbook
Pathname = ActiveWorkbook.Path & "\Files\"
Filename = Dir(Pathname & "*.xls")
' loop through all files in folder
Do While Filename <> ""
Set sourceWB = Workbooks.Open(Pathname & Filename)
' unlock worksheets
sourceWB.Sheets(4).Visible = True
sourceWB.Sheets(4).Unprotect Password:="Password"
sourceWB.Sheets(2).Unprotect Password:="Password"
' create new worksheet
sourceWB.Sheets.Add After:=8
' copy required cells to new sheets
sourceWB.Sheets(2).Range("A14:FM663").Copy Destination:=sourceWB.Sheets(9).Range("C2")
' fill columns for all rows
sourceWB.Sheets(9).Range("A2:A663").Value = sourceWB.Name
sourceWB.Sheets(9).Range("B2:B663").Value = Worksheets(4).Range("C13").Value
'move AuxSheet to taget workbook
sourceWB.Sheets(9).Move Before:=Workbooks(targetWB).Sheets(1)
'add to full data worksheet
targetWB.Sheets(1).Range("A2:FO651").Copy Destination:=sourceWB.Sheets(2).Rows("3:" & Worksheets("Sheet2").UsedRange.Rows.Count)
'close file and repeat
sourceWB.Close SaveChanges:=False
Filename = Dir()
Loop
' save result
targetWB.Save
End Sub
Just to give you an idea of how tasks like this can be handled way more efficient... consider the following that I always use for tasks like this:
Option Explicit
' 1. Add reference to Microsoft Scripting Runtime and Access Data Objects Library via Extras>References
Sub ProcessFiles()
Dim strCon As String
Dim strSQL As String
Dim fso As New Scripting.FileSystemObject
Dim myfile As file
With ThisWorkbook
' 2. empty your outputsheet
.Sheets("out").Cells.Clear
' 3. loop the files in your folder
For Each myfile In fso.GetFolder(.Path & Application.PathSeparator & "Files").Files
' 3.1. no proper way to filter files like in Dir(), but we want to use the file objects
If myfile.Name Like "*.xls" Then
' 3.1.1. Construct the connection string, the only variable part is myfile.Path
strCon = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" & myfile.Path & ";Extended Properties='Excel 8.0;HDR=YES';"
' 3.1.2. Construct the SQL String. Luckily, you already know where your data is
strSQL = "SELECT '" & myfile.Name & "' AS WorkbookName, * FROM [sheetData$A1:C5], (SELECT TOP 1 * FROM [sheetSchool$C12:C13])"
' 3.1.3. Call the get-data sub from below
GetData .Sheets("out"), strCon, strSQL
End If
Next myfile
End With
End Sub
Sub GetData(ByRef wsOut As Variant, strCon As String, strSQL As String)
Dim i As Integer
On Error GoTo skpError
Application.ScreenUpdating = False
' Create a new database connection
Dim objCon As New ADODB.Connection
With objCon
.ConnectionString = strCon
.Open
End With
' Create a new database command
Dim objCmd As New ADODB.Command
With objCmd
.ActiveConnection = objCon
.CommandType = adCmdText
.CommandText = strSQL
Debug.Print .CommandText
End With
' Create a new recordset
Dim objRS
Set objRS = New ADODB.Recordset
With objRS
.ActiveConnection = objCon
.Open objCmd
End With
' Print your FieldNames, in case they're not already there
With wsOut
If wsOut.Cells(1, 1).Value = vbNullString Then
For i = 1 To objRS.Fields.Count
.Cells(1, i).Value = _
objRS.Fields(i - 1).Name
Next i
End If
' Output your data - pretty ugly, but reliable
.Range("A1048576").End(xlUp).Offset(1, 0).CopyFromRecordset (objRS)
End With
skpNoError:
Application.ScreenUpdating = True
Exit Sub
skpError:
MsgBox "Error #" & Err & vbNewLine & Error, vbCritical
GoTo skpNoError
End Sub
Notes: (why use something like this?)
protected and hidden worksheets shouldn't be a problem with this. For protected workbooks, a password parameter can be added to the connection string
this will be considerably faster for a large number of files than opening, editing, copying would be. If you feel fancy, you can further speed things up by moving stuff from the GetData-Sub to ProcessFiles, so they won't get called repeatedly.
you use a database language for querying data instead of some clumsy copy/paste mechanism.
Edit:
Edited my code, for me this works with the example you gave.
From what i gather, you only got protected Worksheets, not a password-protected Workbook - therefore there's no need to unhide or unprotect your worksheets
adjust the line strSQL = "SELECT '" & myfile.Name & "' AS WorkbookName, * FROM [sheetData$A1:C5], (SELECT TOP 1 * FROM [sheetSchool$C12:C13])" to contain your actual Sheets(2) and Sheets(4) names
I am using the following code to read data from Sheet1 of SAME Excel sheet. I load the data into the return array. The Excel sheet file has "read only" checked and is always opened in "READ ONLY" mode.
The issue is that if I change any of the data on Sheet1, because the file is opened as "read only", it won't be reflected in the ADO query. ADO Continues to output what is in the "saved" file and ignores what has been updated in the temp read only version.
For example the below pulls value "Col5:6" from cell "E6". If I replace the value to be "test", ADO still outputs "Col5:6"
How can I make ADO read the current data on Sheet1 without having to "save as"?
Sub sbADO()
Dim sSQLSting As String
Dim Conn As New ADODB.Connection
Dim mrs As New ADODB.Recordset
Dim DBPath As String, sconnect As String
Dim returnArray
DBPath = ThisWorkbook.FullName
sconnect = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" & DBPath _
& ";Extended Properties=""Excel 12.0;HDR=Yes;IMEX=1"";"
Conn.Open sconnect
sSQLSting = "SELECT * From [Sheet1$] "
mrs.Open sSQLSting, Conn
returnArray = mrs.GetRows
mrs.Close
Conn.Close
Debug.Print returnArray(4, 4) '>> "Col5:6"
End Sub
You can't read unsaved changes from Excel worksheet with ADO since the unsaved data is located in the memory (RAM, and probably swap file), and ADO designed to connect to DB files or server-based DBs.
If you believe that SQL is the only way, and your WHERE clause is quite simple then you can use an ADO Recordset built in functionality for filtering and sorting, without making connection. Do the following:
Get the value of the source range in XML format, fix field names.
Create XML DOM Document and load the XML string.
Create ADO Recordset and convert the document.
Make necessary filtering and sorting. Note, there is some limitations on filter criteria syntax.
Process the resulting recordset further, e. g. output to another worksheet.
There is an example of the code:
Option Explicit
Sub FilterSortRecordset()
Dim arrHead
Dim strXML As String
Dim i As Long
Dim objXMLDoc As Object
Dim objRecordSet As Object
Dim arrRows
' get source in XML format
With Sheets("Sheet1")
arrHead = Application.Index(.Range("A1:G1").Value, 1, 0)
strXML = .Range("A2:G92").Value(xlRangeValueMSPersistXML)
End With
' fix field names
For i = 1 To UBound(arrHead)
strXML = Replace(strXML, "rs:name=""Field" & i & """", "rs:name=""" & arrHead(i) & """", 1)
Next
' load source XML into XML DOM Document
Set objXMLDoc = CreateObject("MSXML2.DOMDocument")
objXMLDoc.LoadXML strXML
' convert the document to recordset
Set objRecordSet = CreateObject("ADODB.Recordset")
objRecordSet.Open objXMLDoc
' filtering and sorting
objRecordSet.Filter = "City='London' OR City='Paris'"
objRecordSet.Sort = "ContactName ASC"
' populate another sheet with resulting recordset
arrRows = Application.Transpose(objRecordSet.GetRows)
With Sheets("Sheet2")
.Cells.Delete
.Cells.NumberFormat = "#"
For i = 1 To objRecordSet.Fields.Count
.Cells(1, i).Value = objRecordSet.Fields(i - 1).Name
Next
.Cells(2, 1).Resize(UBound(arrRows, 1), UBound(arrRows, 2)).Value = arrRows
.Columns.AutoFit
End With
End Sub
The sourse data on Sheet1 is as follows:
Then I got the result on Sheet2:
I have two Excel files, a master file and a regular file. Both files have a sheet with the same data structure (same fields, but not formatted as tables).
What I'm trying to do is use VBA to create data connections to both files, and then use SQL to update the master file with any changes in the regular file. The reason for using SQL and a data connection is to avoid opening the regular file and hopefully faster performance overall.
I'm having difficulty with the UPDATE statement, and I'm not sure I'm even going about this in the best manner. My code thus far:
Sub Main()
Dim cnn1 As New ADODB.Connection
Dim cnn2 As New ADODB.Connection
Dim rst1 As New ADODB.Recordset
Dim rst2 As New ADODB.Recordset
Dim arrData() As Variant
Dim fPath As String
With Application.FileDialog(msoFileDialogFilePicker)
.AllowMultiSelect = False
.Filters.Add "Files", "*.xls*"
.InitialFileName = ThisWorkbook.Path & "/Meeting Cuts"
.Title = "Please select the file that contains the values you'd like to import."
.Show
fPath = .SelectedItems(1)
End With
DoEvents
cnn1.Open "Provider=Microsoft.ACE.OLEDB.12.0;" & "Data Source=" & CStr(fPath) & ";" & "Extended Properties=""Excel 12.0;HDR=Yes;"";"
cnn2.Open "Provider=Microsoft.ACE.OLEDB.12.0;" & "Data Source=" & CStr(ThisWorkbook.FullName) & ";" & "Extended Properties=""Excel 12.0;HDR=Yes;"";"
rst1.Open "SELECT * FROM [Sheet1$];", cnn1, adOpenStatic, adLockReadOnly
rst2.Open "UPDATE [Account Data$]" & _
"SET Account_ID = (SELECT Account_ID FROM " & rst1.GetRows & " WHERE Empl_Name = 'Smith,John')" & _
"WHERE Empl_Name = 'Smith,John'", cnn2, adOpenStatic, adLockReadOnly
Set rst1 = Nothing
Set rst2 = Nothing
Set cnn1 = Nothing
Set cnn2 = Nothing
End Sub
When I execute the code, I get a Run-time error '13': Type Mismatch on the rst2.Open line. When I go into debug mode and try to execute that line again, I get a different error: Run-time error '3021': Either BOF or EOF is True, or the current record has been deleted. Requested operation requires a current record.
I know that I'm using GetRows improperly. Is there a way to reference the sheet (from the regular file) somehow in the UPDATE statement? If so, how would I do it?
After this question: get value/charts in another workbooks without opening it
I have coded this:
Sub test()
Dim oConn As New ADODB.Connection
Dim rst As New ADODB.Recordset
oConn.Provider = "Microsoft.Jet.OLEDB.4.0"
oConn.Properties("Extended Properties").Value = "Excel 8.0"
oConn.Open "C:\Workbook1.xlsm"
rst.Open "SELECT * FROM [A1:A2];", oConn, adOpenStatic
rst.MoveFirst
MsgBox rst.Fields(0)
rst.Close
oConn.Close
End Sub
For the moment my goal is to get the value in the cell A1 of the sheet 1 of workbook1.xlsm.
I've encountered two problems.
When the workbook1 is not opened I got a
Run time error '-214767259 (80004005)': Automation error Unspecified Error on the line oConn.Open "C:\Workbook1.xlsm`
This is annoying because I want to work without opening the workbooks. It works well when the workbook is open.
Second problem: I can't manage to only get a single cell value. I've tried to input only [A1] in rst.open but it doesn't work. How can I get a unique cell value with its address ? with its name ?
If you don't mind I'll provide you a bit different attempt to get your data. The difference is the way you connect with you database (excel sheet). However, you could possibly incorporate some important elements into your code. So, check comments inside the code below.
Sub Closed_excel_workbook()
Dim myConnection As String
Dim myRecordset As ADODB.Recordset
Dim mySQL As String
'connection string parameters
'CHANGE PATH TO YOUR CLOSED WORKBOOK
myConnection = "Provider=Microsoft.ACE.OLEDB.12.0;" & _
"Data Source=" & ThisWorkbook.Path & "\Dane\BazaDanych.xlsx;" & _
"Extended Properties=Excel 12.0"
'here is important, YOU CAN'T MISS SHEET NAME
mySQL = "SELECT * FROM [ARKUSZ1$a1:a2]"
'different way of getting data from excel sheet
Set myRecordset = New ADODB.Recordset
myRecordset.Open mySQL, myConnection, adOpenUnspecified, adLockUnspecified
'let's clear sheet before pasting data
'REMOVE IF NOT NEEDED
ActiveSheet.Cells.Clear
'HERE WE PASTING DATA WE HAVE RETRIEVED
ActiveSheet.Range("A2").CopyFromRecordset myRecordset
'OPTIONAL, IF REQUIRED YOU CAN ADD COLUMNS NAMES
Dim cell As Range, i!
With ActiveSheet.Range("A1").CurrentRegion
For i = 0 To myRecordset.Fields.Count - 1
.Cells(1, i + 1).Value = myRecordset.Fields(i).Name
Next i
.EntireColumn.AutoFit
End With
End Sub
My solution:
Function GetValue()
Path = "C:\Path\"
File = "Doc.xlsm"
Sheet = "Sheet_name"
Ref = "D4"
'Retrieves a value from a closed workbook
Dim Arg As String
'Make sure the file exists
If Right(Path, 1) <> "\" Then Path = Path & "\"
If Dir(Path & File) = "" Then
GetValue = "File not Found"
Exit Function
End If
'Create the argument
Arg = "'" & Path & "[" & File & "]" & CStr(Sheet) & "'!" & Range(Ref).Range("A1").Address(, , xlR1C1)
'Check the value
MsgBox Arg
'Execute XML
GetValue = ExecuteExcel4Macro(Arg)
End Function
It has the advantage of not using complex adodb connection, but may be less powerfull.
I need to create an access (2007) table from a text file. I know ahead of time what columns should exist, but from time to time the vendors slip up and submit a text file that contains an incorrect number of columns. So I don't want to specify the columns in advance. I want to load all data as text into whatever columns exist. Then I will do QC.
The columns are pipe delimited and there are over 200 columns per record. There are no column headers, but there is one line of header text for the file, and one line at the end that states how many records there are. There may be anywhere from 1 to over 5,000 records in a text file. Records are identified with CRLF (windows).
Here is what I have so far, and it works (in that it reads the file and places the correct information in the recordset (columns and records), and I can count the number of records), except that the SELECT INTO gives me an error:
Sub OpenTextADO(strFileName As String, strPath As String)
Dim cn As ADODB.Connection
Dim rs As ADODB.Recordset
Dim fld As ADODB.Field
Dim recs As Integer
Dim strRecord As String
Dim strSQL As String
recs = 0
Set cn = New ADODB.Connection
If Right(strFileName, 3) = "txt" Then
'cn.Open "DRIVER={Microsoft Text Driver (*.txt; *.csv)};" & "DBQ=" & strPath & "\" 'need schema.ini file
cn.Open "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & strPath & "\;Extended Properties='text;HDR=No;FMT=Delimited(|)'" 'need schema.ini file
End If
Set rs = New ADODB.Recordset
rs.Open "SELECT * INTO TESTTEXT FROM [" & strFileName & "]", cn, adOpenStatic, adLockOptimistic, adCmdText
'Do Until rs.EOF
' For Each fld In rs.Fields
' strRecord = strRecord & "|" & fld.Value
' Next fld
' strRecord = strRecord & vbCr
' recs = recs + 1
' rs.MoveNext
'Loop
'Debug.Print strRecord
'recs = rs.RecordCount
rs.Close
Set rs = Nothing
MsgBox "Text was opened and there are " & recs & " records in the table."
cn.Close
Set cn = Nothing
End Sub
NOTE: I included both the OLEDB version and the text driver version - they both seem to operate identically. I also created a schema.ini file that looks like this:
[test.txt]
Format=Delimited(|)
ColNameHeader=False
Both drivers seem to need this to desregard column headers, despite the "HDR=No" in the OLEDB version.
The error I get is: "Cannot update. Database or object is read-only".
I appreciate any help.
Could you do a sequential read of the text file, using the count of pipe-delimited fields in the first data line of the file to create a table with the proper number of columns, then just write subsequent lines into that table? I just threw the following together, but it seems to work.
Public Function import_txt_to_db(strFile As String) As Boolean
On Error GoTo ErrHandle
Dim strLine As String
Dim intFileNum As Integer
Dim blnFirstLine As Boolean
blnFirstLine = True
Dim varArray As Variant
intFileNum = FreeFile
Open strFile For Input Access Read As intFileNum
Do While Not EOF(intFileNum)
Line Input #intFileNum, strLine
varArray = Split(strLine, "|")
If blnFirstLine = True Then
'Use count of fields in first line to determine # of columns to create
Dim intColCount As Integer
intColCount = UBound(varArray)
Dim strQry As String
strQry = "CREATE TABLE tblImport ("
Dim intCtr As Integer
For intCtr = 1 To intColCount + 1
strQry = strQry & "[COLUMN_" & intCtr & "] TEXT(255),"
Next intCtr
strQry = Left(strQry, Len(strQry) - 1) & ")" 'get rid of terminal comma
CurrentDb.Execute strQry
blnFirstLine = False
End If
Dim strQry2 As String
strQry2 = "INSERT INTO tblImport VALUES('" & Replace(strLine, "|", "','") & "')"
CurrentDb.Execute strQry2
Loop
Close #intFileNum
import_txt_to_db = True
Exit Function
ErrHandle:
import_txt_to_db = False
End Function
I did a simple test with the folowing five-line text file
Thomas|Jefferson|Virginia
Bill|Clinton|Arkansas
Jimmy|Carter|Georgia
Lyndon|Johnson|Texas
George|Washington|Virginia
After running the code, here's my (simple) table: