Reading Excel file with SQL returns null due to empty column - sql

I am attempting to read in an Excel File from a third party but it falls over and returns null due to empty column.
The file has 50 columns, a blank column and a further 48 columns. I only want the first 50 columns. If I delete or populate the empty column in Excel it works but I need to read 50+ files one after the other, regularly, so opening/deleting/saving isn't an option.
Function read_excel_file(StrPath, StrFile, ByRef TotalFileArray() As Variant)
Dim ReadFileArray() As Variant
Dim connectionString As String
connectionString = _
"Provider=Microsoft.ACE.OLEDB.12.0;" & _
"Data Source=""" & StrPath & StrFile & """;" & _
"Extended Properties=""Excel 12.0;HDR=Yes;"""
'This assumes the Excel file contains column headers -- HDR=Yes
'Routine to get unknown sheet name
Set conn = CreateObject("ADODB.Connection")
conn.connectionString = "Provider=Microsoft.ACE.OLEDB.12.0;" & _
"Data Source=""" & StrPath & StrFile & """;" & _
"Extended Properties=""Excel 12.0;HDR=No"""
conn.Open
Set bs = conn.OpenSchema(20) ' 20 = adSchemaTables
Do Until bs.EOF = True
'Debug.Print bs.Fields!Table_Name.Value
SheetName = bs.Fields!Table_Name.Value
bs.MoveNext
Loop
bs.Close: conn.Close
Set bs = Nothing
Set conn = Nothing
Dim sql As String
'sql = "SELECT Meter Id, Date, 00:30, 01:00, 01:30, 02:00, 02:30, 03:00, 03:30, 04:00, 04:30, 05:00, 05:30, 06:00, 06:30, 07:00, 07:30, 08:00, 08:30, 09:00, 09:30, 10:00, 10:30, 11:00, 11:30, 12:00, 12:30, 13:00, 13:30, 14:00, 14:30, 15:00, 15:30, 16:00, 16:30, 17:00, 17:30, 18:00, 18:30, 19:00, 19:30, 20:00, 20:30, 21:00, 21:30, 22:00, 22:30, 23:00, 23:30, 24:00:00 FROM [" + SheetName + "]" '
sql = "SELECT * FROM [" + SheetName + "]" '
'Dim rs As Object 'New adodb.Recordset
'Set rs = CreateObject("ADODB.Connection")
Dim rs As New ADODB.Recordset
rs.Open sql, connectionString
ReadFileArray() = rs.GetRows 'Puts the data from the recordset into an array
rs.Close
Set rs = Nothing
I did try using specific requests for the columns I wanted but this still returned null. I attempted a single column but again returned null.
Is there a way around the empty column? Even if reading the file in a different way. (I nearly always read in csv files which are a breeze to get into an array and deal with.)
Picture of source data headers

Given that you have no choice but to work with the files in the format provided, the best approach/workaround (rather than manually opening/deleting/saving/closing the files individually) would appear to be to pre-process them using VBA to delete the offending column, before then running your query.

If Excel worksheet contains columns in first row and you specify HDR=Yes, then your first SQL statement would work if you properly enclosed the column names with brackets since they contain special characters like spaces and start with numbers.
' OPEN CONNECTION WITH HEADERS INDICATED
connectionString = _
"Provider=Microsoft.ACE.OLEDB.12.0;" & _
"Data Source=""" & StrPath & StrFile & """;" & _
"Extended Properties=""Excel 12.0;HDR=Yes;"""
conn.Open connectionString
' RUN SQL WITH COLUMNS SPECIFIED
sql = "SELECT [Meter Id], [Date]" _
& " , [00:30], [01:00], ..., [23:30], [24:00:00]" _
& " FROM [" + SheetName + "]"
rs.Open sql, connectionString
...
In fact, your original attempt should have raised an error in attempting to run. Also, the second SQL with SELECT * FROM should raise an error with empty column since one of the column names is zero length which is an invalid identifier. Therefore, you need to explicitly select columns.
After deeper review, it appears your time columns may be in Custom format as time values which really are formatted Excel decimal values. Hence, the SQL engine cannot find [00:30] column. To fix, convert time value columns into text format.
Consider below subroutine to automate the conversion with Text(). You can then call this VBA sub or function from other main method on each worksheet before any SQL query. Do note there is no such time value as 24:00:00 so this may be already in text format.
Sub Clean_Time_Columns()
Dim i As Integer, lastcol As Integer
Dim tmp_time As Variant
With ThisWorkbook.Worksheets("DATA")
For i = 3 To 49 ' LEAVE OUT 24:00:00
tmp_time = .Cells(1, i).Value
.Cells(1, i).NumberFormat = "#"
.Cells(1, i).Value = Application.WorksheetFunction.Text(tmp_time, "HH:MM")
Next i
End With
End Sub

Related

Push Excel Range to SQL Table via VBA

I am in need of pushing a range in Excel to a new row in an SQL table each time an associate executes a VBA macro. So far, I have segregated the data into a single row and multiple columns (110 cells of data in total) in Excel. My problem right now is stemming from how to insert each one of these individual cells from the Excel sheet into the corresponding column and the first empty row in an SQL table. I've done some pretty extensive searches of the internet and have found nothing remotely close to what I am trying to do.
Is there a correct procedure that would allow me to dump a 110-column row into the first empty row in an SQL table?
I have the tables written and I have the range set:
Set DataBaseData = ActiveSheet.Range("DO1:HT1")
Beyond this I have no idea in which manner to open a connection with the Server, Database and Table. This is what I've winged so far:
Sub Connection()
Dim Conn As ADODB.Connection
Dim Command As ADODB.Command
Set Conn = New ADODB.Connection
Set Command = New ADODB.Command
Dim i As Integer
Dim columnnumber As Integer
i = 0
Conn.ConnectionString = "Provider=SQLOLEDB; Data Source=[Nope];Initial Catalog=[NopeNope];User ID=[NopeNopeNope];Password=[AbsolutelyNot]; Trusted_Connection=no;"
Conn.Open
Command.ActiveConnection = Conn
End Sub
Any help would be greatly appreciated.
If you have the curiosity as to what I'm trying to do: I'm pushing a series of data from a CMM to the Database so I can store the data for the needed amount of time, and call that data back to PowerBI and Minitab.
I was able to successfully write an entire row from Excel to an SQL Database using the following:
Sub Connection()
Const Tbl As String = "NEIN"
Dim InsertQuery As String, xlRow As Long, xlCol As Integer
Dim DBconnection As Object
Set DBconnection = CreateObject("ADODB.Connection")
DBconnection.Open "Provider=SQLOLEDB.1;Password=NEIN" & _
";Persist Security Info=false;User ID=NEIN" & _
";Initial Catalog=NEIN;Data Source=NEIN"
InsertQuery = ""
xlRow = 1 'only one row being used *as of now*, and that is the top row in the excel sheet
xlCol = 119 'First column of data
While Cells(xlRow, xlCol) <> ""
InsertQuery = InsertQuery & "INSERT INTO " & Tbl & " VALUES('"
For xlCol = 119 To 229 'columns DO1 to HT1
InsertQuery = InsertQuery & Replace(Cells(xlRow, xlCol), "'", "''") & "', '" 'Includes mitigation for apostrophes in the data
Next xlCol
InsertQuery = InsertQuery & Format(Now(), "M/D/YYYY") & "')" & vbCrLf 'The last column is a date stamp, either way, don't forget to close that parenthesis
Wend
DBconnection.Execute InsertQuery
DBconnection.Close
Set DBconnection = Nothing
End Sub

Executing SQL query stored in an excel cell and then transfer the result set into a .txt file

I have a SQL query stored in one of my excel sheet cells that I execute using the below VBA Code:
Sub run()
Dim dtStart As Date
Dim dtEnd As Date
Dim MRC As Variant
'Get the SQL text(s)
MRC = "" & Worksheets("SQL Text").Range("D4").Value & ""
'Check for UNDF queries
If MRC = 0 Then
MsgBox ("Query has not yet been defined, please make a new selection")
Exit Sub
Else
End If
'Set up query
Application.StatusBar = "Data Refresh: 1 of 1 "
'Update subTabs
Sheets("Summary").Select
With ActiveWorkbook.Connections("connection1").OLEDBConnection
.CommandText = MRC
.CommandType = xlCmdSql
End With
ActiveWorkbook.Connections("connection1").Refresh
End Sub
In addition to the above VBA code, I also have another VBA code that executes the different SQL view and transfers the SQL view result-set to a .txt file and saves it into a specific folder. Please see below for that code
Sub TEXT()
Dim cn As New ADODB.Connection
Dim rs As New ADODB.Recordset
Dim strCon, strSQL As String
strCon = "Provider=SQLOLEDB.1;" & _
"Integrated Security=SSPI;Persist Security Info=True;" & _
"Initial Catalog=master;Data Source=VRSQLADHOC;" & _
"Use Procedure for Prepare=1;" & _
"Auto Translate=True;" & _
"Packet Size=4096;" & _
"Use Encryption for Data=False;" & _
"Tag with column collation when possible=False"
strSQL = "SELECT * FROM table1" 'Sql Query
Folder = "U:\" 'Path in U drive
Filename = "file_name_" & Format(Now(), "YYYYMMDD") & ".txt" 'Name of Text document
fpath = Folder & Filename
cn.Open strCon
rs.ActiveConnection = cn
rs.Open strSQL
Set fs = CreateObject("Scripting.FileSystemObject")
Set A = fs.CreateTextFile(fpath)
A.Write (rs.GetString(adClipString, , , vbCrLf, ""))
rs.Close
cn.Close
Set cn = Nothing
MsgBox ("file name " + fpath)
End Sub
Currently I am interested in applying the second VBA logic of transferring the result-set data into .txt file to my first VBA logic, which takes care of executing the SQL query stored in one of the excel sheet cells.
To put it shortly I want to execute a SQL query stored in an excel cell and then transfer the result set into a .txt file
If I'm understanding this right, both these subs use really different ways do query the database.
In the first example you've got a db connection in a sheet, and you're using a cell to update that connection.
In the second, you've got a vba /adodb connection to the database. If all you want to do is refer to a cell in excel for the query, then simply change the line:
strSQL = "SELECT * FROM table1" 'Sql Query
to something more like:
strSQL = Worksheets("SQL Text").Range("D4").Value

VBA Copy Whole Column From SQL Recordset

I have a fixed data set of 6 rows being queried in SQL. I want to transfer the data in this query into Excel. My question is in two parts:
1 - Can I pull back multiple columns in one go or do I have to do it column by column? I have 17 columns and 6 rows. As this is transferred into my Excel template I'm looking to break this up into thirds so columns 1 to 7 go in one space 8 to 11 in another section and 12 to 17 in another section in Excel.
2 - When using my current code of:
Range("F2").Value = rst.Fields("ACCOUNT")
I only pull back a single row, is it possible to get the whole column (all 6 rows) or do I have to loop all 6 rows to get the full column?
Thanks in advance
Use .GetRows() method, it allows to set a number of records to retrieve, first record to begin, and single field name or ordinal position, or an array of field names or ordinal position numbers.
The below example shows how to get data from external Excel workbook into recordset, retrieve specified fields into array, and paste result array to the worksheet.
Option Explicit
Sub Test()
Dim sConnection As String
Dim sQuery As String
Dim oConnection As Object
Dim oRecordset As Object
Dim aData()
sConnection = _
"Provider=Microsoft.ACE.OLEDB.12.0;" & _
"User ID=Admin;" & _
"Data Source='" & ThisWorkbook.FullName & "';" & _
"Mode=Read;" & _
"Extended Properties=""Excel 12.0 Macro;"";"
sQuery = _
"SELECT * FROM [Sheet1$] " & _
"IN '" & ThisWorkbook.Path & "\Src1.xlsx' " & _
"[Excel 12.0;Provider=Microsoft.ACE.OLEDB.12.0;Mode=Read;Extended Properties='HDR=YES;'] " & _
"WHERE Country='UK';"
Set oConnection = CreateObject("ADODB.Connection")
oConnection.Open sConnection
Set oRecordset = oConnection.Execute(sQuery)
oRecordset.MoveFirst
aData = oRecordset.GetRows(, , Array("CustomerID", "ContactName"))
With ThisWorkbook.Sheets(1)
.Cells.Delete
Output2DArray .Cells(1, 1), WorksheetFunction.Transpose(aData)
.Cells.EntireColumn.AutoFit
End With
oConnection.Close
End Sub
Sub Output2DArray(oDstRng As Range, aCells As Variant)
With oDstRng
.Parent.Select
With .Resize( _
UBound(aCells, 1) - LBound(aCells, 1) + 1, _
UBound(aCells, 2) - LBound(aCells, 2) + 1)
.NumberFormat = "#"
.Value = aCells
End With
End With
End Sub
Also there is Src1.xlsx workbook containing Customers as data source in the same folder as this workbook:
The resulting worksheet is as follows, you can see there are CustomerID and ContactName fields only:

Extracting data from over a million records

I have an Excel file in which I have set up a connection with an Access database. In the Excel file I have a list of names in column A, and I want to search these names in the Access database and return back two fields from that database. I need to do this for around 200-300 names.
Here is my code:
N = Cells(Rows.Count, "A").End(xlUp).Row
Application.DisplayAlerts = False
strDB = ThisWorkbook.Path & "file.accdb"
Set objConnection = New ADODB.Connection
objConnection.Open "Provider=Microsoft.ACE.OLEDB.12.0; Data Source=" & strDB
For i = 2 To N
Dim rstTable As ADODB.Recordset
Set rstTable = New ADODB.Recordset
lookup = Range("A" & i).Value
strSQL = "SELECT NAME1,NAME2 FROM DATA WHERE [Field2]= """ & lookup & """;"
'Store query output
rstTable.Open Source:=strSQL, ActiveConnection:=objConnection
'Paste results to Transactions sheet
Worksheets("Sheet1").Range("B" & i).CopyFromRecordset rstTable
'Close the record set & connection
rstTable.Close
objConnection.Close
Next i
This works (kindof) but it takes an extremely long time and randomly crashes. Any ideas how to improve this?
Making sure there is a key on the lookup field will help. I would suggest making a copy of the workbook and test external data from Access or MS Query to see if that gives a performance gain over VBA.
When using MS Query or data from Access, you can modify the command text in the connection properties and use ? in the where clause to specify the parameter in the worksheet (so you don't lose that functionality).
I modified your SQL statement. Replace the Where [Field2] = "xxx" by Where [Field2] IN ("xxx", "yyy", "zzz").
N = Cells(Rows.Count, "A").End(xlUp).Row
Application.DisplayAlerts = False
strDB = ThisWorkbook.Path & "file.accdb"
Set objConnection = New ADODB.Connection
objConnection.Open "Provider=Microsoft.ACE.OLEDB.12.0; Data Source=" & strDB
For i = 2 To N
lookup = lookup & "'" & Range("A" & i).Value & "', "
Next i
lookup = left(lookup, len(lookup) - 2)
Dim rstTable As ADODB.Recordset
Set rstTable = New ADODB.Recordset
strSQL = "SELECT NAME1,NAME2 FROM DATA WHERE [Field2] IN (" & lookup & ");"
'Store query output
rstTable.Open Source:=strSQL, ActiveConnection:=objConnection
'Paste results to Transactions sheet
Worksheets("Sheet1").Range("B" & i).CopyFromRecordset rstTable
'Close the record set & connection
rstTable.Close
objConnection.Close
You close the connection after the first iteration, so your next iteration -- which does not have code to open the connection -- would fail. So you should move the objConnection.Close out of the loop.
But, even then, to execute the same kind of query over and over again, just with a different argument, can be done in one go, using the IN (...) syntax:
' Declare all your variables
Dim N As Long
Dim strDB As String
Dim objConnection As ADODB.Connection
Dim rstTable As ADODB.Recordset
Dim strSQL As String
N = Cells(Rows.Count, "A").End(xlUp).Row
Application.DisplayAlerts = False
strDB = ThisWorkbook.Path & "file.accdb"
Set objConnection = New ADODB.Connection
objConnection.Open "Provider=Microsoft.ACE.OLEDB.12.0; Data Source=" & strDB
' collect the values in comma-separated string
lookup = ""
For i = 2 To N
lookup = lookup & ",""" & Range("A" & i).Value & """"
Next i
' Chop off the first comma
lookup = Mid(lookup, 2)
' Perform a single query, but also select the Field2 value
Set rstTable = New ADODB.Recordset
strSQL = "SELECT Field2, NAME1,NAME2 FROM DATA WHERE [Field2] IN (" & lookup & ");"
' query output
rstTable.Open Source:=strSQL, ActiveConnection:=objConnection
' Retrieve values
While Not rstTable.EOF
lookup = rstTable.Fields(0).Value
' Locate in which row to put the result
For i = 2 To N
If lookup = Range("A" & i).Value Then
Range("B" & i).Value = rstTable.Fields(1).Value
Range("C" & i).Value = rstTable.Fields(2).Value
End If
Next i
rstTable.MoveNext
Loop
' Close the record set & connection
rstTable.Close
objConnection.Close
You can do what you described, but I think it's far more efficient to do this in Access itself. Just create a table with your names and do an Inner Join to the table you want to find 2 fields. Should take less than a minute, and probably less than 30 seconds.

UNPIVOT columns using SQL query

I am pulling another workbook containing a table that has Columns like ItemCreationDate, and most columns (total 28 such columns) that begin with the word "Global" in them. i want to
pull these "Global" columns (including ItemCreationDate) into an SQL recordset,
add a new Column called Old/New and then
UNPIVOT them i.e. Stack them up one above the other and
in the next column, list their Column items along with their Counts.
Their Counts are derived based on the ItemCreationDate where any date <2015 is OLD & >=2015 is NEW
Final Output should be as shown in Output sheet.
i have attached a Sample.xlsx file where i have shown how i have to arrive at the Output tab starting from the Data sheet. This Data sheet is actually the input that i want to pull into a recordset and spit out the table as shown in the Output sheet. I don't want to create Pivot table as its cumbersome and the data is quite a lot, and i want an alternative SQL approach, wherein i can quickly aggregate the data and insert it into sheet in one go.
i am not using SQL Server, so cannot use the UNPIVOT command or Dynamic SQL to loop thru all the "Global" columns.
Basically i want to form a correct SQL string something like....
Dim arrSQL as variant
......
......
RS.Filter="Like Global*"
......
arrSQL = JOIN(RS.Fields, vbCr)
strSQL = "SELECT [arrSQL], IIF(YEAR([ITEM CREATION DATE])>=YEAR(DATE())-1,""NEW"",""OLD"") AS [New/Old] from [Data$] GROUP BY...."
strSQL = strSQL & " UNION ALL " & vbcr & _
strSQL = strSQL & " ......
Now, run SQL on same recordset to reduce the columns and get required data format....
i know the above is not quite correct, but something on those lines so that i can get the correct output as show in the Output tab.
can anyone help quickly?
Edits for #a_horse_with_no_name :
see the screenshots of the sample file:
Data sheet:
this is actually a table from an input workbook that i want to pull in a Recordset. See the various "Global" column headings and their items that i want to unpivot.
This are the 2 intermediate sheets "New" & "Old" i have to create everytime (that i want to get rid of actually). any items found in 2015 or later are put in New, whereas rest are put in Old.
JFYI, the formulae that are manually used in Output columns are :
C column (New):
=COUNTIF(INDEX(New!$A:$D,0,MATCH($A2,New!$1:$1,0)),Output!$B2)
D Column (Old):
=COUNTIF(INDEX(Old!$A:$D,0,MATCH($A2,Old!$1:$1,0)),Output!$B2)
E Column (% New):
=Output!C2/SUM(C$2:C$6)
F Column (% Old):
=Output!D2/SUM(D$2:D$6)
G Column (Index):
=IF(AND(E2<=0,F2<=0),0,IF(AND(E2>0,F2>0),E2/F2,1))
Hope this helps.
Indeed, you can run SQL queries in MS Excel using the Jet/ACE SQL Engine (a Windows .dll file) which is the very data store that MS Access is connected to by default. And as such, this technology equipped on all PCs is not restricted to any one Office/Windows program.
Consider the following Excel VBA macro (if using Excel on PC) that connects to ACE via ADO running a union of three aggregate SQL queries (GLOBAL VIT/CALC, GLOBAL FLAVOURS, GLOBAL FLAVOUR GROUP) and conditional New and Old counts/percentages. The latter percentage column pair required subqueries.
For proper setup, do the following:
Make sure Item Creation Date is in MM-DD-YYYY (US-based) or DD-MM-YYYY (non-US based) date formats which is not how above screenshot or file has date field currently formatted.
Sub FormatDates()
For i = 2 To 2083
Range("A" & i) = CDate(Range("A" & i))
Next i
End Sub
Run macro in a different workbook than the one holding the data. Below assumes data workbook holds source information in worksheet named Data.
In query-running workbook, create a blank worksheet named RESULTS which will be populated with query output including column headers.
VBA Script (two connections available Driver (commented out) and Provider versions)
Option Explicit
Sub RunSQL()
Dim cols As Object, datawbk As Workbook, datawks As Worksheet
Dim lastcol As Integer, i As Integer, j As Variant, output As Variant
Set cols = CreateObject("Scripting.Dictionary")
Set datawbk = Workbooks.Open("C:\Path\To\Data\Workbook.xlsx;")
Set datawks = datawbk.Worksheets("Data")
lastcol = datawks.Cells(7, datawks.Columns.Count).End(xlToLeft).Column
For i = 2 To lastcol
cols.Add CStr(i - 1), datawks.Cells(1, i).Value
Next i
datawbk.Close False
Set datawks = Nothing
Set datawbk = Nothing
output = DataCapture(cols)
End Sub
Function DataCapture(datacols As Object)
On Error GoTo ErrHandle
Dim conn As Object, rst As Object
Dim strConnection As String
Dim classSQL As String, itemSQL As String, grpSQL As String, strSQL As String
Dim i As Integer, fld As Object, d As Variant, lastrow As Integer
Set conn = CreateObject("ADODB.Connection")
Set rst = CreateObject("ADODB.Recordset")
' Hard code database location and name '
' strConnection = "DRIVER={Microsoft Excel Driver (*.xls, *.xlsx, *.xlsm, *.xlsb)};" _
' & "DBQ=C:\Path\To\Data\Workbook.xlsx;"
strConnection = "Provider=Microsoft.ACE.OLEDB.12.0;" _
& "Data Source='C:\Path\To\Data\Workbook.xlsx;" _
& "Extended Properties=""Excel 12.0 XML;HDR=YES IMEX=1;"";"
' OPEN DB CONNECTION '
conn.Open strConnection
For Each d In datacols.keys
strSQL = " SELECT '" & datacols(d) & "' AS [COLUMN], [Data$].[" & datacols(d) & "] AS ITEMS," _
& " SUM(IIF(Year([Item Creation Date]) >= Year(Date()) - 1, 1, 0)) AS NEW," _
& " " _
& " SUM(IIF(Year([Item Creation Date]) < Year(Date()) - 1, 1, 0)) AS OLD," _
& " " _
& " ROUND(SUM(IIF(Year([Item Creation Date]) >= Year(Date()) - 1, 1, 0)) / " _
& " (SELECT Count(*) FROM [Data$] AS sub" _
& " WHERE Year(sub.[Item Creation Date]) >= Year(Date()) - 1),2) AS NEWPCT," _
& " " _
& " ROUND(SUM(IIF(Year([Item Creation Date]) < Year(Date()) - 1, 1, 0)) / " _
& " (SELECT Count(*) FROM [Data$] AS sub" _
& " WHERE Year(sub.[Item Creation Date]) < Year(Date()) - 1),2) AS OLDPCT" _
& " FROM [Data$]" _
& " GROUP BY [Data$].[" & datacols(d) & "]"
' OPEN RECORDSET '
rst.Open strSQL, conn
' COLUMN HEADERS '
If d = 1 Then
i = 0
Worksheets("RESULTS").Range("A1").Activate
For Each fld In rst.Fields
ActiveCell.Offset(0, i) = fld.Name
i = i + 1
Next fld
End If
' DATA ROWS '
lastrow = Worksheets("RESULTS").Cells(Worksheets("RESULTS").Rows.Count, "A").End(xlUp).Row
Worksheets("RESULTS").Range("A" & lastrow + 1).CopyFromRecordset rst
rst.Close
Next d
conn.Close
MsgBox "Successfully processed SQL query!", vbInformation
Exit Function
ErrHandle:
MsgBox Err.Number & " - " & Err.Description, vbCritical
Exit Function
End Function
Output