Can I parallelize a SQL query in Excel VBA? - vba

I have an Excel workbook that connects to a PostgreS db via ODBC.
Using VBA it executes 27 SQL queries one by one and copies each resulting set to
a different worksheet.
I am content with the data I get, but the performance is mediocre. The database should have plenty of resources.
Can I parallelize/multi-thread the SQL queries? I have read that parallelization is not possible in VBA, per se.

Following #JNevills advice I concatenated the 27 queries in one mega-query
separated by " ; " with a total weight of 54.778 characters.
Execution time:
27 queries sequentially: 45 seconds
1 mega-query: 30 seconds
I learned, that I work on a well seasoned db-version - namely PostgreS 
9.3.11.
Even if I had more cores this version coudn't be bothered to use them.
If I execute the mega-query in Squirrel it takes about 20 seconds.
The performance might be better with an up-to-date database.
Sketch of the solution
rs.Open sqlQuery, conn
for i = 1 to 27
  call writeRecordSetToSheet(rs, sheetname)
  Set rs = rs.NextRecordset  '<- magic line for accessing the other 26 recordsets
next i

Related

How to use a SQL Select statement with Power Query against an Access database?

I've got a query that joins 4 tables that I need to run against 4 different Access .mdb files (all have the same schema) so I can compare the results in Excel. Instead of creating 16 Power Queries and joining them into 4 queries (20 total query objects) I want to write a SQL statement that joins the tables and run it against each of the 4 different data sources. There's a chance that the SQL statement may need to be updated, so having it stored in one place will make future maintenance easier.
I could not find examples of this online and the way that Power Query writes M for an Access connection is based on one table at a time. I did not want a solution that used VBA.
Poking around with the various Power Query connectors I found that I can use the ODBC connector to connect to an Access database. I was able to adjust the parameters and pass it a standard SQL statement.
I put the SQL statement in a cell (C16 in the image) and named that range Package_SQL. I also have 4 cells where I put the path and filename of the 4 Access .mdb files I want to query. I name those ranges Database1 through Database4.
This is the configuration screen to set the database paths and set the SQL statement
let
// Get the Access database to work with.
dbPath = Excel.CurrentWorkbook(){[Name="Database1"]}[Content]{0}[Column1],
// Get the SQL statement from the named range
SQL = Excel.CurrentWorkbook(){[Name="Package_SQL"]}[Content]{0}[Column1],
Source = Odbc.Query("dbq=" & dbPath & "; defaultdir=C:\Temp;driverid=25;
fil=MS Access;maxbuffersize=2048;pagetimeout=5;dsn=MS Access Database", SQL),
#"Changed Type" = Table.TransformColumnTypes(Source,
{{"Issue_Date", type date}, {"Revision_Issue_Date", type date}})
in
#"Changed Type"
As you can see the magic is done in the following line. I didn't want the defaultdir to be hard coded to a folder that everyone may not have so I set it to C:\Temp. You may need to change it or even remove it and see if it makes a difference.
Source = Odbc.Query("dbq=" & dbPath & "; defaultdir=C:\Temp; driverid=25;
fil=MS Access;maxbuffersize=2048; pagetimeout=5; dsn=MS Access Database", SQL),
I made 4 instances of that query and created another query to combine the results. The query runs as fast as most any other Access query. I am very satisfied with this solution. The query can be altered and/or repurposed from the Excel sheet without digging through the Power Query scripts.
Note that this solution does not use any VBA.

Speed up Python executemany

I'm inserting data from one database to another, so I have 2 connections (Conn1 and Conn2). Below is the code (using pypyodbc).
import pypyodbc
Conn1_Query = "SELECT column FROM Table"
Conn1_Cursor.execute(Conn1_Query)
Conn1_Data = Conn1_Cursor.fetchall()
Conn1_array = []
for row in Conn1_Data:
Conn1_array.append(row)
The above part runs very quickly.
stmt = "INSERT INTO TABLE(column) values (?)"
Conn2_Cursor.executemany(stmt, Conn1_array)
Conn2.commit()
This part is extremely slow. I've also tried to do a for loop to insert each row at a time using cursor.execute, but that is also very slow. What am I doing wrong and is there anything I can do to speed it up? Thanks for taking a look.
Thought I should also add that the Conn1 data is only ~50k rows. I also have some more setup code at the beginning that I didn't include because it's not pertinent to the question. It takes about 15 minutes to insert. As a comparison, it takes about 25 seconds to write the output to a csv file.
Yes, executemany under pypyodbc sends separate INSERT statements for each row. It acts just the same as making individual execute calls in a loop. Given that pypyodbc is no longer under active development, that is unlikely to change.
However, if you are using a compatible driver like "ODBC Driver xx for SQL Server" and you switch to pyodbc then you can use its fast_executemany option to speed up the inserts significantly. See this answer for more details.

Resolving an ADO timeout issue in VB6

I am running into an issue when populating an ADO recordset in VB6. The query (hitting SQLServer 2008) only takes about 1 second to run when I run it using SSMS. It works fine when the result set is small, but when it gets to be a few hundred records it takes a long time. 800+ records requires about 5 minutes to return (query still only takes 1 second in SSMS), and 6000+ takes well over 20 minutes. I have "fixed" the exception by increasing the command timeout, but I was wondering if there was a way to get it to work faster since it does not seem to be the actual query that requires so much time. Something such as compressing the results so it doesn't take as long. The recordset is opened as follows:
myConnection.CommandTimeout = 2000
myConnection.ConnectionString = "Provider=SQLOLEDB;" & _
"Initial Catalog=DB_NAME;" & _
"Data Source=SERVER_NAME" & _
"Network Library=DBMSSOCN;" & _
"User ID=USER_NAME;" & _
"Password=PASSWORD;" & _
"Use Encryption for Data=True;"
myConnection.Open
myRecordSet.Open STORED_PROC_QUERY_STRING, myConnection, adOpenStatic, adLockReadOnly
Set myRecordSet.ActiveConnection = Nothing
myConnection.Close
The data returns 3 columns used to fill a combo box.
UPDATE:
I ran SQL Profiler, and the instances from the client machine make more reads and takes more time by a factor of 100 than both metrics for the queries in SSMS. The text of the query is the same for both SSMS and the client machine according to the profiler, so I don't think it should be using a different execution plan. Could the network library or the Provider have any impact on this?
Profiler stats:
From the client application: 7041720
reads, 59458 ms duration, 3900 row
counts
From SSMS: 30802 reads, 238 ms
duration, 3900 row counts
It seems like it is using a different execution plan, but the query is exactly the same and I am not sure how to check the execution plan the client might be using if it is different from what is shown in SSMS.
800+ records requires about 5 minutes = query problem.
look at your execution plan:
In SSMS, run:
SET SHOWPLAN_ALL ON
then run your query, it will not produce the expected result set, but an exceution plan on how the database is retrieving your data. Most bad queries usually table scan (look at every row in the table, which is slow), so look for the word "SCAN" in the StmtText column. Try to figure out why the index is not being used on that table (name will be in there by the word "SCAN"). If you join in multiple tables and have multiple SCANs concentrate on the largest tables first.
Without more info this is the best "generic" help you can get.
EDIT
From reading your question, I'm not sure if you mean it is always fast from SSMS no matter the rows, but slow from VB as the rows increase. If that is the case check this: http://www.google.com/search?q=sql+server+fast+from+ssms+slow+from+application&hl=en&num=100&lr=&ft=i&cr=&safe=images
could be something like: parameter sniffing or inconsistent connection parameters (ANSI nulls, arithabort, etc)
for the connection settings, try running these from SSMS and from VB6 (add them to the result set) and see if there are any differences:
SELECT SESSIONPROPERTY ('ANSI_NULLS') --Specifies whether the SQL-92 compliant behavior of equals (=) and not equal to (<>) against null values is applied.
--1 = ON
--0 = OFF
SELECT SESSIONPROPERTY ('ANSI_PADDING') --Controls the way the column stores values shorter than the defined size of the column, and the way the column stores values that have trailing blanks in character and binary data.
--1 = ON
--0 = OFF
SELECT SESSIONPROPERTY ('ANSI_WARNINGS') --Specifies whether the SQL-92 standard behavior of raising error messages or warnings for certain conditions, including divide-by-zero and arithmetic overflow, is applied.
--1 = ON
--0 = OFF
SELECT SESSIONPROPERTY ('ARITHABORT') -- Determines whether a query is ended when an overflow or a divide-by-zero error occurs during query execution.
--1 = ON
--0 = OFF
SELECT SESSIONPROPERTY ('CONCAT_NULL_YIELDS_NULL') --Controls whether concatenation results are treated as null or empty string values.
--1 = ON
--0 = OFF
SELECT SESSIONPROPERTY ('NUMERIC_ROUNDABORT') --Specifies whether error messages and warnings are generated when rounding in an expression causes a loss of precision.
--1 = ON
--0 = OFF
SELECT SESSIONPROPERTY ('QUOTED_IDENTIFIER') --Specifies whether SQL-92 rules about how to use quotation marks to delimit identifiers and literal strings are to be followed.
--1 = ON
--0 = OFF
make your query like (so you can see the connection settings in VB6):
SELECT
col1, col2
,SESSIONPROPERTY ('ARITHABORT') AS ARITHABORT
,SESSIONPROPERTY ('ANSI_WARNINGS') AS ANSI_WARNINGS
FROM ...

SQL bottleneck, how to fix

This is related to my previous thread: SQL Query takes about 10 - 20 minutes
However, I kinda figured out the problem. The problem (as described in the previous thread) is not the insert (while its still slow), the problem is looping through the data itself
Consider the following code:
Dim rs As DAO.Recordset
Dim sngStart As Single, sngEnd As Single
Dim sngElapsed As Single
Set rs = CurrentDb().QueryDefs("select-all").OpenRecordset
MsgBox "All records retreived"
sngStart = Timer
Do While Not rs.EOF
rs.MoveNext
Loop
sngEnd = Timer
sngElapsed = Format(sngEnd - sngStart, "Fixed") ' Elapsed time.
MsgBox ("The query took " & sngElapsed _
& " seconds to run.")
As you can see, this loop does NOTHING. You'd expect it to finish in seconds, however it takes about 857 seconds to run (or 15 minutes). I dont know why it is so slow. Maybe the lotusnotes sql driver?
any other ideas? (java based solution, any other solution)
What my goal is: To get all the data from remote server and insert into local access table
This document has some information about performance tuning in NotesSQL. If you aren't already, select your data from Notes Views instead of Notes Forms. NotesSQL will then leverage the indexes within the views for faster queries. You may need to create the view in the Notes database, but the performance benefit will make it worthwhile.
My recommendation is that you create a Pass-Through query that will get the data from the remote server. Then create a Make Table query that uses the aforementioned query as its source. Your function then would be simplified to a call to this second query.
The loop isn't doing "nothing" it's calling MoveNext, which is potentially doing A LOT.

Is there a size limit for the SQL text in a PeopleSoft App Engine SQL Step/Action?

I'm getting the following error: AeSymResolveStatement [775] ... Meta-SQL error at or near position 34338 in statement (108,512). The SQL statement itself is over 40,000 chars long, hence the question.
The DB is oracle. Running on Tools 8.49.24.
I know that there is a limit on the size of the SQL used in an Application Engine (SQL Step). I had once recieved a similar error while trying to use an exceptionally long SQL in an Application Engine.
I wouldn't be surprised if that same limit applies to SQL Objects.
To fix the problem, I was able to split the SQL into 2 (was an update statement). Hopefully that's possible in your case as well.
There is no such limit.
You can confirm this yourself by creating an SQL like:
select 'x' from PS_INSTALLATION where
1 = 1 and
1 = 1 and
1 = 1 and
1 = 1 and
/* ... copy paste '1 = 1 and' 90000 times or so times more */
1 = 1
Although it makes pside quite slow, It saves and validates just fine.
There are limits within PeopleCode, mostly due to the limits on string length, however I have never found a limit on stored SQL statements.
Personally I'd look at breaking the statement into pieces in some way.
You could:
Using the inbuilt looping mechanism of App Engines
Use a mixture of SQL and PeopleCode
Use a temporary table and perform intermediate SQLs, storing in the temp table
Apart from giving your database a heart seizure, not the mention the DBA when he sees the statement in the SQL monitor. You are saving yourself a world of pain if you ever have to look at the statement again.
I think the SQLs in App Engines are stored as longs, so it would be 4GB under Oracle, something similarly huge under DB2.