Excel VBA, how to do multiple database entries - sql

I need to populate a database with thousands of entries on a daily basis, but my code at the moment manually inserts each one into the database one at a time.
Do While lngSQLLoop < lngCurrentRecord
lngSQLLoop = lngSQLLoop + 1
sql = "INSERT INTO db (key1, key2) VALUES ('value1', 'value2');"
result = bInsertIntoDatabase(sql, True)
If result = false Then lngFailed = lngFailed + 1
Loop
This works, but takes about 5 seconds for each 100 entries. Would there be a more efficient way to put this into the database? I've tried
INSERT INTO db (key1, key2) VALUES ('value1-1', 'value2-1'), ('value1-2', 'value2-2'), ('value1-3', 'value2-3');
but this fails with a missing colon ; error, suggesting it doesn't like the values to be listed like that. Is there a way that VBA will do this?

The use of multiple (), () clauses only works with SQL Server 2008.
But you're in luck: you can batch these by simply concatenating your SQL statements and batch a the calls to bInsertIntoDatabase.
The only down side to this approach is that if one statement in the batch fails, so will every subsequent statement in the batch.
So, if failure is a regular issue (say, from key collisions), you would need to use another approach. One solution is to:
Insert batches into a temporary table first (without unique indexes, thus avoiding failures initially)
Do a final insert into the main table with a WHERE clause that prevents an error
Get the result count and subtract from the total number of records in the temporary table to get the number of failures.

If the source of your data can be accessed via a database driver (like ODBC) and your database framework supports heterogeneous queries you should be able to do:
INSERT INTO targetDBtable (key1, key2)
VALUES (SELECT key1, key2 FROM sourceDBtable);

Using .AddNew and .Update with an updateable recordset seems fast: takes about 0.25 seconds to add 10000 records with no errors, or 1.25 seconds to add 10000 records with 10000 errors, on my system.

Save the data to a CSV file first and then use Access' TransferText method (of the DoCmd object) to load in to Access table in one go. Remember to delete the CSV file afterwards.
Even if you're running the code from Excel, you can still execute the TransferText method in Access via Automation.

Related

SSIS - How can you use a flat file of ID's in a Where statement?

I have a large (25,000) list of ID values that, at times, I'd like to just script a query to use instead of running against the full data set (100's of millions or rows) and the hard limit of 8,000 characters in the OLE DC Source SQL Command Text field won't allow them all. The client's server is locked up and I can't run BULK INSERT and can only use Temp Tables.
It would be something like this:
Select ID, FieldA, FieldB, FieldC, ...
From TableX
Where ID in (list of ID's from flat file)
I was hoping there would be some way to refer to a flat file with all the ID's in it either in the WHERE clause or via somekind of ForEach Loop. I've setup other ForEach loops but am unsure of A) IF it can even be done and B) how to go about it.
I've poked around the web but not getting any direct hits. Any direction to go and research would be appreciated. Thanks in advance!

BigQuery data using SQL "INSERT INTO" is gone after some time

Today I notice another strange behaviour of BigQuery.
I run UDF standard SQL in the BQ web ui:
CREATE TEMPORARY FUNCTION ...
INSERT INTO projectid.dataset.inserttable...
All seems good, the result of the UDF SQL are inserted in the insert table correct, I can tell from "Number of rows". But the table size is not correct, still keep the table size before run the insert query. Furthermore, I found all the inserted rows are gone after 1 hour later.
Some more info I found, when run a "DETELE FROM insert table true" or "SELECT ...", then the deleted number of rows and table size seems correct with the inserted data. But just can not preview the insert table correctly in the WEB UI.
Then I am guessing the "Detail" or "Preview" info of the table has time delay? May I know do you have any idea about this behaviour?
The preview may have a delay, so SELECT * FROM YourTable; will give the most up-to-date results, or you can use COUNT(*) just to verify that the number of rows is correct. You can think of it as being similar to streaming, if you have tried that, where some rows may be in the streaming buffer for a while before they make it into regular storage.

Speed up Python executemany

I'm inserting data from one database to another, so I have 2 connections (Conn1 and Conn2). Below is the code (using pypyodbc).
import pypyodbc
Conn1_Query = "SELECT column FROM Table"
Conn1_Cursor.execute(Conn1_Query)
Conn1_Data = Conn1_Cursor.fetchall()
Conn1_array = []
for row in Conn1_Data:
Conn1_array.append(row)
The above part runs very quickly.
stmt = "INSERT INTO TABLE(column) values (?)"
Conn2_Cursor.executemany(stmt, Conn1_array)
Conn2.commit()
This part is extremely slow. I've also tried to do a for loop to insert each row at a time using cursor.execute, but that is also very slow. What am I doing wrong and is there anything I can do to speed it up? Thanks for taking a look.
Thought I should also add that the Conn1 data is only ~50k rows. I also have some more setup code at the beginning that I didn't include because it's not pertinent to the question. It takes about 15 minutes to insert. As a comparison, it takes about 25 seconds to write the output to a csv file.
Yes, executemany under pypyodbc sends separate INSERT statements for each row. It acts just the same as making individual execute calls in a loop. Given that pypyodbc is no longer under active development, that is unlikely to change.
However, if you are using a compatible driver like "ODBC Driver xx for SQL Server" and you switch to pyodbc then you can use its fast_executemany option to speed up the inserts significantly. See this answer for more details.

Moving from Access backend to SQL Server as be. Efficiency help needed

I am working on developing an application for my company. From the beginning we were planning on having a split DB with an access front end, and storing the back end data on our shared server. However, after doing some research we realized that storing the data in a back end access DB on a shared drive isn’t the best idea for many reasons (vpn is so slow to shared drive from remote offices, access might not be the best with millions of records, etc.). Anyways, we decided to still use the access front end, but host the data on our SQL server.
I have a couple questions about storing data on our SQL server. Right now when I insert a record I do it with something like this:
Private Sub addButton_Click()
Dim rsToRun As DAO.Recordset
Set rsToRun = CurrentDb.OpenRecordset("SELECT * FROM ToRun")
rsToRun.AddNew
rsToRun("MemNum").Value = memNumTextEntry.Value
rsToRun.Update
memNumTextEntry.Value = Null
End Sub
It seems like it is inefficient to have to use a sql statement like SELECT * FROM ToRun and then make a recordset, add to the recordset, and update it. If there are millions of records in ToRun will this take forever to run? Would it be more efficient just to use an insert statement? If so, how do you do it? Our program is still young in development so we can easily make pretty substantial changes. Nobody on my team is an access or SQL expert so any help is really appreciated.
If you're working with SQL Server, use ADO. It handles server access much better than DAO.
If you are inserting data into a SQL Server table, an INSERT statement can have (in SQL 2008) up to 1000 comma-separated VALUES groups. You therefore need only one INSERT for each 1000 records. You can just append additional inserts after the first, and do your entire data transfer through one string:
INSERT INTO ToRun (MemNum) VALUES ('abc'),('def'),...,('xyz');
INSERT INTO ToRun (MemNum) VALUES ('abcd'),('efgh'),...,('wxyz');
...
You can assemble this in a string, then use an ADO Connection.Execute to do the work. It is frequently faster than multiple DAO or ADO .AddNew/.Update pairs. You just need to remember to requery your recordset afterwards if you need it to be populated with your newly-inserted data.
There are actually two questions in your post:
Will OpenRecordset("SELECT * FROM ToRun") immediately load all recordsets?
No. By default, DAO's OpenRecordset opens a server-side cursor, so the data is not retrieved until you actually start to move around the recordset. Still, it's bad practice to select lots of rows if you don't need to. This leads to the next question:
How should I add records in an attached SQL Server database?
There are a few ways to do that (in order of preference):
Use an INSERT statment. That's the most elegant and direct solution: You want to insert something, so you execute INSERT, not SELECT and AddNew. As Monty Wild explained in his answer, ADO is prefered. In particular, ADO allows you to use parameterized commands, which means that you don't have to put-into-quotes-and-escape your strings and correctly format your dates, which is not so easy to do right.
(DAO also allows you to execute INSERT statements (via CurrentDb.Execute), but it does not allow you to use parameters.)
That said, ADO also supports the AddNew syntax familiar to you. This is a bit less elegant but requires less changes to your existing code.
And, finally, your old DAO code will still work. As always: If you think you have a performance problem, measure if you really have one. Clean code is great, but refactoring has a cost and it makes sense to optimize those places first where it really matters. Test, measure... then optimize.
It seems like it is inefficient to have to use a sql statement like SELECT * FROM ToRun and then make a recordset, add to the recordset, and update it. If there are millions of records in ToRun will this take forever to run?
Yes, you do need to load something from the table in order to get your Recordset, but you don't have to load any actual data.
Just add a WHERE clause to the query that doesn't return anything, like this:
Set rsToRun = CurrentDb.OpenRecordset("SELECT * FROM ToRun WHERE 1=0")
Both INSERT statements and Recordsets have their pros and cons.
With INSERTs, you can insert many records with relatively little code, as shown in Monty Wild's answer.
On the other hand, INSERTs in the basic form shown there are prone to SQL Injection and you need to take care of "illegal" characters like ' inside your values, ideally by using parameters.
With a Recordset, you obviously need to type more code to insert a record, as shown in your question.
But in exchange, a Recordset does some of the work for you:
For example, in the line rsToRun("MemNum").Value = memNumTextEntry.Value you don't have to care about:
characters like ' in the input, which would break an INSERT query unless you use parameters
SQL Injection
getting the date format right when inserting date/time values

is there a maximum number of inserts that can be run in a batch sql script?

I have a series of simple "Insert INTO" type statements but after running about 3 or 4 of them the script stops and i get empty sets when i try selecting from the appropriate tables....aside from my specific code...i wonder whether there is an ideal way of running multiple insert type queries.
Right now i just have a txt file saved as a.sql with normal sql commands separated by ";"
No, there is not. however, if it stops after 3 or 4 inserts, it's a good bet there's an error in the 3rd or 4th insert. Depending on which SQL engine you use, there are different ways of making it report errors during and after operations.
Additionally, if you have lots of inserts, it's a good idea to wrap them inside a transaction - this basically buffers all the insert commands until it sees the end command for the transaction, and then commit everything to your table. That way, if something goes wrong, your database doesn't get polluted with data that needs to first be deleted again. More importantly, every insert without a transaction counts as a single transaction, which makes them really slow - Doing 100 inserts inside a transaction can be as fast as doing two or three normal inserts.
Maximum Capacity Specifications for SQL Server
Max Batch size = 65,536 * Network Packet Size
However I doubt that Max Batch size is your problem.