Fastest Way to Update table in SQL Server - sql

I have a Vb.net application that is updating a table in a SQL Server database very frequently. The table has 143 columns and about 10,000 rows. The same procedure is required to update the table for several different modules so the data updated is different all the time, sometimes it could be just a few cells in a few rows other times it may be several hundred rows and several columns.
At times it's taking 15 to 30 seconds to update the information. That seams really long given that the table can be totally re-written with a bulk import in a second or 2 (I realise that that is beside the point). The database is set to simple recovery, the table has only one index. I have tried playing around with the update batch size to no noticeable improvement.
I'm using the below code to do the update. Is there anything that I can do to improve the speed?
Dim oMainQueryR As String
If DBSelectionsDS.HasChanges Then
Try
oMainQueryR = "SELECT * FROM DBSelections"
Using connection As New SqlConnection(RacingConStr)
Using oDataSQL As New SqlDataAdapter(oMainQueryR, connection)
oDataSQL.UpdateBatchSize = 100
Using cbT As SqlCommandBuilder = New SqlCommandBuilder(oDataSQL)
connection.Open()
oDataSQL.Update(DBSelectionsDS, "DBSelectionsDetails")
connection.Close()
End Using
End Using
End Using
DBSelectionsDS.Tables("DBSelectionsDetails").AcceptChanges()
Catch ex As Exception
ErrMess = "ERROR - occured " & ex.ToString
Call WriteError()
Call ViewError()
End Try
End If

I would be willing to bet the bottleneck lies in two places:
First, You are selecting everything from your table every time you need to run the update. This will take longer and longer as your table grows in size. The SqlCommandBuilder only needs a schema to work with so change your query string to this:
oMainQueryR = "SELECT * FROM DBSelections WHERE 0 = 1"
This will return only the schema and column names for the table but no rows, your DataSet contains all the data information the CommandBuilder needs to perform the update. If you are curious why this works, consider that 0 will never equal 1, so SQL says there are never any rows where 0 = 1, and just returns the schema instead.
Second, the UpdateBatchSize is limiting the batch size of the update.
Consider having 200 rows of changed data in your DataSet. You
will then have to take 2 trips to the database to finish the update.
Setting UpdateBatchSize = 0 will remove this limit, also you can
just remove the line as the default is 0.
Otherwise your bottleneck could be caused by another transaction locking the DBSelections table. Ensure that if you are running any queries against that table while the update is happening, you will either want to use the with (nolock) statement, or ensure that your update is the only transaction occurring at the time.

Related

Getting the number of affected records by an action query in VBA

I'm wondering if it is possible to pick a variable which is being used by MS Access when inserting a specific amount of new data rows to a table which happens during a VBA runtime.
Screenshot of the MS Access Notification (in German):
MS Access notifies me here in that case that in the middle of a running VBA script 2327 new data rows are being added to the table. I'm not a programmer but I feel that this variable must be stored somewhere (at least temporary).
Does anyone has an idea how to pick/access this number from the window when it appears and then store it into an VBA Integer for further use in the future?
Update:
Basically, I have a main database which has to be updated every day by an import file.
In the first step there is a check for already existing datasets which will therefore only updated by an UPDATE-Query.
UPDATE ReiseMaster
INNER JOIN Update_Import
ON(ReiseMaster.Col1 = Update_Import.Col1)
SET ReiseMaster.Col2 = Update_Import.Col2,
ReiseMaster.Col3 = Update_Import.Col3;
Then, in the second step new datasets which are not existing in the main database will be inserted into it will all the information they contain. This is the SQL-Query which is responsible for appending new data rows during the VBA procedure:
INSERT INTO ReiseMaster ([Col1],[Col2],[Col3])
SELECT [Col1],[Col2],[Col3] FROM Update_Import
WHERE NOT EXISTS(SELECT 1 FROM ReiseMaster
WHERE Update_Import.[Col1] = ReiseMaster.[Col1]);
I am struggling with identifying the amount of new datasets after the procedure, which is in fact already somehow determined by MS Access itself (see Pop-Up). So my idea was to just use the number which is already determined by MS Access.
All SQL-Queries are stored in a string and run by the "DoCmd.RunSQL" command.
Using DAO, it's really easy to get the number of records affected after executing a query.
You can just use the Database.RecordsAffected property:
Dim db As DAO.Database
Set db = CurrentDb 'Required, don't use CurrentDb.Execute else this won't work
Dim strSQL as String
strSQL = "UPDATE ReiseMaster INNER JOIN Update_Import" 'and the rest of your query
db.Execute strSQL, dbFailOnError 'Execute the query using Database.Execute
Dim recordsAffected As Long
recordsAffected = db.RecordsAffected 'Get the number of records affected
However, this won't allow you to see it before committing the query. To see it and display a prompt, you can use transactions (short sample here), and rollback the transaction if the user doesn't choose to commit.

Speed up Python executemany

I'm inserting data from one database to another, so I have 2 connections (Conn1 and Conn2). Below is the code (using pypyodbc).
import pypyodbc
Conn1_Query = "SELECT column FROM Table"
Conn1_Cursor.execute(Conn1_Query)
Conn1_Data = Conn1_Cursor.fetchall()
Conn1_array = []
for row in Conn1_Data:
Conn1_array.append(row)
The above part runs very quickly.
stmt = "INSERT INTO TABLE(column) values (?)"
Conn2_Cursor.executemany(stmt, Conn1_array)
Conn2.commit()
This part is extremely slow. I've also tried to do a for loop to insert each row at a time using cursor.execute, but that is also very slow. What am I doing wrong and is there anything I can do to speed it up? Thanks for taking a look.
Thought I should also add that the Conn1 data is only ~50k rows. I also have some more setup code at the beginning that I didn't include because it's not pertinent to the question. It takes about 15 minutes to insert. As a comparison, it takes about 25 seconds to write the output to a csv file.
Yes, executemany under pypyodbc sends separate INSERT statements for each row. It acts just the same as making individual execute calls in a loop. Given that pypyodbc is no longer under active development, that is unlikely to change.
However, if you are using a compatible driver like "ODBC Driver xx for SQL Server" and you switch to pyodbc then you can use its fast_executemany option to speed up the inserts significantly. See this answer for more details.

Insert Records repeatedly faster

I'm monitoring a folder for Jpg files and need to process the incoming files.
I decode the filename to get all the information I want and insert into a table and then move the file to another folder.
The file name is already contains all the information I want. Eg.
2011--8-27_13:20:45_MyLocation_User1.jpg.
Now I'm using an Insert statement
Private Function InsertToDB(ByVal SourceFile As String, ByVal Date_Time As DateTime, ByVal Loc As String, ByVal User As String) As Boolean
Dim conn As SqlConnection = New SqlConnection(My.Settings.ConString)
Dim sSQL As String = "INSERT INTO StageTbl ...."
Dim cmd As SqlComman
cmd = New SqlCommand(sSQL, conn)
....Parameters Set ...
conn.Open()
cmd.ExecuteNonQuery()
conn.Close()
conn = Nothing
cmd = Nothing
End Function
The function will be called for every single file found.
Is this most efficient way? Looks like its is very slow. I need to process about 20~50 files/sec. Probably a stored procedure?
I need to do this as fast as possible. I guess bulk insert not applicable here.
Please help.
Bulk insert could be applicable here - do you need them to be in the DB instantly, or could you just build up the records in memory then push them into the database in batches?
Are you multi-threading as well - otherwise your end to end process could get behind.
Another solution would be to use message queues - pop a message into the queue for every file, then have a process (on a different thread) that is continually reading the queue and adding to the database.
There are several things you can do to optimize the speed of this process:
Don't open and close the connection for every insert. That alone will yield a (very) significant performance improvement (unless you were using connection pooling already).
You may gain performance if you disable autocommit and perform inserts in blocks, commiting the transaction after every N rows (100-1000 rows is a good number to try for a start).
Some DB systems provide a syntax to allow insertion of multiple rows in a single query. SQL Server doesn't but you may be interested on this: http://blog.sqlauthority.com/2007/06/08/sql-server-insert-multiple-records-using-one-insert-statement-use-of-union-all/
If there are many users/processes accessing this table, access can be slow depending on your transaction isolation level. In your case (20-50 inserts/sec) this shouldn't make a big difference. I don't recommend modifying this unless you understand well what you are doing: http://en.wikipedia.org/wiki/Isolation_%28database_systems%29 and http://technet.microsoft.com/es-es/library/ms173763.aspx .
I don't think a stored procedure will necessarily provide a big performance gain. You are only parsing/planning the insert 20-50 times per second. Use a stored procedure only if it fits well your development model. If all your other queries are in code, you can avoid it.
Ensure your bottleneck is the database (i.e. moving files is not taking a lot of time), but since the OS should be good at this, check the points above. If you find that moving files is your bottleneck, delaying or moving files in the background (another thread) can help to a certain extent.

Resolving an ADO timeout issue in VB6

I am running into an issue when populating an ADO recordset in VB6. The query (hitting SQLServer 2008) only takes about 1 second to run when I run it using SSMS. It works fine when the result set is small, but when it gets to be a few hundred records it takes a long time. 800+ records requires about 5 minutes to return (query still only takes 1 second in SSMS), and 6000+ takes well over 20 minutes. I have "fixed" the exception by increasing the command timeout, but I was wondering if there was a way to get it to work faster since it does not seem to be the actual query that requires so much time. Something such as compressing the results so it doesn't take as long. The recordset is opened as follows:
myConnection.CommandTimeout = 2000
myConnection.ConnectionString = "Provider=SQLOLEDB;" & _
"Initial Catalog=DB_NAME;" & _
"Data Source=SERVER_NAME" & _
"Network Library=DBMSSOCN;" & _
"User ID=USER_NAME;" & _
"Password=PASSWORD;" & _
"Use Encryption for Data=True;"
myConnection.Open
myRecordSet.Open STORED_PROC_QUERY_STRING, myConnection, adOpenStatic, adLockReadOnly
Set myRecordSet.ActiveConnection = Nothing
myConnection.Close
The data returns 3 columns used to fill a combo box.
UPDATE:
I ran SQL Profiler, and the instances from the client machine make more reads and takes more time by a factor of 100 than both metrics for the queries in SSMS. The text of the query is the same for both SSMS and the client machine according to the profiler, so I don't think it should be using a different execution plan. Could the network library or the Provider have any impact on this?
Profiler stats:
From the client application: 7041720
reads, 59458 ms duration, 3900 row
counts
From SSMS: 30802 reads, 238 ms
duration, 3900 row counts
It seems like it is using a different execution plan, but the query is exactly the same and I am not sure how to check the execution plan the client might be using if it is different from what is shown in SSMS.
800+ records requires about 5 minutes = query problem.
look at your execution plan:
In SSMS, run:
SET SHOWPLAN_ALL ON
then run your query, it will not produce the expected result set, but an exceution plan on how the database is retrieving your data. Most bad queries usually table scan (look at every row in the table, which is slow), so look for the word "SCAN" in the StmtText column. Try to figure out why the index is not being used on that table (name will be in there by the word "SCAN"). If you join in multiple tables and have multiple SCANs concentrate on the largest tables first.
Without more info this is the best "generic" help you can get.
EDIT
From reading your question, I'm not sure if you mean it is always fast from SSMS no matter the rows, but slow from VB as the rows increase. If that is the case check this: http://www.google.com/search?q=sql+server+fast+from+ssms+slow+from+application&hl=en&num=100&lr=&ft=i&cr=&safe=images
could be something like: parameter sniffing or inconsistent connection parameters (ANSI nulls, arithabort, etc)
for the connection settings, try running these from SSMS and from VB6 (add them to the result set) and see if there are any differences:
SELECT SESSIONPROPERTY ('ANSI_NULLS') --Specifies whether the SQL-92 compliant behavior of equals (=) and not equal to (<>) against null values is applied.
--1 = ON
--0 = OFF
SELECT SESSIONPROPERTY ('ANSI_PADDING') --Controls the way the column stores values shorter than the defined size of the column, and the way the column stores values that have trailing blanks in character and binary data.
--1 = ON
--0 = OFF
SELECT SESSIONPROPERTY ('ANSI_WARNINGS') --Specifies whether the SQL-92 standard behavior of raising error messages or warnings for certain conditions, including divide-by-zero and arithmetic overflow, is applied.
--1 = ON
--0 = OFF
SELECT SESSIONPROPERTY ('ARITHABORT') -- Determines whether a query is ended when an overflow or a divide-by-zero error occurs during query execution.
--1 = ON
--0 = OFF
SELECT SESSIONPROPERTY ('CONCAT_NULL_YIELDS_NULL') --Controls whether concatenation results are treated as null or empty string values.
--1 = ON
--0 = OFF
SELECT SESSIONPROPERTY ('NUMERIC_ROUNDABORT') --Specifies whether error messages and warnings are generated when rounding in an expression causes a loss of precision.
--1 = ON
--0 = OFF
SELECT SESSIONPROPERTY ('QUOTED_IDENTIFIER') --Specifies whether SQL-92 rules about how to use quotation marks to delimit identifiers and literal strings are to be followed.
--1 = ON
--0 = OFF
make your query like (so you can see the connection settings in VB6):
SELECT
col1, col2
,SESSIONPROPERTY ('ARITHABORT') AS ARITHABORT
,SESSIONPROPERTY ('ANSI_WARNINGS') AS ANSI_WARNINGS
FROM ...

Excel VBA, how to do multiple database entries

I need to populate a database with thousands of entries on a daily basis, but my code at the moment manually inserts each one into the database one at a time.
Do While lngSQLLoop < lngCurrentRecord
lngSQLLoop = lngSQLLoop + 1
sql = "INSERT INTO db (key1, key2) VALUES ('value1', 'value2');"
result = bInsertIntoDatabase(sql, True)
If result = false Then lngFailed = lngFailed + 1
Loop
This works, but takes about 5 seconds for each 100 entries. Would there be a more efficient way to put this into the database? I've tried
INSERT INTO db (key1, key2) VALUES ('value1-1', 'value2-1'), ('value1-2', 'value2-2'), ('value1-3', 'value2-3');
but this fails with a missing colon ; error, suggesting it doesn't like the values to be listed like that. Is there a way that VBA will do this?
The use of multiple (), () clauses only works with SQL Server 2008.
But you're in luck: you can batch these by simply concatenating your SQL statements and batch a the calls to bInsertIntoDatabase.
The only down side to this approach is that if one statement in the batch fails, so will every subsequent statement in the batch.
So, if failure is a regular issue (say, from key collisions), you would need to use another approach. One solution is to:
Insert batches into a temporary table first (without unique indexes, thus avoiding failures initially)
Do a final insert into the main table with a WHERE clause that prevents an error
Get the result count and subtract from the total number of records in the temporary table to get the number of failures.
If the source of your data can be accessed via a database driver (like ODBC) and your database framework supports heterogeneous queries you should be able to do:
INSERT INTO targetDBtable (key1, key2)
VALUES (SELECT key1, key2 FROM sourceDBtable);
Using .AddNew and .Update with an updateable recordset seems fast: takes about 0.25 seconds to add 10000 records with no errors, or 1.25 seconds to add 10000 records with 10000 errors, on my system.
Save the data to a CSV file first and then use Access' TransferText method (of the DoCmd object) to load in to Access table in one go. Remember to delete the CSV file afterwards.
Even if you're running the code from Excel, you can still execute the TransferText method in Access via Automation.