I would appreciate some ideas, discussion and feasibility of same, regarding bulk updates on a large table in SQL Server 2008+.
Currently I have a table with 10,000 rows and 160 columns. This table is updated very frequently with 1 to 100+ column changes per row depending on the process. Using the 'standard newbie' table update using a DataAdapter is very slow and unsuitable.
The quest is to find a faster way. I have tried fine-tuning the DataAdapter.Update with batch size, regardless the more heavy updates take 10-15 seconds. In the meanwhile SqlBulkCopy imports the whole table in (ball park) 1-3 seconds. When the update procedure takes place 30-50 times in a process the 10s-15s add up!
Being internet self thought, I have gaps in my experience, however there are 2 possibilities that I can think of that may be better at accomplishing the task of the update.
Dump the table content from the database and repopulate the table using SqlBulkcopy.
Using a stored procedure with a table passed to it with a merge SQL statement.
The main issue is data safety, although this is a local single user application there needs to be a way to handle errors roll back. From my understanding the dump and replace would be simpler, but perhaps more prone to data loss? The stored procedure would be far more extensive to set up as the update statement would have to have all the update columns typed individually and maintained for changes. Unless there is one 'Update *' statement :).
In trying to keep this short I want to keep this at a concept level only , but will appreciate any different ideas or links and advice.
EDIT further info:
The table has only one index, the ID column. Its a simple process of storing incoming (and changing) data to a simple datatable. and the update can be anywhere between 1 row to 1000 rows. The program stores the information to the database very often, and can be some or nearly all the columns. Building a stored procedure for each update would be impossible as I don't know which data will be updated, you can say that all of the columns will be updated (except the ID column and a few 'hard' data columns) it depends on what the update input is. So there is no fine tuning the update to specific columns unless I list nearly all of them each time. In which case one stored procedure would do it.
I think the issue is the number of 'calls' to the database are made using the current data adapter method.
EDIT:
3 WHat about a staging table where I bulk copy the data to and then have a store procedure do the update. Wouldn't that cut down the SQL trafic? I think that is the problem with the dataadapter update.
Edit: Posted an atempt of concept 1 in an answer to this thread.
Thank you
Dropping the table and reloading the entire thing with a bulk copy is not the correct way.
I suggest creating a stored procedure for each process that updates the table. The procedure should take as input only the columns that need to be updated for that specific process and then run a standard SQL update command to update those columns on the specified row. If possible, try to have indexes on the column(s) that you use to find the record(s) that need to be updated.
Alternately, depending on which version of the .Net framework you're using, you might try using Entity Framework if you don't want to maintain a whole list of stored procedures.
I have coded the following mockup to dump all rows from the table, bulkcopy the table in memory into an sql staging table and then move the data back into the original table. So by updating the data in that table.
Time taken 1.1 to 1.3 seconds
certainly a very attractive time compared to the 10-15s it takes to update. I have placed the truncete code for the staging table on top so that there is always one copy of the information in the database. Although the original table will not have the updated info untill the process completed.
What are the pitfals associate with this aproach ? What can I do about them? I must state that the table is unlikely to ever get beyond 10000 rows so the process will work.
Try
ESTP = "Start Bulk DBselection Update"
Dim oMainQueryT = "Truncate Table DBSelectionsSTAGE"
Using con As New SqlClient.SqlConnection(RacingConStr)
Using cmd As New SqlClient.SqlCommand(oMainQueryT, con)
con.Open()
cmd.ExecuteNonQuery()
con.Close()
End Using
End Using
ESTP = "Step 1 Bulk DBselection Update"
Using bulkCopy As SqlBulkCopy = New SqlBulkCopy(RacingConStr)
bulkCopy.DestinationTableName = "DBSelectionsSTAGE"
bulkCopy.WriteToServer(DBSelectionsDS.Tables("DBSelectionsDetails"))
bulkCopy.Close()
End Using
ESTP = "Step 2 Bulk DBselection Update"
oMainQueryT = "Truncate Table DBSelections"
Using con As New SqlClient.SqlConnection(RacingConStr)
Using cmd As New SqlClient.SqlCommand(oMainQueryT, con)
con.Open()
cmd.ExecuteNonQuery()
con.Close()
End Using
End Using
ESTP = "Step 3 Bulk DBselection Update"
oMainQueryT = "Insert INTO DBSelections Select * FROM DBSelectionsSTAGE"
Using con As New SqlClient.SqlConnection(RacingConStr)
Using cmd As New SqlClient.SqlCommand(oMainQueryT, con)
con.Open()
cmd.ExecuteNonQuery()
con.Close()
End Using
End Using
Data_Base.TextBox25.Text = "Deleting data - DONE "
Data_Base.TextBox25.Refresh()
Catch ex As Exception
ErrMess = "ERROR - occured at " & ESTP & " " & ex.ToString
Call WriteError()
Call ViewError()
End Try
Related
UPDATE see end of OP
this one has us scratching our heads.
We have written a method in VB.NET that fills a DataTable
It does the following (simplified - code to instantiate or set items and error handling removed)
Dim procedureName As String
Dim mnCommandTimeOut As Integer
Dim parameters As System.Data.OleDb.OleDbParameter()
Dim oConnection As System.Data.OleDb.OleDbConnection
'... all the above will be set in intervening code. Then
Dim dataTable As System.Data.DataTable = Nothing
Dim oCommand As System.Data.OleDb.OleDbCommand = Nothing
oCommand = New System.Data.OleDb.OleDbCommand(procedureName, oConnection)
oCommand.CommandType = CommandType.StoredProcedure
oCommand.CommandTimeout = mnCommandTimeout
If parameters IsNot Nothing Then
oCommand.Parameters.AddRange(parameters)
End If
Dim oAdapter As System.Data.OleDb.OleDbDataAdapter = Nothing
oAdapter = New System.Data.OleDb.OleDbDataAdapter()
oAdapter.SelectCommand = oCommand
oAdapter.SelectCommand.CommandTimeout = 120
oAdapter.MissingSchemaAction = MissingSchemaAction.AddWithKey
oAdapter.MissingMappingAction = MissingMappingAction.Passthrough
dataTable = New System.Data.DataTable()
oAdapter.Fill(dataTable)
Please note that I have also left out the code to clean up after ourselves, disposing what we no longer need and so on. Our real code is cleaner than this!
Here is the problem
I have an SP that is called using the above method. Again I'm not going to go into any complexity but the code in the SP basically consists of something like
SELECT Column1,
Column2
FROM <somequery>
Now I needed to make a modification to this SP, and added a bit of complexity to it
DECLARE #Table TABLE
(Column1 <some type here> PRIMARY KEY NOT NULL,
Column2 <some type here> NOT NULL)
Column types match original types
INSERT
INTO #Table
(Column1,
Column2)
SELECT <modified query>
followed by
SELECT [TAB].[Column1],
[TAB].[Column2]
FROM #Table [TAB]
To summarise I have not yet made any significant changes to the SP, other than using the Table Variable. I don't do a direct SELECT, but instead I INSERT INTO the Table Variable and then I SELECT from that. I have not yet introduced any of the additional complexity I need; when I run the old SP and the new SP through SQL Server Management Studio I still get identical output there
But not through the above code in VB.NET
Using the old SP I get a System.Data.DataTable containing all the rows returned by the SP
Using the new SP I get a System.Data.DataTable containing 0 columns and 0 rows.
no errors are raised The code runs perfectly happily. It just returns an empty table.
It gets worse. We have another method that fills a DataSet. The only difference between it and the original procedure is that we define
Dim dataSet As System.Data.DataSet = Nothing
dataSet = New System.Data.DataSet()
and
oAdapter.Fill(dataSet)
And here's the insane bit. The utterly, incomprehensible insane bit.
When I run my modified SP through the dataset method, it returns a dataset. The dataset contains one datatable and guess what, the datatable contains all the data my SP returns.
I'm baffled. What on earth is going on? Anyone have a clue? I suspect it has something to do with me using a Table Variable, but what, how?
I know what you're going to say: Why don't we just use the DataSet method? Of course. But we have commitments to Backward Compatibility. There may be old versions of the code, calling my SP, using the old method. I designed my SP so that it still returns all the same data that old versions of the code expect. But I can't change old versions of the code. They still use the method that uses the DataTable. So I can't give them an SP that won't work for them.
Of course there is another solution. Leave the old SP unchanged. Write a new version of the SP, originalnamev2 or something like that, that's going to be used by the new software. But I'd rather avoid that. Plus, of course, this gives me the creeps. I need to understand what is not working anymore so I can assess whether there is anything else in our code base that I need to draw attention to.
UPDATE start
Ok - here is what I tried
Use a Table Variable #Table, insert rows in there, SELECT rows. Result: empty DataTable
Use a Fixed Table [dbo].[TestTable]. Obviously NOT a solution for production but I'm trying things out now. SP now does DELETE [dbo].[TestTable], INSERT INTO [dbo].[TestTable] and finally SELECT rows. Result: empty DataTable
Finally I removed all the inserts and deletes from the SP, and only SELECT rows. Result: DataTable contains rows
Possible Conclusion: It's the presence of DELETE and or INSERT statements that causes this behaviour.
Now what do I need to do to make it work? Why does it not work when I use the Adapter to fill a DataTable, but it works if I use the same adapter to fill a DataSet?
This behaviour was caused by my SP lacking a SET NOCOUNT ON statement at the top. Once I added this, it worked again.
Currently my code structure (in VB.NET) is as follows -
Using reader As IfxDataReader = command.ExecuteReader()
If reader.HasRows Then
Do While reader.Read()
Using transaction As IfxTransaction = conn.BeginTransaction(System.Data.IsolationLevel.ReadCommitted)
'multiple update statements
transaction.Commit()
End Using
Loop
End If
End Using
The reader is reading multiple records and for every record, there are multiple update statements to be run. I figure that it would be better to begin a transaction for each record, and the commit after it is done, move on to the next record and create a new transaction for that, "rinse and repeat".
Everything works fine and is committed to the database, but when the reader checks for more rows after the last record, this peculiar error shows up -
ERROR [HY010][Informix .NET provider] Function sequence error.
After doing some reasearch, the IBM website says that I would have to update to a CSDK 3.5 or higher (http://www-01.ibm.com/support/docview.wss?uid=swg1IC58696). However, to me this seems a bit unnecessary since the code is working fine, it's just throwing that error at the end.
Would it be better to have the transaction OUTSIDE of the reader, and go through all the records in the table and THEN commit all at once? Or is it the most efficient/optimal the way it is now (in other words, going through each record, with all the necessary update statements for that record, and committing one at a time)? Secondly, would the former choice resolve the function sequence error?
If you plan your application to target for 64bit architecture or .NET FW 4x
then you may consider using CSDK 4.10 xC2 or above.
Within the code, there was a datareader, and inside the datareader were some update statements. I changed the way the code was structured by separating these functions. First I have it read all the data, and then store into objects. Then, after that was done and closed, I ran the update statements while iterating through each object. That seemed to solve the function sequence error that was coming up.
I have an issue Im running into with EF and SQL. I have a crazy stored proc that doesnt translate well to C# code (EF/LINQ). Basically, what I do is call the stored proc with a SqlConnection and SqlCommand call (System.Data.SqlClient) [See below] then pull the data from the table using EF. This happens over and over until a main table is depleted. (I have a main table with several hundred thousand records and the stored proc pulls a small portion of that and puts it in a table to be processed, once processed, those records are removed from the main table and it goes all over again until the main table has been completely processed).
The issue is that the table never gets updated in C#, but it IS getting updated on the backend.
So here's the SQL Call:
SqlConnection sqlConn;
SqlCommand sqlCommand;
using (sqlConn = new SqlConnection(ConfigurationManager.ConnectionStrings["AppMRIConnection"].ConnectionString))
{
using (sqlCommand = new SqlCommand(String.Format("EXEC sp_PullFinalDataSetPart '{0}', '{1}'", sLocation, sOutputFileType), sqlConn))
{
sqlConn.Open();
sqlCommand.ExecuteNonQuery();
sqlConn.Close();
}
}
That truncates the FinalDataSetPart table and re-loads it with X new records.
This is the call in C#
List<FinalDataSetPart> lstFinalPart = db.FinalDataSetPart.ToList();
This call will ALWAYS get the first finaldatasetpart table loaded regardless of what is actually in the table. That call is correctly inside the loop (I can break into the code and see it calling that method every loop iteration).
Has anyone seen anything like this before?!
Any thoughts/help/tips would be GREATLY appreciated.
Thanks!
Do the ID's change in the temporary table when you pull in new data? EF won't detect changes to data if the primary id's don't change.
Do you drop en recreate the context everytime you grab new data?
I am new to oracle. I need to process large amount of data in stored proc. I am considering using Temporary tables. I am using connection pooling and the application is multi-threaded.
Is there a way to create temporary tables in a way that different table instances are created for every call to the stored procedure, so that data from multiple stored procedure calls does not mix up?
You say you are new to Oracle. I'm guessing you are used to SQL Server, where it is quite common to use temporary tables. Oracle works differently so it is less common, because it is less necessary.
Bear in mind that using a temporary table imposes the following overheads:read data to populate temporary tablewrite temporary table data to fileread data from temporary table as your process startsMost of that activity is useless in terms of helping you get stuff done. A better idea is to see if you can do everything in a single action, preferably pure SQL.
Incidentally, your mention of connection pooling raises another issue. A process munging large amounts of data is not a good candidate for running in an OLTP mode. You really should consider initiating a background (i.e. asysnchronous) process, probably a database job, to run your stored procedure. This is especially true if you want to run this job on a regular basis, because we can use DBMS_SCHEDULER to automate the management of such things.
IF you're using transaction (rather than session) level temporary tables, then this may already do what you want... so long as each call only contains a single transaction? (you don't quite provide enough detail to make it clear whether this is the case or not)
So, to be clear, so long as each call only contains a single transaction, then it won't matter that you're using a connection pool since the data will be cleared out of the temporary table after each COMMIT or ROLLBACK anyway.
(Another option would be to create a uniquely named temporary table in each call using EXECUTE IMMEDIATE. Not sure how performant this would be though.)
In Oracle, it's almost never necessary to create objects at runtime.
Global Temporary Tables are quite possibly the best solution for your problem, however since you haven't said exactly why you need a temp table, I'd suggest you first check whether a temp table is necessary; half the time you can do with one SQL what you might have thought would require multiple queries.
That said, I have used global temp tables in the past quite successfully in applications that needed to maintain a separate "space" in the table for multiple contexts within the same session; this is done by adding an additional ID column (e.g. "CALL_ID") that is initially set to 1, and subsequent calls to the procedure would increment this ID. The ID would necessarily be remembered using a global variable somewhere, e.g. a package global variable declared in the package body. E.G.:
PACKAGE BODY gtt_ex IS
last_call_id integer;
PROCEDURE myproc IS
l_call_id integer;
BEGIN
last_call_id := NVL(last_call_id, 0) + 1;
l_call_id := last_call_id;
INSERT INTO my_gtt VALUES (l_call_id, ...);
...
SELECT ... FROM my_gtt WHERE call_id = l_call_id;
END;
END;
You'll find GTTs perform very well even with high concurrency, certainly better than using ordinary tables. Best practice is to design your application so that it never needs to delete the rows from the temp table - since the GTT is automatically cleared when the session ends.
I used global temporary table recently and it was behaving very unwantedly manner.
I was using temp table to format some complex data in a procedure call and once the data is formatted, pass the data to fron end (Asp.Net).
In first call to the procedure, i used to get proper data and any subsequent call used to give me data from last procedure call in addition to current call.
I investigated on net and found out an option to delete rows on commit.
I thought that will fix the problem.. guess what ? when i used on commit delete rows option, i always used to get 0 rows from database. so i had to go back to original approach of on commit preserve rows, which preserves the rows even after commiting the transaction.This option clears rows from temp table only after session is terminated.
then i found out this post and came to know about the column to track call_id of a session.
I implemented that solution and still it dint fix the problem.
then i wrote following statement in my procedure before i starting any processing.
Delete From Temp_table;
Above statemnet made the trick. my front end was using connection pooling and after each procedure call it was commitng the transaction but still keeping the connection in connection pool and subsequent request was using the same connection and hence the database session was not terminated after every call..
Deleting rows from temp table before strating any processing made it work....
It drove me nuts till i found this solution....
I've been asked to implement some code that will update a row in a MS SQL Server database and then use a stored proc to insert the update in a history table. We can't add a stored proc to do this since we don't control the database. I know in stored procs you can do the update and then call execute on another stored proc. Can I set it up to do this in code using one SQL command?
Either run them both in the same statement (separate the separate commands by a semi-colon) or a use a transaction so you can rollback the first statement if the 2nd fails.
You don't really need a stored proc for this. The question really boils down to whether or not you have control over all the inserts. If in fact you have access to all the inserts, you can simply wrap an insert into datatable, and a insert into historytable in a single transasction. This will ensure that both are completed for 'success' to occur. However, when accessing to tables in sequence within a transaction you need to make sure you don't lock historytable then datatable, or else you could have a deadlock situation.
However, if you do not have control over the inserts, you can add a trigger to certain db systems that will give you access to the data that are modified, inserted or deleted. It may or may not give you all the data you need, like who did the insert, update or delete, but it will tell you what changed.
You can also create sql triggers.
Insufficient information -- what SQL server? Why have a history table?
Triggers will do this sort of thing. MySQL's binlog might be more useful for you.
You say you don't control the database. Do you control the code that accesses it? Add logging there and keep it out of the SQL server entirely.
Thanks all for your reply, below is a synopsis of what I ended up doing. Now to test to see if the trans actually roll back in event of a fail.
sSQL = "BEGIN TRANSACTION;" & _
" Update table set col1 = #col1, col2 = #col2" & _
" where col3 = #col3 and " & _
" EXECUTE addcontacthistoryentry #parm1, #parm2, #parm3, #parm4, #parm5, #parm6; " & _
"COMMIT TRANSACTION;"
Depending on your library, you can usually just put both queries in one Command String, separated by a semi-colon.