sql - insert concurrently in database - sql

I have an sql server database on which I use stored procedures to insert new data from the application. I wonder what I have to do in order to ensure that it is correct to use multiple threads to call the stored procedures in my db in a concurrent correct/safe way.
A possible concurrency issue is about correctness. I call multiple stored procedures from the data layer interface and each stored procedure (and the other stored procedure that they call) perform its updates in multiple db tables in ways that can break table structure correctness if done concurrently in any "interleaving way" (e.g. from SP_1 I insert different elements in Table1 or Table2 depending on some conditions related to the existence of some element y in Table2)
It seems to me that (using the way tables are defined) in order to make the program correct I have to see every stored procedure actions done in isolation than any other operation that could be called concurrently (operations called using the data access layer interface).
Does one big transaction (that includes everything done as part of an DAL interface call) can help me make the program correct from the point of view of the db in the presence of concurrent inserts? I cannot see (if the solution could be viable) how this could improve more than, say a mutual exclusion approach where only a single thread at the moment would be able to make the necessary inserts into the db tables at a time...

if done concurrently in any "interleaving way"
This is exactly what transactions are used to avoid.

Related

Trigger calls Stored Procedure and if we we do a select will the return values be the new or old?

Using MS SQL Server, a Trigger calls a Stored Procedure which internally makes a select, will the return values be the new or old ones?
I know that inside the trigger I can access them by FROM INSERTED i inner join DELETED, but in this case I want to reuse (cannot change it) an existing Stored Procedure that internally makes a select on the triggered table and processes some logic with them. I just want to know if I can be sure that the existing logic will work or not (by accessing the NEW values).
I can simply try to simulate it with one update... But maybe there are other cases (example: using transactions or something else) that I maybe not be aware and never test it that could result in a different case.
I decided to ask someone else that might know better. Thank you.
AFTER triggers (the default) fire after the DML action. When the proc is called within the trigger, the tables will reflect changes made by the statement that fired the trigger as well changes made within the trigger before calling the proc.
Note changes are uncommitted until the trigger completes or explict transaction later committed.
Since the procedure is running in the same transaction as the (presumably, "after") trigger, it will see the uncommitted data.
I hope you see the implications of that: the trigger is executing as part of the transaction started by the DML statement that caused it to fire, so the stored procedure is part of the same transaction, so a "complicated" stored procedure means that transaction stays open longer, holding locks longer, making responses back to users slower, etc etc.
Also, you said
internally makes a select on the triggered table and processes some logic with them.
if you just mean that the procedure is selecting the data in order to do some complex processing and then write it to somewhere else inside the database, ok, that's not great (for reasons given above), but it will "work".
But just in case you mean you are doing some work on the data in the procedure and then returning that back to the client application, Don't do that
The ability to return results from triggers will be removed in a future version of SQL Server. Triggers that return result sets may cause unexpected behavior in applications that aren't designed to work with them. Avoid returning result sets from triggers in new development work, and plan to modify applications that currently do. To prevent triggers from returning result sets, set the disallow results from triggers option to 1.

What does "query" mean in SQL Server in "use query governor"

If I am executing a stored procedure containing a number of successive query statements,
Does use query governor apply to each statement executed in the stored procedure, or does it mean a single executed statement - in this case the entire stored procedure?
I think you are mixing up some concepts.
The first concept is what is a transaction? That depends if you are using explicit or implicit transactions.
By default, implicit transactions are set on.
If you want to have all the statements either committ or rollback in a stored procedure, you will have to use BEGIN TRANS, COMMIT and/or ROLLBACK statements with error checking in the stored procedure.
Now lets talk about the second concept. The resource governor is used to limit the amount of resources given to a particular user group.
Basically, a login id is mapped by a classifier function to a workload group and resource pool. This allows you to put all your users in a low priority group giving them only a small slice of the CPU and MEMORY while your production jobs can be in a high priority group with the LION's share of the resources.
This prevents a typical user from writing a report that has a huge CROSS JOIN and causes a performance issue on the production database.
I hope this clears up the confusion. If not, please ask exactly what you are looking for.
It appears the answer to my question is that a stored procedure counts as a query in this context;
We have spent some time examining an issue with a stored procedure comprising a number of EXEC'd dml statements, which timed out with "Use Query Governor" selected, according to the value of the number of seconds applicable. Deselecting "Use Query Governor" resolved the problem.

Best practices of structuring stored procedures

As a developer mainly writing c# I have adopted some good practices when writing c# code. When I sometimes write stored procedures I have trouble applying those practices to the stored procedure code.
On several occasions I have inherited nightmare stored procedure code, first three or four layers of stored procedures setting up some temp tables and mostly calling each other. No real work done and just a few lines of code. Then at last there is a call to "the final" stored procedure, a big monster of 3000-5000 lines of SQL code. That code usually have a lot of code smells like code duplication, intricate control flows (a.k.a. spaghetti) and a method that does too many things stacked after each other with no clear separation where one chunk of work starts and where it ends (not even a comment as a divisor).
I have also noticed the use of out commented select statements that selects from intermediate temp tables. The selects can be turned back on for debug purposes, but need to be removed before any calling code expecting a specific order of the returned result sets.
Apparently my fellow team mates also share my lack of good SQL writing practices.
So... ( and here comes the real question) ... what are good practices for writing modular maintainable stored procedures?
Both home made practices and references to books/blogs are welcome. Methods as well as tools that help with certain tasks.
Lets summarize some areas where I have not found good practices
Modularization and encapsulation (is stored procedures communication via temp tables really the way to go?)
In c# I use assemblies, classes and methods decorated with access modifiers to accomplish this.
Debugging/testing (better than modifying the target of debugging?)
Debug tools?
Debug traces?
Test fixtures?
Emphasizing code/logic/data/control flow using code the structure of the code
In c# I refactor and break out smaller methods that does just one logical task each.
Code duplication
Mostly I encounter SQL Server as DBMS but DBMS agnostic answers or answers pointing out features of other DBMS:es that help in above cases are also welcome.
To give some background: Most large stored procedures I have encountered are in reporting scenarios where the base is to just create some summary values from a large table. But along the way you need to exclude some of the values that happen to be in some exception table, add some of the values in some not yet completed stuff table, compare with last year (can you imagine the ugly code that handles products changing department between years?), etc.
I write a lot of complex stored procs. Some things I would consider best practices:
Don't use dynamic SQl in a stored proc unless you are doing a search proc with lots of parameters which may or may not be needed (then it is currently one of the best solutions). If you must use dynamic SQl in a proc always have a debug input parameter and if the debug parameter is set, then print the SQL statement created rather than executing it. This will save hours of debugging time!
If you are performing more than one action query in a proc (insert/update/delete), use Try Cacth blocks and transaction processing. Add a test parameter to the input parameters and when it is is set to 1, always rollback the entire transaction. Before rolling back in test mode, I usually have a section that returns the values in the tables I'm affecting to ensure that what I think I am doing to the database is in fact what I did do. Or you could have checks as go as shown below. That is as simple as putting in the following code around your currently commented out selects (and uncommenting them) once you have the #test parameter.
If #test =1
Begin
Select * from table1 where field1 = #myfirstparameter
End
Now you don't have to go through and comment and uncomment each time time you test.
#test or #debuig should always be set with a default value of 0 and placed last in the list. That way adding them won't break existing calls of the proc.
Consider having logging and/or error logging tables for procs doing inserts/updates/deletes. If you record the the steps and or errors in table variables as you go, they are still available after a rollback to be inserted into the logging table. Knowing what part of a complex proc failed and what the error was can be invaluable later on.
Where possible do not nest stored procs. If you need to run multiple records in a loop, replace the stored proc with one that has a table-valued parameter and set up the proc to run in a set-based and not individual record fashion. This will work if the table-valued parameter has one record or many records.
If you have a complex select with a lot of subqueries or derived tables, consider using CTEs instead. Refactor any correlated subqueries or cursors to better performing set-based code. Always think in terms of sets of data not one record.
Do not, under any conceivable circumstance, nest views. The performance hit is much worse than any small amount of saved development time. And trust me, nested views do not save maintenance time when the change needs to be to the view the furthest into the chain of views.
All stored procs (and other database code) should be in source control.
Table variables are good for smaller data sets, but temp tables (real ones that start with # or ## not staging tables) can be better for performance in large data sets. If using temp tables drop them when you don't need them anymore. Try to avoid the use of global temp tables.
Learn to write performant SQL. It is usually just as easy to write SQL that will perform well than SQL which will not once you know the techiniques. If you write complex stored procs, there is no excuse for not knowing which techniques work better than which other ones. Learn how to make sure your query is sargable. Avoid cursors, correlated subqueries, scalar functions and other things which run row-by-agonizing-row.
Communication via temp tables is sometimes a huge code smell. Such procedures often cannot be run by a user without interfering with each other (if you re-use a temp table name for different procedures' ins and outs and they aren't re-created or if you use the same name with two different table schemas). They can be hard to troubleshoot - like any feature, use them when necessary and better alternatives don't exist. Using real tables temporarily can also be problematic.
Stored procs which pass data to each other in SQL Server at all (more than parameters) can be problematic. There are table-valued parameters now and many things which previously would have been done with procs can now be done with inline table-valued functions or (and usually preferred over) multi-statement table-valued functions.
In SQL Server, avoid heavy use of scalar functions and multi-statement table-valued function on large rowsets - they do not perform very well, so modular techniques which may seem obvious in C# don't really apply here.
I would recommend you look at Ken Henderson's Guru's Guide to SQL Server Stored Procedures - published in 2002, it still has a wealth of useful information on database application design.
This is such a good question. As a C# dev myself having to dabble in SQL it seems SQL by its very nature gets in the way of the best-practices I'm used to with C#.
Common Table Expresions are great to isolate queries in a stored procedure but you can only use them once! That leads you to define views but then you've lost your encapsulation.
A resultset from one stored procedure are very difficult to use in another so you might be tempted to write table-valued functions. That adds to your permissions-maintenance burden and forces you to write functions 'twice' - once as a function and another as a procedure that calls the function. Otherwise, you have different interfaces to your DAL depending on whether it's a procedure or not.
All of this has caused me, over time, to stick to simple CRUD stored procedures (that do not call each other) in the database and few, isolated, queries when the relationships are complex. More BI-stuff. Everything else is in the BLL.
Physically, SQL is isolated in seperate files by function or the table they revolve around and managed in source control.
Avoid SELECT * and favor specifying columns. That saves you from run-time problems when you change a table and don't touch all the procs. Yes, there is a recompile for procs but it WILL miss some, especially if views are involved. Plus, SELECT * almost always returns more columns than you really need and that's a waste of bandwidth.
The comments above are great advice of Do's and Dont's when it comes to SQL code writing. If I understand your questions correctly, you are asking if this is normal for SQL Developer to write hundreds even thousands of code in a single stored procedure. In C# this is a big no-no. You are to encapsulate logic into small chucks using methods, assemblies, and classes. SQL Developer tend to write the entire logic in one stored procedure to accomplish a relating task; as HLGEM mentioned above, "If possible, do not nest stored procedures". Do not nest Views.
For example: A simple Get and Insert design in C# looks like this:
Call GetData Method
Call Get Data Method
Call Transform Data Method
Call CheckAlphaNumeric Method
Call Data Enrichment Method
Call Load Transformed Data Method
A SQL developer will design it like this:
In a single stored proc:
Get Data and Transform using either temp table or table variable, then Load it into the final table.
If you are to change the way SQL is written to match the writing structure of C# Developer, you would then do this:
Execute Main Stored Procedure (which calls the below sproc.)
Execute GetData Stored Procedure and load into Stage Table
Execute Transform Stored Procedure which read the Stage Table and transform data
Execute Load Data stored procedure to load Staged or Transformed data into final table.

How to do a massive insert

I have an application in which I have to insert in a database, with SQL Server 2008, in groups of N tuples and all the tuples have to be inserted to be a success insert, my question is how I insert these tuples and in the case that someone of this fail, I do a rollback to eliminate all the tuples than was inserted correctly.
Thanks
On SQL Server you might consider doing a bulk insert.
From .NET, you can use SQLBulkCopy.
Table-valued parameters (TVPs) are a second route. In your insert statement, use WITH (TABLOCK) on the target table for minimal logging. eg:
INSERT Table1 WITH (TABLOCK) (Col1, Col2....)
SELECT Col1, Col1, .... FROM #tvp
Wrap it in a stored procedure that exposes #tvp as parameter, add some transaction handling, and call this procedure from your app.
You might even try passing the data as XML if it has a nested structure, and shredding it to tables on the database side.
You should look into transactions. This is a good intro article that discusses rolling back and such.
If you are inserting the data directly from the program, it seems like what you need are transactions. You can start a transaction directly in a stored procedure or from a data adapter written in whatever language you are using (for instance, in C# you might be using ADO.NET).
Once all the data has been inserted, you can commit the transaction or do a rollback if there was an error.
See Scott Mitchell's "Managing Transactions in SQL Server Stored Procedures for some details on creating, committing, and rolling back transactions.
For MySQL, look into LOAD DATA INFILE which lets you insert from a disk file.
Also see the general MySQL discussion on Speed of INSERT Statements.
For a more detailed answer please provide some hints as to the software stack you are using, and perhaps some source code.
You have two competing interests, doing a large transaction (which will have poor performance, high risk of failure), or doing a rapid import (which is best not to do all in one transaction).
If you are adding rows to a table, then don't run in a transaction. You should be able to identify which rows are new and delete them should you not like how the look on the first round.
If the transaction is complicated (each row affects dozens of tables, etc) then run them in transactions in small batches.
If you absolutely have to run a huge data import in one transaction, consider doing it when the database is in single user mode and consider using the checkpoint keyword.

Local Temporary table in Oracle 10 (for the scope of Stored Procedure)

I am new to oracle. I need to process large amount of data in stored proc. I am considering using Temporary tables. I am using connection pooling and the application is multi-threaded.
Is there a way to create temporary tables in a way that different table instances are created for every call to the stored procedure, so that data from multiple stored procedure calls does not mix up?
You say you are new to Oracle. I'm guessing you are used to SQL Server, where it is quite common to use temporary tables. Oracle works differently so it is less common, because it is less necessary.
Bear in mind that using a temporary table imposes the following overheads:read data to populate temporary tablewrite temporary table data to fileread data from temporary table as your process startsMost of that activity is useless in terms of helping you get stuff done. A better idea is to see if you can do everything in a single action, preferably pure SQL.
Incidentally, your mention of connection pooling raises another issue. A process munging large amounts of data is not a good candidate for running in an OLTP mode. You really should consider initiating a background (i.e. asysnchronous) process, probably a database job, to run your stored procedure. This is especially true if you want to run this job on a regular basis, because we can use DBMS_SCHEDULER to automate the management of such things.
IF you're using transaction (rather than session) level temporary tables, then this may already do what you want... so long as each call only contains a single transaction? (you don't quite provide enough detail to make it clear whether this is the case or not)
So, to be clear, so long as each call only contains a single transaction, then it won't matter that you're using a connection pool since the data will be cleared out of the temporary table after each COMMIT or ROLLBACK anyway.
(Another option would be to create a uniquely named temporary table in each call using EXECUTE IMMEDIATE. Not sure how performant this would be though.)
In Oracle, it's almost never necessary to create objects at runtime.
Global Temporary Tables are quite possibly the best solution for your problem, however since you haven't said exactly why you need a temp table, I'd suggest you first check whether a temp table is necessary; half the time you can do with one SQL what you might have thought would require multiple queries.
That said, I have used global temp tables in the past quite successfully in applications that needed to maintain a separate "space" in the table for multiple contexts within the same session; this is done by adding an additional ID column (e.g. "CALL_ID") that is initially set to 1, and subsequent calls to the procedure would increment this ID. The ID would necessarily be remembered using a global variable somewhere, e.g. a package global variable declared in the package body. E.G.:
PACKAGE BODY gtt_ex IS
last_call_id integer;
PROCEDURE myproc IS
l_call_id integer;
BEGIN
last_call_id := NVL(last_call_id, 0) + 1;
l_call_id := last_call_id;
INSERT INTO my_gtt VALUES (l_call_id, ...);
...
SELECT ... FROM my_gtt WHERE call_id = l_call_id;
END;
END;
You'll find GTTs perform very well even with high concurrency, certainly better than using ordinary tables. Best practice is to design your application so that it never needs to delete the rows from the temp table - since the GTT is automatically cleared when the session ends.
I used global temporary table recently and it was behaving very unwantedly manner.
I was using temp table to format some complex data in a procedure call and once the data is formatted, pass the data to fron end (Asp.Net).
In first call to the procedure, i used to get proper data and any subsequent call used to give me data from last procedure call in addition to current call.
I investigated on net and found out an option to delete rows on commit.
I thought that will fix the problem.. guess what ? when i used on commit delete rows option, i always used to get 0 rows from database. so i had to go back to original approach of on commit preserve rows, which preserves the rows even after commiting the transaction.This option clears rows from temp table only after session is terminated.
then i found out this post and came to know about the column to track call_id of a session.
I implemented that solution and still it dint fix the problem.
then i wrote following statement in my procedure before i starting any processing.
Delete From Temp_table;
Above statemnet made the trick. my front end was using connection pooling and after each procedure call it was commitng the transaction but still keeping the connection in connection pool and subsequent request was using the same connection and hence the database session was not terminated after every call..
Deleting rows from temp table before strating any processing made it work....
It drove me nuts till i found this solution....