Do I really need to use transactions in stored procedures? [MSSQL 2005] - sql

I'm writing a pretty straightforward e-commerce app in asp.net, do I need to use transactions in my stored procedures?
Read/Write ratio is about 9:1

Many people ask - do I need transactions? Why do I need them? When to use them?
The answer is simple: use them all the time, unless you have a very good reason not to (for instance, don't use atomic transactions for "long running activities" between businesses). The default should always be yes. You are in doubt? - use transactions.
Why are transactions beneficial? They help you deal with crashes, failures, data consistency, error handling, they help you write simpler code etc. And the list of benefits will continue to grow with time.
Here is some more info from http://blogs.msdn.com/florinlazar/

Remember in SQL Server all single statement CRUD operations are in an implicit transaction by default. You just need to turn on explict transactions (BEGIN TRAN) if you need to make multiple statements act as an atomic unit.

The answer is, it depends. You do not always need transaction safety. Sometimes it's overkill. Sometimes it's not.
I can see that, for example, when you implement a checkout process you only want to finalize it once you gathered all data, etc.. Think about a payment f'up, you can rollback - that's an example when you need a transaction. Or maybe when it's wise to use them.
Do you need a transaction when you create a new user account? Maybe, if it's across 10 tables (for whatever reason), if it's just a single table then probably not.
It also depends on what you sold your client on and who they are, and if they requested it, etc.. But if making a decision is up to you, then I'd say, choose wisely.
My bottom line is, avoid premature optimization. Build your application, keep in mind that you may want to go back and refactor/optimize later when you need it. Look at a couple opensource projects and see how they implemented different parts of their app, learn from that. You'll see that most of them don't use transactions at all, yet there are huge online stores that use them.

Of course, it depends.
It depends upon the work that the particular stored procedure performs and, perhaps, not so much the "read/write ratio" that you suggest. In general, you should consider enclosing a unit of work within a transaction if it is query that could be impacted by some other, simultaneously running query. If this sounds nondeterministic, it is. It is often difficult to predict under what circumstances a particular unit of work qualifies as a candidate for this.
A good place to start is to review the precise CRUD being performed within the unit of work, in this case within your stored procedure, and decide if it a) could be affected by some other, simultaneous operation and b) if that other work matters to the end result of this work being performed (or, even, vice versa). If the answer is "Yes" to both of these then consider wrapping the unit of work within a transaction.
What this is suggesting is that you can't always simply decide to either use or not use transactions, rather you should apply them when it makes sense. Use the properties defined by ACID (Atomicity, Consistency, Isolation, and Durability) to help decide when this might be the case.
One other thing to consider is that in some circumstances, particularly if the system must perform many operations in quick succession, e.g., a high-volume transaction processing application, you might need to weigh the relative performance cost of the transaction. Depending upon the size of the unit of work, a commit (or rollback) of a transaction can be resource expensive, perhaps negatively impacting the performance of your system unnecessarily or, at least, with limited benefit.
Unfortunately, this is not an easy question to precisely answer: "It depends."

Use them if:
There are some errors that you may want to test for and catch which won't be caught except by you going out and doing the work (looking things up, testing values, etc.), usually from within a transaction so that you can roll back the whole operation.
There are multi-step operations of any sort, which should, logically, be rolled back as a group if they fail.

Related

Concurrent issues in SQL Server

I have set of validations which decides record to be inserted into the database with valid status code, the issue we are facing is that many users are making requests at the same time and middle of one transaction another transaction comes and both are getting inserted with valid status, which it shouldn't. it should return an error that record already exists which can be easily handled by a simple query but at specific scenarios we are allowing them to insert duplicates, I have tried sp_getapplock which is solving my problem but it is compromising performance big time. Are there any optimal ways to handle concurrent requests?
Thanks.
sp_getapplock is pretty much the befiest and most arbitrary lock you can take. It functions more like the lock keyword does in OOO programming. Basically you name a resource, give it a scope (proc or transaction), then lock it. Pretty much nothing can bypass that lock, which is why it's solved your race conditions. It's also probably mad overkill for what you're trying to do.
The first code/architecture idea that comes to mind is to restructure this table. I'm going to assume you have high update volumes or you wouldn't be running into these violations. You could simply use a try/catch block, and have the catch block retry on a PK violation. Clumsy, but might just do the trick.
Next, you could consider altering the structure of the table which receives this stream of updates throughout the day. Make this table primary keyed off an identity column, and pretty much nothing else. Inserts will be lightning fast, so any blockage will be negligible. You can then move this data in batches into a table better suited for batch processing (as opposed to trying to batch-process in real time)
There are also a whole range of transaction isolation settings which adjust SQL's regular locking system to support different variants (whether at the batch level, or inline via query hints. I'd read up on those, but you might consider looking at Serialized isolation. Various settings will enforce different runtime rules to fit your needs.
Also be sure to check your transactions. You probably want to be locking the hell out of this table (and potentially during some other action) but once that need is gone, so should the lock.

Single Stored procedure call or multiple database calls?

I am developing a MVC application where the view can fetch data using a stored procedure call which has multiple different queries or calling all of them individually directly from model. I am really confused about which approach makes a good practice ?
Thanks
I would go for a single call to the stored procedure in almost all cases (I will talk of the exception at the end).
Reasons:
Whoever maintains the stored procedure may introduce some extra logic (be it for performance reasons or business reasons) that you will have to either duplicate in your own direct calls, or - even worse - might miss altogether due to miscommunication with whoever is maintaining the stored procedure. (Even if you maintain both, you will have to spend effort on duplication). This is #1 reason: using a dedicated interface ensures correctness and avoid duplication
Every time you interact with your DB you incur in a small (but not null) overhead to open the connection (or retrieve it from a pool), marshalling and unmarshalling data over the wire, network latency and so on. Having a single entry point (your Stored Procedure) will amortize these better.
It is possible (but it really depends on a lot of different factors so it is not a guarantee of anything) that the DB engine can further optimize its workload by having everything in a single transaction context. I.e. maybe it issues two consecutive queries which are similar enough that a some of the indexes/records are buffered in the DB cache and so can be accessed faster by the second query).
Possible exception: your application has a sort of "zooming" process where you load first the header of a multirecord structure, and need the lower level details only when/if the user requires those. In this case it might be better to access these on the fly. I would still prefer the previous solution unless I can prove (i.e. testing with realistic loads) that the detail records are big enough to make this a burden on the front-end.
Please resist the temptation to just decide that it is more efficient this way unless you have hard data to backup your "insight". Most of the time this will prove to be a mistake.

Trigger for updated date column vs explicitly setting it

I'm wondering if using a trigger to set the updated date column of a table (or all tables) is considered a better practice versus having the application explicitly set it. I realize this could devolve into a debate over preferences and design patterns, so in an effort to parameterize the question a bit - I'd like to get a take from a best practices stand point. Taking separation of concerns into consideration (keeping business logic out of the database type of thing) as well as making sure database columns have what is intended within them (actually updating the "last modified date" column), I'm more inclined to let the database handle this via a trigger. That said, I also tend to shy away from triggers since they tend to hide the consequences of an action in a database. I'm hoping that the many smarter people here on SO have a more concrete thought than mine.
It rather depends on what you want to do. If you want to know when any column has changed, then a trigger would be the most suitable. It would be a bit of a drag to have to change code every time you added a new column.
However, if you were only interested in certain columns, you might simply wish to handle this in the update SQL.
There are of course shades in between - you could choose to handle only certain columns in the trigger.
Another consideration is how many bits of code can update a given table. You might have some all singing all dancing update code, but it is perfectly possible this could be spread out.
One word of caution - triggers tend to be the last place you consider when tracking an issue. For this sake (and this is a personal preference), I tend to avoid them unless they're absolutely necessary.
Apart from the tuning, the issue with triggers is that if they are disabled, you will never know this, and transactions will be committed. This could cause a major problem if the update-date is important to your business logic (for instance for sync processes or for finance). I prefer to explicitly handle all business related data within the procedures. And use triggers mainly for auditing.

How far can you really go with "eventual" consistency and no transactions (aka SimpleDB)?

I really want to use SimpleDB, but I worry that without real locking and transactions the entire system is fatally flawed. I understand that for high-read/low-write apps it makes sense, since eventually the system becomes consistent, but what about that time in between? Seems like the right query in an inconsistent db would perpetuate havoc throughout the entire database in a way that's very hard to track down. Hopefully I'm just being a worry wart...
This is the pretty classic battle between consistency and scalability and - to some extent - availability. Some data doesn't always need to be that consistent. For instance, look at digg.com and the number of diggs against a story. There's a good chance that value is duplicated in the "digg" record rather than forcing the DB to do a join against the "user_digg" table. Does it matter if that number isn't perfectly accurate? Probably not. Then using something like SimpleDB might be a good fit. However if you are writing a banking system, you should probably value consistency above all else. :)
Unless you know from day 1 that you have to deal with massive scale, I would stick to simple more conventional systems like RDBMS. If you are working somewhere with a reasonable business model, you will hopefully see a big spike in revenue if there's a big spike in traffic. Then you can use that money to help solving the scaling problems. Scaling is hard and scaling is hard to predict. Most of the scaling problems that hurt you will be ones that you never expect.
I would much rather get a site off the ground and spend a few weeks fixing scale issues when traffic picks up then spend so much time worrying about scale that we never make it to production because we run out of money. :)
Assuming you're talking about this SimpleDB, you're not being a worrywart; there are real reasons not to use it as a real world DBMS.
The properties that you get from transaction support in a DBMS can be abbreviated by the acronym "A.C.I.D.": Atomicity, Consistency, Isolation, and Durability. The A and D have mostly to do with system crashes, and the C and I have to do with regular operation. They're all things people totally take for granted when working with commercial databases, so if you work with a database that doesn't have one or more of them, you might be in for any number of nasty surprises.
Atomicity: Any transaction will either complete fully or not at all (i.e. it will either commit or abort cleanly). This applies to single statements (like "UPDATE table ...") as well as longer, more complicated transactions. If you don't have this, then anything that goes wrong (like, the disk getting full, the computer crashing, etc.) might leave something half-done. In other words, you can't ever rely on the DBMS to really do the things you tell it to, because any number of real-world problems can get in the way, and even a simple UPDATE statement might get partially completed.
Consistency: Any rules you've set up about the database will always be enforced. Like, if you have a rule that says A always equals B, then nothing anybody does to the database system can break that rule - it'll fail any operation that tries. This isn't quite as important if all your code is perfect ... but really, when is that ever the case? Plus, if you're missing this safety net, things get really yucky when you lose ...
Isolation: Any actions taken on the database will execute as if they happened serially (one at a time), even if in reality they're happening concurrently (interleaved with each other). If more than one user is going to hit this database at the same time, and you don't have this, then things you can't even dream up will go wrong; even atomic statements can interact with each other in unforeseen ways and screw things up.
Durability: If you lose power or the software crashes, what happens to database transactions that were in progress? If you have durability, the answer is "nothing - they're all safe". Databases do this by using something called "Undo / Redo Logging", where every little thing you do to the database is first logged (typically on a separate disk for safety) in a way such that you can reconstruct the current state after a failure. Without that, the other properties above are sort of useless, because you can never be 100% sure that things will stay consistent after a crash.
Do any of these things matter to you? The answer has everything to do with the types of transactions you're doing, and what guarantees you want in a failure situation. There may well be cases (like a read-only database) where you don't need these, but as soon as you start doing anything non-trivial, and something bad happens, you'll wish you had 'em. Maybe it's OK for you to just revert to a backup anytime something unexpected happens, but my guess is that it isn't.
Also note that dropping all of these protections doesn't make it a given that your database will perform better; in fact, it's probably the opposite. That's because real-world DBMS software also has tons of code to optimize query performance. So, if you write a query that joins 6 tables on SimpleDB, don't assume that it'll figure out the optimal way to run that query - you might end up waiting hours for it to complete, when a commercial DBMS could use an indexed hash join and get it in .5 seconds. There are a zillion little tricks that you can do to optimize query performance, and believe me, you'll really miss them when they're gone.
None of this is meant as a knock on SimpleDB; take it from the author of the software: "Although it is a great teaching tool, I can't imagine that anyone would want to use it for anything else."

Zero SQL deadlock by design - any coding patterns?

I am encountering very infrequent yet annoying SQL deadlocks on a .NET 2.0 webapp running on top of MS SQL Server 2005. In the past, we have been dealing with the SQL deadlocks in the very empirical way - basically tweaking the queries until it work.
Yet, I found this approach very unsatisfactory: time consuming and unreliable. I would highly prefer to follow deterministic query patterns that would ensure by design that no SQL deadlock will be encountered - ever.
For example, in C# multithreaded programming, a simple design rule such as the locks must be taken following their lexicographical order ensures that no deadlock will ever happen.
Are there any SQL coding patterns guaranteed to be deadlock-proof?
Writing deadlock-proof code is really hard. Even when you access the tables in the same order you may still get deadlocks [1]. I wrote a post on my blog that elaborates through some approaches that will help you avoid and resolve deadlock situations.
If you want to ensure two statements/transactions will never deadlock you may be able to achieve it by observing which locks each statement consumes using the sp_lock system stored procedure. To do this you have to either be very fast or use an open transaction with a holdlock hint.
Notes:
Any SELECT statement that needs more than one lock at once can deadlock against an intelligently designed transaction which grabs the locks in reverse order.
Zero deadlocks is basically an incredibly costly problem in the general case because you must know all the tables/obj that you're going to read and modify for every running transaction (this includes SELECTs). The general philosophy is called ordered strict two-phase locking (not to be confused with two-phase commit) (http://en.wikipedia.org/wiki/Two_phase_locking ; even 2PL does not guarantee no deadlocks)
Very few DBMS actually implement strict 2PL because of the massive performance hit such a thing causes (there are no free lunches) while all your transactions wait around for even simple SELECT statements to be executed.
Anyway, if this is something you're really interested in, take a look at SET ISOLATION LEVEL in SQL Server. You can tweak that as necessary. http://en.wikipedia.org/wiki/Isolation_level
For more info, see wikipedia on Serializability: http://en.wikipedia.org/wiki/Serializability
That said -- a great analogy is like source code revisions: check in early and often. Keep your transactions small (in # of SQL statements, # of rows modified) and quick (wall clock time helps avoid collisions with others). It may be nice and tidy to do a LOT of things in a single transaction -- and in general I agree with that philosophy -- but if you're experiencing a lot of deadlocks, you may break the trans up into smaller ones and then check their status in the application as you move along. TRAN 1 - OK Y/N? If Y, send TRAN 2 - OK Y/N? etc. etc
As an aside, in my many years of being a DBA and also a developer (of multiuser DB apps measuring thousands of concurrent users) I have never found deadlocks to be such a massive problem that I needed special cognizance of it (or to change isolation levels willy-nilly, etc).
There is no magic general purpose solution to this problem that work in practice. You can push concurrency to the application but this can be very complex especially if you need to coordinate with other programs running in separate memory spaces.
General answers to reduce deadlock opportunities:
Basic query optimization (proper index use) hotspot avoidanant design, hold transactions for shortest possible times...etc.
When possible set reasonable query timeouts so that if a deadlock should occur it is self-clearing after the timeout period expires.
Deadlocks in MSSQL are often due to its default read concurrency model so its very important not to depend on it - assume Oracle style MVCC in all designs. Use snapshot isolation or if possible the READ UNCOMMITED isolation level.
I believe the following useful read/write pattern is dead lock proof given some constraints:
Constraints:
One table
An index or PK is used for read/write so engine does not resort to table locks.
A batch of records can be read using a single SQL where clause.
Using SQL Server terminology.
Write Cycle:
All writes within a single "Read Committed" transaction.
The first update in the transaction is to a specific, always-present record
within each update group.
Multiple records may then be written in any order. (They are "protected"
by the write to the first record).
Read Cycle:
The default read committed transaction level
No transaction
Read records as a single select statement.
Benefits:
Secondary write cycles are blocked at the write of first record until the first write transaction completes entirely.
Reads are blocked/queued/executed atomically between the write commits.
Achieve transaction level consistency w/o resorting to "Serializable".
I need this to work too so please comment/correct!!
As you said, always access tables in the same order is a very good way to avoid deadlocks. Furthermore, shorten your transactions as much as possible.
Another cool trick is to combine 2 sql statements in one whenever you can. Single statements are always transactional. For example use "UPDATE ... SELECT" or "INSERT ... SELECT", use "##ERROR" and "##ROWCOUNT" instead of "SELECT COUNT" or "IF (EXISTS ...)"
Lastly, make sure that your calling code can handle deadlocks by reposting the query a configurable amount of times. Sometimes it just happens, it's normal behaviour and your application must be able to deal with it.
In addition to consistent sequence of lock acquisition - another path is explicit use of locking and isolation hints to reduce time/resources wasted unintentionally acquiring locks such as shared-intent during read.
Something that none has mentioned (surprisingly), is that where SQL server is concerned many locking problems can be eliminated with the right set of covering indexes for a DB's query workload. Why? Because it can greatly reduce the number of bookmark lookups into a table's clustered index (assuming it's not a heap), thus reducing contention and locking.
If you have enough design control over your app, restrict your updates / inserts to specific stored procedures and remove update / insert privileges from the database roles used by the app (only explicitly allow updates through those stored procedures).
Isolate your database connections to a specific class in your app (every connection must come from this class) and specify that "query only" connections set the isolation level to "dirty read" ... the equivalent to a (nolock) on every join.
That way you isolate the activities that can cause locks (to specific stored procedures) and take "simple reads" out of the "locking loop".
Quick answer is no, there is no guaranteed technique.
I don't see how you can make any application deadlock proof in general as a design principle if it has any non-trivial throughput. If you pre-emptively lock all the resources you could potentially need in a process in the same order even if you don't end up needing them, you risk the more costly issue where the second process is waiting to acquire the first lock it needs, and your availability is impacted. And as the number of resources in your system grows, even trivial processes have to lock them all in the same order to prevent deadlocks.
The best way to solve SQL deadlock problems, like most performance and availability problems is to look at the workload in the profiler and understand the behavior.
Not a direct answer to your question, but food for thought:
http://en.wikipedia.org/wiki/Dining_philosophers_problem
The "Dining philosophers problem" is an old thought experiment for examining the deadlock problem. Reading about it might help you find a solution to your particular circumstance.