Inserting into a table with sequence number - sql

If I have the following:
Begin transaction
numberOfRecords = select count from table where foreignKey = "some value"
Insert into table (set SequenceNumber = numberOfRecords + 1)
End Transaction
and multiple users are executing the above code, would each insert have a unique increasing number?
In other words, does the begin transaction queue up other transactions even reads so that each insert will have the correct sequence number? or do I require Insert into..Select statement to achieve what I want?
Thanks.

No, a transaction with the default SQL Server isolation level (READ COMMITTED) is not sufficient. Putting it into one INSERT...SELECT statement won't fix it either. You basically have two options to fix this:
Option 1: Set your isolation level to SERIALIZABLE: SET TRANSACTION ISOLATION LEVEL SERIALIZABLE. This will ensure that transactions from two different users occur as if they occurred in sequence. This, however, might create deadlocks if many such transactions occur in parallel.
Option 2: Exclusively lock the table at the beginning of the transaction (SELECT * FROM table WITH (TABLOCKX, HOLDLOCK) WHERE 1=0). Note that this might impact performance if table is used frequently.
Both approaches have been analyzed in detail in my answer to the following SO question:
In tsql is an Insert with a Select statement safe in terms of concurrency?

No, transactions does not queue up commands, it is not like a lock.
Usually you want to use an identity column, but in some cases when you want to generate SequenceNumber without gaps you need to use the above code with a unique constraint on the SequenceNumber column and be prepared to retry if the commit transaction throws an exception.

I once used a SQL DB as a logger for a massive data export, to get a sequential "identity" I created a "on insert" trigger that dealt with issuing the next number.
Worked well for me, however it was only a single user DB so not sure if there's any issues with multiple users and what I did.
Now that I've re-read the question, this may not be what your looking for but I think you could also do a trigger for a select?
USE [ExportLog]
GO
/****** Object: Trigger [dbo].[Migration_Update_Event_Date] Script Date: 02/10/2011 17:06:11 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TRIGGER [dbo].[Migration_Update_LogState]
ON [dbo].[MigrationLog] FOR INSERT NOT FOR REPLICATION
AS
UPDATE [MIGRATION-DATA].dbo.MIGRATIONSTATE
SET LASTPROCESSID = ID
WHERE MACHINENAME IN (SELECT MACHINENAME FROM INSERTED)
GO

if you want to insert the record in unique ,then first you create on sequence then record should be insert thorugh sequance .like
create sequnce seq_num ;
now use seq_num to insert the rcords .
insert into <table name>(col1) values(seq_num.nextval);

You need to set the transaction isolation level to a level that provides the right isolation level. In your code, two processes could execute the first line, then the second line. Both will insert the same value.
Set the isolation level to serializable and perform the select statement WITH (UPDLOCK). This will reduce concurrency in your system but it will be safe.
Other strategies are possible, but they are more time-consuming to implement (and test!).

Related

SQL Server delete performance

I have a routine in our .NET web application that allows a user on our platform to clear their account (i.e. delete all their data). This routine runs in a stored procedure and essentially loops through the relevant data tables and clears down all the various items they have created.
The stored procedure looks something like this.
ALTER procedure [dbo].[spDeleteAccountData](
#accountNumber varchar(30) )
AS
BEGIN
SET ANSI_NULLS ON ;
SET NOCOUNT ON;
BEGIN TRAN
BEGIN TRY
DELETE FROM myDataTable1 WHERE accountNumber = #accountNumber
DELETE FROM myDataTable2 WHERE accountNumber = #accountNumber
DELETE FROM myDataTable3 WHERE accountNumber = #accountNumber
//Etc.........
END TRY
BEGIN CATCH
//CATCH ERROR
END CATCH
IF ##TRANCOUNT > 0
COMMIT TRANSACTION;
SET ANSI_NULLS OFF;
SET NOCOUNT OFF;
END
The problem is that in some cases we can have over 10,000 rows on a table and the procedure can take up to 3-5 minutes. During this period all the other connections on the database get throttled causing time-out errors like the one below:
System.Data.SqlClient.SqlException (0x80131904): Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.
Are there any general changes I can make to improve performance? I appreciate there are many unknowns related to the design of our database schema, but general best practice advice would be welcomed! I thought about scheduling this task to run during the early hours to minimise impact, but this is far from Ideal as the user wouldn't be able to regain access to their account until this task had been completed.
Additional Information:
SQL Server 2008 R2 Standard
All tables have a clustered index
No triggers have been associated to any delete commands on any of the relevant tables
Foreign key references exist on a number of tables but the deletion order accounts for this.
Edit: 16:52 GMT
The delete proc affects around 20 tables. The largest one has approx 5 million records. The others have no more the 200,000, with some containing only 1000-2000 records.
Do you have an index on accountNumber in all tables ?
Seeing that you delete using a WHERE clause by that column, this might help.
Another option (and probably even better solution) would be to schedule deletion operations at night, e.g. when user selects to delete his account, you're only setting a flag, and a delete job runs at night actually deleting those accounts flagged for deletion.
If you have an index on the accountNumber field then I guess the long time for deletion is due to locks (generated by other processes) or to foreign keys affected by the respective tables.
If is due to locks then you should see if you can reduce them using nolock where you can actually do that.
if there is a problem of foreign keys .. well you have to wait .. If you do not want to wait though and your application logic does not rely on enforcing the FKs (like sending errors to the application for FK violations, and testing against them) or you feel your application is perfect and then for a short period of time you do not need FKs, then you can disable related FKs prior to deletions with ALTER TABLE xxx NOCHECK CONSTRAINT all and then re enable it.
Off course purists will blame me for the latter but I had been using this a lot of times when need arises.
One way you might want to try is this:
Create a SP.
For each table, delete rows in small batches of some size that works for you (say 10 rows per batch).
Put each batch deletion inside a transaction and add a custom delay between each transaction.
Example:
DECLARE #DeletedRowsCount INT = 1, #BatchSize INT = 300;
WHILE (#DeletedRowsCount> 0) BEGIN
BEGIN TRANSACTION
DELETE TOP (#BatchSize) dbo.Table
FROM dbo.Table
WHERE Id = #PortalId;
SET #DeletedRowsCount = ##ROWCOUNT;
COMMIT;
WAITFOR DELAY '00:00:05';
END
I guess you can do the same without a SP as well.
In fact, it might be better like that.
SqlCommand.CommandTimeout is the short answer. Increase its value.
http://msdn.microsoft.com/en-us/library/system.data.sqlclient.sqlcommand.commandtimeout.aspx
Note, the Connection Timeout is not the same thing as the CommandTimeout.
...
Do you have an index on "accountNumber" on each table?
You could have a clustered key on the surrogate-key of the table, but not the "accountNumber".
...
Basically, you're gonna have to look at the execution plan (or post the execution plan) here.
But here is some "starter code" for trying an index on that column(s).
if exists (select * from dbo.sysindexes where name = N'IX_myDataTable1_accountNumber' and id = object_id(N'[dbo].[myDataTable1]'))
DROP INDEX [dbo].[myDataTable1].[IX_myDataTable1_accountNumber]
GO
CREATE INDEX [IX_myDataTable1_accountNumber] ON [dbo].[myDataTable1]([accountNumber])
GO
It could be worth switching the database into Read Committed Snapshot mode. This will have a performance impact, how much depends on your application.
In Read Committed Snapshot mode, writers and readers no longer block each other, although writers still block writers. You don't say what sort of activity on the table is getting prevented by the delete, so it's a little hard to say if this will help?
http://msdn.microsoft.com/en-us/library/ms188277(v=sql.105).aspx
Having said that, 3-5 minutes for a deletion on tables with ~10k rows seems absurdly slow. You mention foreign keys, are the foreign keys indexed? If not, deletion can cause table scans on the other end to make sure you're not breaking RI, so maybe check that first? What does SQL Server Profiler say for reads/writes for these deletion queries?

How to overcome deadlocks

I am receiving deadlock errors when trying to run a sproc with a delete statement in it. What is happening is that I've got a FK constraint table row that is being updated at the same time that the table row I'm deleting that it is related to.
The data that is being updated in the constraint table is no longer important and retrieval for that data being updated will no longer be access by anyone for any reason, it just so happens that this update and delete can all happen at once. So, I need to the delete to be the principle operation.
What do I need to do to stop a deadlock like this?
DELETE FROM Storefront.Sidelite WHERE ID = #SideliteID;
Below is a screen shot of a Sidelite table and the Size constraint table.
Ok, there are no reads taking place here. The only type things taking place is many updates to the Size table while the Sidelite table is trying to delete a record that it's size is being updated and this is causing a deadlock.
I need to stop all operations to the Size table while a delete takes place in the sidelite table and then, I'll delete the related size record in a trigger.
on the select statement where it is getting the initial value you can utilize with(readuncommitted) or with(nolock) statement. This will however give you dirty reads. Please utilize the following link for more information: Why use a READ UNCOMMITTED isolation level?
1.) SET ALLOW_SNAPSHOT_ISOLATION ON
2.) SET TRANSACTION ISOLATION LEVEL SNAPSHOT
ALTER proc [Storefront].[proc_DeleteSidelite]
#SideliteID INT
AS
BEGIN
SET TRANSACTION ISOLATION LEVEL SNAPSHOT
DECLARE #SizeID INT;
BEGIN TRAN
SELECT #SizeID= sl.SizeID FROM Storefront.Sidelite sl
with(nolock) WHERE sl.ID = #SideliteID
DELETE FROM Storefront.Sidelite WHERE ID = #SideliteID;
DELETE FROM Storefront.Size WHERE ID=#SizeID;
COMMIT TRAN
END;

sql queries and inserts

I have a random question. If I were to do a sql select and while the sql server was querying my request someone else does a insert statement... could that data that was inputted in that insert statement also be retrieved from my select statement?
Queries are queued, so if the SELECT occurs before the INSERT there's no possibility of seeing the newly inserted data.
Using default isolation levels, SELECT is generally given higher privilege over others but still only reads COMMITTED data. So if the INSERT data has not been committed by the time the SELECT occurs--again, you wouldn't see the newly inserted data. If the INSERT has been committed, the subsequent SELECT will include the newly inserted data.
If the isolation level allowed reading UNCOMMITTED (AKA dirty) data, then yes--a SELECT occurring after the INSERT but before the INSERT data was committed would return that data. This is not recommended practice, because UNCOMMITTED data could be subject to a ROLLBACK.
If the SELECT statement is executed before the INSERT statement, the selected data will certainly not include the new inserted data.
What happens in MySQL with MyISAM, the default engine, is that all INSERT statements require a table lock; as a result, once an INSERT statement is executed, it first waits for all existing SELECTs to complete before locking the table, performs the INSERT, and then unlocks it.
For more information, see: Internal Locking Methods in the MySQL manual
No, a SELECT that is already executing that the moment of the INSERT will never gather new records that did not exist when the SELECT statement started executing.
Also if you use the transactional storage engine InnoDB, you can be assured that your SELECT will not include rows that are currently being inserted. That's the purpose of transaction isolation, or the "I" in ACID.
For more details see http://dev.mysql.com/doc/refman/5.1/en/set-transaction.html because there are some nuances about read-committed and read-uncommitted transaction isolation modes.
I don't know particulars for MySQL, but in SQL Server it would depend on if there were any locking hints used, and the default behavior for locks. You have a couple of options:
Your SELECT locks the table, which means the INSERT won't process until your select is finished.
Your SELECT is able to do a "dirty read" which means the transaction doesn't care if you get slightly out-of-date data, and you miss the INSERT
Your SELECT is able to do a "dirty read" but the INSERT happens before the SELECT hits that row, and you get the result that was added.
The only way you do that is with a "dirty read".
Take a look at MYSql's documentation on TRANSACTION ISOLATION LEVELS to get a better understanding of what that is.

SQL problem - high volume trans and PK violation

I'm writing a high volume trading system. We receive messages at around 300-500 per second and these messages then need to be saved to the database as quickly as possible. These messages get deposited on a Message Queue and are then read from there.
I've implemented a Competing Consumer pattern, which reads from the queue and allows for multithreaded processing of the messages. However I'm getting a frequent primary key violation while the app is running.
We're running SQL 2008. The sample table structure would be:
TableA
{
MessageSequence INT PRIMARY KEY,
Data VARCHAR(50)
}
A stored procedure gets invoked to persist this message and looks something like this:
BEGIN TRANSACTION
INSERT TableA(MessageSequence, Data )
SELECT #MessageSequence, #Data
WHERE NOT EXISTS
(
SELECT TOP 1 MessageSequence FROM TableA WHERE MessageSequence = #MessageSequence
)
IF (##ROWCOUNT = 0)
BEGIN
UPDATE TableA
SET Data = #Data
WHERE MessageSequence = #MessageSequence
END
COMMIT TRANSACTION
All of this is in a TRY...CATCH block so if there's an error, it rolls back the transaction.
I've tried using table hints, like ROWLOCK, but it hasn't made a difference. Since the Insert is evaluated as a single statement, it seems ludicrous that I'm still getting a 'Primary Key on insert' issue.
Does anyone have an idea why this is happening? And have you got ANY ideas which may point me in the direction of a solution?
Why is this happening?
SELECT TOP 1 MessageSequence FROM TableA WHERE MessageSequence = #MessageSequence
This SELECT will try to locate the row, if not found the EXISTS operator will return FALSE and the INSERT will proceed. Hoewever, the decision to INSERT is based on a state that was true at the time of the SELECT, but that is no longer guaranteed to be true at the time of the INSERT. In other words, you have race conditions where two threads can both look up the same #MessageSequence, both return NOT EXISTS and both try to INSERT, when only the first one will succeed, second one will cause a PK violation.
How do I solve it?
The quickest fix is to add a WITH (UPDLOCK) hint to the SELECT, this will enforce the lock placed on the #MessageSequence key to be retained and thus the INSERT/SELECT to behave atomically:
INSERT TableA(MessageSequence, Data )
SELECT #MessageSequence, #Data
WHERE NOT EXISTS (
SELECT TOP 1 MessageSequence FROM TableA WITH(UPDLOCK) WHERE MessageSequence = #MessageSequence)
To prevent SQL from doing fancy stuff like page lock, you can also add the ROWLOCK hint.
However, that is not my recommendation. My recommendation may surpise you, but is this: do the operation that is most likely to succeed and handle the error if it failed. Ie. if your business case makes it more likely for the #MessageSequnce to be new, try an INSERT and handle the PK if it failed. This way you avoid the spurious look-ups, and hte cost of the catch/retry is amortized over the many cases when it succeeds from the first try.
Also, it is perhaps worth investigating using the built-in queues that come with SQL Server.
Common problem. Explained here:
Defensive database programming: eliminating IF statements
It might be related to the transaction isolation level. You might need
SET TRANSACTION ISOLATION LEVEL READ COMMITTED
before you start the transaction.
Also, if you have more updates than inserts, you should try the update first and check rowcount and do the insert second.
This is very similar to post 939831. Ultimately you want to use the hints (ROWLOCK, READPAST, UPDLOCK). READPAST tells sql server to skip to the next record if the current one is locked. UPDLOCK tells sql server that the read lock is going to escalate to an update lock.
When I implemented something similar I locked the next record by the threadID
UPDATE TOP (1)
foo
SET
ProcessorID = #PROCID
FROM
OrderTable foo WITH (ROWLOCK, READPAST, UPDLOCK)
WHERE
ProcessorID = 0
Then selected the record
SELECT *
FROM foo WITH (NOLOCK)
WHERE ProcessorID = #PROCID
Then marked it as processed
UPDATE foo
SET ProcessorID = -1
WHERE ProcessorID = #PROCID
Later in off hours I perform the relatively expensive operation of performing the delete operation to clear the queue of processed records.
The atomicity of the following statement is what you are after:
INSERT TableA(MessageSequence, Data )
SELECT #MessageSequence, #Data
WHERE NOT EXISTS
(
SELECT TOP 1 MessageSequence FROM TableA WHERE MessageSequence = #MessageSequence
)
According to this person, it depends on the current isolation level.
On a tangent, if you're thinking of a high volume trading system you might want to consider a tick database designed for such data [I'm not exactly sure what "message" you are storing here], such as discussed in this thread for example: http://www.elitetrader.com/vb/showthread.php?threadid=81345.
These are typically in-memory solutions with proprietary query languages. We use kdb+ at our shop.
Not sure what Messaging product you use - but it may be worth looking at the transactions not at the DB level, but at the MQ Level.
Of course, if you are using a TM (Transaction manager), the two operations : 1)Get from MQ and 2)Write to DB are both 'bracketed' under the same parent commit.
So I am not sure if you are using an implicit or explicit or any TM here (for example, Microsoft's DTC).
MessageSequence is the PK, so could the same Message from the MQ be getting processed twice.
When you perform a 'GET" from MQ, make sure the GET is committed (i.e. not a db-commit, but a MQ-commit) - that will ensure the same MessageID cannot be 'popped' by the next thread that writes messages to the DB.

SQL Server Race Condition Question

(Note: this is for MS SQL Server)
Say you have a table ABC with a primary key identity column, and a CODE column. We want every row in here to have a unique, sequentially-generated code (based on some typical check-digit formula).
Say you have another table DEF with only one row, which stores the next available CODE (imagine a simple autonumber).
I know logic like below would present a race condition, in which two users could end up with the same CODE:
1) Run a select query to grab next available code from DEF
2) Insert said code into table ABC
3) Increment the value in DEF so it's not re-used.
I know that, two users could get stuck at Step 1), and could end up with same CODE in the ABC table.
What is the best way to deal with this situation? I thought I could just wrap a "begin tran" / "commit tran" around this logic, but I don't think that worked. I had a stored procedure like this to test, but I didn't avoid the race condition when I ran from two different windows in MS:
begin tran
declare #x int
select #x= nextcode FROM def
waitfor delay '00:00:15'
update def set nextcode = nextcode + 1
select #x
commit tran
Can someone shed some light on this? I thought the transaction would prevent another user from being able to access my NextCodeTable until the first transaction completed, but I guess my understanding of transactions is flawed.
EDIT: I tried moving the wait to after the "update" statement, and I got two different codes... but I suspected that. I have the waitfor statement there to simulate a delay so the race condition can be easily seen. I think the key problem is my incorrect perception of how transactions work.
Set the Transaction Isolation Level to Serializable.
At lower isolation levels, other transactions can read the data in a row that is read, (but not yet modified) in this transaction. So two transactions can indeed read the same value. At very low isolation (Read Uncommitted) other transactions can even read data after it's been modified (but before committed)...
Review details about SQL Server Isolation Levels here
So bottom line is that the Isolation level is crtitical piece here to control what level of access other transactions get into this one.
NOTE. From the link, about Serializable
Statements cannot read data that has been modified but not yet committed by other transactions.
This is because the locks are placed when the row is modified, not when the Begin Trans occurs, So what you have done may still allow another transaction to read the old value until the point where you modify it. So I would change the logic to modify it in the same statement as you read it, thereby putting the lock on it at the same time.
begin tran
declare #x int
update def set #x= nextcode, nextcode += 1
waitfor delay '00:00:15'
select #x
commit tran
As other responders have mentioned, you can set the transaction isolation level to ensure that anything you 'read' using a SELECT statement cannot change within a transaction.
Alternatively, you could take out a lock specifically on the DEF table by adding the syntax WITH HOLDLOCK after the table name, e.g.,
SELECT nextcode FROM DEF WITH HOLDLOCK
It doesn't make much difference here, as your transaction is small, but it can be useful to take out locks for some SELECTs and not others within a transaction. It's a question of 'repeatability versus concurrency'.
A couple of relavant MS-SQL docs.
Isolation levels
Table hints
Late answer. You want to avoid a race condition...
"SQL Server Process Queue Race Condition"
Recap:
You began a transaction. This doesn't actually "do" anything in and of itself, it modifies subsequent behavior
You read data from a table. The default isolation level is Read Committed, so this select statement is not made part of the transaction.
You then wait 15 seconds
You then issue an update. With the declared transaction, this will generate a lock until the transaction is committed.
You then commit the transaction, releasing the lock.
So, guessing you ran this simultaneously in two windows (A and B):
A read the "next" value from table def, then went into wait mode
B read the same "next" value from the table, then went into wait mode. (Since A only did a read, the transaction did not lock anything.)
A then updated the table, and probably commited the change before B exited the wait state.
B then updated the table, after A's write was committed.
Try putting the wait statement after the update, before the commit, and see what happens.
It's not a real race condition. It's more a common problem with concurrent transactions. One solution is to set a read lock on the table and therefor have a serialization in place.
This is actually a common problem in SQL databases and that is why most (all?) of them have some built in features to take care of this issue of obtaining a unique identifier. Here are some things to look into if you are using Mysql or Postgres. If you are using a different database I bet the provide something very similar.
A good example of this is postgres sequences which you can check out here:
Postgres Sequences
Mysql uses something called auto increments.
Mysql auto increment
You can set the column to a computed value that is persisted. This will take care of the race condition.
Persisted Computed Columns
NOTE
Using this method means you do not need to store the next code in a table. The code column becomes the reference point.
Implementation
Give the column the following properties under computed column specification.
Formula = dbo.GetNextCode()
Is Persisted = Yes
Create Function dbo.GetNextCode()
Returns VarChar(10)
As
Begin
Declare #Return VarChar(10);
Declare #MaxId Int
Select #MaxId = Max(Id)
From Table
Select #Return = Code
From Table
Where Id = #MaxId;
/* Generate New Code ... */
Return #Return;
End