Singleton pattern in a stored procedure - sql

How can you implement the Singleton pattern in a SQL Server 2005/2008 stored procedure?
We want the stored procedure to return the next value from a table to a caller, and then update the value, so the next caller gets a different value ...
BUT there will be time when there are lots of callers!
AND we don't want blocking/time-out issues
PS. maybe singleton isn't the answer ... if not, how could you handle this?

By definition SINGLETON IS A LOCKING Pattern.
Talking about databases, there are so many DB professionals that get afraid when you mention the word "Lock", but locks per se are not a problem, they are a fundamental mechanism for the Relational Databases.
You must learn how locks works, what kind of locks exists, and treat them with respect.
Always work with short transactions, lock the minimum rows as you can, work with sets not individual rows.
Locks become a problem when they are massive, and when they last too much, and of course when you build a DEADLOCK.
So, the golden rule, when you must change data into a transaction, first put an exclusive Lock (UPDATE), never a Shared Lock (SELECT), it means sometimes you have to start doing a fake LOCK as in :
BEGIN TRAN
UPDATE table
set col1 = col1
Where Key = #Key
.......
COMMIT TRAN
Prior to SQL Server 2012, when I needed a serial I've done it in two ways:
Create an IDENTITY column, so after inserting you can get the value with the built in function SCOPE_IDENTITY() there is also ##IDENTITY, but if someone create a trigger that inserts into another table with an identity column starts the nightmare.
CREATE TABLE [table]
(
Id int IDENTITY(1,1) NOT NULL,
col2 ....
col3 ....
)
The second option is to add an serial column usually in the parent table or a table made for it plus a procedure (you can use client code) to get the serial:
--IF YOU CREATE A SERIAL HERE YOU'LL SPENT SOME SPACE,
--BUT IT WILL KEEP YOUR BLOCKINGS VERY LOW
CREATE TABLE Parent
(
Id,
ChildSerial int not null,
col2 ...
col3 ...
CONSTRAINT PK_Parent PRIMARY KEY (Id)
)
GO
--NAMED CONSTRAINT Auto names are random (avoid them)
ALTER TABLE Parent
ADD CONSTRAINT Parent_DF_ChildSerial DEFAULT(0) FOR ChildSerial;
GO
CREATE TABLE Child
(
Id int not null
col2..
colN..
--PLUS PRIMARY KEY... INDEXES, etc.
)
CREATE PROC GetChildId
(
#PatentId int
#ChildSerial int output --To use the proc from another proc
)
As
Begin
BEGIN TRAN
--YOU START WITH A LOCK, SO YOU'LL NEVER GET A DEADLOCK
--NOR A FAKE SERIAL (Two clients with the same number)
UPDATE Parent
SET ChildSerial = ChildSerial + 1
WHERE Id = #PatentId
If ##error != 0
Begin
SELECT #ChildSerial = -1
SELECT #ChildSerial
ROLLBACK TRAN
RETURN
End
SELECT #ChildSerial = ChildSerial
FROM Parent
WHERE Id = #PatentId
COMMIT TRAN
SELECT #ChildSerial --To Use the proc easily from a program
End
Go

Related

Best way to generate a UniqueID for a group of rows?

This is very simplified but I have a web service array of items that look something like this:
[12345, 34131, 13431]
and I am going to be looping through the array and inserting them one by one into a database and I want that table to look like this. These values would be tied to a unique identifier showing that they were
1 12345
1 34131
1 13431
and then if another array came along it would then insert all of its numbers with unique ID 2.... basically this is to keep track of groups.
There will be multiple processes executing this potentially at the same time so what would be the best way to generate the unique identifier and also ensure that 2 processes couldn't have used the same one?
You should fix your data model. It is missing an entity, say, batches.
create table batches (
batch_id int identity(1, 1) primary key,
created_at datetime default getdate()
);
You might have other information as well.
And your table should have a foreign key reference, batch_id to batches.
Then your code should do the following:
Insert a new row into batches. A new batch has begun.
Fetch the id that was just created.
Use this id for the rows that you want to insert.
Although you could do this with a sequence, a separate table makes more sense to me. You are tying a bunch of rows together into something. That something should be represented in the data model.
You can declare this :
DECLARE #UniqueID UNIQUEIDENTIFIER = NEWID();
and use this as your unique identifier when you insert your batch
Since it isn't a primary key, an identity column is out. Honestly I'd probably just track it using a separate id sequence table. Create a proc that grabs the next available ID and then increments it. If you open a transaction at the beginning of the proc it should prevent the second thread from getting the number until the first thread is done with it's update.
Something like:
CREATE PROCEDURE getNextID
#NextNumber INT OUTPUT
,#id_type VARCHAR(20)
AS
BEGIN
SET NOCOUNT ON;
DECLARE #NextValue TABLE (NextNumber int);
BEGIN TRANSACTION;
UPDATE id_sequence
SET last_used_number = ISNULL(#NextNumber, 0) + 1
OUTPUT inserted.last_used_number INTO #NextValue(NextNumber)
WHERE id_type = #id_type
SELECT #NextNumber = NextNumber FROM #NextValue
COMMIT TRANSACTION;
END

Creating JVM level and Thread Safe Sequence in DB

It is old question but to be sure i am asking again.
Actually i have created sequence like in Oracle with a table and want to use with multiple threads and multiple JVM's all process will be hitting it parallel.
Following is sequence stored procedure just want to ask whether this will work with multiple JVM's and always provide unique number to threads in all jvm's or is there any slight chance of it returning same sequence number two more than one calls?
create table sequenceTable (id int)
insert into sequenceTable values (0)
create procedure mySequence
AS
BEGIN
declare #seqNum int
declare #rowCount int
select #rowCount = 0
while(#rowCount = 0)
begin
select #seqNum = id from sequenceTable
update sequenceTable set id = id + 1 where id = #seqNum
select #rowCount = ##rowcount
print 'numbers of rows update %1!', #rowCount
end
SELECT #seqNum
END
If you choose to maintain your current design of updating the sequenceTable.id column each time you want to generate a new sequence number, you need to make sure:
the 'current' process gets an exclusive lock on the row containing the desired sequence number
the 'current' process then updates the desired row and retrieves the newly updated value
the 'current' process releases the exclusive lock
While the above can be implemented via a begin tran + update + select + commit tran, it's actually a bit easier with a single update statement, eg:
create procedure mySequence
AS
begin
declare #seqNum int
update sequenceTable
set #seqNum = id + 1,
id = id + 1
select #seqNum
end
The update statement is its own transaction so the update of the id column and the assignment of #seqNum = id + 1 is performed under an exclusive lock within the update's transaction.
Keep in mind that the exclusive lock will block other processes from obtaining a new id value; net result is that the generation of new id values will be single-threaded/sequential
While this is 'good' from the perspective of ensuring all processes obtain a unique value, it does mean this particular update statement becomes a bottleneck if you have multiple processes hitting the update concurrently.
In such a situation (high volume of concurrent updates) you could alleviate some contention by calling the stored proc less often; this could be accomplished by having the calling processes request a range of new id values (eg, pass #increment as input parameter to the proc, then instead of id + 1 you use id + #increment), with the calling process then knowing it can use sequence numbers (#seqNum-#increment+1) to #seqNum.
Obviously (?) any process that uses a stored proc to generate 'next id' values only works if *ALL* processes a) always call the proc for a new id value and b) *ALL* processes only use the id value returned by the proc (eg, they don't generate their own id values).
If there's a possibility of applications not following this process (call proc to get new id value), you may want to consider pushing the creation of the unique id values out to the table where these id values are being inserted; in other words, modify the target table's id column to include the identity attribute; this eliminates the need for applications to call the stored proc (to generate a new id) and it (still) ensures a unique id is generated for each insert.
You can emulate sequences in ASE. Use the reserve_identity function to achieve required type of activity:
create table sequenceTable (id bigint identity)
go
create procedure mySequence AS
begin
select reserve_identity('sequenceTable', 1)
end
go
This solution is non-blocking and does generate minimal transaction log activity.

updlock vs for update cursor

I need to update a column of all rows of a table and I need to use UPDLOCK to do it.
For example:
UPDATE table (UPDLock)
SET column_name = ‘123’
Another alternative is to use an for update cursor and update each row. The advantage with the second approach is that the lock is not held till the end of the transaction and concurrent updates of the same rows can happen sooner. At the same time update cursors are said to have bad performance. Which is a better approach?
EDIT:
Assume the column is updated with a value that is derived from another column in the table. In other words, column_name = f(column_name_1)
You cannot give an UPDLOCK hint to a write operation, like UPDATE statement. It will be ignored, since all writes (INSERT/UPDATE/DELETE) take the same lock, an exclusive lock on the row being updated. You can quickly validate this yourself:
create table heap (a int);
go
insert into heap (a) values (1)
go
begin transaction
update heap
--with (UPDLOCK)
set a=2
select * from sys.dm_tran_locks
rollback
If you remove the comment -- on the with (UPDLOCK) you'll see that you get excatly the same locks (an X lock on the physical row). You can do the same experiment with a B-Tree instead of a heap:
create table btree (a int not null identity(1,1) primary key, b int)
go
insert into btree (b) values (1)
go
begin transaction
update btree
--with (UPDLOCK)
set b=2
select * from sys.dm_tran_locks
rollback
Again, the locks acquired will be identical with or w/o the hint (an exclusive lock on the row key).
Now back to your question, can this whole table update be done in batches? (since this is basically what you're asking). Yes, if the table has a primary key (to be precise what's required is an unique index to batch on, preferable the clustered index to avoid tipping point issues). Here is an example how:
create table btree (id int not null identity(1,1) primary key, b int, c int);
go
set nocount on;
insert into btree (b) values (rand()*1000);
go 1000
declare #id int = null, #rc int;
declare #inserted table (id int);
begin transaction;
-- first batch has no WHERE clause
with cte as (
select top(10) id, b, c
from btree
order by id)
update cte
set c = b+1
output INSERTED.id into #inserted (id);
set #rc = ##rowcount;
commit;
select #id = max(id) from #inserted;
delete from #inserted;
raiserror (N'Updated %d rows, up to id %d', 0,0,#rc, #id);
begin transaction;
while (1=1)
begin
-- update the next batch of 10 rows, now it has where clause
with cte as (
select top(10) id, b, c
from btree
where id > #id
order by id)
update cte
set c = b+1
output INSERTED.id into #inserted (id);
set #rc = ##rowcount;
if (0 = #rc)
break;
commit;
begin transaction;
select #id = max(id) from #inserted;
delete from #inserted;
raiserror (N'Updated %d rows, up to id %d', 0,0,#rc, #id);
end
commit
go
If your table doesn't have a unique clustered index then it becomes really tricky to do this, you would need to do the same thing a cursor has to do. While from a logical point of view the index is not required, not having it would cause each batch to do a whole-table-scan, which would be pretty much disastrous.
In case you wonder what happens if someone inserts a value behind the current #id, then the answer is very simple: the exactly same thing that would happen if someone inserts a value after the whole processing is complete.
Personally I think the single UPDATE will be much better. There are very few cases where a cursor will be better overall, regardless of concurrent activity. In fact the only one that comes to mind is a very complex running totals query - I don't think I've ever seen better overall performance from a cursor that is not read only, only SELECT queries. Of course, you have much better means of testing which is "a better approach" - you have your hardware, your schema, your data, and your usage patterns right in front of you. All you have to do is perform some tests.
That all said, what is the point in the first place of updating that column so that every single row has the same value? I suspect that if the value in that column has no bearing to the rest of the row, it can be stored elsewhere - perhaps a related table or a single-row table. Maybe the value in that column should be NULL (in which case you get it from the other table) unless it is overriden for a specific row. It seems to me like there is a better solution here than touching every single row in the table every time.

SQL Table Locking

I have an SQL Server locking question regarding an application we have in house. The application takes submissions of data and persists them into an SQL Server table. Each submission is also assigned a special catalog number (unrelated to the identity field in the table) which is a sequential alpha numeric number. These numbers are pulled from another table and are not generated at run time. So the steps are
Insert Data into Submission Table
Grab next Unassigned Catalog
Number from Catalog Table
Assign the Catalog Number to the
Submission in the Submission table
All these steps happen sequentially in the same stored procedure.
Its, rate but sometimes we manage to get two submission at the same second and they both get assigned the same Catalog Number which causes a localized version of the Apocalypse in our company for a small while.
What can we do to limit the over assignment of the catalog numbers?
When getting your next catalog number, use row locking to protect the time between you finding it and marking it as in use, e.g.:
set transaction isolation level REPEATABLE READ
begin transaction
select top 1 #catalog_number = catalog_number
from catalog_numbers with (updlock,rowlock)
where assigned = 0
update catalog_numbers set assigned = 1 where catalog_number = :catalog_number
commit transaction
You could use an identity field to produce the catalog numbers, that way you can safely create and get the number:
insert into Catalog () values ()
set #CatalogNumber = scope_identity()
The scope_identity function will return the id of the last record created in the same session, so separate sessions can create records at the same time and still end up with the correct id.
If you can't use an identity field to create the catalog numbers, you have to use a transaction to make sure that you can determine the next number and create it without another session accessing the table.
I like araqnid's response. You could also use an insert trigger on the submission table to accomplish this. The trigger would be in the scope of the insert, and you would effectively embed the logic to assign the catalog_number in the trigger. Just wanted to put your options up here.
Here's the easy solution. No race condition. No blocking from a restrictive transaction isolation level. Probably won't work in SQL dialects other than T-SQL, though.
I assume their is some outside force at work to keep your catalog number table populated with unassigned catalog numbers.
This technique should work for you: just do the same sort of "interlocked update" that retrieves a value, something like:
update top 1 CatalogNumber
set in_use = 1 ,
#newCatalogNumber = catalog_number
from CatalogNumber
where in_use = 0
Anyway, the following stored procedure just just ticks up a number on each execution and hands back the previous one. If you want fancier value, add a computed column that applies the transform of choice to the incrementing value to get the desired value.
drop table dbo.PrimaryKeyGenerator
go
create table dbo.PrimaryKeyGenerator
(
id varchar(100) not null ,
current_value int not null default(1) ,
constraint PrimaryKeyGenerator_PK primary key clustered ( id ) ,
)
go
drop procedure dbo.GetNewPrimaryKey
go
create procedure dbo.GetNewPrimaryKey
#name varchar(100)
as
set nocount on
set ansi_nulls on
set concat_null_yields_null on
set xact_abort on
declare
#uniqueValue int
--
-- put the supplied key in canonical form
--
set #name = ltrim(rtrim(lower(#name)))
--
-- if the name isn't already defined in the table, define it.
--
insert dbo.PrimaryKeyGenerator ( id )
select id = #name
where not exists ( select *
from dbo.PrimaryKeyGenerator pkg
where pkg.id = #name
)
--
-- now, an interlocked update to get the current value and increment the table
--
update PrimaryKeyGenerator
set #uniqueValue = current_value ,
current_value = current_value + 1
where id = #name
--
-- return the new unique value to the caller
--
return #uniqueValue
go
To use it:
declare #pk int
exec #pk = dbo.GetNewPrimaryKey 'foobar'
select #pk
Trivial to mod it to return a result set or return the value via an OUTPUT parameter.

atomic compare and swap in a database

I am working on a work queueing solution. I want to query a given row in the database, where a status column has a specific value, modify that value and return the row, and I want to do it atomically, so that no other query will see it:
begin transaction
select * from table where pk = x and status = y
update table set status = z where pk = x
commit transaction
--(the row would be returned)
it must be impossible for 2 or more concurrent queries to return the row (one query execution would see the row while its status = y) -- sort of like an interlocked CompareAndExchange operation.
I know the code above runs (for SQL server), but will the swap always be atomic?
I need a solution that will work for SQL Server and Oracle
Is PK the primary key? Then this is a non issue, if you already know the primary key there is no sport. If pk is the primary key, then this begs the obvious question how do you know the pk of the item to dequeue...
The problem is if you don't know the primary key and want to dequeue the next 'available' (ie. status = y) and mark it as dequeued (delete it or set status = z).
The proper way to do this is to use a single statement. Unfortunately the syntax differs between Oracle and SQL Server. The SQL Server syntax is:
update top (1) [<table>]
set status = z
output DELETED.*
where status = y;
I'm not familiar enough with Oracle's RETURNING clause to give an example similar to SQL's OUTPUT one.
Other SQL Server solutions require lock hints on the SELECT (with UPDLOCK) to be correct.
In Oracle the preffered avenue is use the FOR UPDATE, but that does not work in SQL Server since FOR UPDATE is to be used in conjunction with cursors in SQL.
In any case, the behavior you have in the original post is incorrect. Multiple sessions can all select the same row(s) and even all update it, returning the same dequeued item(s) to multiple readers.
As a general rule, to make an operation like this atomic you'll need to ensure that you set an exclusive (or update) lock when you perform the select so that no other transaction can read the row before your update.
The typical syntax for this is something like:
select * from table where pk = x and status = y for update
but you'd need to look it up to be sure.
I have some applications that follow a similar pattern. There is a table like yours that represents a queue of work. The table has two extra columns: thread_id and thread_date. When the app asks for work froom the queue, it submits a thread id. Then a single update statement updates all applicable rows with the thread id column with the submitted id and the thread date column with the current time. After that update, it selects all rows with that thread id. This way you dont need to declare an explicit transaction. The "locking" occurs in the initial update.
The thread_date column is used to ensure that you do not end up with orphaned work items. What happens if items are pulled from the queue and then your app crashes? You have to have the ability to try those work items again. So you might grab all items off the queue that have not been marked completed but have been assigned to a thread with a thread date in the distant past. Its up to you to define "distant."
Try this. The validation is in the UPDATE statement.
Code
IF EXISTS (SELECT * FROM sys.tables WHERE name = 't1')
DROP TABLE dbo.t1
GO
CREATE TABLE dbo.t1 (
ColID int IDENTITY,
[Status] varchar(20)
)
GO
DECLARE #id int
DECLARE #initialValue varchar(20)
DECLARE #newValue varchar(20)
SET #initialValue = 'Initial Value'
INSERT INTO dbo.t1 (Status) VALUES (#initialValue)
SELECT #id = SCOPE_IDENTITY()
SET #newValue = 'Updated Value'
BEGIN TRAN
UPDATE dbo.t1
SET
#initialValue = [Status],
[Status] = #newValue
WHERE ColID = #id
AND [Status] = #initialValue
SELECT ColID, [Status] FROM dbo.t1
COMMIT TRAN
SELECT #initialValue AS '#initialValue', #newValue AS '#newValue'
Results
ColID Status
----- -------------
1 Updated Value
#initialValue #newValue
------------- -------------
Initial Value Updated Value