Best way to generate a UniqueID for a group of rows?

Best way to generate a UniqueID for a group of rows? - sql

This is very simplified but I have a web service array of items that look something like this:
[12345, 34131, 13431]
and I am going to be looping through the array and inserting them one by one into a database and I want that table to look like this. These values would be tied to a unique identifier showing that they were
1 12345
1 34131
1 13431
and then if another array came along it would then insert all of its numbers with unique ID 2.... basically this is to keep track of groups.
There will be multiple processes executing this potentially at the same time so what would be the best way to generate the unique identifier and also ensure that 2 processes couldn't have used the same one?

You should fix your data model. It is missing an entity, say, batches.
create table batches (
batch_id int identity(1, 1) primary key,
created_at datetime default getdate()
);
You might have other information as well.
And your table should have a foreign key reference, batch_id to batches.
Then your code should do the following:
Insert a new row into batches. A new batch has begun.
Fetch the id that was just created.
Use this id for the rows that you want to insert.
Although you could do this with a sequence, a separate table makes more sense to me. You are tying a bunch of rows together into something. That something should be represented in the data model.

You can declare this :
DECLARE #UniqueID UNIQUEIDENTIFIER = NEWID();
and use this as your unique identifier when you insert your batch

Since it isn't a primary key, an identity column is out. Honestly I'd probably just track it using a separate id sequence table. Create a proc that grabs the next available ID and then increments it. If you open a transaction at the beginning of the proc it should prevent the second thread from getting the number until the first thread is done with it's update.
Something like:
CREATE PROCEDURE getNextID
#NextNumber INT OUTPUT
,#id_type VARCHAR(20)
AS
BEGIN
SET NOCOUNT ON;
DECLARE #NextValue TABLE (NextNumber int);
BEGIN TRANSACTION;
UPDATE id_sequence
SET last_used_number = ISNULL(#NextNumber, 0) + 1
OUTPUT inserted.last_used_number INTO #NextValue(NextNumber)
WHERE id_type = #id_type
SELECT #NextNumber = NextNumber FROM #NextValue
COMMIT TRANSACTION;
END

Related

SQL Server - Multiple Processes Inserting to table

I have a couple of stored procs (Add & Remove) which run some selects, inserts, deletes & updates on some tables. These seem fine.
Each of these procs uses a TRANSACTION.
I begin the transaction before I do any changes to data and near the end of the proc i do..
IF ##TRANSCCOUNT > 0
COMMIT TRANSACTION #transName;
Within the Add and Remove procs, and within the TRANSACTION I call another stored procedure (Adjust) to update a table which keeps a running total of values.
I am finding that this is getting out of sync.....
Here is the body of that proc....
INSERT INTO L2(ProductId, LocationId, POId, StockMoveId, BasketId, OrderId, AdjusterValue, CurrentValue)
SELECT TOP 1
#ProductId, #LocationId, null, null, #BasketId, null, #Value, (CurrentValue + #Value)
FROM L2
WHERE 1=1
AND LocationId = #LocationId
AND ProductId = #ProductId
ORDER BY Id Desc
ProductId, LocationId, StockMoveId and OrderId are all foreign keys to the relevent tables but do allow nulls so only the approprate one needs to be populated with an actual value.
Here is an image showing an example of where it goes wrong....
The 19 should have been addded to 324 nmaking a new total of 343, however, as you can see it seems to have been added to the 300 and 319 is inserted.
Questions...
Is this actually in the transaction that was began in the calling stored proc.
How can I prevent this situation?
I've tried using MAX to get the right row to try and speed up but the execution plan on that isn't as cost effective as the simple TOP. ID, btw is an Identity column and PKey.
Do I need to Lock the table, and if I do with the other process calling Adjust wait or will they error.
Any assistance much appreciated.
More info....
I have been experimenting and it would seem the only solution that consistently works as desired is to have the Id column as an INT field and simply increment it myself on the INSERT.
This doesn't sit well with me as to me it doesn't make sense as to why the IDENTITY column n doesn't seem to cope.
I've tried the posted Identity column solution, sequences and incrementing ID myself

After lots of searching, experimenting I SEEM to have a solution that is now very robust.
I now have the ID as a simple INT column and I manage the ID myself by getting the MAX + 1 for each new insert.
I now wrap the body of the Adjust proc in its own Transaction and use the following to get the next ID.....
DECLARE #trxNam Varchar(10) = 'tranNextId';
DECLARE #newId INT;
DECLARE #currentLevelId INT;
BEGIN TRANSACTION #trxNam;
SELECT #newId = MAX(id) + 1 FROM L2 WITH(updlock,serializable);
I then do my insert using the #newId and COMMIT the named transaction.
I have a scenarion set up where I have a number of Win32Apps caling my API 100s of times that was cosistantly failing due to intermittent PKEY violations.
Now it doesn't.
Happy days!
Still I'm looking to see if I can simply have an identity column again and use the Transaction in the adjust proc...That would be cleaner I think.
I found this article led me to the solution....

Singleton pattern in a stored procedure

How can you implement the Singleton pattern in a SQL Server 2005/2008 stored procedure?
We want the stored procedure to return the next value from a table to a caller, and then update the value, so the next caller gets a different value ...
BUT there will be time when there are lots of callers!
AND we don't want blocking/time-out issues
PS. maybe singleton isn't the answer ... if not, how could you handle this?

By definition SINGLETON IS A LOCKING Pattern.
Talking about databases, there are so many DB professionals that get afraid when you mention the word "Lock", but locks per se are not a problem, they are a fundamental mechanism for the Relational Databases.
You must learn how locks works, what kind of locks exists, and treat them with respect.
Always work with short transactions, lock the minimum rows as you can, work with sets not individual rows.
Locks become a problem when they are massive, and when they last too much, and of course when you build a DEADLOCK.
So, the golden rule, when you must change data into a transaction, first put an exclusive Lock (UPDATE), never a Shared Lock (SELECT), it means sometimes you have to start doing a fake LOCK as in :
BEGIN TRAN
UPDATE table
set col1 = col1
Where Key = #Key
.......
COMMIT TRAN
Prior to SQL Server 2012, when I needed a serial I've done it in two ways:
Create an IDENTITY column, so after inserting you can get the value with the built in function SCOPE_IDENTITY() there is also ##IDENTITY, but if someone create a trigger that inserts into another table with an identity column starts the nightmare.
CREATE TABLE [table]
(
Id int IDENTITY(1,1) NOT NULL,
col2 ....
col3 ....
)
The second option is to add an serial column usually in the parent table or a table made for it plus a procedure (you can use client code) to get the serial:
--IF YOU CREATE A SERIAL HERE YOU'LL SPENT SOME SPACE,
--BUT IT WILL KEEP YOUR BLOCKINGS VERY LOW
CREATE TABLE Parent
(
Id,
ChildSerial int not null,
col2 ...
col3 ...
CONSTRAINT PK_Parent PRIMARY KEY (Id)
)
GO
--NAMED CONSTRAINT Auto names are random (avoid them)
ALTER TABLE Parent
ADD CONSTRAINT Parent_DF_ChildSerial DEFAULT(0) FOR ChildSerial;
GO
CREATE TABLE Child
(
Id int not null
col2..
colN..
--PLUS PRIMARY KEY... INDEXES, etc.
)
CREATE PROC GetChildId
(
#PatentId int
#ChildSerial int output --To use the proc from another proc
)
As
Begin
BEGIN TRAN
--YOU START WITH A LOCK, SO YOU'LL NEVER GET A DEADLOCK
--NOR A FAKE SERIAL (Two clients with the same number)
UPDATE Parent
SET ChildSerial = ChildSerial + 1
WHERE Id = #PatentId
If ##error != 0
Begin
SELECT #ChildSerial = -1
SELECT #ChildSerial
ROLLBACK TRAN
RETURN
End
SELECT #ChildSerial = ChildSerial
FROM Parent
WHERE Id = #PatentId
COMMIT TRAN
SELECT #ChildSerial --To Use the proc easily from a program
End
Go

Stored Procedure - Knowing the next ID

I am creating a stored procedure to create a new customer so for instance,
CREATE PROCEDURE Customer_Create
#customer_arg
#type_arg
AS
BEGIN
INSERT INTO Customer (Customer_id, Type_id)
VALUES (#Customer_arg,#type_arg)
End;
If I have several foreign keys in my statement and they are all ID's is there a way for me to pull the NEXT ID number automatically without having to know what it would be off the top of my head when I run the execute statement? I would like to just have it pull the fact that the ID will be 2 because the previous record was 1
EXECUTE Customer_Create 16,2
Is it something wnith output? If so how does this work code wise

I suspect that what you want to do is return the new id after the record is inserted. For that:
CREATE PROCEDURE Customer_Create (
#customer_arg,
#type_arg,
#NewCustomerId int output
) AS
BEGIN
INSERT INTO Customer(Customer_id, Type_id)
VALUES (#Customer_arg, #type_arg);
#NewCustomerId = scope_identity();
End;
There are several other choices for getting the identity, which are explained here.

To get to the last inserted IDENTITY value you should use the OUTPUT clause like this:
DECLARE #IdentValues TABLE(v INT);
INSERT INTO dbo.IdentityTest
OUTPUT INSERTED.id INTO #IdentValues(v)
DEFAULT VALUES;
SELECT v AS IdentityValues FROM #IdentValues;
There are several other mechanisms like ##IDENTITY but they all have significant problems. See my Identity Crisis article for details.

In your case you can also experiment with #IDENTITY like this
DECLARE #NextID int
--insert statement goes here
SET #NextID = ##Identity`
Here are couple good resources for getting familiar with this
http://blog.sqlauthority.com/2007/03/25/sql-server-identity-vs-scope_identity-vs-ident_current-retrieve-last-inserted-identity-of-record/
http://blog.sqlauthority.com/2013/03/26/sql-server-identity-fields-review-sql-queries-2012-joes-2-pros-volume-2-the-sql-query-techniques-tutorial-for-sql-server-2012/

SQL Table Locking

I have an SQL Server locking question regarding an application we have in house. The application takes submissions of data and persists them into an SQL Server table. Each submission is also assigned a special catalog number (unrelated to the identity field in the table) which is a sequential alpha numeric number. These numbers are pulled from another table and are not generated at run time. So the steps are
Insert Data into Submission Table
Grab next Unassigned Catalog
Number from Catalog Table
Assign the Catalog Number to the
Submission in the Submission table
All these steps happen sequentially in the same stored procedure.
Its, rate but sometimes we manage to get two submission at the same second and they both get assigned the same Catalog Number which causes a localized version of the Apocalypse in our company for a small while.
What can we do to limit the over assignment of the catalog numbers?

When getting your next catalog number, use row locking to protect the time between you finding it and marking it as in use, e.g.:
set transaction isolation level REPEATABLE READ
begin transaction
select top 1 #catalog_number = catalog_number
from catalog_numbers with (updlock,rowlock)
where assigned = 0
update catalog_numbers set assigned = 1 where catalog_number = :catalog_number
commit transaction

You could use an identity field to produce the catalog numbers, that way you can safely create and get the number:
insert into Catalog () values ()
set #CatalogNumber = scope_identity()
The scope_identity function will return the id of the last record created in the same session, so separate sessions can create records at the same time and still end up with the correct id.
If you can't use an identity field to create the catalog numbers, you have to use a transaction to make sure that you can determine the next number and create it without another session accessing the table.

I like araqnid's response. You could also use an insert trigger on the submission table to accomplish this. The trigger would be in the scope of the insert, and you would effectively embed the logic to assign the catalog_number in the trigger. Just wanted to put your options up here.

Here's the easy solution. No race condition. No blocking from a restrictive transaction isolation level. Probably won't work in SQL dialects other than T-SQL, though.
I assume their is some outside force at work to keep your catalog number table populated with unassigned catalog numbers.
This technique should work for you: just do the same sort of "interlocked update" that retrieves a value, something like:
update top 1 CatalogNumber
set in_use = 1 ,
#newCatalogNumber = catalog_number
from CatalogNumber
where in_use = 0
Anyway, the following stored procedure just just ticks up a number on each execution and hands back the previous one. If you want fancier value, add a computed column that applies the transform of choice to the incrementing value to get the desired value.
drop table dbo.PrimaryKeyGenerator
go
create table dbo.PrimaryKeyGenerator
(
id varchar(100) not null ,
current_value int not null default(1) ,
constraint PrimaryKeyGenerator_PK primary key clustered ( id ) ,
)
go
drop procedure dbo.GetNewPrimaryKey
go
create procedure dbo.GetNewPrimaryKey
#name varchar(100)
as
set nocount on
set ansi_nulls on
set concat_null_yields_null on
set xact_abort on
declare
#uniqueValue int
--
-- put the supplied key in canonical form
--
set #name = ltrim(rtrim(lower(#name)))
--
-- if the name isn't already defined in the table, define it.
--
insert dbo.PrimaryKeyGenerator ( id )
select id = #name
where not exists ( select *
from dbo.PrimaryKeyGenerator pkg
where pkg.id = #name
)
--
-- now, an interlocked update to get the current value and increment the table
--
update PrimaryKeyGenerator
set #uniqueValue = current_value ,
current_value = current_value + 1
where id = #name
--
-- return the new unique value to the caller
--
return #uniqueValue
go
To use it:
declare #pk int
exec #pk = dbo.GetNewPrimaryKey 'foobar'
select #pk
Trivial to mod it to return a result set or return the value via an OUTPUT parameter.

Does anyone know a neat trick for reusing identity values?

Typically when you specify an identity column you get a convenient interface in SQL Server for asking for particular row.
SELECT * FROM $IDENTITY = #pID
You don't really need to concern yourself with the name if the identity column because there can only be one.
But what if I have a table which mostly consists of temporary data. Lots of inserts and lots of deletes. Is there a simple way for me to reuse the identity values.
Preferably I would want to be able to write a function that would return say NEXT_SMALLEST($IDENTITY) as next identity value and do so in a fail-safe manner.
Basically find the smallest value that's not in use. That's not entirely trivial to do, but what I want is to be able to tell SQL Server that this is my function that will generate the identity values. But what I know is that no such function exists...
I want to...
Implement global data base IDs, I need to provide a default value that I'm in control of.
My idea was based around that I should be able to have a table with all known IDs and then every row ID from some other table that needed a global ID would reference that table. The default value would be provided by something like
INSERT INTO GlobalID
RETURN SCOPE_IDENTITY()

No; it's not unique if it can be reused.
Why do you want to re-use them? Why do you concern yourself with this field? If you want to be in control of it, don't make it an identity; create your own scheme and use that.

Don't reuse identities, you'll just shoot your self in the foot. Use a large enough value so that it never rolls over (64 bit big int).
To find missing gaps in a sequence of numbers join the table against itself with a +/- 1 difference:
SELECT a.id
FROM table AS a
LEFT OUTER JOIN table AS b ON a.id = b.id+1
WHERE b.id IS NULL;
This query will find the numbers in the id sequence for which id-1 is not in the table, ie. contiguous sequence start numbers. You can then use SET IDENTITY INSERT OFF to insert a specific id and reuse a number. The cost of doing so is overwhelming (both runtime and code complexity) compared with the an ordinary identity based insert.

If you really want to reset Identity value to the lowest,
here is the trick you can use through DBCC CHECKIDENT
Basically following sql statements resets identity value so that identity value restarts from the lowest possible number
create table TT (id int identity(1, 1))
GO
insert TT default values
GO 10
select * from TT
GO
delete TT where id between 5 and 10
GO
--; At this point, next ID will be 11, not 5
select * from TT
GO
insert TT default values
GO
--; as you can see here, next ID is indeed 11
select * from TT
GO
--; Now delete ID = 11
--; so that we can reseed next highest ID to 5
delete TT where id = 11
GO
--; Now, let''s reseed identity value to the lowest possible identity number
declare #seedID int
select #seedID = max(id) from TT
print #seedID --; 4
--; We reseed identity column with "DBCC CheckIdent" and pass a new seed value
--; But we can't pass a seed number as argument, so let's use dynamic sql.
declare #sql nvarchar(200)
set #sql = 'dbcc checkident(TT, reseed, ' + cast(#seedID as varchar) + ')'
exec sp_sqlexec #sql
GO
--; Now the next
insert TT default values
GO
--; as you can see here, next ID is indeed 5
select * from TT
GO

I guess we would really need to know why you want to reuse your identity column. The only reason I can think of is because of the temporary nature of your data you might exhaust the possible values for the identity. That is not really likely, but if that is your concern, you can use uniqueidentifiers (guids) as the primary key in your table instead.
The function newid() will create a new guid and can be used in insert statements (or other statements). Then when you delete the row, you don't have any "holes" in your key because guids are not created in that order anyway.

[Syntax assumes SQL2008....]
Yes, it's possible. You need to two management tables, and two triggers on each participating table.
First, the management tables:
-- this table should only ever have one row
CREATE TABLE NextId (Id INT)
INSERT NextId VALUES (1)
GO
CREATE TABLE RecoveredIds (Id INT NOT NULL PRIMARY KEY)
GO
Then, the triggers, two on each table:
CREATE TRIGGER tr_TableName_RecoverId ON TableName
FOR DELETE AS BEGIN
IF ##ROWCOUNT = 0 RETURN
INSERT RecoveredIds (Id) SELECT Id FROM deleted
END
GO
CREATE TRIGGER tr_TableName_AssignId ON TableName
INSTEAD OF INSERT AS BEGIN
DECLARE #rowcount INT = ##ROWCOUNT
IF #rowcount = 0 RETURN
DECLARE #required INT = #rowcount
DECLARE #new_ids TABLE (Id INT PRIMARY KEY)
DELETE TOP (#required) OUTPUT DELETED.Id INTO #new_ids (Id) FROM RecoveredIds
SET #rowcount = ##ROWCOUNT
IF #rowcount < #required BEGIN
DECLARE #output TABLE (Id INT)
UPDATE NextId SET Id = Id + (#required-#rowcount)
OUTPUT DELETED.Id INTO #output
-- this assumes you have a numbers table around somewhere
INSERT #new_ids (Id)
SELECT n.Number+o.Id-1 FROM Numbers n, #output o
WHERE n.Number BETWEEN 1 AND #required-#rowcount
END
SET IDENTITY_INSERT TableName ON
;WITH inserted_CTE AS (SELECT _no = ROW_NUMBER() OVER (ORDER BY Id), * FROM inserted)
, new_ids_CTE AS (SELECT _no = ROW_NUMBER() OVER (ORDER BY Id), * FROM #new_ids)
INSERT TableName (Id, Attr1, Attr2)
SELECT n.Id, i.Attr1, i.Attr2
FROM inserted_CTE i JOIN new_ids_CTE n ON i._no = n._no
SET IDENTITY_INSERT TableName OFF
END
You could script the triggers out easily enough from system tables.
You would want to test this for concurrency. It should work as is, syntax errors notwithstanding: The OUTPUT clause guarantees atomicity of id lookup->increment as one step, and the entire operation occurs within a transaction, thanks to the trigger.
TableName.Id is still an identity column. All the common idioms like $IDENTITY and SCOPE_IDENTITY() will still work.
There is no central table of ids by table, but you could create one easily enough.

I don't have any help for finding the values not in use but if you really want to find them and set them yourself, you can use
set identity_insert on ....
in your code to do so.
I'm with everyone else though. Why bother? Don't you have a business problem to solve?

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Best way to generate a UniqueID for a group of rows? - sql

You can declare this : DECLARE #UniqueID UNIQUEIDENTIFIER = NEWID(); and use this as your unique identifier when you insert your batch

Related

SQL Server - Multiple Processes Inserting to table

Singleton pattern in a stored procedure

Stored Procedure - Knowing the next ID

SQL Table Locking

Does anyone know a neat trick for reusing identity values?

Categories

Resources