refactoring sql while loop to regular inserts - sql

I am inserting parent records and child records at the same time in a stored procedure.
Rather than have outside code make nested calls to create each parent and then each child of that parent (which is even slower than my current approach), I am giving the sql a comma separated list of child types that I put into a temp table (#TempParentChildUpdateTable) which is associated with a parent record in a table value parameter (#PenguinParentChildUpdate).
Then I loop through both, insert the parent and then insert all related children.
The problem is this while loop is not performing very well at 13 requests per second.
How do I make this faster? How do I take out the while loop?
Is there a way to do this with non-looping inserts? If so, can the string parsing happen inside the inner insert?
DECLARE #RowCnt int = 0;
DECLARE #CounterId int = 1;
SELECT #RowCnt = COUNT(*) FROM #PenguinParentChildUpdate;
WHILE #CounterId <= #RowCnt
BEGIN
SELECT #GrandparentId = WorkflowInstanceId,
#ParentType = ParentType,
#ChildIds = ChildIds
FROM #PenguinParentChildUpdate
-- insert parent record
INSERT INTO WorkflowInstanceRole (ParentType, GrandparentId)
VALUES(#ParentType, #GrandparentId)
SET #ParentId = SCOPE_IDENTITY()
-- identify children
-- convert comma separated list (e.g. 4,91,12,6) into separate rows
INSERT INTO #TempParentChildUpdateTable (sq.ChildId, sq.ParentId)
select convert(int, value) ChildId, #ParentId RoleId FROM string_split(#ChildIds, CHAR(44))
-- insert children 7787+ =
IF (#ChildIds IS NOT NULL AND LEN(LTRIM(RTRIM(#ChildIds))) > 0)
BEGIN
INSERT INTO dbo.PenguinParentChild
(
grandparentId,
childid,
)
select
#ParentId,
childId,
from
#TempParentChildUpdateTable tsru
where
tsru.ParentId = #ParentId
END
SET #CounterId = #CounterId + 1
END

I am struggling to follow some of the logic in the loop, but I think the premise is
Insert to WorkFlowInstanceRole
Capture the inserted records then insert further children based on this.
Since step 2 requires data that is not in WorkFlowInstanceRole you need to use MERGE to add the new rows, rather than a standard insert. What this allows you to do is capture the ChildIds from the source table, even though you aren't inserting them. Something like this should do it (I've had to guess at some data types):
DECLARE #Output TABLE
(
ParentId INT,
ParentType INT,
GrandParentId INT,
ChildIDs VARCHAR(MAX)
);
MERGE INTO WorkflowInstanceRole AS t
USING #PenguinParentChildUpdate AS s
ON 1 = 0 -- Will never be true so will always insert
WHEN NOT MATCHED THEN
INSERT (ParentType, GrandparentId)
VALUES (s.ParentType, s.WorkflowInstanceId)
OUTPUT inserted.ParentId, inserted.ParentType, inserted.GrandParentId, s.ChilDIds
INTO #Output (ParentId,ParentType, GrandParentId, ChildIDs);
INSERT INTO dbo.PenguinParentChild (GrandParentId, ChildId)
SELECT o.ParentId,
CONVERT(int, ss.value)
FROM #Output AS o
CROSS APPLY STRING_SPLIT(i.ChileIds, CHAR(44)) AS ss;
The key part is the MERGE* really, since the condition is 1=0 then this will always insert. Unlike INSERT the OUTPUT clause on a merge will allow you to capture both the newly inserted identity value, and the non-inserted ChildIds column.
This is output into a temporary table, which you can then use, along with CROSS APPLY STRING_SPLIT() to insert the child records.
There may be some data errors, and logic may not be 100% perfect, but this should hopefully be a bit step in the right direction.
*MERGE has a number of known bugs, and I'd generally advise to steer clear, but I am not aware of any alternative that would allow you to capture the newly inserted identity value, and the non-inserted ChildIds column, and as far as I am aware none of these bugs affect this operation (Anecdotally, in that I have never encountered an issue using this method).

Related

Prevent circular reference in MS-SQL table

I have a Account table with ID and ParentAccountID. Here is the scripts to reproduce the steps.
If the ParentAccountID is NULL then that is considered as Top level account.
Every account should finally ends with top level account i.e ParentAccountID is NULL
Declare #Accounts table (ID INT, ParentAccountID INT )
INSERT INTO #Accounts values (1,NULL), (2,1), (3,2) ,(4,3), (5,4), (6,5)
select * from #Accounts
-- Request to update ParentAccountID to 6 for the ID 3
update #Accounts
set ParentAccountID = 6
where ID = 3
-- Now the above update will cause circular reference
select * from #Accounts
When request comes like to update ParentAccountID of an account, if that cause circular reference then before update its need to identified.
Any idea folks!!
It seems you've got some business rules defined for your table:
All chain must end with a top-level account
A chain may not have a circular reference
You have two ways to enforce this.
You can create a trigger in your database, and check the logic in the trigger. This has the benefit of running inside the database, so it applies to every transaction, regardless of the client. However, database triggers are not always popular. I see them as a side effect, and they can be hard to debug. Triggers run as part of your SQL, so if they are slow, your SQL will be slow.
The alternative is to enforce this logic in the application layer - whatever is talking to your database. This is easier to debug, and makes your business logic explicit to new developers - but it doesn't run inside the database, so you could end up replicating the logic if you have multiple client applications.
Here is an example that you could use as a basis to implement a database constraint that should prevent circular references in singular row updates; I don't believe this will work to prevent a circular reference if multiple rows are updated.
/*
ALTER TABLE dbo.Test DROP CONSTRAINT chkTest_PreventCircularRef
GO
DROP FUNCTION dbo.Test_PreventCircularRef
GO
DROP TABLE dbo.Test
GO
*/
CREATE TABLE dbo.Test (TestID INT PRIMARY KEY,TestID_Parent INT)
INSERT INTO dbo.Test(TestID,TestID_Parent) SELECT 1 AS TestID,NULL AS TestID_Parent
INSERT INTO dbo.Test(TestID,TestID_Parent) SELECT 2 AS TestID,1 AS TestID_Parent
INSERT INTO dbo.Test(TestID,TestID_Parent) SELECT 3 AS TestID,2 AS TestID_Parent
INSERT INTO dbo.Test(TestID,TestID_Parent) SELECT 4 AS TestID,3 AS TestID_Parent
INSERT INTO dbo.Test(TestID,TestID_Parent) SELECT 5 AS TestID,4 AS TestID_Parent
GO
GO
CREATE FUNCTION dbo.Test_PreventCircularRef (#TestID INT,#TestID_Parent INT)
RETURNS INT
BEGIN
--FOR TESTING:
--SELECT * FROM dbo.Test;DECLARE #TestID INT=3,#TestID_Parent INT=4
DECLARE #ParentID INT=#TestID
DECLARE #ChildID INT=NULL
DECLARE #RetVal INT=0
DECLARE #Ancestors TABLE(TestID INT)
DECLARE #Descendants TABLE(TestID INT)
--Get all descendants
INSERT INTO #Descendants(TestID) SELECT TestID FROM dbo.Test WHERE TestID_Parent=#TestID
WHILE (##ROWCOUNT>0)
BEGIN
INSERT INTO #Descendants(TestID)
SELECT t1.TestID
FROM dbo.Test t1
LEFT JOIN #Descendants relID ON relID.TestID=t1.TestID
WHERE relID.TestID IS NULL
AND t1.TestID_Parent IN (SELECT TestID FROM #Descendants)
END
--Get all ancestors
--INSERT INTO #Ancestors(TestID) SELECT TestID_Parent FROM dbo.Test WHERE TestID=#TestID
--WHILE (##ROWCOUNT>0)
--BEGIN
-- INSERT INTO #Ancestors(TestID)
-- SELECT t1.TestID_Parent
-- FROM dbo.Test t1
-- LEFT JOIN #Ancestors relID ON relID.TestID=t1.TestID_Parent
-- WHERE relID.TestID IS NULL
-- AND t1.TestID_Parent IS NOT NULL
-- AND t1.TestID IN (SELECT TestID FROM #Ancestors)
--END
--FOR TESTING:
--SELECT TestID AS [Ancestors] FROM #Ancestors;SELECT TestID AS [Descendants] FROM #Descendants;
IF EXISTS (
SELECT *
FROM #Descendants
WHERE TestID=#TestID_Parent
)
BEGIN
SET #RetVal=1
END
RETURN #RetVal
END
GO
ALTER TABLE dbo.Test
ADD CONSTRAINT chkTest_PreventCircularRef
CHECK (dbo.Test_PreventCircularRef(TestID,TestID_Parent) = 0);
GO
SELECT * FROM dbo.Test
--This is problematic as it creates a circular reference between TestID 3 and 4; it is now prevented
UPDATE dbo.Test SET TestID_Parent=4 WHERE TestID=3
Dealing with self-referencing tables / recursive relationships in SQL is not simple. I suppose this is evidenced by the fact that multiple people can't get their heads around the problem with just checking for single-depth cycles.
To enforce this with table constraints, you would need a check constraint based on a recursive query. At best that's DBMS-specific support, and it may not perform well if it has to be run on every update.
My advice is to have the code containing the UPDATE statement enforce this. That could take a couple of forms. In any case if it needs to be strictly enforced it may require limiting UPDATE access into the table to a service account used by a stored proc or external service.
Using a stored procedure would be vary similar to a CHECK constraint, except that you could use procedural (iterative) logic to look for cycles before doing the update. It has become unpopular to put too much logic in stored procs, though, and whether this type of check should be done is a judgement call from team to team / organization to organization.
Likewise using a service-based approach would let you use procedural logic to look for cycles, and you could write it in a language better suited to such logic. The issue here is, if services aren't part of your architecture then it's a bit heavy-weight to introduce a whole new layer. But, a service layer is probably considered more modern/popular (at the moment at least) than funneling updates through stored procs.
With those approaches in mind - and understanding that both procedural and recursive syntax in databases is DBMS-specific - there are too many possible syntax options to really go into. But the idea is:
Examine the proposed parent.
Check it's parent recursively
Do you ever reach the proposed child before reaching a top-level account? IF not, allow the update
Finally, I have created the scripts after some failures, its working fine for me.
-- To hold the Account table data
Declare #Accounts table (ID INT, ParentAccountID INT)
-- To be updated
Declare #AccountID int = 4;
Declare #ParentAccountID int = 7;
Declare #NextParentAccountID INT = #ParentAccountID
Declare #IsCircular int = 0
INSERT INTO #Accounts values (1, NULL), (2,1), (3,1) ,(4,3), (5,4), (6,5), (7,6), (8,7)
-- No circular reference value
--Select * from #Accounts
-- Request to update ParentAccountID to 7 for the Account ID 4
update #Accounts
set ParentAccountID = #ParentAccountID
where ID = #AccountID
Select * from #Accounts
WHILE(1=1)
BEGIN
-- Take the ParentAccountID for #NextParentAccountID
SELECT #NextParentAccountID = ParentAccountID from #Accounts WHERE ID = #NextParentAccountID
-- If the #NextParentAccountID is NULL, then it reaches the top level account, no circular reference hence break the loop
IF (#NextParentAccountID IS NULL)
BEGIN
BREAK;
END
-- If the #NextParentAccountID is equal to #AccountID (to which the update was done) then its creating circular reference
-- Then set the #IsCircular to 1 and break the loop
IF (#NextParentAccountID = #AccountID )
BEGIN
SET #IsCircular = 1
BREAK
END
END
IF #IsCircular = 1
BEGIN
select 'CircularReference' as 'ResponseCode'
END

Generate Hierarchy value automatically

I have a tree table with column ID, ParentID and Hierarchy and want to generate Hierarchy column value dependent by ParentID. for this purpose I use triggers. do exists better way to generate Hierarchy column value?
ALTER TRIGGER [TR_MyTable_BeforInsert] ON [MyTable]
INSTEAD OF INSERT
AS BEGIN
SET NOCOUNT ON;
Declare #Name NVarChar(100),
#ParentID Int
Declare DACategory Cursor For
Select A.Name, A.ParentID
From Inserted A
OPEN DACategory
FETCH NEXT FROM DACategory INTO #Name, #ParentID
While ##FETCH_STATUS=0 Begin
Insert Into MyTable (Name, ParentID, Hierarchy)
Values (#Name, #ParentID, dbo.F_MyTableGetHID(NULL, #ParentID))
FETCH NEXT FROM DACategory INTO #Name, #ParentID
End
Close DACategory
Deallocate DACategory
END
Function :
ALTER FUNCTION [F_MyTableGetHID]
(
#ID int,
#ParentID int
)
RETURNS HierarchyID
AS BEGIN
Declare #RootHID HierarchyID,
#LastHID HierarchyID
IF (#ParentID IS NULL)Begin
Set #RootHID = HierarchyID::GetRoot()
Select #LastHID = Max(Hierarchy) From MyTable Where ParentID IS NULL
End Else Begin
Select #RootHID = Hierarchy From MyTable Where ID = #ParentID
select #LastHID = Max(Hierarchy) From MyTable where ParentID = #ParentID
End
return #RootHID.GetDescendant(#LastHID, NULL)
END
for Update this table also have trigger to set Hierarchy column again when ParentID Changed.
what's the best practices for this problem?
EDIT 1 : I look up solution that not use trigger if possible.
I have a different approach to answer both the questions. I generally avoid using triggers until it is the last choice as it adds un-necessary overhead on the database.
Comparision between triggers and stored procedure
It is easy to view table relationships , constraints, indexes, stored
procedure in database but triggers are difficult to view.
Triggers execute invisible to client-application application. They
are not visible or can be traced in debugging code.
It is easy to forget about triggers and if there is no documentation
it will be difficult to figure out for new developers for their
existence.
Triggers run every time when the database fields are updated and it
is overhead on system. It makes system run slower.
Enough said, this is why I prefer stored procs. You can create a job file (say for ex : it executes after every 30 min, or any other time) via agent. You can use the logic for insertion in that job file. In this way your data in the tree table would be near to real time.
now reference to create an agent :
http://msdn.microsoft.com/en-us/library/ms191128(v=sql.90).aspx
http://msdn.microsoft.com/en-us/library/ms181153(v=sql.105).aspx
You asked for the best practice.
The best practice is not to use the adjacency list model (which is what you have) and instead switch to the nested set model.
It's more difficult to code and understand, which is why it's not as popular, but it's much more flexible.
You should use Trigger without Cursors in this context
ALTER TRIGGER [TR_MyTable_BeforInsert] ON [MyTable]
INSTEAD OF INSERT
AS
BEGIN
SET NOCOUNT ON;
Insert Into MyTable (Name, ParentID, Hierarchy)
Select Name, ParentID, dbo.F_MyTableGetHID(NULL, ParentID)
From inserted
END
You can create a function that calculates the hierarchyid value, given the foreign-key to the parent. Then you can call this function as the default value calculator for the column. If you prevent the users from inserting into this column, the default value will always apply.
This solution will work, only if the parent-child relationship is not updatable.
No need for cursor
just use
Insert Into MyTable (Name, ParentID, Hierarchy)
select Name, ParentID, dbo.F_MyTableGetHID(NULL, ParentID)
from inserted
Can you explain function - not sur what its doing. But looks like it could be turned into a query and therefore could be combined with above - best practice is always do set theory on relational databases

Does anyone know a neat trick for reusing identity values?

Typically when you specify an identity column you get a convenient interface in SQL Server for asking for particular row.
SELECT * FROM $IDENTITY = #pID
You don't really need to concern yourself with the name if the identity column because there can only be one.
But what if I have a table which mostly consists of temporary data. Lots of inserts and lots of deletes. Is there a simple way for me to reuse the identity values.
Preferably I would want to be able to write a function that would return say NEXT_SMALLEST($IDENTITY) as next identity value and do so in a fail-safe manner.
Basically find the smallest value that's not in use. That's not entirely trivial to do, but what I want is to be able to tell SQL Server that this is my function that will generate the identity values. But what I know is that no such function exists...
I want to...
Implement global data base IDs, I need to provide a default value that I'm in control of.
My idea was based around that I should be able to have a table with all known IDs and then every row ID from some other table that needed a global ID would reference that table. The default value would be provided by something like
INSERT INTO GlobalID
RETURN SCOPE_IDENTITY()
No; it's not unique if it can be reused.
Why do you want to re-use them? Why do you concern yourself with this field? If you want to be in control of it, don't make it an identity; create your own scheme and use that.
Don't reuse identities, you'll just shoot your self in the foot. Use a large enough value so that it never rolls over (64 bit big int).
To find missing gaps in a sequence of numbers join the table against itself with a +/- 1 difference:
SELECT a.id
FROM table AS a
LEFT OUTER JOIN table AS b ON a.id = b.id+1
WHERE b.id IS NULL;
This query will find the numbers in the id sequence for which id-1 is not in the table, ie. contiguous sequence start numbers. You can then use SET IDENTITY INSERT OFF to insert a specific id and reuse a number. The cost of doing so is overwhelming (both runtime and code complexity) compared with the an ordinary identity based insert.
If you really want to reset Identity value to the lowest,
here is the trick you can use through DBCC CHECKIDENT
Basically following sql statements resets identity value so that identity value restarts from the lowest possible number
create table TT (id int identity(1, 1))
GO
insert TT default values
GO 10
select * from TT
GO
delete TT where id between 5 and 10
GO
--; At this point, next ID will be 11, not 5
select * from TT
GO
insert TT default values
GO
--; as you can see here, next ID is indeed 11
select * from TT
GO
--; Now delete ID = 11
--; so that we can reseed next highest ID to 5
delete TT where id = 11
GO
--; Now, let''s reseed identity value to the lowest possible identity number
declare #seedID int
select #seedID = max(id) from TT
print #seedID --; 4
--; We reseed identity column with "DBCC CheckIdent" and pass a new seed value
--; But we can't pass a seed number as argument, so let's use dynamic sql.
declare #sql nvarchar(200)
set #sql = 'dbcc checkident(TT, reseed, ' + cast(#seedID as varchar) + ')'
exec sp_sqlexec #sql
GO
--; Now the next
insert TT default values
GO
--; as you can see here, next ID is indeed 5
select * from TT
GO
I guess we would really need to know why you want to reuse your identity column. The only reason I can think of is because of the temporary nature of your data you might exhaust the possible values for the identity. That is not really likely, but if that is your concern, you can use uniqueidentifiers (guids) as the primary key in your table instead.
The function newid() will create a new guid and can be used in insert statements (or other statements). Then when you delete the row, you don't have any "holes" in your key because guids are not created in that order anyway.
[Syntax assumes SQL2008....]
Yes, it's possible. You need to two management tables, and two triggers on each participating table.
First, the management tables:
-- this table should only ever have one row
CREATE TABLE NextId (Id INT)
INSERT NextId VALUES (1)
GO
CREATE TABLE RecoveredIds (Id INT NOT NULL PRIMARY KEY)
GO
Then, the triggers, two on each table:
CREATE TRIGGER tr_TableName_RecoverId ON TableName
FOR DELETE AS BEGIN
IF ##ROWCOUNT = 0 RETURN
INSERT RecoveredIds (Id) SELECT Id FROM deleted
END
GO
CREATE TRIGGER tr_TableName_AssignId ON TableName
INSTEAD OF INSERT AS BEGIN
DECLARE #rowcount INT = ##ROWCOUNT
IF #rowcount = 0 RETURN
DECLARE #required INT = #rowcount
DECLARE #new_ids TABLE (Id INT PRIMARY KEY)
DELETE TOP (#required) OUTPUT DELETED.Id INTO #new_ids (Id) FROM RecoveredIds
SET #rowcount = ##ROWCOUNT
IF #rowcount < #required BEGIN
DECLARE #output TABLE (Id INT)
UPDATE NextId SET Id = Id + (#required-#rowcount)
OUTPUT DELETED.Id INTO #output
-- this assumes you have a numbers table around somewhere
INSERT #new_ids (Id)
SELECT n.Number+o.Id-1 FROM Numbers n, #output o
WHERE n.Number BETWEEN 1 AND #required-#rowcount
END
SET IDENTITY_INSERT TableName ON
;WITH inserted_CTE AS (SELECT _no = ROW_NUMBER() OVER (ORDER BY Id), * FROM inserted)
, new_ids_CTE AS (SELECT _no = ROW_NUMBER() OVER (ORDER BY Id), * FROM #new_ids)
INSERT TableName (Id, Attr1, Attr2)
SELECT n.Id, i.Attr1, i.Attr2
FROM inserted_CTE i JOIN new_ids_CTE n ON i._no = n._no
SET IDENTITY_INSERT TableName OFF
END
You could script the triggers out easily enough from system tables.
You would want to test this for concurrency. It should work as is, syntax errors notwithstanding: The OUTPUT clause guarantees atomicity of id lookup->increment as one step, and the entire operation occurs within a transaction, thanks to the trigger.
TableName.Id is still an identity column. All the common idioms like $IDENTITY and SCOPE_IDENTITY() will still work.
There is no central table of ids by table, but you could create one easily enough.
I don't have any help for finding the values not in use but if you really want to find them and set them yourself, you can use
set identity_insert on ....
in your code to do so.
I'm with everyone else though. Why bother? Don't you have a business problem to solve?

Possible to implement a manual increment with just simple SQL INSERT?

I have a primary key that I don't want to auto increment (for various reasons) and so I'm looking for a way to simply increment that field when I INSERT. By simply, I mean without stored procedures and without triggers, so just a series of SQL commands (preferably one command).
Here is what I have tried thus far:
BEGIN TRAN
INSERT INTO Table1(id, data_field)
VALUES ( (SELECT (MAX(id) + 1) FROM Table1), '[blob of data]');
COMMIT TRAN;
* Data abstracted to use generic names and identifiers
However, when executed, the command errors, saying that
"Subqueries are not allowed in this
context. only scalar expressions are
allowed"
So, how can I do this/what am I doing wrong?
EDIT: Since it was pointed out as a consideration, the table to be inserted into is guaranteed to have at least 1 row already.
You understand that you will have collisions right?
you need to do something like this and this might cause deadlocks so be very sure what you are trying to accomplish here
DECLARE #id int
BEGIN TRAN
SELECT #id = MAX(id) + 1 FROM Table1 WITH (UPDLOCK, HOLDLOCK)
INSERT INTO Table1(id, data_field)
VALUES (#id ,'[blob of data]')
COMMIT TRAN
To explain the collision thing, I have provided some code
first create this table and insert one row
CREATE TABLE Table1(id int primary key not null, data_field char(100))
GO
Insert Table1 values(1,'[blob of data]')
Go
Now open up two query windows and run this at the same time
declare #i int
set #i =1
while #i < 10000
begin
BEGIN TRAN
INSERT INTO Table1(id, data_field)
SELECT MAX(id) + 1, '[blob of data]' FROM Table1
COMMIT TRAN;
set #i =#i + 1
end
You will see a bunch of these
Server: Msg 2627, Level 14, State 1, Line 7
Violation of PRIMARY KEY constraint 'PK__Table1__3213E83F2962141D'. Cannot insert duplicate key in object 'dbo.Table1'.
The statement has been terminated.
Try this instead:
INSERT INTO Table1 (id, data_field)
SELECT id, '[blob of data]' FROM (SELECT MAX(id) + 1 as id FROM Table1) tbl
I wouldn't recommend doing it that way for any number of reasons though (performance, transaction safety, etc)
It could be because there are no records so the sub query is returning NULL...try
INSERT INTO tblTest(RecordID, Text)
VALUES ((SELECT ISNULL(MAX(RecordID), 0) + 1 FROM tblTest), 'asdf')
I don't know if somebody is still looking for an answer but here is a solution that seems to work:
-- Preparation: execute only once
CREATE TABLE Test (Value int)
CREATE TABLE Lock (LockID uniqueidentifier)
INSERT INTO Lock SELECT NEWID()
-- Real insert
BEGIN TRAN LockTran
-- Lock an object to block simultaneous calls.
UPDATE Lock WITH(TABLOCK)
SET LockID = LockID
INSERT INTO Test
SELECT ISNULL(MAX(T.Value), 0) + 1
FROM Test T
COMMIT TRAN LockTran
We have a similar situation where we needed to increment and could not have gaps in the numbers. (If you use an identity value and a transaction is rolled back, that number will not be inserted and you will have gaps because the identity value does not roll back.)
We created a separate table for last number used and seeded it with 0.
Our insert takes a few steps.
--increment the number
Update dbo.NumberTable
set number = number + 1
--find out what the incremented number is
select #number = number
from dbo.NumberTable
--use the number
insert into dbo.MyTable using the #number
commit or rollback
This causes simultaneous transactions to process in a single line as each concurrent transaction will wait because the NumberTable is locked. As soon as the waiting transaction gets the lock, it increments the current value and locks it from others. That current value is the last number used and if a transaction is rolled back, the NumberTable update is also rolled back so there are no gaps.
Hope that helps.
Another way to cause single file execution is to use a SQL application lock. We have used that approach for longer running processes like synchronizing data between systems so only one synchronizing process can run at a time.
If you're doing it in a trigger, you could make sure it's an "INSTEAD OF" trigger and do it in a couple of statements:
DECLARE #next INT
SET #next = (SELECT (MAX(id) + 1) FROM Table1)
INSERT INTO Table1
VALUES (#next, inserted.datablob)
The only thing you'd have to be careful about is concurrency - if two rows are inserted at the same time, they could attempt to use the same value for #next, causing a conflict.
Does this accomplish what you want?
It seems very odd to do this sort of thing w/o an IDENTITY (auto-increment) column, making me question the architecture itself. I mean, seriously, this is the perfect situation for an IDENTITY column. It might help us answer your question if you'd explain the reasoning behind this decision. =)
Having said that, some options are:
using an INSTEAD OF trigger for this purpose. So, you'd do your INSERT (the INSERT statement would not need to pass in an ID). The trigger code would handle inserting the appropriate ID. You'd need to use the WITH (UPDLOCK, HOLDLOCK) syntax used by another answerer to hold the lock for the duration of the trigger (which is implicitly wrapped in a transaction) & to elevate the lock type from "shared" to "update" lock (IIRC).
you can use the idea above, but have a table whose purpose is to store the last, max value inserted into the table. So, once the table is set up, you would no longer have to do a SELECT MAX(ID) every time. You'd simply increment the value in the table. This is safe provided that you use appropriate locking (as discussed). Again, that avoids repeated table scans every time you INSERT.
use GUIDs instead of IDs. It's much easier to merge tables across databases, since the GUIDs will always be unique (whereas records across databases will have conflicting integer IDs). To avoid page splitting, sequential GUIDs can be used. This is only beneficial if you might need to do database merging.
Use a stored proc in lieu of the trigger approach (since triggers are to be avoided, for some reason). You'd still have the locking issue (and the performance problems that can arise). But sprocs are preferred over dynamic SQL (in the context of applications), and are often much more performant.
Sorry about rambling. Hope that helps.
How about creating a separate table to maintain the counter? It has better performance than MAX(id), as it will be O(1). MAX(id) is at best O(lgn) depending on the implementation.
And then when you need to insert, simply lock the counter table for reading the counter and increment the counter. Then you can release the lock and insert to your table with the incremented counter value.
Have a separate table where you keep your latest ID and for every transaction get a new one.
It may be a bit slower but it should work.
DECLARE #NEWID INT
BEGIN TRAN
UPDATE TABLE SET ID=ID+1
SELECT #NEWID=ID FROM TABLE
COMMIT TRAN
PRINT #NEWID -- Do what you want with your new ID
Code without any transaction scope (I use it in my engineer course as an exercice) :
-- Preparation: execute only once
CREATE TABLE increment (val int);
INSERT INTO increment VALUES (1);
-- Real insert
DECLARE #newIncrement INT;
UPDATE increment
SET #newIncrement = val,
val = val + 1;
INSERT INTO Table1 (id, data_field)
SELECT #newIncrement, 'some data';
declare #nextId int
set #nextId = (select MAX(id)+1 from Table1)
insert into Table1(id, data_field) values (#nextId, '[blob of data]')
commit;
But perhaps a better approach would be using a scalar function getNextId('table1')
Any critiques of this? Works for me.
DECLARE #m_NewRequestID INT
, #m_IsError BIT = 1
, #m_CatchEndless INT = 0
WHILE #m_IsError = 1
BEGIN TRY
SELECT #m_NewRequestID = (SELECT ISNULL(MAX(RequestID), 0) + 1 FROM Requests)
INSERT INTO Requests ( RequestID
, RequestName
, Customer
, Comment
, CreatedFromApplication)
SELECT RequestID = #m_NewRequestID
, RequestName = dbo.ufGetNextAvailableRequestName(PatternName)
, Customer = #Customer
, Comment = [Description]
, CreatedFromApplication = #CreatedFromApplication
FROM RequestPatterns
WHERE PatternID = #PatternID
SET #m_IsError = 0
END TRY
BEGIN CATCH
SET #m_IsError = 1
SET #m_CatchEndless = #m_CatchEndless + 1
IF #m_CatchEndless > 1000
THROW 51000, '[upCreateRequestFromPattern]: Unable to get new RequestID', 1
END CATCH
This should work:
INSERT INTO Table1 (id, data_field)
SELECT (SELECT (MAX(id) + 1) FROM Table1), '[blob of data]';
Or this (substitute LIMIT for other platforms):
INSERT INTO Table1 (id, data_field)
SELECT TOP 1
MAX(id) + 1, '[blob of data]'
FROM
Table1
ORDER BY
[id] DESC;

Sql Server INSERT scope problem

This may have been asked before, but it's really hard to search for terms that limit the search results...
Take the following SQL snippet:
declare #source table (id int)
declare #target table(id int primary key, sourceId int)
set nocount on
insert into #target values (0,0)
insert into #source(id) values(1)
--insert into #source(id) values(2)
set nocount off
insert into #target select (select max(id)+1 from #target), s.id from #source s
select * from #target
This obviously executes without error, but now uncomment the second insert line and the following error occurs:
Msg 2627, Level 14, State 1, Line 15
Violation of PRIMARY KEY constraint 'PK__#7DB3CB72__7EA7EFAB'. Cannot insert duplicate key in object 'dbo.#target'.
I realise that the insert statement more than likely is effected against a snapshot of the #target table so (select max(id)+1 from #target) will always return a value of 1 - causing the violation error above...
Is there any way around this apart from resorting to a cursor?
Change your insert statement to the following:
insert into #target select (select
max(id) from #target) + (ROW_NUMBER()
OVER(ORDER BY s.id)), s.id from
#source s
This should work for this specific case but I would be careful about generalizing it.
You could use an identity column (that's exactly what they are meant for)
declare #target table(id int IDENTITY(1,1), sourceId int)
If your problem is that the select clause is "computed" before the insert is executed, there's afaik no way around this using a single SQL request
I think it's by design ; For your insertion to avoid duplicates, the index id must be computed during the insert, not during the select. This is the exact purpose of the IDENTITY keyword.
If you want to insert one select at a time, you must write separate requests (using cursors for example, but you'll lose atomicity, and will have to use proper locking keywords to avoid race conditions)
The way you're determining your new PK value, is a race condition waiting to happen.
If your DB is under high load, and multiple records are being inserted at the same time, you're going to get unexpected results.
Why don't you just use an identity column , and let the database handle the assignment of a new primary Id ?
Or, you can create some kind of meta-table, which holds a record for every table in your database, and this record contains the next value that should be used as a primary id in the table.
Then, you must make sure that every time you create a new record, you also update the next-value in your meta-table (and you should make sure that you do the appropriate locking), but, I see no added value in this approach vs making use of identity columns.