Constrain Sum(Column) to 1 by some group ID - sql

I have a table that I'm trying to make sure that an aggregate sum of the inserts adds up to 1 (it's a mixture).
I want to constrain it so the whole FKID =2 fails because it adds up to 1.1.
Currently my constraint is
FUNCTION[dbo].[CheckSumTarget](#ID bigint)
RETURNS bit
AS BEGIN
DECLARE #Res BIT
SELECT #Res = Count(1)
FROM dbo.Test AS t
WHERE t.FKID = #ID
GROUP BY t.FKID
HAVING Sum([t.Value])<>1
RETURN #Res
END
GO
ALTER TABLE dbo.Test WITH CHECK ADD CONSTRAINT [CK_Target_Sum] CHECK (([dbo].[CheckSumTarget]([FKID])<>(1)))
but it's failing on the first insert because it doesn't add up to 1 yet. I was hoping if I add them all simultaneously, that wouldn't be the case.

This approach seems fraught with problems.
I would suggest another approach, starting with two tables:
aggregates, so "fkid" should really be aggregate_id
components
Then, in aggregates accumulate the sum() of the component values using a trigger. Maintain another flag that is computed:
alter table aggregates add is_valid as ( sum_value = 1.0 )
Then, create views on the two tables to only show records where is_valid = 1. For instance:
create view v_aggregates as
select c.*
from aggregates a join
components c
on a.aggregate_id = c.aggregate_id
where a.is_value = 1;

Here is a working version of solution
Here is table DDL
create table dbo.test(
id int,
fkid bigint,
value decimal(4,2)
);
The function definition
CREATE FUNCTION[dbo].[CheckSumTarget](#ID bigint)
RETURNS bit
AS BEGIN
DECLARE #Res decimal(4,2)
SELECT #Res = case when sum(value) > 1 then 1 else 0 end
FROM dbo.Test AS t
WHERE t.FKID = #ID
RETURN #Res
END
And the constraint defintion
ALTER TABLE dbo.Test WITH CHECK ADD CONSTRAINT [CK_Target_Sum] CHECK ([dbo].[CheckSumTarget]([FKID]) <> 1)
In your example
insert into dbo.test values (1, 2, 0.5);
insert into dbo.test values (1, 2, 0.4);
-- The following insert will fail, like you expect
insert into dbo.test values (1, 2, 0.2);
Note: This solution will be broken by UPDATE statement (as pointed out by 'Daniel Brughera') however that is a known behaviour. A better and common approach is use of trigger. You may want to explore that.

Your actual approach will work this way.....
You insert the firts component, the value must be 1
You try to insert a second component, it will be rejected because your sum is already 1
You update the existing component to .85
You insert the next component, the value must be .15
You back to step 2. with the third component
Since your constraint only takes care of the FKID column, it will be possible, and you may think that is working....
But.... if you left the process in step 3. your sum is not equal to 1 and is impossible for the constraint to foresee if you will insert the next value or not, even worst, you can update any value to be greater than 1 and it will be accepted.
If you add the value column to your constraint, it will prevent those updates, but you will never be able to go beyond step 1.
Personally I would't do that, but here you can get an approach
Use the computed column suggested by Gordon on your parent table. With computed columns you will always get the actual value, so, the parent wont be valid until the sum is equal to one
Use this solution to prevent the value to be greater than 1, so, at least you will be sure that any non valid parent is because a component is missing, that can be helpful for your business layer
As I mentioned in one comment, the rest of the logic belongs to the business and ui layers
Note as you can see the id and value parameters are not used in the function, but I need them to call them when I create the constraint, that way the constraint will validate updates too
CREATE TABLE ttest (id int, fkid int, value float)
go
create FUNCTION [dbo].[CheckSumTarget](#id int, #fkid int, #value float)
RETURNS FLOAT
AS BEGIN
DECLARE #Res float
SELECT #Res = sum(value)
FROM dbo.ttest AS t
WHERE t.FKID = #fkid
RETURN #Res
END
GO
ALTER TABLE dbo.ttest WITH CHECK ADD CONSTRAINT [CK_Target_Sum] CHECK (([dbo].[CheckSumTarget](id,[FKID],value)<=(1.0)))

Related

Prevent circular reference in MS-SQL table

I have a Account table with ID and ParentAccountID. Here is the scripts to reproduce the steps.
If the ParentAccountID is NULL then that is considered as Top level account.
Every account should finally ends with top level account i.e ParentAccountID is NULL
Declare #Accounts table (ID INT, ParentAccountID INT )
INSERT INTO #Accounts values (1,NULL), (2,1), (3,2) ,(4,3), (5,4), (6,5)
select * from #Accounts
-- Request to update ParentAccountID to 6 for the ID 3
update #Accounts
set ParentAccountID = 6
where ID = 3
-- Now the above update will cause circular reference
select * from #Accounts
When request comes like to update ParentAccountID of an account, if that cause circular reference then before update its need to identified.
Any idea folks!!
It seems you've got some business rules defined for your table:
All chain must end with a top-level account
A chain may not have a circular reference
You have two ways to enforce this.
You can create a trigger in your database, and check the logic in the trigger. This has the benefit of running inside the database, so it applies to every transaction, regardless of the client. However, database triggers are not always popular. I see them as a side effect, and they can be hard to debug. Triggers run as part of your SQL, so if they are slow, your SQL will be slow.
The alternative is to enforce this logic in the application layer - whatever is talking to your database. This is easier to debug, and makes your business logic explicit to new developers - but it doesn't run inside the database, so you could end up replicating the logic if you have multiple client applications.
Here is an example that you could use as a basis to implement a database constraint that should prevent circular references in singular row updates; I don't believe this will work to prevent a circular reference if multiple rows are updated.
/*
ALTER TABLE dbo.Test DROP CONSTRAINT chkTest_PreventCircularRef
GO
DROP FUNCTION dbo.Test_PreventCircularRef
GO
DROP TABLE dbo.Test
GO
*/
CREATE TABLE dbo.Test (TestID INT PRIMARY KEY,TestID_Parent INT)
INSERT INTO dbo.Test(TestID,TestID_Parent) SELECT 1 AS TestID,NULL AS TestID_Parent
INSERT INTO dbo.Test(TestID,TestID_Parent) SELECT 2 AS TestID,1 AS TestID_Parent
INSERT INTO dbo.Test(TestID,TestID_Parent) SELECT 3 AS TestID,2 AS TestID_Parent
INSERT INTO dbo.Test(TestID,TestID_Parent) SELECT 4 AS TestID,3 AS TestID_Parent
INSERT INTO dbo.Test(TestID,TestID_Parent) SELECT 5 AS TestID,4 AS TestID_Parent
GO
GO
CREATE FUNCTION dbo.Test_PreventCircularRef (#TestID INT,#TestID_Parent INT)
RETURNS INT
BEGIN
--FOR TESTING:
--SELECT * FROM dbo.Test;DECLARE #TestID INT=3,#TestID_Parent INT=4
DECLARE #ParentID INT=#TestID
DECLARE #ChildID INT=NULL
DECLARE #RetVal INT=0
DECLARE #Ancestors TABLE(TestID INT)
DECLARE #Descendants TABLE(TestID INT)
--Get all descendants
INSERT INTO #Descendants(TestID) SELECT TestID FROM dbo.Test WHERE TestID_Parent=#TestID
WHILE (##ROWCOUNT>0)
BEGIN
INSERT INTO #Descendants(TestID)
SELECT t1.TestID
FROM dbo.Test t1
LEFT JOIN #Descendants relID ON relID.TestID=t1.TestID
WHERE relID.TestID IS NULL
AND t1.TestID_Parent IN (SELECT TestID FROM #Descendants)
END
--Get all ancestors
--INSERT INTO #Ancestors(TestID) SELECT TestID_Parent FROM dbo.Test WHERE TestID=#TestID
--WHILE (##ROWCOUNT>0)
--BEGIN
-- INSERT INTO #Ancestors(TestID)
-- SELECT t1.TestID_Parent
-- FROM dbo.Test t1
-- LEFT JOIN #Ancestors relID ON relID.TestID=t1.TestID_Parent
-- WHERE relID.TestID IS NULL
-- AND t1.TestID_Parent IS NOT NULL
-- AND t1.TestID IN (SELECT TestID FROM #Ancestors)
--END
--FOR TESTING:
--SELECT TestID AS [Ancestors] FROM #Ancestors;SELECT TestID AS [Descendants] FROM #Descendants;
IF EXISTS (
SELECT *
FROM #Descendants
WHERE TestID=#TestID_Parent
)
BEGIN
SET #RetVal=1
END
RETURN #RetVal
END
GO
ALTER TABLE dbo.Test
ADD CONSTRAINT chkTest_PreventCircularRef
CHECK (dbo.Test_PreventCircularRef(TestID,TestID_Parent) = 0);
GO
SELECT * FROM dbo.Test
--This is problematic as it creates a circular reference between TestID 3 and 4; it is now prevented
UPDATE dbo.Test SET TestID_Parent=4 WHERE TestID=3
Dealing with self-referencing tables / recursive relationships in SQL is not simple. I suppose this is evidenced by the fact that multiple people can't get their heads around the problem with just checking for single-depth cycles.
To enforce this with table constraints, you would need a check constraint based on a recursive query. At best that's DBMS-specific support, and it may not perform well if it has to be run on every update.
My advice is to have the code containing the UPDATE statement enforce this. That could take a couple of forms. In any case if it needs to be strictly enforced it may require limiting UPDATE access into the table to a service account used by a stored proc or external service.
Using a stored procedure would be vary similar to a CHECK constraint, except that you could use procedural (iterative) logic to look for cycles before doing the update. It has become unpopular to put too much logic in stored procs, though, and whether this type of check should be done is a judgement call from team to team / organization to organization.
Likewise using a service-based approach would let you use procedural logic to look for cycles, and you could write it in a language better suited to such logic. The issue here is, if services aren't part of your architecture then it's a bit heavy-weight to introduce a whole new layer. But, a service layer is probably considered more modern/popular (at the moment at least) than funneling updates through stored procs.
With those approaches in mind - and understanding that both procedural and recursive syntax in databases is DBMS-specific - there are too many possible syntax options to really go into. But the idea is:
Examine the proposed parent.
Check it's parent recursively
Do you ever reach the proposed child before reaching a top-level account? IF not, allow the update
Finally, I have created the scripts after some failures, its working fine for me.
-- To hold the Account table data
Declare #Accounts table (ID INT, ParentAccountID INT)
-- To be updated
Declare #AccountID int = 4;
Declare #ParentAccountID int = 7;
Declare #NextParentAccountID INT = #ParentAccountID
Declare #IsCircular int = 0
INSERT INTO #Accounts values (1, NULL), (2,1), (3,1) ,(4,3), (5,4), (6,5), (7,6), (8,7)
-- No circular reference value
--Select * from #Accounts
-- Request to update ParentAccountID to 7 for the Account ID 4
update #Accounts
set ParentAccountID = #ParentAccountID
where ID = #AccountID
Select * from #Accounts
WHILE(1=1)
BEGIN
-- Take the ParentAccountID for #NextParentAccountID
SELECT #NextParentAccountID = ParentAccountID from #Accounts WHERE ID = #NextParentAccountID
-- If the #NextParentAccountID is NULL, then it reaches the top level account, no circular reference hence break the loop
IF (#NextParentAccountID IS NULL)
BEGIN
BREAK;
END
-- If the #NextParentAccountID is equal to #AccountID (to which the update was done) then its creating circular reference
-- Then set the #IsCircular to 1 and break the loop
IF (#NextParentAccountID = #AccountID )
BEGIN
SET #IsCircular = 1
BREAK
END
END
IF #IsCircular = 1
BEGIN
select 'CircularReference' as 'ResponseCode'
END

sql server cannot access inserted table in a trigger

I am trying to create a simple to insert trigger that gets the count from a table and adds it to another like this
CREATE TABLE [poll-count](
id VARCHAR(100),
altid BIGINT,
option_order BIGINT,
uip VARCHAR(50),
[uid] VARCHAR(100),
[order] BIGINT
PRIMARY KEY NONCLUSTERED([order]),
FOREIGN KEY ([order]) references ord ([order]
)
GO
CREATE TRIGGER [get-poll-count]
ON [poll-count]
FOR INSERT
AS
BEGIN
DECLARE #count INT
SET #count = (SELECT COUNT (*) FROM [poll-count] WHERE option_order = i.option_order)
UPDATE [poll-options] SET [total] = #count WHERE [order] = i.option_order
END
GO
when i ever i try to run this i get this error:
The multi-part identifier "i.option_order" could not be bound
what is the problem?
thanks
Your trigger currently assumes that there will always be one-row inserts. Have you tried your trigger with anything like this?
INSERT dbo.[poll-options](option_order --, ...)
VALUES(1 --, ...),
(2 --, ...);
Also, you say that SQL Server "cannot access inserted table" - yet your statement says this. Where do you reference inserted (even if this were a valid subquery structure)?
SET #count = (SELECT COUNT (*) FROM [poll-count]
WHERE option_order = i.option_order)
-----------------------^ "i" <> "inserted"
Here is a trigger that properly references inserted and also properly handles multi-row inserts:
CREATE TRIGGER dbo.pollupdate
ON dbo.[poll-options]
FOR INSERT
AS
BEGIN
SET NOCOUNT ON;
;WITH x AS
(
SELECT option_order, c = COUNT(*)
FROM dbo.[poll-options] AS p
WHERE EXISTS
(
SELECT 1 FROM inserted
WHERE option_order = p.option_order
)
GROUP BY option_order
)
UPDATE p SET total = x.c
FROM dbo.[poll-options] AS p
INNER JOIN x
ON p.option_order = x.option_order;
END
GO
However, why do you want to store this data on every row? You can always derive the count at runtime, know that it is perfectly up to date, and avoid the need for a trigger altogether. If it's about the performance aspect of deriving the count at runtime, a much easier way to implement this write-ahead optimization for about the same maintenance cost during DML is to create an indexed view:
CREATE VIEW dbo.[poll-options-count]
WITH SCHEMABINDING
AS
SELECT option_order, c = COUNT_BIG(*)
FROM dbo.[poll-options]
GROUP BY option_order;
GO
CREATE UNIQUE CLUSTERED INDEX oo ON dbo.[poll-options-count](option_order);
GO
Now the index is maintained for you and you can derive very quick counts for any given (or all) option_order values. You'll have test, of course, whether the improvement in query time is worth the increased maintenance (though you are already paying that price with the trigger, except that it can affect many more rows in any given insert, so...).
As a final suggestion, don't use special characters like - in object names. It just forces you to always wrap it in [square brackets] and that's no fun for anyone.

Get IDENTITY value in the same T-SQL statement it is created in?

I was asked if you could have an insert statement, which had an ID field that was an "identity" column, and if the value that was assigned could also be inserted into another field in the same record, in the same insert statement.
Is this possible (SQL Server 2008r2)?
Thanks.
You cannot really do this - because the actual value that will be used for the IDENTITY column really only is fixed and set when the INSERT has completed.
You could however use e.g. a trigger
CREATE TRIGGER trg_YourTableInsertID ON dbo.YourTable
AFTER INSERT
AS
UPDATE dbo.YourTable
SET dbo.YourTable.OtherID = i.ID
FROM dbo.YourTable t2
INNER JOIN INSERTED i ON i.ID = t2.ID
This would fire right after any rows have been inserted, and would set the OtherID column to the values of the IDENTITY columns for the inserted rows. But it's strictly speaking not within the same statement - it's just after your original statement.
You can do this by having a computed column in your table:
DECLARE #QQ TABLE (ID INT IDENTITY(1,1), Computed AS ID PERSISTED, Letter VARCHAR (1))
INSERT INTO #QQ (Letter)
VALUES ('h'),
('e'),
('l'),
('l'),
('o')
SELECT *
FROM #QQ
1 1 h
2 2 e
3 3 l
4 4 l
5 5 o
About the cheked answer:
You cannot really do this - because the actual value that will be used
for the IDENTITY column really only is fixed and set when the INSERT
has completed.
marc_s I suppose, you are not actually right. Yes, He can! ))
The way to solution is IDENT_CURRENT():
CREATE TABLE TemporaryTable(
Id int PRIMARY KEY IDENTITY(1,1) NOT NULL,
FkId int NOT NULL
)
ALTER TABLE TemporaryTable
ADD CONSTRAINT [Fk_const] FOREIGN KEY (FkId) REFERENCES [TemporaryTable] ([Id])
INSERT INTO TemporaryTable (FkId) VALUES (IDENT_CURRENT('[TemporaryTable]'))
INSERT INTO TemporaryTable (FkId) VALUES (IDENT_CURRENT('[TemporaryTable]'))
INSERT INTO TemporaryTable (FkId) VALUES (IDENT_CURRENT('[TemporaryTable]'))
INSERT INTO TemporaryTable (FkId) VALUES (IDENT_CURRENT('[TemporaryTable]'))
UPDATE TemporaryTable
SET [FkId] = 3
WHERE Id = 2
SELECT * FROM TemporaryTable
DROP TABLE TemporaryTable
More over, you can even use IDENT_CURRENT() as DEFAULT CONSTRAINT and it works instead of SCOPE_IDENTITY() for example. Try this:
CREATE TABLE TemporaryTable(
Id int PRIMARY KEY IDENTITY(1,1) NOT NULL,
FkId int NOT NULL DEFAULT IDENT_CURRENT('[TemporaryTable]')
)
ALTER TABLE TemporaryTable
ADD CONSTRAINT [Fk_const] FOREIGN KEY (FkId) REFERENCES [TemporaryTable] ([Id])
INSERT INTO TemporaryTable (FkId) VALUES (DEFAULT)
INSERT INTO TemporaryTable (FkId) VALUES (DEFAULT)
INSERT INTO TemporaryTable (FkId) VALUES (DEFAULT)
INSERT INTO TemporaryTable (FkId) VALUES (DEFAULT)
UPDATE TemporaryTable
SET [FkId] = 3
WHERE Id = 2
SELECT * FROM TemporaryTable
DROP TABLE TemporaryTable
You can do both.
To insert rows with a column "identity", you need to set identity_insert off.
Note that you still can't duplicate values!
You can see the command here.
Be aware to set identity_insert on afterwards.
To create a table with the same record, you simply need to:
create new column;
insert it with null value or other thing;
update that column after inserts with the value of the identity column.
If you need to insert the value at the same time, you can use the ##identity global variable. It'll give you the last inserted. So I think you need to do a ##identity + 1. In this case it can give wrong values because the ##identity is for all tables. So it'll count if the insert occurs in another table with identity.
Another solution is to get the max id and add one :) and you get the needed value!
use this simple code
`SCOPE_IDENTITY()+1
I know the original post was a long while ago. But, the top-most solution is using a trigger to update the field after the record has been inserted and I think there is a more efficient method.
Using a trigger for this has always bugged me. It always has seemed like there must be a better way. That trigger basically makes every insert perform 2 writes to the database, (1) the insert, and then (2) the update of the 2nd int. The trigger is also doing a join back into the table. This is a bit of overhead to have especially for a large database and large tables. And I suspect that as the table gets larger, the overhead of this approach does also. Maybe I'm wrong on that. But, it just doesn't seem like a good solution on a large table.
I wrote a function fn_GetIdent that can be used for this. It's funny how simple it is but really was some work to figure out. I stumbled onto this eventually. It turns out that calling IDENT_CURRENT(#variableTableName) from within a function that is called from the INSERT statements SET value assignment clause acts differently than if you call IDENT_CURRENT(#variableTableName) from the INSERT statement directly. And it makes it where you can get the new identity value for the record that you are inserting.
There is one caveat. When the identity is NULL (ie - an empty table with no records) it acts a little differently since the sys.identity_columns.last_value is NULL. So, you have to handle the very first record entered a little differently. I put code in the function to address that, and now it works.
This works because each call to the function, even within the same INSERT statement, is in it's own new "scope" within the function. (I believe that is the correct explanation). So, you can even insert multiple rows with one INSERT statement using this function. If you call IDENT_CURRENT(#variableTableName) from the INSERT statement directly, it will assign the same value for the newID in all rows. This is because the identity gets updated after the entire INSERT statement finishes processing (within the same scope). But, calling IDENT_CURRENT(#variableTableName) from within a function causes each insert to update the identity value with each row entered. But, it's all done in a function call from the INSERT statement itself. So, it's easy to implement once you have the function created.
This approach is a call to a function (from the INSERT statement) which does one read from the sys.identity_columns.last_value (to see if it is NULL and if a record exists) within the function and then calling IDENT_CURRENT(#variableTableName) and then returning out of the function to the INSERT statement to insert the row. So, it is one small read (for each row INSERTED) and then the one write of the insert which is less overhead than the trigger approach I think. The trigger approach could be rather inefficient if you use that for all tables in a large database with large tables. I haven't done any performance analysis on it compared to the trigger. But, I think this would be a lot more efficient, especially on large tables.
I've been testing it out and this seems to work in all cases. I would welcome feedback as to whether anyone finds where this doesn't work or if there is any problem with this approach. Can anyone can shoot holes in this approach? If so, please let me know. If not, could you vote it up? I think it is a better approach.
So, maybe being holed up due to COVID-19 out there, turned out to be productive for something. Thank you Microsoft for keeping me occupied. Anyone hiring? :) No, seriously, anyone hiring? OK, so now what am I going to do with myself now that I am done with this? :) Wishing everyone safe times out there.
Here is the code below. Wondering if this approach has any holes in it. Feedback welcomed.
IF OBJECT_ID('dbo.fn_GetIdent') IS NOT NULL
DROP FUNCTION dbo.fn_GetIdent;
GO
CREATE FUNCTION dbo.fn_GetIdent(#inTableName AS VARCHAR(MAX))
RETURNS Int
WITH EXECUTE AS CALLER
AS
BEGIN
DECLARE #tableHasIdentity AS Int
DECLARE #tableIdentitySeedValue AS Int
/*Check if the tables identity column is null - a special case*/
SELECT
#tableHasIdentity = CASE identity_columns.last_value WHEN NULL THEN 0 ELSE 1 END,
#tableIdentitySeedValue = CONVERT(int, identity_columns.seed_value)
FROM sys.tables
INNER JOIN sys.identity_columns
ON tables.object_id = identity_columns.object_id
WHERE identity_columns.is_identity = 1
AND tables.type = 'U'
AND tables.name = #inTableName;
DECLARE #ReturnValue AS Int;
SET #ReturnValue = CASE #tableHasIdentity WHEN 0 THEN #tableIdentitySeedValue
ELSE IDENT_CURRENT(#inTableName)
END;
RETURN (#ReturnValue);
END
GO
/* The function above only has to be created the one time to be used in the example below */
DECLARE #TableHasRows AS Bit
DROP TABLE IF EXISTS TestTable
CREATE TABLE TestTable (ID INT IDENTITY(1,1),
New INT,
Letter VARCHAR (1))
INSERT INTO TestTable (New, Letter)
VALUES (dbo.fn_GetIdent('TestTable'), 'H')
INSERT INTO TestTable (New, Letter)
VALUES (dbo.fn_GetIdent('TestTable'), 'e')
INSERT INTO TestTable (New, Letter)
VALUES (dbo.fn_GetIdent('TestTable'), 'l'),
(dbo.fn_GetIdent('TestTable'), 'l'),
(dbo.fn_GetIdent('TestTable'), 'o')
INSERT INTO TestTable (New, Letter)
VALUES (dbo.fn_GetIdent('TestTable'), ' '),
(dbo.fn_GetIdent('TestTable'), 'W'),
(dbo.fn_GetIdent('TestTable'), 'o'),
(dbo.fn_GetIdent('TestTable'), 'r'),
(dbo.fn_GetIdent('TestTable'), 'l'),
(dbo.fn_GetIdent('TestTable'), 'd')
INSERT INTO TestTable (New, Letter)
VALUES (dbo.fn_GetIdent('TestTable'), '!')
SELECT * FROM TestTable
/*
Result
ID New Letter
1 1 H
2 2 e
3 3 l
4 4 l
5 5 o
6 6
7 7 W
8 8 o
9 9 r
10 10 l
11 11 d
12 12 !
*/

Fastest way to return a primary key value SQL Server 2005

I have a two column table with a primary key (int) and a unique value (nvarchar(255))
When I insert a value to this table, I can use Scope_identity() to return the primary key for the value I just inserted. However, if the value already exists, I have to perform an additional select to return the primary key for a follow up operation (inserting that primary key into a second table)
I'm thinking there must be a better way to do this - I considered using covered indexes but the table only has two columns, most of what I've read on covered indexes suggests they only help where the table is significantly larger than the index.
Is there any faster way to do this? Would a covered index be faster even if its the same size as the table?
Building an index won't gain you anything since you have already created your value column as unique (which builds a index in the background). Effectively a full table scan is no different from an index scan in your scenario.
I assume you want to have a sort of insert-if-not-already-existsts behaviour. There is no way getting around a second select
if not exists (select ID from where name = #...)
insert into ...
select SCOPE_IDENTITY()
else
(select ID from where name = #...)
If the value happens to exist, the query will usually have been cached, so there should be no performance hit for the second ID select.
[Update statment here]
IF (##ROWCOUNT = 0)
BEGIN
[Insert statment here]
SELECT Scope_Identity()
END
ELSE
BEGIN
[SELECT id statment here]
END
I don't know about performance but it has no big overhead
As has already been mentioned this really shouldn't be a slow operation, especially if you index both columns. However if you are determined to reduce the expense of this operation then I see no reason why you couldn't remove the table entirely and just use the unique value directly rather than looking it up in this table. A 1-1 mapping like this is (theoretically) redundant. I say theoretically because there may be performance implications to using an nvarchar instead of an int.
I'll post this answer since everyone else seems to say you have to query the table twice in the event that the record exists... that's not true.
Step 1) Create a unique-index on the other column:
I recommend this as the index:
-- We're including the "ID" column so that SQL will not have to look far once the "WHERE" clause is finished.
CREATE INDEX MyLilIndex ON dbo.MyTable (Column2) INCLUDE (ID)
Step 2)
DECLARE #TheID INT
SELECT #TheID = ID from MyTable WHERE Column2 = 'blah blah'
IF (#TheID IS NOT NULL)
BEGIN
-- See, you don't have to query the table twice!
SELECT #TheID AS TheIDYouWanted
END
ELSE
INSERT...
SELECT SCOPE_IDENTITY() AS TheIDYouWanted
Create a unique index for the second entry, then:
if not exists (select null from ...)
insert into ...
else
select x from ...
You can't get away from the index, and it isn't really much overhead -- SQL server supports index columns upto 900-bytes, and does not discriminate.
The needs of your model are more important than any perceived performance issues, symbolising a string (which is what you are doing) is a common method to reduce database size, and this indirectly (and generally) means better performance.
-- edit --
To appease timothy :
declare #x int = select x from ...
if (#x is not null)
return x
else
...
You could use OUTPUT clause to return the value in the same statement. Here is an example.
DDL:
CREATE TABLE ##t (
id int PRIMARY KEY IDENTITY(1,1),
val varchar(255) NOT NULL
)
GO
-- no need for INCLUDE as PK column is always included in the index
CREATE UNIQUE INDEX AK_t_val ON ##t (val)
DML:
DECLARE #id int, #val varchar(255)
SET #val = 'test' -- or whatever you need here
SELECT #id = id FROM ##t WHERE val = #val
IF (#id IS NULL)
BEGIN
DECLARE #new TABLE (id int)
INSERT INTO ##t (val)
OUTPUT inserted.id INTO #new -- put new ID into table variable immediately
VALUES (#val)
SELECT #id = id FROM #new
END
PRINT #id

Does anyone know a neat trick for reusing identity values?

Typically when you specify an identity column you get a convenient interface in SQL Server for asking for particular row.
SELECT * FROM $IDENTITY = #pID
You don't really need to concern yourself with the name if the identity column because there can only be one.
But what if I have a table which mostly consists of temporary data. Lots of inserts and lots of deletes. Is there a simple way for me to reuse the identity values.
Preferably I would want to be able to write a function that would return say NEXT_SMALLEST($IDENTITY) as next identity value and do so in a fail-safe manner.
Basically find the smallest value that's not in use. That's not entirely trivial to do, but what I want is to be able to tell SQL Server that this is my function that will generate the identity values. But what I know is that no such function exists...
I want to...
Implement global data base IDs, I need to provide a default value that I'm in control of.
My idea was based around that I should be able to have a table with all known IDs and then every row ID from some other table that needed a global ID would reference that table. The default value would be provided by something like
INSERT INTO GlobalID
RETURN SCOPE_IDENTITY()
No; it's not unique if it can be reused.
Why do you want to re-use them? Why do you concern yourself with this field? If you want to be in control of it, don't make it an identity; create your own scheme and use that.
Don't reuse identities, you'll just shoot your self in the foot. Use a large enough value so that it never rolls over (64 bit big int).
To find missing gaps in a sequence of numbers join the table against itself with a +/- 1 difference:
SELECT a.id
FROM table AS a
LEFT OUTER JOIN table AS b ON a.id = b.id+1
WHERE b.id IS NULL;
This query will find the numbers in the id sequence for which id-1 is not in the table, ie. contiguous sequence start numbers. You can then use SET IDENTITY INSERT OFF to insert a specific id and reuse a number. The cost of doing so is overwhelming (both runtime and code complexity) compared with the an ordinary identity based insert.
If you really want to reset Identity value to the lowest,
here is the trick you can use through DBCC CHECKIDENT
Basically following sql statements resets identity value so that identity value restarts from the lowest possible number
create table TT (id int identity(1, 1))
GO
insert TT default values
GO 10
select * from TT
GO
delete TT where id between 5 and 10
GO
--; At this point, next ID will be 11, not 5
select * from TT
GO
insert TT default values
GO
--; as you can see here, next ID is indeed 11
select * from TT
GO
--; Now delete ID = 11
--; so that we can reseed next highest ID to 5
delete TT where id = 11
GO
--; Now, let''s reseed identity value to the lowest possible identity number
declare #seedID int
select #seedID = max(id) from TT
print #seedID --; 4
--; We reseed identity column with "DBCC CheckIdent" and pass a new seed value
--; But we can't pass a seed number as argument, so let's use dynamic sql.
declare #sql nvarchar(200)
set #sql = 'dbcc checkident(TT, reseed, ' + cast(#seedID as varchar) + ')'
exec sp_sqlexec #sql
GO
--; Now the next
insert TT default values
GO
--; as you can see here, next ID is indeed 5
select * from TT
GO
I guess we would really need to know why you want to reuse your identity column. The only reason I can think of is because of the temporary nature of your data you might exhaust the possible values for the identity. That is not really likely, but if that is your concern, you can use uniqueidentifiers (guids) as the primary key in your table instead.
The function newid() will create a new guid and can be used in insert statements (or other statements). Then when you delete the row, you don't have any "holes" in your key because guids are not created in that order anyway.
[Syntax assumes SQL2008....]
Yes, it's possible. You need to two management tables, and two triggers on each participating table.
First, the management tables:
-- this table should only ever have one row
CREATE TABLE NextId (Id INT)
INSERT NextId VALUES (1)
GO
CREATE TABLE RecoveredIds (Id INT NOT NULL PRIMARY KEY)
GO
Then, the triggers, two on each table:
CREATE TRIGGER tr_TableName_RecoverId ON TableName
FOR DELETE AS BEGIN
IF ##ROWCOUNT = 0 RETURN
INSERT RecoveredIds (Id) SELECT Id FROM deleted
END
GO
CREATE TRIGGER tr_TableName_AssignId ON TableName
INSTEAD OF INSERT AS BEGIN
DECLARE #rowcount INT = ##ROWCOUNT
IF #rowcount = 0 RETURN
DECLARE #required INT = #rowcount
DECLARE #new_ids TABLE (Id INT PRIMARY KEY)
DELETE TOP (#required) OUTPUT DELETED.Id INTO #new_ids (Id) FROM RecoveredIds
SET #rowcount = ##ROWCOUNT
IF #rowcount < #required BEGIN
DECLARE #output TABLE (Id INT)
UPDATE NextId SET Id = Id + (#required-#rowcount)
OUTPUT DELETED.Id INTO #output
-- this assumes you have a numbers table around somewhere
INSERT #new_ids (Id)
SELECT n.Number+o.Id-1 FROM Numbers n, #output o
WHERE n.Number BETWEEN 1 AND #required-#rowcount
END
SET IDENTITY_INSERT TableName ON
;WITH inserted_CTE AS (SELECT _no = ROW_NUMBER() OVER (ORDER BY Id), * FROM inserted)
, new_ids_CTE AS (SELECT _no = ROW_NUMBER() OVER (ORDER BY Id), * FROM #new_ids)
INSERT TableName (Id, Attr1, Attr2)
SELECT n.Id, i.Attr1, i.Attr2
FROM inserted_CTE i JOIN new_ids_CTE n ON i._no = n._no
SET IDENTITY_INSERT TableName OFF
END
You could script the triggers out easily enough from system tables.
You would want to test this for concurrency. It should work as is, syntax errors notwithstanding: The OUTPUT clause guarantees atomicity of id lookup->increment as one step, and the entire operation occurs within a transaction, thanks to the trigger.
TableName.Id is still an identity column. All the common idioms like $IDENTITY and SCOPE_IDENTITY() will still work.
There is no central table of ids by table, but you could create one easily enough.
I don't have any help for finding the values not in use but if you really want to find them and set them yourself, you can use
set identity_insert on ....
in your code to do so.
I'm with everyone else though. Why bother? Don't you have a business problem to solve?