How to maintain transaction integrity per common column value for inserting into two tables? - sql

Let's say I have two table variables declared as below:
DECLARE #Table1 TABLE (
A INT,
B NVARCHAR(100)
)
DECLARE #Table2 TABLE (
A INT,
C NVARCHAR(100)
)
Here are the contents of #Table1:
1, 'Hello'
2, 'Hi'
3, 'Ola'
These are the contents of #Table2:
1, 'my old friend'
1, 'sweetheart'
2, 'buddy'
4, 'the end'
Now I want to insert #Table1 into a table X and #Table2 into a table Y. The scenario is that I have to maintain transaction integrity for the insertion into both X and Y for every same value of column A.
For instance, let's say I am inserting the first row (1,'Hello') of #Table1 into X. This means I must also insert the first two rows ((1,'my old friend'), (1,'sweetheart')) of #Table2 into Y in the same transaction. So if any insert of Y fails for A=1, X also fails for A=1. For any value of column A that is not in both #Table1 and #Table2, they are individual transactions by themselves (e.g. A=3 in #Table1 and A=4 in #Table2).
Here are the ways I see to deal with this problem:
I fetch all values of A in both #Table1 and #Table2, run a cursor over it and then for each value of A, I insert into tables X and Y in a single transaction. The issue here is first of all, I don't want to use cursors as much as possible and also, this would mean a super large number of individual inserts.
I pre-validate my #Table1 and #Table2 values and then do one single insert of #Table1 on X and #Table2 on Y. This will be much faster than the above method. But the issues I see here are that first of all, not putting it in a 'transaction' somehow doesn't seem right and also, there could be a small chance I might have missed a validation somewhere (unlikely, yet still).
Which approach should I go for? Is there a better solution?
P.S. Please also note that I do not want to fail the entire insert on X and Y if there is an issue in inserting for only one or few values of A. Also, going back and deleting tables from my DB based on the failed inserts is also not an option as it messes with the running id continuity which I am trying to avoid.

A DML statement is executed completely or not executed at all.
You can do a mix of your two options
First add as much validations as possible, if it fails run the second one using a temp table instead of a cursor
BEGIN TRY
BEGIN TRAN Opt2
--Option 2
COMMIT TRAN Opt2
END TRY
BEGIN CATCH
ROLLBACK TRAN Opt2
DECLARE #A INT, #B VARCHAR(100)
SELECT * INTO #TMP FROM #Table1
WHILE EXISTS (SELECT 1 FROM #TMP)
BEGIN
SELECT TOP 1 #A = A, #B = B FROM #Table1
-- Option 1
DELETE #Temp WHERE A = #A
END
END CATCH

Related

Prevent circular reference in MS-SQL table

I have a Account table with ID and ParentAccountID. Here is the scripts to reproduce the steps.
If the ParentAccountID is NULL then that is considered as Top level account.
Every account should finally ends with top level account i.e ParentAccountID is NULL
Declare #Accounts table (ID INT, ParentAccountID INT )
INSERT INTO #Accounts values (1,NULL), (2,1), (3,2) ,(4,3), (5,4), (6,5)
select * from #Accounts
-- Request to update ParentAccountID to 6 for the ID 3
update #Accounts
set ParentAccountID = 6
where ID = 3
-- Now the above update will cause circular reference
select * from #Accounts
When request comes like to update ParentAccountID of an account, if that cause circular reference then before update its need to identified.
Any idea folks!!
It seems you've got some business rules defined for your table:
All chain must end with a top-level account
A chain may not have a circular reference
You have two ways to enforce this.
You can create a trigger in your database, and check the logic in the trigger. This has the benefit of running inside the database, so it applies to every transaction, regardless of the client. However, database triggers are not always popular. I see them as a side effect, and they can be hard to debug. Triggers run as part of your SQL, so if they are slow, your SQL will be slow.
The alternative is to enforce this logic in the application layer - whatever is talking to your database. This is easier to debug, and makes your business logic explicit to new developers - but it doesn't run inside the database, so you could end up replicating the logic if you have multiple client applications.
Here is an example that you could use as a basis to implement a database constraint that should prevent circular references in singular row updates; I don't believe this will work to prevent a circular reference if multiple rows are updated.
/*
ALTER TABLE dbo.Test DROP CONSTRAINT chkTest_PreventCircularRef
GO
DROP FUNCTION dbo.Test_PreventCircularRef
GO
DROP TABLE dbo.Test
GO
*/
CREATE TABLE dbo.Test (TestID INT PRIMARY KEY,TestID_Parent INT)
INSERT INTO dbo.Test(TestID,TestID_Parent) SELECT 1 AS TestID,NULL AS TestID_Parent
INSERT INTO dbo.Test(TestID,TestID_Parent) SELECT 2 AS TestID,1 AS TestID_Parent
INSERT INTO dbo.Test(TestID,TestID_Parent) SELECT 3 AS TestID,2 AS TestID_Parent
INSERT INTO dbo.Test(TestID,TestID_Parent) SELECT 4 AS TestID,3 AS TestID_Parent
INSERT INTO dbo.Test(TestID,TestID_Parent) SELECT 5 AS TestID,4 AS TestID_Parent
GO
GO
CREATE FUNCTION dbo.Test_PreventCircularRef (#TestID INT,#TestID_Parent INT)
RETURNS INT
BEGIN
--FOR TESTING:
--SELECT * FROM dbo.Test;DECLARE #TestID INT=3,#TestID_Parent INT=4
DECLARE #ParentID INT=#TestID
DECLARE #ChildID INT=NULL
DECLARE #RetVal INT=0
DECLARE #Ancestors TABLE(TestID INT)
DECLARE #Descendants TABLE(TestID INT)
--Get all descendants
INSERT INTO #Descendants(TestID) SELECT TestID FROM dbo.Test WHERE TestID_Parent=#TestID
WHILE (##ROWCOUNT>0)
BEGIN
INSERT INTO #Descendants(TestID)
SELECT t1.TestID
FROM dbo.Test t1
LEFT JOIN #Descendants relID ON relID.TestID=t1.TestID
WHERE relID.TestID IS NULL
AND t1.TestID_Parent IN (SELECT TestID FROM #Descendants)
END
--Get all ancestors
--INSERT INTO #Ancestors(TestID) SELECT TestID_Parent FROM dbo.Test WHERE TestID=#TestID
--WHILE (##ROWCOUNT>0)
--BEGIN
-- INSERT INTO #Ancestors(TestID)
-- SELECT t1.TestID_Parent
-- FROM dbo.Test t1
-- LEFT JOIN #Ancestors relID ON relID.TestID=t1.TestID_Parent
-- WHERE relID.TestID IS NULL
-- AND t1.TestID_Parent IS NOT NULL
-- AND t1.TestID IN (SELECT TestID FROM #Ancestors)
--END
--FOR TESTING:
--SELECT TestID AS [Ancestors] FROM #Ancestors;SELECT TestID AS [Descendants] FROM #Descendants;
IF EXISTS (
SELECT *
FROM #Descendants
WHERE TestID=#TestID_Parent
)
BEGIN
SET #RetVal=1
END
RETURN #RetVal
END
GO
ALTER TABLE dbo.Test
ADD CONSTRAINT chkTest_PreventCircularRef
CHECK (dbo.Test_PreventCircularRef(TestID,TestID_Parent) = 0);
GO
SELECT * FROM dbo.Test
--This is problematic as it creates a circular reference between TestID 3 and 4; it is now prevented
UPDATE dbo.Test SET TestID_Parent=4 WHERE TestID=3
Dealing with self-referencing tables / recursive relationships in SQL is not simple. I suppose this is evidenced by the fact that multiple people can't get their heads around the problem with just checking for single-depth cycles.
To enforce this with table constraints, you would need a check constraint based on a recursive query. At best that's DBMS-specific support, and it may not perform well if it has to be run on every update.
My advice is to have the code containing the UPDATE statement enforce this. That could take a couple of forms. In any case if it needs to be strictly enforced it may require limiting UPDATE access into the table to a service account used by a stored proc or external service.
Using a stored procedure would be vary similar to a CHECK constraint, except that you could use procedural (iterative) logic to look for cycles before doing the update. It has become unpopular to put too much logic in stored procs, though, and whether this type of check should be done is a judgement call from team to team / organization to organization.
Likewise using a service-based approach would let you use procedural logic to look for cycles, and you could write it in a language better suited to such logic. The issue here is, if services aren't part of your architecture then it's a bit heavy-weight to introduce a whole new layer. But, a service layer is probably considered more modern/popular (at the moment at least) than funneling updates through stored procs.
With those approaches in mind - and understanding that both procedural and recursive syntax in databases is DBMS-specific - there are too many possible syntax options to really go into. But the idea is:
Examine the proposed parent.
Check it's parent recursively
Do you ever reach the proposed child before reaching a top-level account? IF not, allow the update
Finally, I have created the scripts after some failures, its working fine for me.
-- To hold the Account table data
Declare #Accounts table (ID INT, ParentAccountID INT)
-- To be updated
Declare #AccountID int = 4;
Declare #ParentAccountID int = 7;
Declare #NextParentAccountID INT = #ParentAccountID
Declare #IsCircular int = 0
INSERT INTO #Accounts values (1, NULL), (2,1), (3,1) ,(4,3), (5,4), (6,5), (7,6), (8,7)
-- No circular reference value
--Select * from #Accounts
-- Request to update ParentAccountID to 7 for the Account ID 4
update #Accounts
set ParentAccountID = #ParentAccountID
where ID = #AccountID
Select * from #Accounts
WHILE(1=1)
BEGIN
-- Take the ParentAccountID for #NextParentAccountID
SELECT #NextParentAccountID = ParentAccountID from #Accounts WHERE ID = #NextParentAccountID
-- If the #NextParentAccountID is NULL, then it reaches the top level account, no circular reference hence break the loop
IF (#NextParentAccountID IS NULL)
BEGIN
BREAK;
END
-- If the #NextParentAccountID is equal to #AccountID (to which the update was done) then its creating circular reference
-- Then set the #IsCircular to 1 and break the loop
IF (#NextParentAccountID = #AccountID )
BEGIN
SET #IsCircular = 1
BREAK
END
END
IF #IsCircular = 1
BEGIN
select 'CircularReference' as 'ResponseCode'
END

SQl: Update Table from a text File

Here's what I have to do :
I have a text file which has 3 columns: PID, X, Y.
Now I have two tables in my database:
Table 1 contains 4 columns: UID, PID, X, Y
Table 2 contains multiple columns, required ones being UID, X, Y
I need to update Table 2 with corresponding X and Y values.
I think we can use BULK INSERT for updating table 1, then some WHILE loop or something.
But I can't figure out exact thing.
CREATE PROCEDURE [dbo].[BulkInsert]
(
#PID int ,
#x int,
#y int,
)
AS
BEGIN
SET NOCOUNT ON;
declare #query varchar(max)
CREATE TABLE #TEMP
(
[PID] [int] NOT NULL ,
[x] int NOT NULL,
[y] int NOT NULL,
)
SET #query = 'BULK INSERT #TEMP FROM ''' + PathOfYourTextFile + ''' WITH ( FIELDTERMINATOR = '','',ROWTERMINATOR = ''\n'')'
--print #query
--return
execute(#query)
BEGIN TRAN;
MERGE TableName AS Target
USING (SELECT * FROM #TEMP) AS Source
ON (Target.YourTableId = Source.YourTextFileFieldId)
-- In the above line we are checking if the particular row exists in the table(Table1) then update the Table1 if not then insert the new row in Table-1.
WHEN MATCHED THEN
UPDATE SET
Target.PID= Source.PID, Target.x= Source.x, Target.y= Source.y
WHEN NOT MATCHED BY TARGET THEN
-- Insert statement
You can use this above approach to solve your problem. Hope this helps. :)
How are you going to run it ? From a stored procedure ?
To save some performance, I would have done BULK INSERT to temp table, then insert from temp table to Table 1 & 2.
It should look like this
INSERT INTO Table1 ( PID, X, Y)
SELECT PID, X, Y
FROM #tempTable
Some will tell that temp table are not good, but it really depend - if you file is big, reading it from disk will take time and you don't want to do it twice.
You don't need any loop to update table 2; all you need is insert from table 1.
Or, if you are trying to update existing rows in table 2, use an update query that joins on table 1. See this question for an example.
However, you should consider changing your database design, as it appears to be incorrect: you are storing X and Y in two places; they should only be stored in one table, and you should join to this table if you need to use them in conjunction with other data. If you did this, you wouldn't have to worry about messy issues of keeping the two tables in sync.

Reference a column and update it in the same statement

I have a table that I need to move a value from one column to another then set a new value in the original column.
declare #foo table (A int, B int);
insert into #Foo select 1, 0;
update #Foo
set B = A,
A = 2;
After the update does B always contain 1 or is this non deterministic behavior and it will sometimes have a value of 2 due to A being updated first (and all my tests have just never hit just right the conditions to have it be 2)?
As a followup question if the answer is "B will always be 1", if I do the following will I still get the same result?
update #Foo
set A = 2,
B = A;
When doing an update, SQL Server keeps two versions of the row. The old one (aka deleted) and the new one (aka inserted.) The assignment itself only modifies the inserted row. All assignments use the same variable values from the deleted row. The order of the assignment does not matter.
You can visualize this using the output clause:
declare #t table (a int, b int)
insert #t values (1,2), (2,3)
update #t
set b=a
, a=b
output inserted.*
, deleted.*
select * from #t
B will always contain 1 and the order of the clauses makes no difference. Conceptually the operations happen "all at once".
The following also works in SQL to swap two values without requiring any intermediate variable.
update #Foo
set A = B,
B = A;

At what point should a Table Variable be abandoned for a Temp Table?

Is there a row count that makes table variable's inefficient or what? I understand the differences between the two and I've seen some different figures on when that point is reached, but I'm curious if anyone knows.
When you need other indices on the table other than those that can be created on a temp table variable, or, for larger datasets (which are not likely to be persisted in available memory), when the table width (number of bytes per row) exceeds some threshold (This is because the number or rows of data per I/O page shrinks and the performance decreases... or if the changes you plan on making to the dataset need to be part of a multi-statement transaction which may need to be rolled back. (changes to Table variables are not written to the transaction log, changes to temp tables are...)
this code demonstrates that table variables are not stored in Transaction log:
create table #T (s varchar(128))
declare #T table (s varchar(128))
insert into #T select 'old value #'
insert into #T select 'old value #'
begin transaction
update #T set s='new value #'
update #T set s='new value #'
rollback transaction
select * from #T
select * from #T
Internally, table variables can be instantiated in tempdb as well as the temporary tables.
They differ in scope and persistence only.
Contrary to the popular belief, operations to the temp tables do affect the transaction log, despite the fact they are not subject to the transaction control.
To check it, run this simple query:
DECLARE #mytable TABLE (id INT NOT NULL PRIMARY KEY)
;
WITH q(num) AS
(
SELECT 1
UNION ALL
SELECT num + 1
FROM q
WHERE num <= 42
)
INSERT
INTO #mytable (id)
SELECT num
FROM q
OPTION (MAXRECURSION 0)
DBCC LOG(tempdb, -1)
GO
DBCC LOG(tempdb, -1)
GO
and browse the last entries from both recordsets.
In the first recordset, you will see 42 LOP_INSERT_ROWS entries.
In the second recordset (which is in another batch) you will see 42 LOP_DELETE_ROWS entries.
They are the result of the table variable getting out of scope and its record being deleted.

Possible to implement a manual increment with just simple SQL INSERT?

I have a primary key that I don't want to auto increment (for various reasons) and so I'm looking for a way to simply increment that field when I INSERT. By simply, I mean without stored procedures and without triggers, so just a series of SQL commands (preferably one command).
Here is what I have tried thus far:
BEGIN TRAN
INSERT INTO Table1(id, data_field)
VALUES ( (SELECT (MAX(id) + 1) FROM Table1), '[blob of data]');
COMMIT TRAN;
* Data abstracted to use generic names and identifiers
However, when executed, the command errors, saying that
"Subqueries are not allowed in this
context. only scalar expressions are
allowed"
So, how can I do this/what am I doing wrong?
EDIT: Since it was pointed out as a consideration, the table to be inserted into is guaranteed to have at least 1 row already.
You understand that you will have collisions right?
you need to do something like this and this might cause deadlocks so be very sure what you are trying to accomplish here
DECLARE #id int
BEGIN TRAN
SELECT #id = MAX(id) + 1 FROM Table1 WITH (UPDLOCK, HOLDLOCK)
INSERT INTO Table1(id, data_field)
VALUES (#id ,'[blob of data]')
COMMIT TRAN
To explain the collision thing, I have provided some code
first create this table and insert one row
CREATE TABLE Table1(id int primary key not null, data_field char(100))
GO
Insert Table1 values(1,'[blob of data]')
Go
Now open up two query windows and run this at the same time
declare #i int
set #i =1
while #i < 10000
begin
BEGIN TRAN
INSERT INTO Table1(id, data_field)
SELECT MAX(id) + 1, '[blob of data]' FROM Table1
COMMIT TRAN;
set #i =#i + 1
end
You will see a bunch of these
Server: Msg 2627, Level 14, State 1, Line 7
Violation of PRIMARY KEY constraint 'PK__Table1__3213E83F2962141D'. Cannot insert duplicate key in object 'dbo.Table1'.
The statement has been terminated.
Try this instead:
INSERT INTO Table1 (id, data_field)
SELECT id, '[blob of data]' FROM (SELECT MAX(id) + 1 as id FROM Table1) tbl
I wouldn't recommend doing it that way for any number of reasons though (performance, transaction safety, etc)
It could be because there are no records so the sub query is returning NULL...try
INSERT INTO tblTest(RecordID, Text)
VALUES ((SELECT ISNULL(MAX(RecordID), 0) + 1 FROM tblTest), 'asdf')
I don't know if somebody is still looking for an answer but here is a solution that seems to work:
-- Preparation: execute only once
CREATE TABLE Test (Value int)
CREATE TABLE Lock (LockID uniqueidentifier)
INSERT INTO Lock SELECT NEWID()
-- Real insert
BEGIN TRAN LockTran
-- Lock an object to block simultaneous calls.
UPDATE Lock WITH(TABLOCK)
SET LockID = LockID
INSERT INTO Test
SELECT ISNULL(MAX(T.Value), 0) + 1
FROM Test T
COMMIT TRAN LockTran
We have a similar situation where we needed to increment and could not have gaps in the numbers. (If you use an identity value and a transaction is rolled back, that number will not be inserted and you will have gaps because the identity value does not roll back.)
We created a separate table for last number used and seeded it with 0.
Our insert takes a few steps.
--increment the number
Update dbo.NumberTable
set number = number + 1
--find out what the incremented number is
select #number = number
from dbo.NumberTable
--use the number
insert into dbo.MyTable using the #number
commit or rollback
This causes simultaneous transactions to process in a single line as each concurrent transaction will wait because the NumberTable is locked. As soon as the waiting transaction gets the lock, it increments the current value and locks it from others. That current value is the last number used and if a transaction is rolled back, the NumberTable update is also rolled back so there are no gaps.
Hope that helps.
Another way to cause single file execution is to use a SQL application lock. We have used that approach for longer running processes like synchronizing data between systems so only one synchronizing process can run at a time.
If you're doing it in a trigger, you could make sure it's an "INSTEAD OF" trigger and do it in a couple of statements:
DECLARE #next INT
SET #next = (SELECT (MAX(id) + 1) FROM Table1)
INSERT INTO Table1
VALUES (#next, inserted.datablob)
The only thing you'd have to be careful about is concurrency - if two rows are inserted at the same time, they could attempt to use the same value for #next, causing a conflict.
Does this accomplish what you want?
It seems very odd to do this sort of thing w/o an IDENTITY (auto-increment) column, making me question the architecture itself. I mean, seriously, this is the perfect situation for an IDENTITY column. It might help us answer your question if you'd explain the reasoning behind this decision. =)
Having said that, some options are:
using an INSTEAD OF trigger for this purpose. So, you'd do your INSERT (the INSERT statement would not need to pass in an ID). The trigger code would handle inserting the appropriate ID. You'd need to use the WITH (UPDLOCK, HOLDLOCK) syntax used by another answerer to hold the lock for the duration of the trigger (which is implicitly wrapped in a transaction) & to elevate the lock type from "shared" to "update" lock (IIRC).
you can use the idea above, but have a table whose purpose is to store the last, max value inserted into the table. So, once the table is set up, you would no longer have to do a SELECT MAX(ID) every time. You'd simply increment the value in the table. This is safe provided that you use appropriate locking (as discussed). Again, that avoids repeated table scans every time you INSERT.
use GUIDs instead of IDs. It's much easier to merge tables across databases, since the GUIDs will always be unique (whereas records across databases will have conflicting integer IDs). To avoid page splitting, sequential GUIDs can be used. This is only beneficial if you might need to do database merging.
Use a stored proc in lieu of the trigger approach (since triggers are to be avoided, for some reason). You'd still have the locking issue (and the performance problems that can arise). But sprocs are preferred over dynamic SQL (in the context of applications), and are often much more performant.
Sorry about rambling. Hope that helps.
How about creating a separate table to maintain the counter? It has better performance than MAX(id), as it will be O(1). MAX(id) is at best O(lgn) depending on the implementation.
And then when you need to insert, simply lock the counter table for reading the counter and increment the counter. Then you can release the lock and insert to your table with the incremented counter value.
Have a separate table where you keep your latest ID and for every transaction get a new one.
It may be a bit slower but it should work.
DECLARE #NEWID INT
BEGIN TRAN
UPDATE TABLE SET ID=ID+1
SELECT #NEWID=ID FROM TABLE
COMMIT TRAN
PRINT #NEWID -- Do what you want with your new ID
Code without any transaction scope (I use it in my engineer course as an exercice) :
-- Preparation: execute only once
CREATE TABLE increment (val int);
INSERT INTO increment VALUES (1);
-- Real insert
DECLARE #newIncrement INT;
UPDATE increment
SET #newIncrement = val,
val = val + 1;
INSERT INTO Table1 (id, data_field)
SELECT #newIncrement, 'some data';
declare #nextId int
set #nextId = (select MAX(id)+1 from Table1)
insert into Table1(id, data_field) values (#nextId, '[blob of data]')
commit;
But perhaps a better approach would be using a scalar function getNextId('table1')
Any critiques of this? Works for me.
DECLARE #m_NewRequestID INT
, #m_IsError BIT = 1
, #m_CatchEndless INT = 0
WHILE #m_IsError = 1
BEGIN TRY
SELECT #m_NewRequestID = (SELECT ISNULL(MAX(RequestID), 0) + 1 FROM Requests)
INSERT INTO Requests ( RequestID
, RequestName
, Customer
, Comment
, CreatedFromApplication)
SELECT RequestID = #m_NewRequestID
, RequestName = dbo.ufGetNextAvailableRequestName(PatternName)
, Customer = #Customer
, Comment = [Description]
, CreatedFromApplication = #CreatedFromApplication
FROM RequestPatterns
WHERE PatternID = #PatternID
SET #m_IsError = 0
END TRY
BEGIN CATCH
SET #m_IsError = 1
SET #m_CatchEndless = #m_CatchEndless + 1
IF #m_CatchEndless > 1000
THROW 51000, '[upCreateRequestFromPattern]: Unable to get new RequestID', 1
END CATCH
This should work:
INSERT INTO Table1 (id, data_field)
SELECT (SELECT (MAX(id) + 1) FROM Table1), '[blob of data]';
Or this (substitute LIMIT for other platforms):
INSERT INTO Table1 (id, data_field)
SELECT TOP 1
MAX(id) + 1, '[blob of data]'
FROM
Table1
ORDER BY
[id] DESC;