At what point should a Table Variable be abandoned for a Temp Table? - sql

Is there a row count that makes table variable's inefficient or what? I understand the differences between the two and I've seen some different figures on when that point is reached, but I'm curious if anyone knows.

When you need other indices on the table other than those that can be created on a temp table variable, or, for larger datasets (which are not likely to be persisted in available memory), when the table width (number of bytes per row) exceeds some threshold (This is because the number or rows of data per I/O page shrinks and the performance decreases... or if the changes you plan on making to the dataset need to be part of a multi-statement transaction which may need to be rolled back. (changes to Table variables are not written to the transaction log, changes to temp tables are...)
this code demonstrates that table variables are not stored in Transaction log:
create table #T (s varchar(128))
declare #T table (s varchar(128))
insert into #T select 'old value #'
insert into #T select 'old value #'
begin transaction
update #T set s='new value #'
update #T set s='new value #'
rollback transaction
select * from #T
select * from #T

Internally, table variables can be instantiated in tempdb as well as the temporary tables.
They differ in scope and persistence only.
Contrary to the popular belief, operations to the temp tables do affect the transaction log, despite the fact they are not subject to the transaction control.
To check it, run this simple query:
DECLARE #mytable TABLE (id INT NOT NULL PRIMARY KEY)
;
WITH q(num) AS
(
SELECT 1
UNION ALL
SELECT num + 1
FROM q
WHERE num <= 42
)
INSERT
INTO #mytable (id)
SELECT num
FROM q
OPTION (MAXRECURSION 0)
DBCC LOG(tempdb, -1)
GO
DBCC LOG(tempdb, -1)
GO
and browse the last entries from both recordsets.
In the first recordset, you will see 42 LOP_INSERT_ROWS entries.
In the second recordset (which is in another batch) you will see 42 LOP_DELETE_ROWS entries.
They are the result of the table variable getting out of scope and its record being deleted.

Related

How to maintain transaction integrity per common column value for inserting into two tables?

Let's say I have two table variables declared as below:
DECLARE #Table1 TABLE (
A INT,
B NVARCHAR(100)
)
DECLARE #Table2 TABLE (
A INT,
C NVARCHAR(100)
)
Here are the contents of #Table1:
1, 'Hello'
2, 'Hi'
3, 'Ola'
These are the contents of #Table2:
1, 'my old friend'
1, 'sweetheart'
2, 'buddy'
4, 'the end'
Now I want to insert #Table1 into a table X and #Table2 into a table Y. The scenario is that I have to maintain transaction integrity for the insertion into both X and Y for every same value of column A.
For instance, let's say I am inserting the first row (1,'Hello') of #Table1 into X. This means I must also insert the first two rows ((1,'my old friend'), (1,'sweetheart')) of #Table2 into Y in the same transaction. So if any insert of Y fails for A=1, X also fails for A=1. For any value of column A that is not in both #Table1 and #Table2, they are individual transactions by themselves (e.g. A=3 in #Table1 and A=4 in #Table2).
Here are the ways I see to deal with this problem:
I fetch all values of A in both #Table1 and #Table2, run a cursor over it and then for each value of A, I insert into tables X and Y in a single transaction. The issue here is first of all, I don't want to use cursors as much as possible and also, this would mean a super large number of individual inserts.
I pre-validate my #Table1 and #Table2 values and then do one single insert of #Table1 on X and #Table2 on Y. This will be much faster than the above method. But the issues I see here are that first of all, not putting it in a 'transaction' somehow doesn't seem right and also, there could be a small chance I might have missed a validation somewhere (unlikely, yet still).
Which approach should I go for? Is there a better solution?
P.S. Please also note that I do not want to fail the entire insert on X and Y if there is an issue in inserting for only one or few values of A. Also, going back and deleting tables from my DB based on the failed inserts is also not an option as it messes with the running id continuity which I am trying to avoid.
A DML statement is executed completely or not executed at all.
You can do a mix of your two options
First add as much validations as possible, if it fails run the second one using a temp table instead of a cursor
BEGIN TRY
BEGIN TRAN Opt2
--Option 2
COMMIT TRAN Opt2
END TRY
BEGIN CATCH
ROLLBACK TRAN Opt2
DECLARE #A INT, #B VARCHAR(100)
SELECT * INTO #TMP FROM #Table1
WHILE EXISTS (SELECT 1 FROM #TMP)
BEGIN
SELECT TOP 1 #A = A, #B = B FROM #Table1
-- Option 1
DELETE #Temp WHERE A = #A
END
END CATCH

Insert Values from Table Variable into already EXISTING Temp Table

I'm successfully inserting values from Table Variable into new (not yet existing table) Temp Table. Have not issues when inserting small number of rows (eg. 10,000), but when inserting into a Table Variable a lot of rows (eg. 30,000) is throws an error "Server ran out of memory and external resources).
To walk around the issue:
I split my (60,000) Table Variable rows into small batches (eg. 10,000) each, thinking I could insert new data to already existing Temp Table, but I'm getting this error message:
There is already an object named '##TempTable' in the database.
My code is:
USE MyDataBase;
Go
Declare ##TableVariable TABLE
(
[ID] bigint PRIMARY KEY,
[BLD_ID] int NOT NULL
-- 25 more columns
)
Insert Into ##TableVariable VALUES
(1,25),
(2,30)
-- 61,000 more rows
Select * Into #TempTable From ##TableVariable;
Select Count(*) From #TempTable;
Below is the error message I'm getting
The problem is that SELECT INTO wants to create the destination table, so at second run you get the error.
first you have to create the #TempTable:
/* this creates the temptable copying the #TableVariable structure*/
Select *
Into #TempTable
From #TableVariable
where 1=0;
now you can loop through your batches and call this insert as many times you want..
insert Into #TempTable
Select * From #TableVariable;
pay attention that #TempTable is different from ##TempTable ( # = Local, ## = Global ) and remember to drop it when you have finished.
also you should NOT use ## for you table variable, use only #TableVariable
I hope this help

Can someone give me a real time example with the below temp table and tablevariable example that I found in stackexchange

Difference between temp table and table variable as stated:
Operations on #table_variables are carried out as system transactions, independent of any outer user transaction, whereas the equivalent #temp table operations would be carried out as part of the user transaction itself. For this reason a ROLLBACKcommand will affect a #temp table but leave the #table_variable untouched.
DECLARE #T TABLE(X INT)
CREATE TABLE #T(X INT)
BEGIN TRAN
INSERT #T
OUTPUT INSERTED.X INTO #T
VALUES(1),(2),(3)
/*Both have 3 rows*/
SELECT * FROM #T
SELECT * FROM #T
ROLLBACK
/*Only table variable now has rows*/
SELECT * FROM #T
SELECT * FROM #T
DROP TABLE #T
Can anyone tell me when will this above mentioned application/scenario will be used in real time? Can anyone give a real time example. Thanks
P.S. - Referred from this link: https://dba.stackexchange.com/questions/16385/whats-the-difference-between-a-temp-table-and-table-variable-in-sql-server/16386#16386
In a real example, just consider you have a transaction and for somehow your transaction rollbacks but you still want to log and see why the transaction is failed and try to keep the log until you execute the transaction without any rollbacks.
In this example, you can capture all your logs information into a table variable.
What is going on is that the developer is demonstrating the use of a temporary table (which for most intents is the same as a regular table) and a variable that is a table. When a rollback occurs any changes made to the temporary table is undone (the table is in the same state as before transaction started) but the variable is changed - its not affected by the rollback.

Does anyone know a neat trick for reusing identity values?

Typically when you specify an identity column you get a convenient interface in SQL Server for asking for particular row.
SELECT * FROM $IDENTITY = #pID
You don't really need to concern yourself with the name if the identity column because there can only be one.
But what if I have a table which mostly consists of temporary data. Lots of inserts and lots of deletes. Is there a simple way for me to reuse the identity values.
Preferably I would want to be able to write a function that would return say NEXT_SMALLEST($IDENTITY) as next identity value and do so in a fail-safe manner.
Basically find the smallest value that's not in use. That's not entirely trivial to do, but what I want is to be able to tell SQL Server that this is my function that will generate the identity values. But what I know is that no such function exists...
I want to...
Implement global data base IDs, I need to provide a default value that I'm in control of.
My idea was based around that I should be able to have a table with all known IDs and then every row ID from some other table that needed a global ID would reference that table. The default value would be provided by something like
INSERT INTO GlobalID
RETURN SCOPE_IDENTITY()
No; it's not unique if it can be reused.
Why do you want to re-use them? Why do you concern yourself with this field? If you want to be in control of it, don't make it an identity; create your own scheme and use that.
Don't reuse identities, you'll just shoot your self in the foot. Use a large enough value so that it never rolls over (64 bit big int).
To find missing gaps in a sequence of numbers join the table against itself with a +/- 1 difference:
SELECT a.id
FROM table AS a
LEFT OUTER JOIN table AS b ON a.id = b.id+1
WHERE b.id IS NULL;
This query will find the numbers in the id sequence for which id-1 is not in the table, ie. contiguous sequence start numbers. You can then use SET IDENTITY INSERT OFF to insert a specific id and reuse a number. The cost of doing so is overwhelming (both runtime and code complexity) compared with the an ordinary identity based insert.
If you really want to reset Identity value to the lowest,
here is the trick you can use through DBCC CHECKIDENT
Basically following sql statements resets identity value so that identity value restarts from the lowest possible number
create table TT (id int identity(1, 1))
GO
insert TT default values
GO 10
select * from TT
GO
delete TT where id between 5 and 10
GO
--; At this point, next ID will be 11, not 5
select * from TT
GO
insert TT default values
GO
--; as you can see here, next ID is indeed 11
select * from TT
GO
--; Now delete ID = 11
--; so that we can reseed next highest ID to 5
delete TT where id = 11
GO
--; Now, let''s reseed identity value to the lowest possible identity number
declare #seedID int
select #seedID = max(id) from TT
print #seedID --; 4
--; We reseed identity column with "DBCC CheckIdent" and pass a new seed value
--; But we can't pass a seed number as argument, so let's use dynamic sql.
declare #sql nvarchar(200)
set #sql = 'dbcc checkident(TT, reseed, ' + cast(#seedID as varchar) + ')'
exec sp_sqlexec #sql
GO
--; Now the next
insert TT default values
GO
--; as you can see here, next ID is indeed 5
select * from TT
GO
I guess we would really need to know why you want to reuse your identity column. The only reason I can think of is because of the temporary nature of your data you might exhaust the possible values for the identity. That is not really likely, but if that is your concern, you can use uniqueidentifiers (guids) as the primary key in your table instead.
The function newid() will create a new guid and can be used in insert statements (or other statements). Then when you delete the row, you don't have any "holes" in your key because guids are not created in that order anyway.
[Syntax assumes SQL2008....]
Yes, it's possible. You need to two management tables, and two triggers on each participating table.
First, the management tables:
-- this table should only ever have one row
CREATE TABLE NextId (Id INT)
INSERT NextId VALUES (1)
GO
CREATE TABLE RecoveredIds (Id INT NOT NULL PRIMARY KEY)
GO
Then, the triggers, two on each table:
CREATE TRIGGER tr_TableName_RecoverId ON TableName
FOR DELETE AS BEGIN
IF ##ROWCOUNT = 0 RETURN
INSERT RecoveredIds (Id) SELECT Id FROM deleted
END
GO
CREATE TRIGGER tr_TableName_AssignId ON TableName
INSTEAD OF INSERT AS BEGIN
DECLARE #rowcount INT = ##ROWCOUNT
IF #rowcount = 0 RETURN
DECLARE #required INT = #rowcount
DECLARE #new_ids TABLE (Id INT PRIMARY KEY)
DELETE TOP (#required) OUTPUT DELETED.Id INTO #new_ids (Id) FROM RecoveredIds
SET #rowcount = ##ROWCOUNT
IF #rowcount < #required BEGIN
DECLARE #output TABLE (Id INT)
UPDATE NextId SET Id = Id + (#required-#rowcount)
OUTPUT DELETED.Id INTO #output
-- this assumes you have a numbers table around somewhere
INSERT #new_ids (Id)
SELECT n.Number+o.Id-1 FROM Numbers n, #output o
WHERE n.Number BETWEEN 1 AND #required-#rowcount
END
SET IDENTITY_INSERT TableName ON
;WITH inserted_CTE AS (SELECT _no = ROW_NUMBER() OVER (ORDER BY Id), * FROM inserted)
, new_ids_CTE AS (SELECT _no = ROW_NUMBER() OVER (ORDER BY Id), * FROM #new_ids)
INSERT TableName (Id, Attr1, Attr2)
SELECT n.Id, i.Attr1, i.Attr2
FROM inserted_CTE i JOIN new_ids_CTE n ON i._no = n._no
SET IDENTITY_INSERT TableName OFF
END
You could script the triggers out easily enough from system tables.
You would want to test this for concurrency. It should work as is, syntax errors notwithstanding: The OUTPUT clause guarantees atomicity of id lookup->increment as one step, and the entire operation occurs within a transaction, thanks to the trigger.
TableName.Id is still an identity column. All the common idioms like $IDENTITY and SCOPE_IDENTITY() will still work.
There is no central table of ids by table, but you could create one easily enough.
I don't have any help for finding the values not in use but if you really want to find them and set them yourself, you can use
set identity_insert on ....
in your code to do so.
I'm with everyone else though. Why bother? Don't you have a business problem to solve?

Possible to implement a manual increment with just simple SQL INSERT?

I have a primary key that I don't want to auto increment (for various reasons) and so I'm looking for a way to simply increment that field when I INSERT. By simply, I mean without stored procedures and without triggers, so just a series of SQL commands (preferably one command).
Here is what I have tried thus far:
BEGIN TRAN
INSERT INTO Table1(id, data_field)
VALUES ( (SELECT (MAX(id) + 1) FROM Table1), '[blob of data]');
COMMIT TRAN;
* Data abstracted to use generic names and identifiers
However, when executed, the command errors, saying that
"Subqueries are not allowed in this
context. only scalar expressions are
allowed"
So, how can I do this/what am I doing wrong?
EDIT: Since it was pointed out as a consideration, the table to be inserted into is guaranteed to have at least 1 row already.
You understand that you will have collisions right?
you need to do something like this and this might cause deadlocks so be very sure what you are trying to accomplish here
DECLARE #id int
BEGIN TRAN
SELECT #id = MAX(id) + 1 FROM Table1 WITH (UPDLOCK, HOLDLOCK)
INSERT INTO Table1(id, data_field)
VALUES (#id ,'[blob of data]')
COMMIT TRAN
To explain the collision thing, I have provided some code
first create this table and insert one row
CREATE TABLE Table1(id int primary key not null, data_field char(100))
GO
Insert Table1 values(1,'[blob of data]')
Go
Now open up two query windows and run this at the same time
declare #i int
set #i =1
while #i < 10000
begin
BEGIN TRAN
INSERT INTO Table1(id, data_field)
SELECT MAX(id) + 1, '[blob of data]' FROM Table1
COMMIT TRAN;
set #i =#i + 1
end
You will see a bunch of these
Server: Msg 2627, Level 14, State 1, Line 7
Violation of PRIMARY KEY constraint 'PK__Table1__3213E83F2962141D'. Cannot insert duplicate key in object 'dbo.Table1'.
The statement has been terminated.
Try this instead:
INSERT INTO Table1 (id, data_field)
SELECT id, '[blob of data]' FROM (SELECT MAX(id) + 1 as id FROM Table1) tbl
I wouldn't recommend doing it that way for any number of reasons though (performance, transaction safety, etc)
It could be because there are no records so the sub query is returning NULL...try
INSERT INTO tblTest(RecordID, Text)
VALUES ((SELECT ISNULL(MAX(RecordID), 0) + 1 FROM tblTest), 'asdf')
I don't know if somebody is still looking for an answer but here is a solution that seems to work:
-- Preparation: execute only once
CREATE TABLE Test (Value int)
CREATE TABLE Lock (LockID uniqueidentifier)
INSERT INTO Lock SELECT NEWID()
-- Real insert
BEGIN TRAN LockTran
-- Lock an object to block simultaneous calls.
UPDATE Lock WITH(TABLOCK)
SET LockID = LockID
INSERT INTO Test
SELECT ISNULL(MAX(T.Value), 0) + 1
FROM Test T
COMMIT TRAN LockTran
We have a similar situation where we needed to increment and could not have gaps in the numbers. (If you use an identity value and a transaction is rolled back, that number will not be inserted and you will have gaps because the identity value does not roll back.)
We created a separate table for last number used and seeded it with 0.
Our insert takes a few steps.
--increment the number
Update dbo.NumberTable
set number = number + 1
--find out what the incremented number is
select #number = number
from dbo.NumberTable
--use the number
insert into dbo.MyTable using the #number
commit or rollback
This causes simultaneous transactions to process in a single line as each concurrent transaction will wait because the NumberTable is locked. As soon as the waiting transaction gets the lock, it increments the current value and locks it from others. That current value is the last number used and if a transaction is rolled back, the NumberTable update is also rolled back so there are no gaps.
Hope that helps.
Another way to cause single file execution is to use a SQL application lock. We have used that approach for longer running processes like synchronizing data between systems so only one synchronizing process can run at a time.
If you're doing it in a trigger, you could make sure it's an "INSTEAD OF" trigger and do it in a couple of statements:
DECLARE #next INT
SET #next = (SELECT (MAX(id) + 1) FROM Table1)
INSERT INTO Table1
VALUES (#next, inserted.datablob)
The only thing you'd have to be careful about is concurrency - if two rows are inserted at the same time, they could attempt to use the same value for #next, causing a conflict.
Does this accomplish what you want?
It seems very odd to do this sort of thing w/o an IDENTITY (auto-increment) column, making me question the architecture itself. I mean, seriously, this is the perfect situation for an IDENTITY column. It might help us answer your question if you'd explain the reasoning behind this decision. =)
Having said that, some options are:
using an INSTEAD OF trigger for this purpose. So, you'd do your INSERT (the INSERT statement would not need to pass in an ID). The trigger code would handle inserting the appropriate ID. You'd need to use the WITH (UPDLOCK, HOLDLOCK) syntax used by another answerer to hold the lock for the duration of the trigger (which is implicitly wrapped in a transaction) & to elevate the lock type from "shared" to "update" lock (IIRC).
you can use the idea above, but have a table whose purpose is to store the last, max value inserted into the table. So, once the table is set up, you would no longer have to do a SELECT MAX(ID) every time. You'd simply increment the value in the table. This is safe provided that you use appropriate locking (as discussed). Again, that avoids repeated table scans every time you INSERT.
use GUIDs instead of IDs. It's much easier to merge tables across databases, since the GUIDs will always be unique (whereas records across databases will have conflicting integer IDs). To avoid page splitting, sequential GUIDs can be used. This is only beneficial if you might need to do database merging.
Use a stored proc in lieu of the trigger approach (since triggers are to be avoided, for some reason). You'd still have the locking issue (and the performance problems that can arise). But sprocs are preferred over dynamic SQL (in the context of applications), and are often much more performant.
Sorry about rambling. Hope that helps.
How about creating a separate table to maintain the counter? It has better performance than MAX(id), as it will be O(1). MAX(id) is at best O(lgn) depending on the implementation.
And then when you need to insert, simply lock the counter table for reading the counter and increment the counter. Then you can release the lock and insert to your table with the incremented counter value.
Have a separate table where you keep your latest ID and for every transaction get a new one.
It may be a bit slower but it should work.
DECLARE #NEWID INT
BEGIN TRAN
UPDATE TABLE SET ID=ID+1
SELECT #NEWID=ID FROM TABLE
COMMIT TRAN
PRINT #NEWID -- Do what you want with your new ID
Code without any transaction scope (I use it in my engineer course as an exercice) :
-- Preparation: execute only once
CREATE TABLE increment (val int);
INSERT INTO increment VALUES (1);
-- Real insert
DECLARE #newIncrement INT;
UPDATE increment
SET #newIncrement = val,
val = val + 1;
INSERT INTO Table1 (id, data_field)
SELECT #newIncrement, 'some data';
declare #nextId int
set #nextId = (select MAX(id)+1 from Table1)
insert into Table1(id, data_field) values (#nextId, '[blob of data]')
commit;
But perhaps a better approach would be using a scalar function getNextId('table1')
Any critiques of this? Works for me.
DECLARE #m_NewRequestID INT
, #m_IsError BIT = 1
, #m_CatchEndless INT = 0
WHILE #m_IsError = 1
BEGIN TRY
SELECT #m_NewRequestID = (SELECT ISNULL(MAX(RequestID), 0) + 1 FROM Requests)
INSERT INTO Requests ( RequestID
, RequestName
, Customer
, Comment
, CreatedFromApplication)
SELECT RequestID = #m_NewRequestID
, RequestName = dbo.ufGetNextAvailableRequestName(PatternName)
, Customer = #Customer
, Comment = [Description]
, CreatedFromApplication = #CreatedFromApplication
FROM RequestPatterns
WHERE PatternID = #PatternID
SET #m_IsError = 0
END TRY
BEGIN CATCH
SET #m_IsError = 1
SET #m_CatchEndless = #m_CatchEndless + 1
IF #m_CatchEndless > 1000
THROW 51000, '[upCreateRequestFromPattern]: Unable to get new RequestID', 1
END CATCH
This should work:
INSERT INTO Table1 (id, data_field)
SELECT (SELECT (MAX(id) + 1) FROM Table1), '[blob of data]';
Or this (substitute LIMIT for other platforms):
INSERT INTO Table1 (id, data_field)
SELECT TOP 1
MAX(id) + 1, '[blob of data]'
FROM
Table1
ORDER BY
[id] DESC;