Using temporary table in where clause - sql

I want to delete many rows with the same set of field values in some (6) tables. I could do this by deleting the result of one subquery in every table (Solution 1), which would be redundant, because the subquery would be the same every time; so I want to store the result of the subquery in a temporary table and delete the value of each row (of the temp table) in the tables (Solution 2). Which solution is the better one?
First solution:
DELETE FROM dbo.SubProtocols
WHERE ProtocolID IN (
SELECT ProtocolID
FROM dbo.Protocols
WHERE WorkplaceID = #WorkplaceID
)
DELETE FROM dbo.ProtocolHeaders
WHERE ProtocolID IN (
SELECT ProtocolID
FROM dbo.Protocols
WHERE WorkplaceID = #WorkplaceID
)
// ...
DELETE FROM dbo.Protocols
WHERE WorkplaceID = #WorkplaceID
Second Solution:
DECLARE #Protocols table(ProtocolID int NOT NULL)
INSERT INTO #Protocols
SELECT ProtocolID
FROM dbo.Protocols
WHERE WorkplaceID = #WorkplaceID
DELETE FROM dbo.SubProtocols
WHERE ProtocolID IN (
SELECT ProtocolID
FROM #Protocols
)
DELETE FROM dbo.ProtocolHeaders
WHERE ProtocolID IN (
SELECT ProtocolID
FROM #Protocols
)
// ...
DELETE FROM dbo.Protocols
WHERE WorkplaceID = #WorkplaceID
Is it possible to do solution 2 without the subquery? Say doing WHERE ProtocolID IN #Protocols (but syntactically correct)?
I am using Microsoft SQL Server 2005.

While you can avoid the subquery in SQL Server with a join, like so:
delete from sp
from subprotocols sp
inner join protocols p on
sp.protocolid = p.protocolid
and p.workspaceid = #workspaceid
You'll find that this doesn't gain you really any performance over either of your approaches. Generally, with your subquery, SQL Server 2005 optimizes that in into an inner join, since it doesn't rely on each row. Also, SQL Server will probably cache the subquery in your case, so shoving it into a temp table is most likely unnecessary.
The first way, though, would be susceptible to changes in Protocols during the transactions, where the second one wouldn't. Just something to think about.

Can try this
DELETE FROM dbo.ProtocolHeaders
FROM dbo.ProtocolHeaders INNER JOIN
dbo.Protocols ON ProtocolHeaders.ProtocolID = Protocols.ProtocolID
WHERE Protocols.WorkplaceID = #WorkplaceID

DELETE ... FROM is a T-SQL extension to the standard SQL DELETE that provides an alternative to using a subquery. From the help:
D. Using DELETE based on a subquery
and using the Transact-SQL extension
The following example shows the
Transact-SQL extension used to delete
records from a base table that is
based on a join or correlated
subquery. The first DELETE statement
shows the SQL-2003-compatible subquery
solution, and the second DELETE
statement shows the Transact-SQL
extension. Both queries remove rows
from the SalesPersonQuotaHistory table
based on the year-to-date sales stored
in the SalesPerson table.
-- SQL-2003 Standard subquery
USE AdventureWorks;
GO
DELETE FROM Sales.SalesPersonQuotaHistory
WHERE SalesPersonID IN
(SELECT SalesPersonID
FROM Sales.SalesPerson
WHERE SalesYTD > 2500000.00);
GO
-- Transact-SQL extension
USE AdventureWorks;
GO
DELETE FROM Sales.SalesPersonQuotaHistory
FROM Sales.SalesPersonQuotaHistory AS spqh
INNER JOIN Sales.SalesPerson AS sp
ON spqh.SalesPersonID = sp.SalesPersonID
WHERE sp.SalesYTD > 2500000.00;
GO
You would want, in your second solution, something like
-- untested!
DELETE FROM
dbo.SubProtocols -- ProtocolHeaders, etc
FROM
dbo.SubProtocols
INNER JOIN #Protocols ON SubProtocols.ProtocolID = #Protocols.ProtocolID
However!!
Is it not possible to alter your design so that all the susidiary protocol tables have a FOREIGN KEY with DELETE CASCADE to the main Protocols table? Then you could just DELETE from Protocols and the rest would be taken care of...
edit to add:
If you already have FOREIGN KEYs set up, you would need to use DDL to alter them (I think a drop and recreate is required) in order for them to have DELETE CASCADE turned on. Once that is in place, a DELETE from the main table will automatically DELETE related records from the child table.

Without the temp table you risk deleting different rows in the the second delete, but that takes three operations to do.
You could delete from the first table and use the OUTPUT INTO clause to insert into a temp table all the IDs, and then use that temp table to delete the second table. This will make sure you only delete the same keys with and with only two statements.
declare #x table(RowID int identity(1,1) primary key, ValueData varchar(3))
declare #y table(RowID int identity(1,1) primary key, ValueData varchar(3))
declare #temp table (RowID int)
insert into #x values ('aaa')
insert into #x values ('bab')
insert into #x values ('aac')
insert into #x values ('bad')
insert into #x values ('aae')
insert into #x values ('baf')
insert into #x values ('aag')
insert into #y values ('aaa')
insert into #y values ('bab')
insert into #y values ('aac')
insert into #y values ('bad')
insert into #y values ('aae')
insert into #y values ('baf')
insert into #y values ('aag')
DELETE #x
OUTPUT DELETED.RowID
INTO #temp
WHERE ValueData like 'a%'
DELETE y
FROM #y y
INNER JOIN #temp t ON y.RowID=t.RowID
select * from #x
select * from #y
SELECT OUTPUT:
RowID ValueData
----------- ---------
2 bab
4 bad
6 baf
(3 row(s) affected)
RowID ValueData
----------- ---------
2 bab
4 bad
6 baf
(3 row(s) affected)

Related

SQL Server : insert data set into table with an identity column

I have a table with an auto-incrementing identity column. Typically I might insert data as follows
INSERT INTO [dbo].[table]
DEFAULT VALUES;
SET #value = SCOPE_IDENTITY();
This way I know the identity value I've just inserted. However I need to insert a "set" of values into that table. Preferably also be able to identify the values I just inserted. I was hoping something similar to the following would be possible ...
INSERT INTO dbo.table DEFAULT VALUES
OUTPUT INSERTED.id INTO #output
SELECT SCOPE_IDENTITY() -- obviously this isn't possible and doesn't actually make sense
FROM #records
WHERE somecolumn IS NULL
I know I might need to set identity_insert on ... I would prefer not to if I don't have to. I am also aware that maybe I could also use some sort of recursive CTE, though I haven't used one of those in a while. Any help would be appreciated.
EDIT: to be clear the question I am asking is: how do I insert a "SET" of data into a table with an auto-incrementing identity column. And hopefully identify the values I just inserted in some way.
INSERT INTO [dbo].[table]
DEFAULT VALUES;
SET #value = SCOPE_IDENTITY();
One does not "typically" do any such thing. It would be highly unusual (to be gentle) to insert a single row that consisted of nothing but default values. And inserting hundreds or thousands of rows is even more suspicious. I think you have chosen a path that doesn't completely make sense.
But let's assume you have not lost your senses. Unfortunately, you cannot insert multiple rows using the "default values" syntax (directly or indirectly). But we can kludge together a script that "sort of" does this (with assumptions) using the output clause suggested by both Gordon and Sachin (using tally table logic here).
set nocount on;
declare #id int;
declare #outputtable table (id int);
create table #x (id int not null identity(1,1), descr varchar(20) null, dd int not null default(2));
insert #x (descr, dd) values ('test', 4), ('zork', 2), (null, 55); -- some extra fluff for demonstartion
insert #x default values;
set #id = SCOPE_IDENTITY();
select #id;
select * from #x order by id;
WITH E00(N) AS (SELECT 1 UNION ALL SELECT 1),
E02(N) AS (SELECT 1 FROM E00 a, E00 b),
E04(N) AS (SELECT 1 FROM E02 a, E02 b),
E08(N) AS (SELECT 1 FROM E04 a, E04 b),
E16(N) AS (SELECT 1 FROM E08 a, E08 b),
E32(N) AS (SELECT 1 FROM E16 a, E16 b),
cteTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY N) FROM E32)
insert #x (descr, dd)
output inserted.id into #outputtable(id)
select src.descr, src.dd
from #x as src cross join cteTally
where src.id = #id and cteTally.N < 5;
select x.*
from #outputtable as ids inner join #x as x on ids.id = x.id order by x.id;
if object_id('tempdb..#x') is not null drop table #x;
go
This might not work depending on your table DDL. I'll let you find the assumptions built into this logic.
For an identity column, there is only one way to do this that I am aware of. If you don't mind keeping the dummy around you can skip the alter table statements that add and remove it.
drop table if exists T;
create table T (
id int identity(1, 1) not null
);
alter table T add dummy bit;
insert into T (dummy)
select null
from (
values (42),(555),(911)
) v (v);
alter table T drop column dummy;
select * from T;
You are really close:
INSERT INTO dbo.table
OUTPUT INSERTED.id INTO #output
DEFAULT VALUES;
SELECT *
FROM #output;
The INSERT puts the values into #output. You can then reference them. Remember to define #output as a table variable with a column of the correct type.
Here is a rextester example of it working.
EDIT:
I thought the problem was using #output, because your sample doesn't do that correctly. If your table has a single identity column, then I don't think that SQL Server provides a single-query mechanism for inserting multiple values, unless you turn off identity insert.
One option is a loop:
CREATE TABLE t (id int identity);
DECLARE #output table (id int);
DECLARE #i int = 1;
WHILE #i < 10
BEGIN
INSERT INTO t
OUTPUT INSERTED.id INTO #output
DEFAULT VALUES;
SET #i = #i + 1;
END;
SELECT *
FROM #output;
Another option would be to include another column (even a dummy) just so you can insert something.
And finally, perhaps you don't need a table at all. Perhaps a sequence will suffice for your purposes.
Try this query --
CREATE TABLE StudentPassMarks (ID INT identity(1, 1))
DECLARE #OutputTable TABLE (ID INT)
INSERT INTO StudentPassMarks
OUTPUT inserted.ID
INTO #OutputTable(ID)
DEFAULT VALUES
SELECT * FROM #OutputTable
Go 20;
SELECT * FROM StudentPassMarks

SQL Server: automatically add a unique identifier to all rows inserted at one time

The below SQL Server code successfully calculates and inserts the monthly pay for all employees along with their staffID number and inserts it into Tablepayroll.
INSERT INTO Tablepayroll (StaffID,Totalpaid)
(SELECT Tabletimelog.StaffID , Tabletimelog.hoursworked * Tablestaff.hourlypay
FROM Tabletimelog
JOIN Tablestaff ON
Tabletimelog.StaffID = Tablestaff.StaffID)
However, I want to be able to also insert a batchIDso that you can identify each time the above insert has been run and the records inserted by it at that time. Meaning that all staff payroll calculated at the same time would have the same batchID number. Each subsequent batchID should just increase by 1.
Please see image below for visual explanation .
I think that Select MAX(batch_id) + 1 would work , but I don't know how to include it in the insert statement.
You can use subquery to find latest batch_id from your current table using this query:
INSERT INTO TablePayroll (StaffID, TotalPaid, batch_id)
SELECT T1.StaffID
, T1.HoursWorked * T2.HourlyPay
, ISNULL((SELECT MAX(batch_id) FROM TablePayRoll), 0) + 1 AS batch_id
FROM TableTimeLog AS T1
INNER JOIN TableStaff AS T2
ON T1.StaffID = T2.StaffID;
As you can see, I just add 1 to current MAX(batch_id) and that's it.
By the way, learn to use aliases. It will make your life easier
Yet another solution would be having your batch_id as a GUID, so you wouldn't have to create sequences or get MAX(batch_id) from current table.
DECLARE #batch_id UNIQUEIDENTIFIER = NEWID();
INSERT INTO TablePayroll (StaffID, TotalPaid, batch_id)
SELECT T1.StaffID, T1.HoursWorked * T2.HourlyPay, #batch_id
FROM TableTimeLog AS T1
INNER JOIN TableStaff AS T2
ON T1.StaffID = T2.StaffID;
Updated
First of all obtain the maximum value in a large table (based on the name of the table it must be big) can be very expensive. Especially if there is no index on the column batch_id
Secondly, pay attantion your solution SELECT MAX(batch_id) + 1 may behave incorrectly when you will have competitive inserts. Solution from #EvaldasBuinauskas without opening transaction and right isolation level can also lead to same batch_id if you run the two inserts at the same time in parallel.
If your SQL Server ver 2012 or higer you can try SEQUENCE. This at least ensures that no duplicates batch_id
Creating SEQUENCE:
CREATE SEQUENCE dbo.BatchID
START WITH 1
INCREMENT BY 1 ;
-- DROP SEQUENCE dbo.BatchID
GO
And using it:
DECLARE #BatchID INT
SET #BatchID = NEXT VALUE FOR dbo.BatchID;
INSERT INTO Tablepayroll (StaffID,Totalpaid, batch_id)
(SELECT Tabletimelog.StaffID , Tabletimelog.hoursworked * Tablestaff.hourlypay, #BatchID
FROM Tabletimelog
JOIN Tablestaff ON Tabletimelog.StaffID = Tablestaff.StaffID)
An alternative SEQUENCE may be additional table:
CREATE TABLE dbo.Batch (
ID INT NOT NULL IDENTITY
CONSTRAINT PK_Batch PRIMARY KEY CLUSTERED
,DT DATETIME
CONSTRAINT DF_Batch_DT DEFAULT GETDATE()
);
This solution works even on older version of the server.
DECLARE #BatchID INT
INSERT INTO dbo.Batch (DT)
VALUES (GETDATE());
SET #BatchID = SCOPE_IDENTITY();
INSERT INTO Tablepayroll (StaffID,Totalpaid, batch_id)
(SELECT Tabletimelog.StaffID , Tabletimelog.hoursworked * Tablestaff.hourlypay, #BatchID
FROM Tabletimelog ...
And yes, all of these solutions do not guarantee the absence of holes in the numbering. This can happen during a transaction rollback (deadlock for ex.)

For each inserted row create row in other table with foreign key constrain

I have 2 tables with foreign key constraint:
Table A:
[id] int identity(1, 1) PK,
[b_id] INT
and
Table B:
[id] int identity(1, 1) PK
where [b_id] refers to [id] column of Table B.
The task is:
On each insert into table A, and new record into table B and update [b_id].
Sql Server 2008 r2 is used.
Any help is appreciated.
Having misread this the first time, I am posting a totally different answer.
First if table B is the parent table, you insert into it first. Then you grab the id value and insert into table A.
It is best to do this is one transaction. Depending on what the other fields are, you can populate table A with a trigger from table B or you might need to write straight SQL code or a stored procedure to do the work.
It would be easier to describe what to do if you have a table schema for both tables. However, assuming table B only has one column and table A only has ID and B_id, this is the way the code could work (you would want to add explicit transactions for production code). The example is for a single record insert which would not happen from a trigger. Triggers should always handle multiple record inserts and it would have to be written differently then. But without knowing what the columns in the tables are it is hard to provide a good example of this.
create table #temp (id int identity)
create table #temp2 (Id int identity, b_id int)
declare #b_id int
insert into #temp default values
select #B_id = scope_identity()
insert into #temp2 (B_id)
values(#B_id)
select * from #temp2
Now the problem gets more complex if there are other columns, as you would have to provide values for them as well.
Without removing identity specification you can use the following option:
SET IDENTITY_INSERT B ON
Try this:
CREATE TRIGGER trgAfterInsert ON [dbo].[A]
FOR INSERT
AS
IF ##ROWCOUNT = 0 RETURN;
SET NOCOUNT ON;
SET IDENTITY_INSERT B ON
DECLARE #B_Id INT
SELECT #B_Id = ISNULL(MAX(Id), 0) FROM B;
WITH RES (ID, BIDS)
AS
(SELECT Id, #B_Id + ROW_NUMBER() OVER (ORDER BY Id) FROM INSERTED)
UPDATE A SET [b_Id] = BIDS
FROM A
INNER JOIN RES ON A.ID = RES.ID
INSERT INTO B (Id)
SELECT #B_Id + ROW_NUMBER() OVER (ORDER BY Id) FROM INSERTED
SET IDENTITY_INSERT B OFF
GO
Though Nadeem's answer is on the right track, his trigger for some reason takes max.id instead of NEW.id and doesn't update A accordingly.
For what you ask to be usable by trigger, you need the FK in table A to be nulleable, else you have a race condition between the tables.
EDIT: As SpectralGhost pointed out, my original code didn't support multiple rows, this one will do:
CREATE TRIGGER trgAfterInsertA ON TableA
FOR INSERT
AS
DECLARE db_cursor CURSOR FOR
SELECT id FROM INSERTED
DECLARE #an_id int
OPEN db_cursor
FETCH NEXT FROM db_cursor INTO #an_id
WHILE ##FETCH_STATUS = 0
BEGIN
INSERT INTO TableB VALUES(VALUE_PLACEHOLDER)
UPDATE TableA
SET b_id = SCOPE_IDENTITY()
WHERE id = #an_id
FETCH NEXT FROM db_cursor INTO #an_id
END
GO
The VALUE_PLACEHOLDER are the values you initialize TableB with.

Optimizing deletes from table using UDT (tsql)

SQL Server.
I have a proc that takes a user defined table (readonly) and is about 7500 records large. Using that UDT, I run about 15 different delete statements:
delete from table1
where id in (select id from #table)
delete from table2
where id in (select id from #table)
delete from table3
where id in (select id from #table)
delete from table4
where id in (select id from #table)
....
This operation, as expected, does take a while (about 7-10 minutes). These columns are indexed. However, I suspect there is a more efficient way to do this. I know deletes are traditionally slower, but I wasn't expecting this slow.
Is there a better way to do this?
You can test/try "exists" instead of "IN". I really don't like IN clauses for anything besides casual lookup-queries. (Some people will argue about IN until they are blue in the face)
Delete deleteAlias
from table1 deleteAlias
where exists ( select null from #table vart where vart.Id = deleteAlias.Id )
You can populate a #temp table instead of a #variableTable. Again, over the years, this has been trial and test it out. #variable vs #temp , most of the time, doesn't make that big of a different. But in about 4 situations I had, going to a #temp table made a big impact.
You can also experiment with putting an index on the #temp table (the "joining" column, 'Id' in this example )
IF OBJECT_ID('tempdb..#Holder') IS NOT NULL
begin
drop table #Holder
end
CREATE TABLE #Holder
(ID INT )
/* simulate your insert */
INSERT INTO #HOLDER (ID)
select 1 union all select 2 union all select 3 union all select 4
/* CREATE CLUSTERED INDEX IDX_TempHolder_ID ON #Holder (ID) */
/* optional, create an index on the "join" column of the #temp table */
CREATE INDEX IDX_TempHolder_ID ON #Holder (ID)
Delete deleteAlias
from table1 deleteAlias
where exists ( select null from #Holder holder where holder.Id = deleteAlias.Id )
IF OBJECT_ID('tempdb..#Holder') IS NOT NULL
begin
drop table #Holder
end
IMHO, there is not clear cut answer, sometimes you gotta experiment a little.
And "how your tempdb is setup' is a huge fork in the road that can affect #temp table performance. But try the suggestions above first.
And one last experiment
Delete deleteAlias
from table1 deleteAlias
where exists ( select 1 from #table vart where vart.Id = deleteAlias.Id )
change the null to "1".... once I saw this affect something. Weird, right?

Deleting duplicate rows from sqlite database

I have a huge table - 36 million rows - in SQLite3. In this very large table, there are two columns:
hash - text
d - real
Some of the rows are duplicates. That is, both hash and d have the same values. If two hashes are identical, then so are the values of d. However, two identical d's does not imply two identical hash'es.
I want to delete the duplicate rows. I don't have a primary key column.
What's the fastest way to do this?
You need a way to distinguish the rows. Based on your comment, you could use the special rowid column for that.
To delete duplicates by keeping the lowest rowid per (hash,d):
delete from YourTable
where rowid not in
(
select min(rowid)
from YourTable
group by
hash
, d
)
I guess the fastest would be to use the very database for it: add a new table with the same columns, but with proper constraints (a unique index on hash/real pair?), iterate through the original table and try to insert records in the new table, ignoring constraint violation errors (i.e. continue iterating when exceptions are raised).
Then delete the old table and rename the new to the old one.
If adding a primary key is not an option, then one approach would be to store the duplicates DISTINCT in a temp table, delete all of the duplicated records from the existing table, and then add the records back into the original table from the temp table.
For example (written for SQL Server 2008, but the technique is the same for any database):
DECLARE #original AS TABLE([hash] varchar(20), [d] float)
INSERT INTO #original VALUES('A', 1)
INSERT INTO #original VALUES('A', 2)
INSERT INTO #original VALUES('A', 1)
INSERT INTO #original VALUES('B', 1)
INSERT INTO #original VALUES('C', 1)
INSERT INTO #original VALUES('C', 1)
DECLARE #temp AS TABLE([hash] varchar(20), [d] float)
INSERT INTO #temp
SELECT [hash], [d] FROM #original
GROUP BY [hash], [d]
HAVING COUNT(*) > 1
DELETE O
FROM #original O
JOIN #temp T ON T.[hash] = O.[hash] AND T.[d] = O.[d]
INSERT INTO #original
SELECT [hash], [d] FROM #temp
SELECT * FROM #original
I'm not sure if sqlite has a ROW_NUMBER() type function, but if it does you could also try some of the approaches listed here: Delete duplicate records from a SQL table without a primary key
The proposed solution was not working for me, so I ended up doing this:
CREATE TABLE temp_table as SELECT DISTINCT * FROM your_table
DROP TABLE your_table
ALTER TABLE temp_table RENAME TO your_table