Very slow DELETE query - sql

I have problems with SQL performance. For sudden reason the following queries are very slow:
I have two lists which contains Id's of a certain table. I need to delete all records from the first list if the Id's already exists in the second list:
DECLARE #IdList1 TABLE(Id INT)
DECLARE #IdList2 TABLE(Id INT)
-- Approach 1
DELETE list1
FROM #IdList1 list1
INNER JOIN #IdList2 list2 ON list1.Id = list2.Id
-- Approach 2
DELETE FROM #IdList1
WHERE Id IN (SELECT Id FROM #IdList2)
It is possible the two lists contains more than 10.000 records. In that case both queries takes each more than 20 seconds to execute.
The execution plan also showed something I don't understand. Maybe that explains why it is so slow:
I Filled both lists with 10.000 sequential integers so both list contained value 1-10.000 as starting point.
As you can see both queries shows for #IdList2 Actual Number of Rows is 50.005.000!!. #IdList1 is correct (Actual Number of Rows is 10.000)
I know there are other solutions how to solve this. Like filling a third list instaed of removing from first list. But my question is:
Why are these delete queries so slow and why do I see these strange query plans?

Add a Primary key to your table variables and watch them scream
DECLARE #IdList1 TABLE(Id INT primary Key not null)
DECLARE #IdList2 TABLE(Id INT primary Key not null)
because there's no index on these table variables, any joins or subqueries must examine on the order of 10,000 times 10,000 = 100,000,000 pairs of values.

SQL Server compiles the plan when the table variable is empty and does not recompile it when rows are added. Try
DELETE FROM #IdList1
WHERE Id IN (SELECT Id FROM #IdList2)
OPTION (RECOMPILE)
This will take account of the actual number of rows contained in the table variable and get rid of the nested loops plan
Of course creating an index on Id via a constraint may well be beneficial for other queries using the table variable too.

The tables in table variables can have primary keys, so if your data supports uniqueness for these Ids, you may be able to improve performance by going for
DECLARE #IdList1 TABLE(Id INT PRIMARY KEY)
DECLARE #IdList2 TABLE(Id INT PRIMARY KEY)

Possible solutions:
1) Try to create indices thus
1.1) If List{1|2}.Id column has unique values then you could define a unique clustered index using a PK constraint like this:
DECLARE #IdList1 TABLE(Id INT PRIMARY KEY);
DECLARE #IdList2 TABLE(Id INT PRIMARY KEY);
1.2) If List{1|2}.Id column may have duplicate values then you could define a unique clustered index using a PK constraint using a dummy IDENTITY column like this:
DECLARE #IdList1 TABLE(Id INT, DummyID INT IDENTITY, PRIMARY KEY (ID, DummyID) );
DECLARE #IdList2 TABLE(Id INT, DummyID INT IDENTITY, PRIMARY KEY (ID, DummyID) );
2) Try to add HASH JOIN query hint like this:
DELETE list1
FROM #IdList1 list1
INNER JOIN #IdList2 list2 ON list1.Id = list2.Id
OPTION (HASH JOIN);

You are using Table Variables, either add a primary key to the table or change them to Temporary Tables and add an INDEX. This will result in much more performance. As a rule of thumb, if the table is only small, use TABLE Variables, however if the table is expanding and contains a lot of data then either use a temp table.

I'd be tempted to try
DECLARE #IdList3 TABLE(Id INT);
INSERT #IdList3
SELECT Id FROM #IDList1 ORDER BY Id
EXCEPT
SELECT Id FROM #IDList2 ORDER BY Id
No deleting required.

Try this alternate syntax:
DELETE deleteAlias
FROM #IdList1 deleteAlias
WHERE EXISTS (
SELECT NULL
FROM #IdList2 innerList2Alias
WHERE innerList2Alias.id=deleteAlias.id
)
EDIT.....................
Try using #temp tables with indexes instead.
Here is a generic example where "DepartmentKey" is the PK and the FK.
IF OBJECT_ID('tempdb..#Department') IS NOT NULL
begin
drop table #Department
end
CREATE TABLE #Department
(
DepartmentKey int ,
DepartmentName varchar(12)
)
CREATE INDEX IX_TEMPTABLE_Department_DepartmentKey ON #Department (DepartmentKey)
IF OBJECT_ID('tempdb..#Employee') IS NOT NULL
begin
drop table #Employee
end
CREATE TABLE #Employee
(
EmployeeKey int ,
DepartmentKey int ,
SSN varchar(11)
)
CREATE INDEX IX_TEMPTABLE_Employee_DepartmentKey ON #Employee (DepartmentKey)
Delete deleteAlias
from #Department deleteAlias
where exists ( select null from #Employee innerE where innerE.DepartmentKey = deleteAlias.DepartmentKey )
IF OBJECT_ID('tempdb..#Employee') IS NOT NULL
begin
drop table #Employee
end
IF OBJECT_ID('tempdb..#Department') IS NOT NULL
begin
drop table #Department
end

Related

Can I insert into multiple related tables in a single statement?

I have two related tables something like this:
CREATE TABLE test.items
(
id INT identity(1,1) PRIMARY KEY,
type VARCHAR(max),
price NUMERIC(6,2)
);
CREATE TABLE test.books
(
id INT PRIMARY KEY REFERENCES test.items(id),
title VARCHAR(max),
author VARCHAR(max)
);
Is it possible to insert into both tables using a single SQL statement?
In PostgreSQL, I can use something like this:
-- PostgreSQL:
WITH item AS (INSERT INTO test.items(type,price) VALUES('book',12.5) RETURNING id)
INSERT INTO test.books(id,title) SELECT id,'Good Omens' FROM item;
but apparently SQL Server limits CTEs to SELECT statements, so that won’t work.
In principle, I could use the OUTPUT clause this way:
-- SQL Server:
INSERT INTO test.items(type, price)
OUTPUT inserted.id, 'Good Omens' INTO test.books(id,title)
VALUES ('book', 12.5);
but this doesn’t work if there’s a foreign key involved, as above.
I know about using variables and procedures, but I wondered whether there is a simple single-statement approach.
You can using dynamic sql as follows. Although its awkward to construct query like this.
CREATE TABLE dbo.items (
id INT identity(1,1) PRIMARY KEY,
type VARCHAR(max),
price NUMERIC(6,2)
);
CREATE TABLE dbo.books (
id INT PRIMARY KEY REFERENCES dbo.items(id),
title VARCHAR(max),
author VARCHAR(max)
);
insert into dbo.books(id,title)
exec ('insert into dbo.items(type,price) output inserted.id,''Good Omen'' VALUES(''book'',12.5)')

Having troubles with Identity field of SQL-SERVER

I'm doing a school project about a school theme where I need to create some tables for Students, Classes, Programmes...
I want to add a Group to determined classes with an auto increment in group_id however I wanted the group_id variable to reset if I change any of those attributes(Classes_id,courses_acronym,year_Semesters) how can I reset it every time any of those change??
Here is my table:
CREATE TABLE Classes_Groups(
Classes_id varchar(2),
Group_id INT IDENTITY(1,1),
courses_acronym varchar(4),
year_Semesters varchar(5),
FOREIGN KEY (Classes_id, year_Semesters,courses_acronym) REFERENCES Classes(id,year_Semesters, courses_acronym),
PRIMARY KEY(Classes_id,courses_acronym,year_Semesters,Group_id)
);
Normally, you do not (need to) reset the identity column of a table. An identity column is used to create unique values for every single record in a table.
So you want to generate entries in your groups table based on new entries in your classes table. You might create a trigger on your classes table for that purpose.
Since Group_id is already unique by itself (because of its IDENTITY), you do not need other fields in the primary key at all. Instead, you may create a separate UNIQUE constraint for the combination (Classes_id, courses_acronym, year_Semesters) if you need it.
And if the id field of your classes table is an IDENTITY column too, you could define a primary key in your classes table solely on that id field. And then your foreign key constraint in your new groups table can only include that Classes_id field.)
So much for now. I guess that your database design needs some more additional tuning and tweaking. ;)
where are you setting the values from?, you can have a stored proc and in your query have the columns have an initial value set when stored proc is hit assuming there are values at the beginning
.Then use an IF statement.
declare #initial_Classes_id varchar(2) = --initial value inserted
declare #initial_courses_acronym varchar(4) = --initial value inserted
declare #initial_year_Semesters varchar(5) = --initial value inserted
declare #compare_Classes_id varchar(2) = (select top 1 Classes_id from Classes_Groups order by --PK column desc for last insert); l would add Dateadded and then order with last insert date
declare #compare_courses_acronym varchar(2) = (select top 1 Classes_id from Classes_Groups where Classes_id = #compare_Classes_id);
declare #compare_year_Semesters varchar(2) = (select top 1 Classes_id from Classes_Groups where Classes_id = #compare_Classes_id);
IF (#initial_Classes_id != #compare_Classes_id OR #initial_courses_acronym != #compare_courses_acronym OR #initial_year_Semesters != #compare_year_Semesters)
BEGIN
DBCC CHECKIDENT ('Group_id', RESEED, 1)
Insert into Classes_Groups (courses_acronym,year_Semesters)
values (
courses_acronym,
year_Semesters
)
END
ELSE
BEGIN
Insert into Classes_Groups (courses_acronym,year_Semesters)
values (
courses_acronym,
year_Semesters
)
END
NB: would advice to use int on the primary key. Unless you have a specific purpose of doing so.

SQL Server: Unique Index on single values of two columns (!!! Not Combination)

I have a table for teams where each team has two codes. A code for teammembers and a code for the teamleader.
TeamId Name MemberCode LeaderCode
--------------------------------------------
1 Team1 CodeXY CodeXYZ
2 Team2 CodeAB CodeBC
...
There are two unique indexes, one on MemberCode and one on LeaderCode securing that MemberCodes and LeaderCodes are unique.
But how can I define the not only MemberCodes itself are unqiue, but MemberCodes and LeaderCodes?
No MemberCode should be a LeaderCode.
Someone got an idea?
P.S.: A unique index on the two columns like Create Unique index UIDX_12 On tbl (MemberCode, LeaderCode) is no option!
With this data structure, I think you would have to have a trigger.
You can reformat the data, so you have one table and (at least) three columns:
TeamId
Code
CodeType
Then you can add constraints:
codetype is only 'member' or 'leader'
code is unique
teamid is in the teamid table
teamid/codetype is unique
This will allow you to store exactly one of each of these values for each team (assuming that the values are not NULL).
In a create table statement, this might look something like:
create table . . .
check codetype in ('member', 'leader'),
unique(code),
teamid references teams(teamid),
unique (teamid, codetype)
. . .
You can enforce this constraint with an indexed view. Something like:
create table dbo.MColumnUnique (
MemberName int not null,
LeaderName int not null
)
go
create table dbo.Two (ID int not null primary key,constraint CK_Two_ID CHECK (ID in (1,2)))
go
insert into dbo.Two(ID) values (1),(2)
go
create view dbo.MColumnUnique_Enforcer (Name)
with schemabinding
as
select
CASE WHEN ID = 1 THEN MemberName ELSE LeaderName END
from
dbo.MColumnUnique
cross join
dbo.Two
go
create unique clustered index IX_MColumnUnique_Enforcer on dbo.MColumnUnique_Enforcer (Name)
go
insert into dbo.MColumnUnique (MemberName,LeaderName) values (1,2),(3,4) --Works
go
insert into dbo.MColumnUnique (MemberName,LeaderName) values (4,5) --Fails
go
insert into dbo.MColumnUnique (MemberName,LeaderName) values (6,6) --Fails
Where hopefully you can see the parallels between my above structure and your tables.
dbo.Two is just a generally helpful helper table that contains exactly two rows, and is used to perform a limited unpivot on the data into a single column.
You could do it with a trigger, but I would use a CHECK CONSTRAINT.
Create a function that takes a varchar parameter (or whatever the datatype you use for MemberCode and LeaderCode), and returns a bit: 0 if there is no LeaderCode or MemberCode that matches the parameter value, or 1 if there is a match.
Then put a check constraint on the table that specifies:
MemberCode <> LeaderCode AND
YourFunction(MemberCode) = 0 AND
YourFunction(LeaderCode) = 0
EDIT based on Damien's comment:
To prevent the function from including the row you just added, you need to also pass the [code] column (which you say is UNIQUE), and not count the row with that value for [code].

Derived table with an index

Please see the TSQL below:
DECLARE #TestTable table (reference int identity,
TestField varchar(10),
primary key (reference))
INSERT INTO #TestTable VALUES ('Ian')
select * from #TestTable as TestTable
INNER JOIN LiveTable on LiveTable.Reference=TestTable.Reference
Is it possible to create an index on #Test.TestField? The following webpage suggests it is not. However, I read on another webpage that it is possible.
I know I could create a physical table instead (for #TestTable). However, I want to see if I can do this with a derived table first.
You can create an index on a table variable as described in the top voted answer on this question:
SQL Server : Creating an index on a table variable
Sample syntax from that post:
DECLARE #TEMPTABLE TABLE (
[ID] [INT] NOT NULL PRIMARY KEY,
[Name] [NVARCHAR] (255) COLLATE DATABASE_DEFAULT NULL,
UNIQUE NONCLUSTERED ([Name], [ID])
)
Alternately, you may want to consider using a temp table, which will persist during the scope of the current operation, i.e. during execution of a stored procedure exactly like table variables. Temp tables will be structured and optimized just like regular tables, but they will be stored in tempDb, therefore they can be indexed in the same way as regular table.
Temp tables will generally offer better performance than table variables, but it's worth testing with your dataset.
More in depth details can be found here:
When should I use a table variable vs temporary table in sql server?
You can see a sample of creating a temp table with an index from:
SQL Server Planet - Create Index on Temp Table
One of the most valuable assets of a temp table (#temp) is the ability
to add either a clustered or non clustered index. Additionally, #temp
tables allow for the auto-generated statistics to be created against
them. This can help the optimizer when determining cardinality. Below
is an example of creating both a clustered and non-clustered index on
a temp table.
Sample code from site:
CREATE TABLE #Users
(
ID INT IDENTITY(1,1),
UserID INT,
UserName VARCHAR(50)
)
INSERT INTO #Users
(
UserID,
UserName
)
SELECT
UserID = u.UserID
,UserName = u.UserName
FROM dbo.Users u
CREATE CLUSTERED INDEX IDX_C_Users_UserID ON #Users(UserID)
CREATE INDEX IDX_Users_UserName ON #Users(UserName)

Insert data from one table to another table while the target table has a primary key

In SQL Server I have a table as RawTable (temp) which gets fed by a CVS, let's say it has 22 columns in it. Then, I need to copy existing records (ONLY FEW COLUMNs NOT ALL) into another table as Visitors which is not temporary table.
Visitor table has an ID column as INT and that is primary key and incremental.
RawData table
id PK, int not null
VisitorDate Varchar(10)
VisitorTime Varchar(11)
Visitors table
VisitorID, PK, big int, not null
VisitorDate, Varchar(10), null
VisitorTime Varchar(11), null
So I did:
insert into [dbo].[Visitors] ( [VisitorDate], [VisitorTime])
select [VisitorDate], [VisitorTime]
from RawTable /*this is temp table */
Seems SQL Server doesn't like this method so it throws
Msg 515, Level 16, State 2, Line 1
Cannot insert the value NULL into column 'VisitorID', table 'TS.dbo.Visitors'; column does not allow nulls. INSERT fails. The statement has been terminated.
How can I keep Sql Server not to complain about the primary key? this column as you know better will be fed by sql server itself.
Any idea?
Just because your visitors table has an ID column that is the primary key doesn't mean that the server will supply your ID values for you. if you want SQL to provide the ID's then you need to alter the table definition and make the visitorsId column an IDENTITY column.
Otherwise, you can psuedo-create these id's during the insert with the ROW_NUMBER function -
DECLARE #maxId INT;
SELECT #maxId = (SELECT MAX(visitorsId) FROM dbo.visitors);
INSERT INTO [dbo].[Visitors] ( [visitorsId],[VisitorDate], [VisitorTime])
SELECT #maxId + ROW_NUMBER() OVER (ORDER BY visitorDate), [VisitorDate], [VisitorTime]
from RawTable /*this is temp table */