When execution plan is changed for DELETE TOP(X) in Microsoft SQL Server? - sql-execution-plan

I observe a little strange behavior in Microsoft SQL Server 2017.
When I DELETE data row by row:
DELETE TOP(1)
FROM [table_A]
WHERE [id] IN (SELECT [i] FROM [table_B])
it takes about 4 min. (11GB database)
but when I execute:
DELETE TOP(1000)
FROM [table_A]
WHERE [id] IN (SELECT [i] FROM [table_B])
on the same table, execution is very fast (<< 1 second).
Difference is in execution plan.
Question: how to force SQL Server to use the execution plan from the second statement (DELETE TOP(**1000**) ...) for the first statement (DELETE TOP(**1**) ...)?
see:

Question: how to force SQL Server to use the execution plan from the second statement (DELETE TOP(1000) ...) for the first statement (DELETE TOP(1) ...)?
You could force specific join type with hint:
DELETE TOP(1) FROM [table_A]
WHERE [id] IN (SELECT [i] FROM [table_B])
-- OPTION(LOOP JOIN);
-- OPTION(HASH JOIN);
DBFiddle Demo

Related

How to optimize this SQL Delete statement for faster speed

There is a faster way to run this SQL delete than this one:
DELETE FROM TABLE
WHERE TABLE.key NOT IN (SELECT DISTINCT(MAIN_TABLE.key) FROM MAIN_TABLE)
You can prefer using not exists
delete from TABLE t
where not exists ( select 0 from MAIN_TABLE m where m.key = t.key )
mostly preferable in performance point of view rather than not in.
I think this is because
NOT IN returns true for each non-matched value is found in a subquery, while
NOT EXISTS is active only if non-matching row is found within the subquery.

How to change query status from suspended to runnable?

I have a query that needs to update 2 million records but there is no space in the disk, so the query is suspended right now. After that, I free up some space, but the query is still in suspended. So how should I change the status to Runnable or is there any way to tell sql server that you have enough space right now, and you can run the query.
After that, I free up some space, but the query is still in suspended.is there any way to tell sql server that you have enough space right now, and you can run the query.
SQLSERVER will change the query status from suspended to runnable automatically,it is not managed by you..
Your job here is to check ,why the query is suspended..Below dmvs can help
select session_id,blocking_session_id,wait_resource,wait_time,
last_wait_type from sys.dm_exec_requests
where session_id=<< your session id>>
There are many reasons why a query gets suspended..some of them include locking/blocking,rollback,getting data from disk..
You will have to check the status as per above dmv and see what is the reason and troubleshoot accordingly..
Below is some sample piece of code which can help you in understanding what suspended means
create table t1
(
id int
)
insert into t1
select row_number() over (order by (select null))
from
sys.objects c
cross join
sys.objects c1
now in one tab of ssms:
run below query
begin tran
update t1
set id=id+1
Open another tab and run below query
select * from t1
Now open another tab and run below query
select session_id,blocking_session_id,wait_resource,wait_time,
last_wait_type,status from sys.dm_exec_requests
where session_id=<< your session id of select >>
or run below query
select session_id,blocking_session_id,wait_resource,wait_time,
last_wait_type,status from sys.dm_exec_requests
where blocking_session_id>0
You can see status as suspended due to blocking,once you clear the blocking(by committing transaction) , you will see sql server automatically resumes suspended query in this case

Where or Join Which one is evaluated first in sql server?

Query
select * from TableA a join TableB b
on a.col1=b.col1
where b.col2 = 'SomeValue'
I'm expecting the server, first filter the col2 from TableB then do the join. This will be more efficient.
Is that the sql server evaluates the where clause first and then Join?
Any link the to know in which order sql will process a query ?
Thanks In Advance
Already answered ... read both answers ...
https://dba.stackexchange.com/questions/5038/sql-server-join-where-processing-order
To summarise: it depends on the server implementation and its execution plan ... so you will need to read up on your server in order to optimise your queries.
But I'm sure that simple joins get optimised by each server as best as it can.
If you are not sure measure execution time on a large dataset.
It's decided by the sql server query optmiser engine based on which which execution plan have lesser cost.
If you think that the filter clause will benefit your query performance, you can get the subset of the table by filtering it with your desired value and make a CTE for it.
Then join the cte expression with your other table.
You can check which query performs better in your case in SSMS and go with it :)
We will use this code:
IF OBJECT_ID(N'tempdb..#TableA',N'U') IS NOT NULL DROP TABLE #TableA;
IF OBJECT_ID(N'tempdb..#TableB',N'U') IS NOT NULL DROP TABLE #TableB;
CREATE TABLE #TableA (col1 INT NOT NULL,Col2 NVARCHAR(255) NOT NULL)
CREATE TABLE #TableB (col1 INT NOT NULL,Col2 NVARCHAR(255) NOT NULL)
INSERT INTO #TableA VALUES (1,'SomeValue'),(2,'SomeValue2'),(3,'SomeValue3')
INSERT INTO #TableB VALUES (1,'SomeValue'),(2,'SomeValue2'),(3,'SomeValue3')
select * from #TableA a join #TableB b
on a.col1=b.col1
where b.col2 = 'SomeValue'
Let`s analyze query plan in MSSQL Management studio. Mark full SELECT statement and right click --> Diplay Estimated Execution Plan. As you can seen on the picture below
first it does Table Scan for the WHERE clause, then JOIN.
1.Is that the sql server evaluates the where clause first and then Join?
First the where clause then JOIN
2.Any link the to know in which order sql will process a query?
I think you will find useful information here:
Execution Plan Basics
Graphical Execution Plans for Simple SQL Queries

SQL queries slow when running in sequence, but quick when running separately

I have a table that I will populate with values from an expensive calculation (with xquery from an immutable XML column). To speed up deployment to production I have precalculated values on a test server and saved to a file with BCP.
My script is as follows
-- Lots of other work, including modifying OtherTable
CREATE TABLE FOO (...)
GO
BULK INSERT FOO
FROM 'C:\foo.dat';
GO
-- rerun from here after the break
INSERT INTO FOO
(ID, TotalQuantity)
SELECT
e.ID,
SUM(e.Quantity) as TotalQuantity
FROM (select
o.ID,
h.n.value('TotalQuantity[1]/.', 'int') as TotalQuantity
FROM dbo.OtherTable o
CROSS APPLY XmlColumn.nodes('(item/.../salesorder/)') h(n)
WHERE o.ID NOT IN (SELECT DISTINCT ID FROM FOO)
) as E
GROUP BY e.ID
When I run the script in management studio the first two statements completes within seconds, but the last statement takes 4 hours to complete. Since no rows are added to the OtherTable since my foo.dat was computed management studio reports (0 row(s) affected).
If I cancel the query execution after a couple of minutes and selects just the last query and run that separately it completes within 5 seconds.
Notable facts:
The OtherTable contains 200k rows and the data in XmlColumn is pretty large, total table size ~3GB
The FOO table gets 1.3M rows
What could possibly make the difference?
Management studio has implicit transactions turned off. Is far as I can understand each statement will then run in its own transaction.
Update:
If I first select and run the script until -- rerun from here after the break, then select and run just the last query, it is still slow until I cancel execution and try again. This at least rules out any effects of running "together" with the previous code in the script and boils down to the same query being slow on first execution and fast on the second (running with all other conditions the same).
Probably different execution plans. See Slow in the Application, Fast in SSMS? Understanding Performance Mysteries.
Could it possibly be related to the statistics being completely wrong on the newly created Foo table? If SQL Server automatically updates the statistics when it first runs the query, the second run would have its execution plan created from up-to-date statistics.
What if you check the statistics right after the bulk insert (with the STATS_DATE function) and then checks it again after having cancelled the long-running query? Did the stats get updated, even though the query was cancelled?
In that case, an UPDATE STATISTICS on Foo right after the bulk insert could help.
Not sure exactly why it helped, but i rewrote the last query to an left outer join instead and suddenly the execution dropped to 15 milliseconds.
INSERT INTO FOO
(ID, TotalQuantity)
SELECT
e.ID,
SUM(e.Quantity) as TotalQuantity
FROM (select
o.ID,
h.n.value('TotalQuantity[1]/.', 'int') as TotalQuantity
FROM dbo.OtherTable o
INNER JOIN FOO f ON o.ID = f.ID
CROSS APPLY o.XmlColumn.nodes('(item/.../salesorder/)') h(n)
WHERE f.ID = null
) as E
GROUP BY e.ID

To execute SQL query takes a lot of time

I have two tables. Tables 2 contains more recent records.
Table 1 has 900K records and Table 2 about the same.
To execute the query below takes about 10 mins. Most of the queries (at the time of execution the query below) to table 1 give timeout exception.
DELETE T1
FROM Table1 T1 WITH(NOLOCK)
LEFT OUTER JOIN Table2 T2
ON T1.ID = T2.ID
WHERE T2.ID IS NULL AND T1.ID IS NOT NULL
Could someone help me to optimize the query above or write something more efficient?
Also how to fix the problem with time out issue?
Optimizer will likely chose to block whole table as it is easier to do if it needs to delete that many rows. In the case like this I delete in chunks.
while(1 = 1)
begin
with cte
as
(
select *
from Table1
where Id not in (select Id from Table2)
)
delete top(1000) cte
if ##rowcount = 0
break
waitfor delay '00:00:01' -- give it some rest :)
end
So the query deletes 1000 rows at a time. Optimizer will likely lock just a page to delete the rows, not whole table.
The total time of this query execution will be longer, but it will not block other callers.
Disclaimer: assumed MS SQL.
Another approach is to use SNAPSHOT transaction. This way table readers will not be blocked while rows are being deleted.
Wait a second, are you trying to do this...
DELETE Table1 WHERE ID NOT IN (SELECT ID FROM Table2)
?
If so, that's how I would write it.
You could also try to update the statistics on both tables. And of course indexes on Table1.ID and Table2.ID could speed things up considerably.
EDIT: If you're getting timeouts from the designer, increase the "Designer" timeout value in SSMS (default is 30 seconds). Tools -> Options -> Designers -> "Override connection string time-out value for table designer updates" -> enter reasonable number (in seconds).
Both ID columns need an index
Then use simpler SQL
DELETE Table1 WHERE NOT EXISTS (SELECT * FROM Table2 WHERE Table1.ID = Table2.ID)