DELETE query taking a long time to execute - how to optimise? [duplicate] - sql

I have a rather big table named FTPLog with around 3 milion record I wanted to add a delete mechanism to delete old logs but delete command takes long time. I found that clustered index deleting takes long time.
DECLARE #MaxFTPLogId as bigint
SELECT #MaxFTPLogId = Max(FTPLogId) FROM FTPLog WHERE LogTime <= DATEADD(day, -10 , GETDATE())
PRINT #MaxFTPLogId
DELETE FROM FTPLog WHERE FTPLogId <= #MaxFTPLogId
I want to know how can I improve performance of deleting?

It might be slow because a large delete generates a big transaction log. Try to delete it in chunks, like:
WHILE 1 = 1
BEGIN
DELETE TOP (256) FROM FTPLog WHERE FTPLogId <= #MaxFTPLogId
IF ##ROWCOUNT = 0
BREAK
END
This generates smaller transactions. And it mitigates locking issues by creating breathing space for other processes.
You might also look into partitioned tables. These potentially allow you to purge old entries by dropping an entire partition.

Since it's a log table, there is no need to make is clustered.
It's unlikely that you will search it on Id.
Alter your PRIMARY KEY so that it's unclustered. This will use HEAP storage method which is faster on DML:
ALTER TABLE FTPLog DROP CONSTRAINT Primary_Key_Name
ALTER TABLE FTPLog ADD CONSTRAINT Primary_Key_Name PRIMARY KEY NONCLUSTERED (FTPLogId)
, and just issue:
SELECT #MaxFTPLogTime = DATEADD(day, -10 , GETDATE())
PRINT #MaxFTPLogId
DELETE FROM FTPLog WHERE LogTime <= #MaxFTPLogTime

Check the density of your table (use command DBCC showcontig to check density)
Scan Density [Best Count:Actual Count] this parameter should be closer to 100% and Logical Scan Fragmentation parameter should be closer to 0% for best performance of your table. If it is not, re-index and refragment the index of that table to improve performance of your query execution.

I assume that not only this table is huge in terms of number of rows, but also that it is really heavily used for logging new entries while you try to clean it up.
Suggestion of Andomar should help, but I would try to clean it up when there are no inserts going on.
Alternative: when you write logs, you probably do not care about the transaction isolation so much. Therefore I would change transaction isolation level for the code/processes that write the log entries so that you may avoid creating huge tempdb (by the way, check if tempdb grows a lot during this DELETE operation)
Also, I think that deletions from the clustered index should not be really slower then from non-clustered one: you are still psysically deleting rows. Rebuilding this index afterward may take time though.

Related

Efficiency UPDATE WHERE ... vs CASE ... WHEN

In the case of a big table (~5 millions lines) what is more efficient to update all lines that matches a condition (circa 1000 rows):
1. A simple update statement ?
UPDATE table SET last_modif_date = NOW() WHERE condition;
2. A case when that perform an update if condition matches
UPDATE table SET
last_modif_date = (CASE WHEN CONDITION THEN NOW() ELSE last_modif_date END)
And why ?
Thanks in advance
I've made a simple test - and the results was that the where version is more efficient then the case version.
Here is the test I've made:
/*
-- Create a numbers (tally) table if you don't already have one
SELECT TOP 10000 IDENTITY(int,1,1) AS Number
INTO Tally
FROM sys.objects s1
CROSS JOIN sys.objects s2
ALTER TABLE Tally ADD CONSTRAINT PK_Tally PRIMARY KEY CLUSTERED (Number)
*/
-- Create a dates table with 10000 rows
SELECT Number As Id, DATEADD(DAY, Number, DATEADD(DAY, -5000, GETDATE())) As TheDate
INTO DatesTest
FROM Tally
-- Update using a where clause
UPDATE DatesTest
SET TheDate = GETDATE()
WHERE Id % 100 = 0
-- Drop and re-create the same dates table
DROP TABLE DatesTest
SELECT Number As Id, DATEADD(DAY, Number, DATEADD(DAY, -5000, GETDATE())) As TheDate
INTO DatesTest
FROM Tally
-- Update using case
UPDATE DatesTest
SET TheDate = CASE WHEN Id % 100 = 0 THEN GETDATE() ELSE TheDate END
As you can see from the execution plan - The where clause version is only 7% of all execution cost while the case version is 34%.
I'ld say we have a winner.
A general best practice is not to update or delete without a where clause because people make mistakes and recovering from some of them can be a real pain.
Beyond that, even non-updating updates can have a significant impact on the database beyond just a poor performing query. Executing an update without a where can also lead to excessive locking and blocking of other concurrent operations.
I can not summarize this article any better than Paul White did for himself, here are some more things to consider:
SQL Server contains a number of optimisations to avoid unnecessary logging or page flushing when processing an UPDATE operation that will not result in any change to the persistent database.
Non-updating updates to a clustered table generally avoid extra logging and page flushing, unless a column that forms (part of) the cluster key is affected by the update operation.
If any part of the cluster key is ‘updated’ to the same value, the operation is logged as if data had changed, and the affected pages are marked as dirty in the buffer pool. This is a consequence of the conversion of the UPDATE to a delete-then-insert operation.
Heap tables behave the same as clustered tables, except they do not have a cluster key to cause any extra logging or page flushing. This remains the case even where a non-clustered primary key exists on the heap. Non-updating updates to a heap therefore generally avoid the extra logging and flushing (but see below).
Both heaps and clustered tables will suffer the extra logging and flushing for any row where a LOB column containing more than 8000 bytes of data is updated to the same value using any syntax other than ‘SET column_name = column_name’.
Simply enabling either type of row versioning isolation level on a database always causes the extra logging and flushing. This occurs regardless of the isolation level in effect for the update transaction.
~ The Impact of Non-Updating Updates - Paul White
The result are just different since the UPDATE SET with CASE will update ALL the rows and will triggers on update and so on EVEN if in your case your new values is in fact the previous value.
The UPDATE with a WHERE clause will UPDATE only the rows you really want to update and trigger only on these rows. Assuming you have an index, it is more efficient in most cases.
The only way for you to be sure is to analyse the actual execution plan of both queries and compare. Anyway the use of an UPDATE SET CASE over UPDATE WHERE is somewhat unnatural.
Usually, 'where' clause would be more efficient than 'Case' statement if you have an index on the column that you are using. 'Case' statement will make the query (search) sequential, meaning that every row will be checked for the condition. Where clause with index on the column (that can be used in the condition that you are applying), will only scan a subset of records and will use faster search algorithm (in most cases).
Bottom line is the actual query(condition that you are using) and the indexes can make a lot of differences in the performance of the statement. The best way to analyse the query, is to use the execution plan of the query in SQL server management studio.
You can read this post to learn more about execution plans:
https://www.simple-talk.com/sql/performance/execution-plan-basics/

Copying timestamp columns within a Postgres table

I have a table with about 30 million rows in a Postgres 9.4 db. This table has 6 columns, the primary key id, 2 text, one boolean, and two timestamp. There are indices on one of the text columns, and obviously the primary key.
I want to copy the values in the first timestamp column, call it timestamp_a into the second timestamp column, call it timestamp_b. To do this, I ran the following query:
UPDATE my_table SET timestamp_b = timestamp_a;
This worked, but it took an hour and 15 minutes to complete, which seems a really long time to me considering, as far as I know, it's just copying values from one column to the next.
I ran EXPLAIN on the query and nothing seemed particularly inefficient. I then used pgtune to modify my config file, most notably it increased the shared_buffers, work_mem, and maintenance_work_mem.
I re-ran the query and it took essentially the same amount of time, actually slightly longer (an hour and 20 mins).
What else can I do to improve the speed of this update? Is this just how long it takes to write 30 million timestamps into postgres? For context I'm running this on a macbook pro, osx, quadcore, 16 gigs of ram.
The reason this is slow is that internally PostgreSQL doesn't update the field. It actually writes new rows with the new values. This usually takes a similar time to inserting that many values.
If there are indexes on any column this can further slow the update down. Even if they're not on columns being updated, because PostgreSQL has to write a new row and write new index entries to point to that row. HOT updates can help and will do so automatically if available, but that generally only helps if the table is subject to lots of small updates. It's also disabled if any of the fields being updated are indexed.
Since you're basically rewriting the table, if you don't mind locking out all concurrent users while you do it you can do it faster with:
BEGIN
DROP all indexes
UPDATE the table
CREATE all indexes again
COMMIT
PostgreSQL also has an optimisation for writes to tables that've just been TRUNCATEd, but to benefit from that you'd have to copy the data to a temp table, then TRUNCATE and copy it back. So there's no benefit.
#Craig mentioned an optimization for COPY after TRUNCATE: Postgres can skip WAL entries because if the transaction fails, nobody will ever have seen the new table anyway.
The same optimization is true for tables created with CREATE TABLE AS:
What causes large INSERT to slow down and disk usage to explode?
Details are missing in your description, but if you can afford to write a new table (no concurrent transactions get in the way, no dependencies), then the fastest way might be (except if you have big TOAST table entries - basically big columns):
BEGIN;
LOCK TABLE my_table IN SHARE MODE; -- only for concurrent access
SET LOCAL work_mem = '???? MB'; -- just for this transaction
CREATE my_table2
SELECT ..., timestamp_a, timestamp_a AS timestamp_b
-- columns in order, timestamp_a overwrites timestamp_b
FROM my_table
ORDER BY ??; -- optionally cluster table while being at it.
DROP TABLE my_table;
ALTER TABLE my_table2 RENAME TO my_table;
ALTER TABLE my_table
, ADD CONSTRAINT my_table_id_pk PRIMARY KEY (id);
-- more constraints, indices, triggers?
-- recreate views etc. if any
COMMIT;
The additional benefit: a pristine (optionally clustered) table without bloat. Related:
Best way to populate a new column in a large table?

resource busy while rebuilding an index

There is a table T with column a:
CREATE TABLE T {
id_t integer not null,
text varchar2(100),
a integer
}
/
ALTER TABLE T ADD CONSTRAINT PK_T PRIMARY KEY (ID_T)
/
Index was created like this:
CREATE INDEX IDX_T$A ON T(a);
Also there's such a check constraint:
ALTER TABLE T ADD CONSTRAINT CHECK (a is null or a = 1);
Most of the records in T have null value of a, so the query using the index works really fast if the index is in consistent state and statistics for it is up to date.
But the problem is that values of a of some rows change really frequently (some rows get null value, some get 1), and I need to rebuild the index let's say every hour.
However, really often when the job doing this, trying to rebuild the index, it gets an exception:
ORA-00054: resource busy and acquire with NOWAIT specified
Can anybody help me with coping with this issue?
Index rebuild is not needed in most cases. Of course newly created indexes are efficient and their efficiency decreases over time. But this process stops after some time - it simply converges to some level.
If you really need to optimize indexes try to use less invasive DDL command "ALTER INDEX SHRINK SPACE COMPACT".
PS: I would also recommend you to use some smaller block size (4K or 8K) for you tablespace storage.
Have you tried adding "ONLINE" to that index rebuild statement?
Edit: If online rebuild is not available then you might look at a fast refresh on commit materialised view to store the rowid's or primary keys of rows that have a 1 for column A.
Start with a look at the documentation:-
http://docs.oracle.com/cd/B28359_01/server.111/b28326/repmview.htm
http://docs.oracle.com/cd/B28359_01/server.111/b28286/statements_6002.htm#SQLRF01302
You'd create a materialised view log on the table, and then a materialised view.
Think in particular about the resource requirements for this: changes to the master table require a change vector to be written to the materialised view log, which is effectively an additional insert for every change. Then the changes have to be propagated to another table (the materialised view storage table) with additional queries. It is by no means a low-impact option.
Rebuilding for Performance
Most Oracle experts are skeptical of frequently rebuilding indexes. For example, a quick glance at the presentation Rebuilding the Truth will show you that indexes do not behave in the naive way many people assume they do.
One of the relevant points in that presentation is "fully deleted blocks are recycled and are not generally problematic". If your values completely change, then your index should not grow infinitely large. Although your indexes are used in a non-typical way, that
behavior is probably a good thing.
Here's a quick example. Create 1 million rows and index 100 of them.
--Create table, constraints, and index.
CREATE TABLE T
(
id_t integer primary key,
text varchar2(100),
a integer check (a is null or a = 1)
);
CREATE INDEX IDX_T$A ON T(a);
--Insert 1M rows, with 100 "1"s.
insert into t
select level, level, case when mod(level, 10000) = 0 then 1 else null end
from dual connect by level <= 1000000;
commit;
--Initial sizes:
select segment_name, bytes/1024/1024 MB
from dba_segments
where segment_name in ('T', 'IDX_T$A');
SEGMENT_NAME MB
T 19
IDX_T$A 0.0625
Now completely shuffle the index rows around 1000 times.
--Move the 1s around 1000 times. Takes about 6 minutes.
begin
for i in 9000 .. 10000 loop
update t
set a = case when mod(id_t, i) = 0 then 1 else null end
--Don't update if the vlaue is the same
where nvl(a,-1) <> nvl(case when mod(id_t,i) = 0 then 1 else null end,-1);
commit;
end loop;
end;
/
The index segment size is still the same.
--The the index size is the same.
select segment_name, bytes/1024/1024 MB
from dba_segments
where segment_name in ('T', 'IDX_T$A');
SEGMENT_NAME MB
T 19
IDX_T$A 0.0625
Rebuilding for Statistics
It's good to worry about the statistics of objects whose data changes so dramatically. But again, although your system is unusual, it may work fine with the default Oracle behavior. Although the rows indexed may completely change, the relevant statistics may stay the same. If there are always 100 rows indexed, the number of rows, blocks, and distinctness will stay the same.
Perhaps the clustering factor will significantly change, if the 100 rows shift from being completely random to being very close to each other. But even that may not matter. If there are millions of rows, but only 100 indexed, the optimizer's decision will probably be the same regardless of the clustering factor. Reading 1 block (awesome clustering factor) or reading 100 blocks (worst-case clustering factor) will still look much better than doing a full table scan of millions of rows.
But statistics are complicated, I'm surely over-simplifying things. If you need to keep your statistics a specific way, you may want to lock them. Unfortunately you can't lock just an index, but you can lock the table and it's dependent indexes.
begin
dbms_stats.lock_table_stats(ownname => user, tabname => 'T');
end;
/
Rebuilding anyway
If a rebuild is still necessary, #Robe Eleckers idea to retry should work. Although instead of an exception, it would be easier to set DDL_LOCK_TIMEOUT.
alter session set ddl_lock_timeout = 500;
The session will still need to get an exclusive lock on the table, but this will make it much easier to find the right window of opportunity.
Since the field in question has very low cardinality I would suggest using a bitmap index and skipping the rebuilds altogether.
CREATE BITMAP INDEX IDX_T$A ON T(a);
Note (as mentioned in comments): transactional performance is very low for bitmap indexes so this would only work well if there are very few overlapping transactions doing updates to the table.

Avoid deadlock for concurrent delete

I have a table called Products with many columns. It is a temporary table used for reporting purposes. The data will be processed to this table concurrently by multiple user requests. There are separate stored procedures to make DML operations to this table.
Table Structure:
CREATE TABLE Products (
instance uniqueidentifier,
inserted datetime,
col1,
col2,
...
)
The inserted column will be populated with GETDATE() to contain the time when each row was inserted and the instance column will contain the value from newid(). One user request will have one unique id but may have million rows. The below are the queries which will be executed concurrently, which causing the deadlock. Please advise me
Query 1:
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
DELETE P
FROM Products (NOLOCK)
WHERE instance = 'XXXX-xxx-xxx-xx'
Query 2:
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
DELETE P
FROM Products (NOLOCK)
WHERE inserted <= DATEADD(hh, -10, GETDATE())
Note: The nonclustered index is created on instance column.
Please advice me which lock I can use in this scenario.
Note I couldnt able to primary key as it is consuming time when I insert 10 million rows to the table (this for one transaction; there are 20 concurrent transations).
The report should be generated sooner. And my procedure has multiple 35 DML statments, there are around 15 DELETE statements for instance column with other columns( DELETE FROM table WHERE instance = #instance AND col1 = #col1).
(1) You should stop using read uncommitted isolation. Use at least read committed.
(2) There are a number of things you could try to avoid deadlocks, like ensuring your different transactions access database objects in the same order, etc. This would be worth a read - http://support.microsoft.com/kb/169960
(3) Disable lock escalation for your table (more granular locks so better concurrency, but more lock overhead):
ALTER TABLE Products SET (lock_escalation = disable)
(4) Disallow Page Locks, and allow Row Locks on your indexes (will mean you can't defrag indexes, but you can still rebuild them):
ALTER INDEX [<YourIndex>] ON Product WITH (allow_row_locks = on, allow_page_locks = off)
First, there's no lock that you can take on these delete statements besides an exclusive lock. Your isolation level and NOLOCK hints are being ignored by Sql Server:
(Nolock) Only applies to the SELECT statement.
Two suggestions:
Change your non-clustered index on instance to a clustered index. BUT, only do this if you can change NEWID() to NEWSEQUENTIALID().
Second, instead of performing a delete to remove records older than 10 hours... consider implementing rolling partitions. This will remove any contention caused by the cleanup with your other delete operations.

Delete statement in SQL is very slow

I have statements like this that are timing out:
DELETE FROM [table] WHERE [COL] IN ( '1', '2', '6', '12', '24', '7', '3', '5')
I tried doing one at a time like this:
DELETE FROM [table] WHERE [COL] IN ( '1' )
and so far it's at 22 minutes and still going.
The table has 260,000 rows in it and is four columns.
Does anyone have any ideas why this would be so slow and how to speed it up?
I do have a non-unique, non-clustered index on the [COL] that i'm doing the WHERE on.
I'm using SQL Server 2008 R2
update: I have no triggers on the table.
Things that can cause a delete to be slow:
deleting a lot of records
many indexes
missing indexes on foreign keys in child tables. (thank you to #CesarAlvaradoDiaz for mentioning this in the comments)
deadlocks and blocking
triggers
cascade delete (those ten parent records you are deleting could mean
millions of child records getting deleted)
Transaction log needing to grow
Many Foreign keys to check
So your choices are to find out what is blocking and fix it or run the deletes in off hours when they won't be interfering with the normal production load. You can run the delete in batches (useful if you have triggers, cascade delete, or a large number of records). You can drop and recreate the indexes (best if you can do that in off hours too).
Disable CONSTRAINT
ALTER TABLE [TableName] NOCHECK CONSTRAINT ALL;
Disable Index
ALTER INDEX ALL ON [TableName] DISABLE;
Rebuild Index
ALTER INDEX ALL ON [TableName] REBUILD;
Enable CONSTRAINT
ALTER TABLE [TableName] CHECK CONSTRAINT ALL;
Delete again
Deleting a lot of rows can be very slow. Try to delete a few at a time, like:
delete top (10) YourTable where col in ('1','2','3','4')
while ##rowcount > 0
begin
delete top (10) YourTable where col in ('1','2','3','4')
end
In my case the database statistics had become corrupt. The statement
delete from tablename where col1 = 'v1'
was taking 30 seconds even though there were no matching records but
delete from tablename where col1 = 'rubbish'
ran instantly
running
update statistics tablename
fixed the issue
If the table you are deleting from has BEFORE/AFTER DELETE triggers, something in there could be causing your delay.
Additionally, if you have foreign keys referencing that table, additional UPDATEs or DELETEs may be occurring.
Preventive Action
Check with the help of SQL Profiler for the root cause of this issue. There may be Triggers causing the delay in Execution. It can be anything. Don't forget to Select the Database Name and Object Name while Starting the Trace to exclude scanning unnecessary queries...
Database Name Filtering
Table/Stored Procedure/Trigger Name Filtering
Corrective Action
As you said your table contains 260,000 records...and IN Predicate contains six values. Now, each record is being search 260,000 times for each value in IN Predicate. Instead it should be the Inner Join like below...
Delete K From YourTable1 K
Inner Join YourTable2 T on T.id = K.id
Insert the IN Predicate values into a Temporary Table or Local Variable
It's possible that other tables have FK constraint to your [table].
So the DB needs to check these tables to maintain the referential integrity.
Even if you have all needed indexes corresponding these FKs, check their amount.
I had the situation when NHibernate incorrectly created duplicated FKs on the same columns, but with different names (which is allowed by SQL Server).
It has drastically slowed down running of the DELETE statement.
Check execution plan of this delete statement. Have a look if index seek is used. Also what is data type of col?
If you are using wrong data type, change update statement (like from '1' to 1 or N'1').
If index scan is used consider using some query hint..
If you're deleting all the records in the table rather than a select few it may be much faster to just drop and recreate the table.
Is [COL] really a character field that's holding numbers, or can you get rid of the single-quotes around the values? #Alex is right that IN is slower than =, so if you can do this, you'll be better off:
DELETE FROM [table] WHERE [COL] = '1'
But better still is using numbers rather than strings to find the rows (sql likes numbers):
DELETE FROM [table] WHERE [COL] = 1
Maybe try:
DELETE FROM [table] WHERE CAST([COL] AS INT) = 1
In either event, make sure you have an index on column [COL] to speed up the table scan.
I read this article it was really helpful for troubleshooting any kind of inconveniences
https://support.microsoft.com/en-us/kb/224453
this is a case of waitresource
KEY: 16:72057595075231744 (ab74b4daaf17)
-- First SQL Provider to find the SPID (Session ID)
-- Second Identify problem, check Status, Open_tran, Lastwaittype, waittype, and waittime
-- iMPORTANT Waitresource select * from sys.sysprocesses where spid = 57
select * from sys.databases where database_id=16
-- with Waitresource check this to obtain object id
select * from sys.partitions where hobt_id=72057595075231744
select * from sys.objects where object_id=2105058535
After inspecting an SSIS Package(due to a SQL Server executing commands really slow), that was set up in a client of ours about 5-4 years before the time of me writing this, I found out that there were the below tasks:
1) insert data from an XML file into a table called [Importbarcdes].
2) merge command on an another target table, using as source the above mentioned table.
3) "delete from [Importbarcodes]", to clear the table of the row that was inserted after the XML file was read by the task of the SSIS Package.
After a quick inspection all statements(SELECT, UPDATE, DELETE etc.) on the table ImportBarcodes that had only 1 row, took about 2 minutes to execute.
Extended Events showed a whole lot PAGEIOLATCH_EX wait notifications.
No indexes were present of the table and no triggers were registered.
Upon close inspection of the properties of the table, in the Storage Tab and under general section, the Data Space field showed more than 6 GIGABYTES of space allocated in pages.
What happened:
The query ran for a good portion of time each day for the last 4 years, inserting and deleting data in the table, leaving unused pagefiles behind with out freeing them up.
So, that was the main reason of the wait events that were captured by the Extended Events Session and the slowly executed commands upon the table.
Running ALTER TABLE ImportBarcodes REBUILD fixed the issue freeing up all the unused space. TRUNCATE TABLE ImportBarcodes did a similar thing, with the only difference of deleting all pagefiles and data.
Older topic but one still relevant.
Another issue occurs when an index has become fragmented to the extent of becoming more of a problem than a help. In such a case, the answer would be to rebuild or drop and recreate the index and issuing the delete statement again.
As an extension to Andomar's answer, above, I had a scenario where the first 700,000,000 records (of ~1.2 billion) processed very quickly, with chunks of 25,000 records processing per second (roughly). But, then it starting taking 15 minutes to do a batch of 25,000. I reduced the chunk size down to 5,000 records and it went back to its previous speed. I'm not certain what internal threshold I hit, but the fix was to reduce the number of records, further, to regain the speed.
open CMD and run this commands
NET STOP MSSQLSERVER
NET START MSSQLSERVER
this will restart the SQL Server instance.
try to run again after your delete command
I have this command in a batch script and run it from time to time if I'm encountering problems like this. A normal PC restart will not be the same so restarting the instance is the most effective way if you are encountering some issues with your sql server.