Performance issues while DELETE operation

Performance issues while DELETE operation - sql

I have a simple delete statement like this:
DELETE FROM MY_TABLE WHERE ID='Something'
This deletes only one record for that ID, which is taking more than a minute since MY_TABLE is referenced by a Foreign Key created on another table that has around 40 millions records and the DELETE operation checks existence of this ID in that large table.
One way is to drop/re-create this Foreign Key, but this DELETE operation gets performed by the application users and I can not just drop/re-create this FK on the fly unless it could have been an off-hours operation.
Can anyone help?

So there's an excellent SO question that you can refer
SQL delete performance.
Apart from the suggestions on the question like
Possible causes:
1) cascading delete operations
2) trigger(s)
3) the type of your primary key column is something other than an
integer, thereby forcing a type conversion on each pk value to do the
comparison. this requires a full table scan.
4) does your query really end in a dot like you posted it in the
question? if so, the number may considered to be a floating point
number instead of an integer, thereby causing a type conversion
similar to 3)
5) your delete query is waiting for some other slow query to release a
lock
Importantly:
We should definitely look into triggers. You should try to do away with triggers if there are any.
Also try to see if there are index on the FK in the other large table.
See if you can convert your query to a CASCADE delete to deleting from large table and then deleting from the MY_TABLE like in query below
DELETE FROM LARGE_TABLE WHERE LT.FK IN (SELECT PK FROM MY_TABLE WHERE ID='Something')
GO
DELETE FROM MY_TABLE WHERE ID='Something'
Also see the performance plan for first query on LARGE_TABLE.
You should try to replace WHERE ID='Something' with an INT key operation like WHERE anotherID=34. Basically instead of using a string in WHERE try to use integers for deletion. It is hard to believe that ID column in varchar. You should have a Primary KEY clustered indexed column in MY_Table, and use this key in WHERE clause.
Another way to improve performance is to drop the FK contraint between tables. You can always periodically do garbage deletion from LARGE_TABLE followed by Index rebuilding.
Another simpler way is to introduce mapping table between these tables. Where say a table named Map_My_Table_Large_Table exists which keeps following columns (FK column from My_Table, FK column from Large_Table). You can shift your CASCADE delete to this table.

Related

Copying timestamp columns within a Postgres table

I have a table with about 30 million rows in a Postgres 9.4 db. This table has 6 columns, the primary key id, 2 text, one boolean, and two timestamp. There are indices on one of the text columns, and obviously the primary key.
I want to copy the values in the first timestamp column, call it timestamp_a into the second timestamp column, call it timestamp_b. To do this, I ran the following query:
UPDATE my_table SET timestamp_b = timestamp_a;
This worked, but it took an hour and 15 minutes to complete, which seems a really long time to me considering, as far as I know, it's just copying values from one column to the next.
I ran EXPLAIN on the query and nothing seemed particularly inefficient. I then used pgtune to modify my config file, most notably it increased the shared_buffers, work_mem, and maintenance_work_mem.
I re-ran the query and it took essentially the same amount of time, actually slightly longer (an hour and 20 mins).
What else can I do to improve the speed of this update? Is this just how long it takes to write 30 million timestamps into postgres? For context I'm running this on a macbook pro, osx, quadcore, 16 gigs of ram.

The reason this is slow is that internally PostgreSQL doesn't update the field. It actually writes new rows with the new values. This usually takes a similar time to inserting that many values.
If there are indexes on any column this can further slow the update down. Even if they're not on columns being updated, because PostgreSQL has to write a new row and write new index entries to point to that row. HOT updates can help and will do so automatically if available, but that generally only helps if the table is subject to lots of small updates. It's also disabled if any of the fields being updated are indexed.
Since you're basically rewriting the table, if you don't mind locking out all concurrent users while you do it you can do it faster with:
BEGIN
DROP all indexes
UPDATE the table
CREATE all indexes again
COMMIT
PostgreSQL also has an optimisation for writes to tables that've just been TRUNCATEd, but to benefit from that you'd have to copy the data to a temp table, then TRUNCATE and copy it back. So there's no benefit.

#Craig mentioned an optimization for COPY after TRUNCATE: Postgres can skip WAL entries because if the transaction fails, nobody will ever have seen the new table anyway.
The same optimization is true for tables created with CREATE TABLE AS:
What causes large INSERT to slow down and disk usage to explode?
Details are missing in your description, but if you can afford to write a new table (no concurrent transactions get in the way, no dependencies), then the fastest way might be (except if you have big TOAST table entries - basically big columns):
BEGIN;
LOCK TABLE my_table IN SHARE MODE; -- only for concurrent access
SET LOCAL work_mem = '???? MB'; -- just for this transaction
CREATE my_table2
SELECT ..., timestamp_a, timestamp_a AS timestamp_b
-- columns in order, timestamp_a overwrites timestamp_b
FROM my_table
ORDER BY ??; -- optionally cluster table while being at it.
DROP TABLE my_table;
ALTER TABLE my_table2 RENAME TO my_table;
ALTER TABLE my_table
, ADD CONSTRAINT my_table_id_pk PRIMARY KEY (id);
-- more constraints, indices, triggers?
-- recreate views etc. if any
COMMIT;
The additional benefit: a pristine (optionally clustered) table without bloat. Related:
Best way to populate a new column in a large table?

How to optimize the following delete SQL query?

I have a following delete query in oracle. There will be about 1000 records to be deleted from the database at a time.
I have used "in" the query. Is there any better way to write this query?
DELETE FROM BI_EMPLOYEE_ACTIVITY
WHERE EMPLOYEE_ID in (
SELECT
EMP_ID
FROM
BI_EMPLOYEE
WHERE
PRODUCT_ID = IN_PRODUCT_ID
);

It is not really possible to answer this question as we're missing a description of the data distribution: How many rows are in each table? What's the relationship between the tables? How many rows are affected by the delete?
I'll be assuming that both tables are large (since this is an optimization question) and that BI_EMPLOYEE and BI_EMPLOYEE_ACTIVITY have a parent-child 1..N relationship.
If there are few rows affected by the delete, this means that not many employees have the same PRODUCT_ID and each employee has few activities. In this case it would make sense to index both BI_EMPLOYEE (product_id) and BI_EMPLOYEE_ACTIVITY (employee_id).
This is probably not the case though, the delete probably affects lots of rows. In that case the indexes could be a hindrance. If the delete affects lots of rows, the fastest access path probably is FULL TABLE SCAN + HASH JOIN.
We need some metrics here: how many rows are deleted? How long does it take? This is because large DML will always take time, especially DELETE since they produce the largest amount of undo.
There are alternatives to a large DELETE, as explained in "Deleting many rows from a big table" from asktom:
recreate the table without the deleted rows
partition the data, do a parallel delete
partition the data so that the delete is done by dropping a partition

Putting index on EMP_ID may help, I dont believe if any other optimization is possible, query is quite simple and straight forward

Create an index on PRODUCT_ID column. This would speed up the search. If the column is of varchar type, make use to function index if you are converting values to uppercase or lowercase

Maybe you can try EXIST instead of IN:
DELETE FROM BI_EMPLOYEE_ACTIVITY
WHERE EXISTS (
SELECT
EMP_ID
FROM
BI_EMPLOYEE
WHERE
PRODUCT_ID = IN_PRODUCT_ID
AND
EMP_ID = EMPLOYEE_ID
);

Create an index on BI_EMPLOYEE table for PRODUCT_ID, EMP_ID columns in this order (product_id on the first place).
And create an index on the BI_EMPLOYEE_ACTIVITY table for the column EMPLOYEE_ID

I'll just add that other than creating an index for the query, you need to take a look at the locking issue when your table grows really big, try to lock the table in exclusive mode (if possible) as this will only take a lock from the db, and if it's not possible try to commit the delete over each 2500 records so if you're stuck with row locking you don't endup starving the database of locks.

Please recommend the best bulk-delete option

I'm using PostgreSQL 8.1.4. I've 3 tables: one being the core (table1), others are dependents (table2,table3). I inserted 70000 records in table1 and appropriate related records in other 2 tables. As I'd used CASCADE, I could able to delete the related records using DELETE FROM table1; It works fine when the records are minimal in my current PostgreSQL version. When I've a huge volume of records, it tries to delete all but there is no sign of deletion progress for many hours! Whereas, bulk import, does in few minutes. I wish to do bulk-delete in reasonable minutes. I tried TRUNCATE also. Like, TRUNCATE table3, table2,table1; No change in performance though. It just takes more time, and no sign of completion! From the net, I got few options, like, deleting all constraints and then recreating the same would be fine. But, no query seems to be successfully run over 'table1' when it's loaded more data!
Please recommend me the best solutions to delete all the records in minutes.
CREATE TABLE table1(
t1_id SERIAL PRIMARY KEY,
disp_name TEXT NOT NULL DEFAULT '',
last_updated TIMESTAMP NOT NULL DEFAULT current_timestamp,
UNIQUE(disp_name)
) WITHOUT OIDS;
CREATE UNIQUE INDEX disp_name_index on table1(upper(disp_name));
CREATE TABLE table2 (
t2_id SERIAL PRIMARY KEY,
t1_id INTEGER REFERENCES table1 ON DELETE CASCADE,
type TEXT
) WITHOUT OIDS;
CREATE TABLE table3 (
t3_id SERIAL PRIMARY KEY,
t1_id INTEGER REFERENCES table1 ON DELETE CASCADE,
config_key TEXT,
config_value TEXT
) WITHOUT OIDS;
Regards,
Siva.

You can create an index on the columns on the child tables which reference the parent table:
on table2 create an index on the t1_id column
on table3 create an index on the t1_id column
that should speed things up slightly.
And/or, don't bother with the on delete cascade, make a delete stored procedure which deletes first from the child tables and then from the parent table, it may be faster than letting postgresql do it for you.

In SQL, the TRUNCATE TABLE statement is a Data Definition Language
(DDL) operation that marks the extents of a table for deallocation
(empty for reuse). The result of this operation quickly removes all
data from a table, typically bypassing a number of integrity
enforcing mechanisms.
http://en.wikipedia.org/wiki/Truncate_(SQL)
So truncate should be very fast. In your case, it looks like that you have a transaction which is not committed nor rollbacked. In that case your delete transaction will never finish.
To solve this problem, you should check your active transactions in your database. The easiest way (at least under SQL Server, it works) is to write "ROLLBACK COMMIT;" into the query window and execute it. If it executes without throwing an error, it means that there were actually an active transaction. If there is no active transaction remaining, it will give you an error.

I would bet that you miss some indices on the database too.
If you issue the delete command from psql console, just hit Ctrl-C - the transaction will get interrupted and psql should inform you which query was being executed when you interrupted it.
Then use EXPLAIN to check why the query takes so long.
I had a similar situation recently and adding an index solved the problem.

Improving performance of Sql Delete

We have a query to remove some rows from the table based on an id field (primary key). It is a pretty straightforward query:
delete all from OUR_TABLE where ID in (123, 345, ...)
The problem is no.of ids can be huge (Eg. 70k), so the query takes a long time. Is there any way to optimize this?
(We are using sybase - if that matters).

There are two ways to make statements like this one perform:
Create a new table and copy all but the rows to delete. Swap the tables afterwards (alter table name ...) I suggest to give it a try even when it sounds stupid. Some databases are much faster at copying than at deleting.
Partition your tables. Create N tables and use a view to join them into one. Sort the rows into different tables grouped by the delete criterion. The idea is to drop a whole table instead of deleting individual rows.

Consider running this in batches. A loop running 1000 records at a time may be much faster than one query that does everything and in addition will not keep the table locked out to other users for as long at a stretch.
If you have cascade delete (and lots of foreign key tables affected) or triggers involved, you may need to run in even smaller batches. You'll have to experiement to see which is the best number for your situation. I've had tables where I had to delete in batches of 100 and others where 50000 worked (fortunate in that case as I was deleting a million records).
But in any even I would put my key values that I intend to delete into a temp table and delete from there.

I'm wondering if parsing an IN clause with 70K items in it is a problem. Have you tried a temp table with a join instead?

Can Sybase handle 70K arguments in IN clause? All databases I worked with have some limit on number of arguments for IN clause. For example, Oracle have limit around 1000.
Can you create subselect instead of IN clause? That will shorten sql. Maybe that could help for such a big number of values in IN clause. Something like this:
DELETE FROM OUR_TABLE WHERE ID IN
(SELECT ID FROM somewhere WHERE some_condition)
Deleting large number of records can be sped up with some interventions in database, if database model permits. Here are some strategies:
you can speed things up by dropping indexes, deleting records and recreating indexes again. This will eliminate rebalancing index trees while deleting records.
drop all indexes on table
delete records
recreate indexes
if you have lots of relations to this table, try disabling constraints if you are absolutely sure that delete command will not break any integrity constraint. Delete will go much faster because database won't be checking integrity. Enable constraints after delete.
disable integrity constraints, disable check constraints
delete records
enable constraints
disable triggers on table, if you have any and if your business rules allow that. Delete records, then enable triggers.
last, do as other suggested - make a copy of the table that contains rows that are not to be deleted, then drop original, rename copy and recreate integrity constraints, if there are any.
I would try combination of 1, 2 and 3. If that does not work, then 4. If everything is slow, I would look for bigger box - more memory, faster disks.

Find out what is using up the performance!
In many cases you might use one of the solutions provided. But there might be others (based on Oracle knowledge, so things will be different on other databases. Edit: just saw that you mentioned sybase):
Do you have foreign keys on that table? Makes sure the referring ids are indexed
Do you have indexes on that table? It might be that droping before delete and recreating after the delete might be faster.
check the execution plan. Is it using an index where a full table scan might be faster? Or the other way round? HINTS might help
instead of a select into new_table as suggested above a create table as select might be even faster.
But remember: Find out what is using up the performance first.
When you are using DDL statements make sure you understand and accept the consequences it might have on transactions and backups.

Try sorting the ID you are passing into "in" in the same order as the table, or index is stored in. You may then get more hits on the disk cache.
Putting the ID to be deleted into a temp table that has the Ids sorted in the same order as the main table, may let the database do a simple scanned over the main table.
You could try using more then one connection and spiting the work over the connections so as to use all the CPUs on the database server, however think about what locks will be taken out etc first.

I also think that the temp table is likely the best solution.
If you were to do a "delete from .. where ID in (select id from ...)" it can still be slow with large queries, though. I thus suggest that you delete using a join - many people don't know about that functionality.
So, given this example table:
-- set up tables for this example
if exists (select id from sysobjects where name = 'OurTable' and type = 'U')
drop table OurTable
go
create table OurTable (ID integer primary key not null)
go
insert into OurTable (ID) values (1)
insert into OurTable (ID) values (2)
insert into OurTable (ID) values (3)
insert into OurTable (ID) values (4)
go
We can then write our delete code as follows:
create table #IDsToDelete (ID integer not null)
go
insert into #IDsToDelete (ID) values (2)
insert into #IDsToDelete (ID) values (3)
go
-- ... etc ...
-- Now do the delete - notice that we aren't using 'from'
-- in the usual place for this delete
delete OurTable from #IDsToDelete
where OurTable.ID = #IDsToDelete.ID
go
drop table #IDsToDelete
go
-- This returns only items 1 and 4
select * from OurTable order by ID
go

Does our_table have a reference on delete cascade?

Multi Rows Deletion from table in SQL Server

How I can Delete 1.5 Millions Rows From SQL Server 2000, And how much time it will take to complete this task.
I dont want to delete all records from table.... I just want to delete all records which are fullfilling WHERE condition.
EDITED from a comment to an answer below.
"I fire the same query i.e. delete from table_name with Where Clause... Is it possible to Disable Indexing at the running Query, becuase Query is going on from past 20 hr.. Also help me out how i can disable Indexing.."

If (and only if) you want to delete all of the records in a table, you can use DROP TABLE or TRUNCATE TABLE.
DELETE removes one record at a time and records an entry in the transaction log for each deleted row.
TRUNCATE TABLE is much faster because it doesn't record the activity in the transaction log. It removes all rows from a table, but the table structure & its columns, constraints, indexes and so on remain. DROP TABLE would remove those.
Use caution if you decide to TRUNCATE. It's irreversible (unless you have a backup).

create a second table, inserting all rows from the first that you don't want deleting.
delete the first table
rename the second table to be the first
(or a variation on the above)
This can often be quicker than doing a delete of selected records from a big table.

You may want to try deleting in batches too. I just tested this on a table I have and the delete operation went from 13 seconds to 3 seconds.
While Exists(Select * From YourTable Where YourCondition = True)
Delete Top (100000)
From YourTable
Where YourCondition = True
I don't think you can use the TOP predicate if you are running SQL2000, but it works with SQL2005 and up. If you are using SQL2000, then you can use this syntax instead:
Set RowCount 100000
While Exists(Select * From YourTable Where YourCondition = True)
Delete
From YourTable
Where YourCondition = True

DELETE FROM table WHERE a=b;
When deleting that many rows you may want to disable the indexes so they don't get updated on every delete. Rewriting the indexes on every deletion will significantly slow down the whole process.
You'll want to disable these indexes before beginning your deletion or else there may be table locks already in place.
--Disable Index
ALTER INDEX [IX_MyIndex] ON MyTable.MyColumn DISABLE
--Enable Index
ALTER INDEX [IX_MyIndex] ON MyTable.MyColumn REBUILD
If you wish to remove all entries in a table you can use TRUNCATE.

Does the table you are deleting from have multiple foreign keys, or cascaded deletes or triggers? All of these will impact performance.
Depending on what you want to do and the transactional integrity, can you delete things in small batches e.g. if you are trying to delete 1.5 million records that is 1 years worth of data, can you do it 1 week at a time?

Delete from table where condition for those 1.5 million rows
The time depends.

On Oracle it is also possible to use
truncate table <table>
Not sure if that is standard SQL or available in SQL Server. It will however clear the whole table - but then it is quicker than "delete from " (it will also conduct a commit).

TRUNCATE will also ignore any referential integrity or triggers on the table. DELETE FROM ... WHERE will respect both. The time will depend on the indexing of your condition columns, your hardware, and any additional system load.

The delete SQL is exactly the same as a normal SQL delete
delete from table where [your condition ]
However if your worried about time then I'll assume your question is a little deeper than this. If your table is has a significant number of non-clustered indexes then in some circumstances it may be faster to drop all these indexes first and rebuild after the delete. This is unusual but in cases where your straightforward delete is vulnerable to timeout issues it may be helpful

CREATE TABLE new_table as select <data you want to keep> from old_table;
index new_table
grant on new table
add constraints on new_table
etc on new_table
drop table old_table
rename new_table to old_table;

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas