The PK of table1 is a FK in table2.
I wish to update the status in table1 where no record exists in table2 and limit the number of updates. There may be no records in table2.
Something like:
UPDATE t1
SET status = 0
WHERE NOT EXISTS (
SELECT id
FROM t2
WHERE t1.id = t2.id
LIMIT 1000
)
This is a little complicated in Postgres, because there is no limit. Assuming that you have a primary key in t1 (which I'll assume is id), you can use a subquery to determine the rows to update and then match in the WHERE clause:
UPDATE t1
SET status = 0
FROM (SELECT tt1.*
FROM t1 tt1
WHERE NOT EXISTS (SELECT t2.id FROM t2 WHERE tt1.id = t2.id)
LIMIT 1000
) ttl
WHERE t1.id = tt1.id;
If you are doing this under concurrent write load, there is a race condition between the subquery (the SELECT to determine rows) and the outer UPDATE, which can lead to wrong results. To defend against this, add a row locking clause.
However, this needs to be done in a CTE to be reliable (at least in my test wit Postgres up to version 10). So:
WITH cte AS (
SELECT id -- PK
FROM t1
WHERE NOT EXISTS (SELECT FROM t2 WHERE t2.id = t1.id)
LIMIT 1000
FOR UPDATE -- SKIP LOCKED -- ?
)
UPDATE t1
SET status = 0
FROM cte
WHERE t1.id = cte.id
RETURNING id; -- optional
If you run multiple commands like this, possibly in parallel, add SKIP LOCKED, so that they don't block each other.
This only secures existing rows in t1. There is still the problem that conflicting rows might be added in t2 between SELECT and UPDATE.
You mentioned a FK constraint. I am not sure from the top of my head whether depending rows in t2 are blocked from being added by the FK constraint while there is a FOR UPDATE lock on the parent row. Would have to test, but out of time right now.
Postgres has no predicate-locking for user commands. (Users can only lock existing rows.) To be absolutely sure, you could also use the (more expensive) SERIALIZABLE transaction isolation.
See:
Postgres UPDATE … LIMIT 1
Related
I have a table(table1) which has id as the primary key and incremental in nature. The table has Updatedtm which has the last updated date and time. The table has around 300 million records. I have another table table2 which is in sync with table1. What is the best way to delete from table2 when an id is removed from table1? Is Left join an efficient way, do i have to compare 300 million records everytime to check for deletes?
Delete from table2
where id not in(Select id from table1)
Make sure that you have an index that covers this query. I.e. make sure that table1.id and table2.id are indexed (yes, having this index will speed up the delete even thought the indexes will need to be updated.) This will help with the JOIN.
Also, you might want to look into batching your deletes,
WHILE <some_condition> BEGIN
DELETE TOP (1000) t2
FROM table2 t2
LEFT OUTER JOIN table1 t1 ON t2.id = t1.id
WHERE t1.id IS NULL
END
Batching your deletes will reduce the number of locks that SQL server will have to take out on your table, clustered, and non-clustered indexes. If this is a production server with 300-million rows, I would definitely look at your indexes and count the number of records that you think you might be deleting before coming up with a deletion strategy.
SELECT COUNT(*) FROM table2 t2
LEFT OUTER JOIN table1 t1 ON t2.id = t1.id
WHERE t1.id IS NULL
Also, contact any server admins to see what they think about potential locking issues.
I would suggest you to use "Exists" of instead of "Not in"
Delete from table2 t2
where exists ( select 1 from table1 t1 where t1.id=t2.id)
I can't seem to ever remember this query!
I want to delete all rows in table1 whose ID's are the same as in Table2.
So:
DELETE table1 t1
WHERE t1.ID = t2.ID
I know I can do a WHERE ID IN (SELECT ID FROM table2) but I want to do this query using a JOIN if possible.
DELETE t1
FROM Table1 t1
JOIN Table2 t2 ON t1.ID = t2.ID;
I always use the alias in the delete statement as it prevents the accidental
DELETE Table1
caused when failing to highlight the whole query before running it.
DELETE Table1
FROM Table1
INNER JOIN Table2 ON Table1.ID = Table2.ID
There is no solution in ANSI SQL to use joins in deletes, AFAIK.
DELETE FROM Table1
WHERE Table1.id IN (SELECT Table2.id FROM Table2)
Later edit
Other solution (sometimes performing faster):
DELETE FROM Table1
WHERE EXISTS( SELECT 1 FROM Table2 Where Table1.id = Table2.id)
PostgreSQL implementation would be:
DELETE FROM t1
USING t2
WHERE t1.id = t2.id;
Try this:
DELETE Table1
FROM Table1 t1, Table2 t2
WHERE t1.ID = t2.ID;
or
DELETE Table1
FROM Table1 t1 INNER JOIN Table2 t2 ON t1.ID = t2.ID;
I think that you might get a little more performance if you tried this
DELETE FROM Table1
WHERE EXISTS (
SELECT 1
FROM Table2
WHERE Table1.ID = Table2.ID
)
This will delete all rows in Table1 that match the criteria:
DELETE Table1
FROM Table2
WHERE Table1.JoinColumn = Table2.JoinColumn And Table1.SomeStuff = 'SomeStuff'
Found this link useful
Copied from there
Oftentimes, one wants to delete some records from a table based on criteria in another table. How do you delete from one of those tables without removing the records in both table?
DELETE DeletingFromTable
FROM DeletingFromTable INNER JOIN CriteriaTable
ON DeletingFromTable.field_id = CriteriaTable.id
WHERE CriteriaTable.criteria = "value";
The key is that you specify the name of the table to be deleted from as the SELECT. So, the JOIN and WHERE do the selection and limiting, while the DELETE does the deleting. You're not limited to just one table, though. If you have a many-to-many relationship (for instance, Magazines and Subscribers, joined by a Subscription) and you're removing a Subscriber, you need to remove any potential records from the join model as well.
DELETE subscribers, subscriptions
FROM subscribers INNER JOIN subscriptions
ON subscribers.id = subscriptions.subscriber_id
INNER JOIN magazines
ON subscriptions.magazine_id = magazines.id
WHERE subscribers.name='Wes';
Deleting records with a join could also be done with a LEFT JOIN and a WHERE to see if the joined table was NULL, so that you could remove records in one table that didn't have a match (like in preparation for adding a relationship.) Example post to come.
Since the OP does not ask for a specific DB, better use a standard compliant statement.
Only MERGE is in SQL standard for deleting (or updating) rows while joining something on target table.
merge table1 t1
using (
select t2.ID
from table2 t2
) as d
on t1.ID = d.ID
when matched then delete;
MERGE has a stricter semantic, protecting from some error cases which may go unnoticed with DELETE ... FROM. It enforces 'uniqueness' of match : if many rows in the source (the statement inside using) match the same row in the target, the merge must be canceled and an error must be raised by the SQL engine.
To Delete table records based on another table
Delete From Table1 a,Table2 b where a.id=b.id
Or
DELETE FROM Table1
WHERE Table1.id IN (SELECT Table2.id FROM Table2)
Or
DELETE Table1
FROM Table1 t1 INNER JOIN Table2 t2 ON t1.ID = t2.ID;
I often do things like the following made-up example. (This example is from Informix SE running on Linux.)
The point of of this example is to delete all real estate exemption/abatement transaction records -- because the abatement application has a bug -- based on information in the real_estate table.
In this case last_update != nullmeans the account is not closed, and res_exempt != 'p' means the accounts are not personal property (commercial equipment/furnishings).
delete from trans
where yr = '16'
and tran_date = '01/22/2016'
and acct_type = 'r'
and tran_type = 'a'
and bill_no in
(select acct_no from real_estate where last_update is not null
and res_exempt != 'p');
I like this method, because the filtering criteria -- at least for me -- is easier to read while creating the query, and to understand many months from now when I'm looking at it and wondering what I was thinking.
Referencing MSDN T-SQL DELETE (Example D):
DELETE FROM Table1
FROM Tabel1 t1
INNER JOIN Table2 t2 on t1.ID = t2.ID
This is old I know, but just a pointer to anyone using this ass a reference. I have just tried this and if you are using Oracle, JOIN does not work in DELETE statements.
You get a the following message:
ORA-00933: SQL command not properly ended.
While the OP doesn't want to use an 'in' statement, in reply to Ankur Gupta, this was the easiest way I found to delete the records in one table which didn't exist in another table, in a one to many relationship:
DELETE
FROM Table1 as t1
WHERE ID_Number NOT IN
(SELECT ID_Number FROM Table2 as t2)
Worked like a charm in Access 2016, for me.
delete
table1
from
t2
where
table1.ID=t2.ID
Works on mssql
I am trying to write the following MySQL query in PostgreSQL 8.0 (specifically, using Redshift):
DELETE t1 FROM table t1
LEFT JOIN table t2 ON (
t1.field = t2.field AND
t1.field2 = t2.field2
)
WHERE t1.field > 0
PostgreSQL 8.0 does not support DELETE FROM table USING. The examples in the docs say that you can reference columns in other tables in the where clause, but that doesn't work here as I'm joining on the same table I'm deleting from. The other example is a subselect query, but the primary key of the table I'm working with has four columns so I can't see a way to make that work either.
Amazon Redshift was forked from Postgres 8.0, but is a very much different beast. The manual informs, that the USING clause is supported in DELETE statements:
Just use the modern form:
DELETE FROM tbl
USING tbl t2
WHERE t2.field = tbl.field
AND t2.field2 = tbl.field2
AND t2.pkey <> tbl.pkey -- exclude self-join
AND tbl.field > 0;
This is assuming JOIN instead of LEFT JOIN in your MySQL statement, which would not make any sense. I also added the condition AND t2.pkey <> t1.pkey, to make it a useful query. This excludes rows joining itself. pkey being the primary key column.
What this query does:
Delete all rows where at least one other row exists in the same table with the same not-null values in field and field2. All such duplicates are deleted without leaving a single row per set.
To keep (for example) the row with the smallest pkey per set of duplicates, use t2.pkey < t2.pkey.
An EXISTS semi-join (as #wilplasser already hinted) might be a better choice, especially if multiple rows could be joined (a row can only be deleted once anyway):
DELETE FROM tbl
WHERE field > 0
AND EXISTS (
SELECT 1
FROM tbl t2
WHERE t2.field = tbl.field
AND t2.field2 = tbl.field2
AND t2.pkey <> tbl.pkey
);
I don't understand the mysql syntax, but you probably want this:
DELETE FROM mytablet1
WHERE t1.field > 0
-- don't need this self-join if {field,field2}
-- are a candidate key for mytable
-- (in that case, the exists-subquery would detect _exactly_ the
-- same tuples as the ones to be deleted, which always succeeds)
-- AND EXISTS (
-- SELECT *
-- FROM mytable t2
-- WHERE t1.field = t2.field
-- AND t1.field2 = t2.field2
-- )
;
Note: For testing purposes, you can replace the DELETE keyword by SELECT * or SELECT COUNT(*), and see which rows would be affected by the query.
I have this statement:
update new_table t2
set t2.creation_date_utc =
(select creation_date from old_table t1 where t2.id = t1.id)
where exists
(select 1 from old_table t1 where t2.id = t1.id);
The cost according to the explain plan is 150959919. The explain plan shows some full table access for a total cost of like 3000, and then the update has that basically infinite cost. It indeed seems to go on forever if run.
FYI, these tables have no more than 300k rows each.
In addition, this query.
select
(select creation_date from old_table t1 where t2.id = t1.id)
from new_table t2;
finishes basically instantly.
What could be the cause of this?
if you have an update statement in the format that you mentioned, then it implies that the id field is unique across the old_table (otherwise the first inner query would have raised error returning multiple values for update when in fact, only one value can be processed). So, you can modify your first query to be(removing the where clause since it is redundant) :-
update new_table t2
set t2.creation_date_utc =
(select creation_date from old_table t1 where t2.id = t1.id);
it might be possible that the above query will still take a long time owing to full table scans. So, you have two options :-
apply an index to the old_table table on the id field using the following command.
Create index index_name on old_table(id);
modify your update query to the following (untested) :-
update new_table t2
set t2.creation_date_utc=
(select creation_date from old_table t1 where t2.id=t1.id and rownum=1);
The rownum=1 should instruct oracle not to do any more searching as soon as the first match is found in old_table.
I would recommend the first approach though.
You don't have an index on old_table.id and nested loop joins are very expensive. An index on old_table(id, creation_date) would be best.
Why not use merge statement? I've found it to be more efficient in similar cases.
merge into new_table t2
using old_table t1
on (t2.id = t1.id)
when matched then
update set t2.creation_date_utc = t1.creation_date;
This may be faster and equivalent:
update new_table t2
set t2.creation_date_utc =
(select creation_date from old_table t1 where t2.id = t1.id)
where t2.id in
(select id from old_table t1);
I can't seem to ever remember this query!
I want to delete all rows in table1 whose ID's are the same as in Table2.
So:
DELETE table1 t1
WHERE t1.ID = t2.ID
I know I can do a WHERE ID IN (SELECT ID FROM table2) but I want to do this query using a JOIN if possible.
DELETE t1
FROM Table1 t1
JOIN Table2 t2 ON t1.ID = t2.ID;
I always use the alias in the delete statement as it prevents the accidental
DELETE Table1
caused when failing to highlight the whole query before running it.
DELETE Table1
FROM Table1
INNER JOIN Table2 ON Table1.ID = Table2.ID
There is no solution in ANSI SQL to use joins in deletes, AFAIK.
DELETE FROM Table1
WHERE Table1.id IN (SELECT Table2.id FROM Table2)
Later edit
Other solution (sometimes performing faster):
DELETE FROM Table1
WHERE EXISTS( SELECT 1 FROM Table2 Where Table1.id = Table2.id)
PostgreSQL implementation would be:
DELETE FROM t1
USING t2
WHERE t1.id = t2.id;
Try this:
DELETE Table1
FROM Table1 t1, Table2 t2
WHERE t1.ID = t2.ID;
or
DELETE Table1
FROM Table1 t1 INNER JOIN Table2 t2 ON t1.ID = t2.ID;
I think that you might get a little more performance if you tried this
DELETE FROM Table1
WHERE EXISTS (
SELECT 1
FROM Table2
WHERE Table1.ID = Table2.ID
)
This will delete all rows in Table1 that match the criteria:
DELETE Table1
FROM Table2
WHERE Table1.JoinColumn = Table2.JoinColumn And Table1.SomeStuff = 'SomeStuff'
Found this link useful
Copied from there
Oftentimes, one wants to delete some records from a table based on criteria in another table. How do you delete from one of those tables without removing the records in both table?
DELETE DeletingFromTable
FROM DeletingFromTable INNER JOIN CriteriaTable
ON DeletingFromTable.field_id = CriteriaTable.id
WHERE CriteriaTable.criteria = "value";
The key is that you specify the name of the table to be deleted from as the SELECT. So, the JOIN and WHERE do the selection and limiting, while the DELETE does the deleting. You're not limited to just one table, though. If you have a many-to-many relationship (for instance, Magazines and Subscribers, joined by a Subscription) and you're removing a Subscriber, you need to remove any potential records from the join model as well.
DELETE subscribers, subscriptions
FROM subscribers INNER JOIN subscriptions
ON subscribers.id = subscriptions.subscriber_id
INNER JOIN magazines
ON subscriptions.magazine_id = magazines.id
WHERE subscribers.name='Wes';
Deleting records with a join could also be done with a LEFT JOIN and a WHERE to see if the joined table was NULL, so that you could remove records in one table that didn't have a match (like in preparation for adding a relationship.) Example post to come.
Since the OP does not ask for a specific DB, better use a standard compliant statement.
Only MERGE is in SQL standard for deleting (or updating) rows while joining something on target table.
merge table1 t1
using (
select t2.ID
from table2 t2
) as d
on t1.ID = d.ID
when matched then delete;
MERGE has a stricter semantic, protecting from some error cases which may go unnoticed with DELETE ... FROM. It enforces 'uniqueness' of match : if many rows in the source (the statement inside using) match the same row in the target, the merge must be canceled and an error must be raised by the SQL engine.
To Delete table records based on another table
Delete From Table1 a,Table2 b where a.id=b.id
Or
DELETE FROM Table1
WHERE Table1.id IN (SELECT Table2.id FROM Table2)
Or
DELETE Table1
FROM Table1 t1 INNER JOIN Table2 t2 ON t1.ID = t2.ID;
I often do things like the following made-up example. (This example is from Informix SE running on Linux.)
The point of of this example is to delete all real estate exemption/abatement transaction records -- because the abatement application has a bug -- based on information in the real_estate table.
In this case last_update != nullmeans the account is not closed, and res_exempt != 'p' means the accounts are not personal property (commercial equipment/furnishings).
delete from trans
where yr = '16'
and tran_date = '01/22/2016'
and acct_type = 'r'
and tran_type = 'a'
and bill_no in
(select acct_no from real_estate where last_update is not null
and res_exempt != 'p');
I like this method, because the filtering criteria -- at least for me -- is easier to read while creating the query, and to understand many months from now when I'm looking at it and wondering what I was thinking.
Referencing MSDN T-SQL DELETE (Example D):
DELETE FROM Table1
FROM Tabel1 t1
INNER JOIN Table2 t2 on t1.ID = t2.ID
This is old I know, but just a pointer to anyone using this ass a reference. I have just tried this and if you are using Oracle, JOIN does not work in DELETE statements.
You get a the following message:
ORA-00933: SQL command not properly ended.
While the OP doesn't want to use an 'in' statement, in reply to Ankur Gupta, this was the easiest way I found to delete the records in one table which didn't exist in another table, in a one to many relationship:
DELETE
FROM Table1 as t1
WHERE ID_Number NOT IN
(SELECT ID_Number FROM Table2 as t2)
Worked like a charm in Access 2016, for me.
delete
table1
from
t2
where
table1.ID=t2.ID
Works on mssql