I am trying to write the following MySQL query in PostgreSQL 8.0 (specifically, using Redshift):
DELETE t1 FROM table t1
LEFT JOIN table t2 ON (
t1.field = t2.field AND
t1.field2 = t2.field2
)
WHERE t1.field > 0
PostgreSQL 8.0 does not support DELETE FROM table USING. The examples in the docs say that you can reference columns in other tables in the where clause, but that doesn't work here as I'm joining on the same table I'm deleting from. The other example is a subselect query, but the primary key of the table I'm working with has four columns so I can't see a way to make that work either.
Amazon Redshift was forked from Postgres 8.0, but is a very much different beast. The manual informs, that the USING clause is supported in DELETE statements:
Just use the modern form:
DELETE FROM tbl
USING tbl t2
WHERE t2.field = tbl.field
AND t2.field2 = tbl.field2
AND t2.pkey <> tbl.pkey -- exclude self-join
AND tbl.field > 0;
This is assuming JOIN instead of LEFT JOIN in your MySQL statement, which would not make any sense. I also added the condition AND t2.pkey <> t1.pkey, to make it a useful query. This excludes rows joining itself. pkey being the primary key column.
What this query does:
Delete all rows where at least one other row exists in the same table with the same not-null values in field and field2. All such duplicates are deleted without leaving a single row per set.
To keep (for example) the row with the smallest pkey per set of duplicates, use t2.pkey < t2.pkey.
An EXISTS semi-join (as #wilplasser already hinted) might be a better choice, especially if multiple rows could be joined (a row can only be deleted once anyway):
DELETE FROM tbl
WHERE field > 0
AND EXISTS (
SELECT 1
FROM tbl t2
WHERE t2.field = tbl.field
AND t2.field2 = tbl.field2
AND t2.pkey <> tbl.pkey
);
I don't understand the mysql syntax, but you probably want this:
DELETE FROM mytablet1
WHERE t1.field > 0
-- don't need this self-join if {field,field2}
-- are a candidate key for mytable
-- (in that case, the exists-subquery would detect _exactly_ the
-- same tuples as the ones to be deleted, which always succeeds)
-- AND EXISTS (
-- SELECT *
-- FROM mytable t2
-- WHERE t1.field = t2.field
-- AND t1.field2 = t2.field2
-- )
;
Note: For testing purposes, you can replace the DELETE keyword by SELECT * or SELECT COUNT(*), and see which rows would be affected by the query.
Related
I need to correct some data in a table (a correclation table was messed up and therefore I need to gather the correlation data and then update another table with the information). To not mess up the production data I want to collect all the data I need into a seperate table (temptablecorrection) and then update the respective production DB tables from there.
The seperate table already has some filled rows from a previous SQL (t2.field4).
MERGE INTO temptablecorrection t1
USING (
SELECT DISTINCT
cfield1,
cfield2,
cfield3
FROM
maintable
WHERE
ccreationtime BETWEEN 1656396000000 AND 1656550800000
)
t2 ON ( t1.field3 = t2.field4 )
WHEN MATCHED THEN UPDATE
SET t1.field1 = t2.field1,
t1.field2 = t2.field2;
t1.field3 is unique in the timeframe (-> dinstinct count field3 and rowcount are the same).
t2.field4 is unique in the timeframe (-> dinstinct count field4 and rowcount are the same).
I get the following SQL error:
MERGE into dataUMCContrl t1 USING (select DISTINCT cfield1, cfield2,cfield3 from mainTable where ccrea...
ORA-30926: unable to get a stable set of rows in the source tables [SQL State=99999, DB Errorcode=30926]
1 statement failed.
I have no Idea what the issue is. On my testsytem the SQL works without issues.
When I checked for the issue the pointers I found were in regard to an issue with not unique rows.
But from what I gather t1.field3 and t2.field4 are unique.
In total the timeframe covers almost 74000 rows.
Any ideas to point me in the right direction?
Since field4 is unique use it in the select statement in the using clause.Also how are you using field4 in the ON clause without having field4 in the select statement
MERGE INTO temptablecorrection t1
USING (
SELECT DISTINCT
cfield1,
cfield2,
cfield3,
field4,
rowid row_id
FROM
maintable
WHERE
ccreationtime BETWEEN 1656396000000 AND 1656550800000
)
t2 ON ( t1.field3 = t2.field4 )
WHEN MATCHED THEN UPDATE
SET t1.field1 = t2.cfield1,
t1.field2 = t2.cfield2;
The PK of table1 is a FK in table2.
I wish to update the status in table1 where no record exists in table2 and limit the number of updates. There may be no records in table2.
Something like:
UPDATE t1
SET status = 0
WHERE NOT EXISTS (
SELECT id
FROM t2
WHERE t1.id = t2.id
LIMIT 1000
)
This is a little complicated in Postgres, because there is no limit. Assuming that you have a primary key in t1 (which I'll assume is id), you can use a subquery to determine the rows to update and then match in the WHERE clause:
UPDATE t1
SET status = 0
FROM (SELECT tt1.*
FROM t1 tt1
WHERE NOT EXISTS (SELECT t2.id FROM t2 WHERE tt1.id = t2.id)
LIMIT 1000
) ttl
WHERE t1.id = tt1.id;
If you are doing this under concurrent write load, there is a race condition between the subquery (the SELECT to determine rows) and the outer UPDATE, which can lead to wrong results. To defend against this, add a row locking clause.
However, this needs to be done in a CTE to be reliable (at least in my test wit Postgres up to version 10). So:
WITH cte AS (
SELECT id -- PK
FROM t1
WHERE NOT EXISTS (SELECT FROM t2 WHERE t2.id = t1.id)
LIMIT 1000
FOR UPDATE -- SKIP LOCKED -- ?
)
UPDATE t1
SET status = 0
FROM cte
WHERE t1.id = cte.id
RETURNING id; -- optional
If you run multiple commands like this, possibly in parallel, add SKIP LOCKED, so that they don't block each other.
This only secures existing rows in t1. There is still the problem that conflicting rows might be added in t2 between SELECT and UPDATE.
You mentioned a FK constraint. I am not sure from the top of my head whether depending rows in t2 are blocked from being added by the FK constraint while there is a FOR UPDATE lock on the parent row. Would have to test, but out of time right now.
Postgres has no predicate-locking for user commands. (Users can only lock existing rows.) To be absolutely sure, you could also use the (more expensive) SERIALIZABLE transaction isolation.
See:
Postgres UPDATE … LIMIT 1
I have table1, table2 and table3,
they have a join condition that relates between them.
suppose it is the (ID) column.
So the question is,
in the case of using the merge statement, is it possible
to construct the syntax to be like this:
Merge into Table1
using table2 , table3
on (table1.ID = Table2.ID , Table2.ID = Table.ID)
when match then
update --(definitly table1)
where
table1.something between table2.something and table2.something -- whatever :)
when not match then
do_nothing --I think I should type NULL here
if this syntax is wrong, how should I call two tables and using them to update a row in table1?
how should I call two tables and using them to update a row in table1?
This can be achieved in several ways in Oracle :
correlated subquery
inline view
merge query
The following code gives a raw, commented example of the third solution (merge statement). As you did not show us your exact SQL attempt and the structure of your tables, you will have to adapt this to your exact use case :
MERGE INTO table1 target
-- prepare the dataset to use during the UPDATE
USING (
SELECT
-- following fields will be available in the UPDATE
t1.id,
t2.foo,
t3.bar
FROM
-- JOIN conditions between the 3 tables
table1 t1
INNER JOIN table2 t2 on t2.id = t1.id
INNER JOIN table3 t3 on t3.id = t1.id
WHERE
-- WHERE clause (if needed)
t1.zoo = 'blah'
) source
-- search records to UPDATE
ON (target.id = source.id)
WHEN MATCHED THEN
UPDATE SET
-- UPDATE table1 fieds
target.value1 = source.foo,
target.value2 = source.foo
;
Note : while this query makes use of the Oracle MERGE statement, it conceptually does not implement a real merge operation. The concept of a merge is an update/insert query, whereas this query only does an update, and ignores the insert part. Still, this is one of the simplest way to perform such a correlated update in Oracle...
How do you write a update statement with a Sub-Select in an Oracle Environment (SQL Developer)?
Example: UPDATE table SET column = (SELECT....)
Every time I try this it gives me ORA-01427 "Sub select returns more then one row" even if there is no WHERE clause..
Based on the understanding of your question I'd suggest use Merge statement.
Merge into Table1
Using
(SELECT * from table2 where condition) Temp
On (Table1.columname condition Temp.columname)
When matched Then update Set Table1.column_name = Temp.column_name;
Table1 is the table where you want to update the records.
Table2 is the table from which you want to get the data (The sub query which you are talking about )
Using this merge statement you will be able to update n number of rows.
If you want to update multiple rows, you can either use a MERGE statement (as in #jackkds7's answer above) or you can use a filter on your subselect:
UPDATE table t1
SET column = ( SELECT column FROM table2 t2 WHERE t2.key = t1.key );
If there aren't matches in table2 for all the records in table then column will be set to NULL for the non-matches. To avoid that, add a WHERE EXISTS clause:
UPDATE table t1
SET column = ( SELECT column FROM table2 t2 WHERE t2.key = t1.key )
WHERE EXISTS ( SELECT 1 FROM table2 t2 WHERE t2.key = t1.key );
Oh and in the event that key is not unique for table2, you can aggregate (up to you to figure out which function would be best):
UPDATE table t1
SET column = ( SELECT MAX(column) FROM table2 t2 WHERE t2.key = t1.key )
WHERE EXISTS ( SELECT 1 FROM table2 t2 WHERE t2.key = t1.key );
Hope this helps.
I think it would help if you posted your actual query.
In essence, the "inner" select would be executed for each row that would be updated. This inner select query is called a correlated subquery:
UPDATE table t SET t.column = (
select ot.othercolumn from othertable ot
where ot.fk = t.id --This is the correlation part, that finds
--he right value for the row you are currently updating
)
You must ensure the subquery you use will always return just a single row and a single column for every time it runs (that is, for every row that is going to be updated). If needed, you can use MAX(), or ROWNUM to ensure you always only get 1 value
More examples:
Using Correlated Subqueries
I found a very useful delete query that will delete duplicates based on specific columns:
DELETE FROM table USING table alias
WHERE table.field1 = alias.field1 AND table.field2 = alias.field2 AND
table.max_field < alias.max_field
How to delete duplicate entries?
However, is there an equivalent SELECT query that will allow to filter the same way? Was trying USING but no success.
Thank you.
You can join your table with itself using the specific columns, field1 and field2, and then filter based on a comparison between max_field on both tables.
select t1.*
from mytable t1
join mytable t2 on (t1.field1 = t2.field1 and t1.field2 = t2.field2)
where t1.max_field < t2.max_field;
You will get all the duplicates whose max_field is not the greatest.
sqlfiddle here.