postgresql delete rows based on inner join of complex subqueries - sql

I am trying to delete the matching rows from two complex subqueries. I am using postgresql. Here is a sample code:
DELETE FROM complex_subquery1 as a
USING complex_subquery2 as b
WHERE a.column1 = b.column2
I read here: PostgreSQL: delete rows returned by subquery that this is not really possible this way. Is there a shortcut for the case of deleting inner join?

The normal way to do that is
DELETE FROM atable
USING complex_subquery1 as a,
complex_subquery2 as b
WHERE a.column1 = b.column2
AND a.column3 = atable.column4;

Related

SQL inner join and where performance comparison

If we have 2 tables, tableA (with column1, column2) and tableB (with column1, column2), what's the difference between the following two queries? Which one has better performance? What if we have indexing for both tables?
Query #1:
select
b.column2
from
tableA a,
tableB b
where
a.column1 = b.column1
and a.column2 = ?;
Query #2:
select
b.column2
from
tableA a
inner join
tableB b on a.column1 = b.column1
where
a.column2 = ?;
2nd query has better performance.
You are using cross join in your first query and then filtering the results. Imagine having 10000 records in both the tables, it will produce 10000*10000 combinations.
Both will perform equally. One is an ansi style and the other is old fashioned style of joining
You may compare the explain plans and most likely you will find them to be the same.

Deleting rows from a join of two tables and rownum condition in Oracle DB

Learning PL/SQL with Oracle DB and trying to accomplish the following:
I have two tables a and b. I am joining them on id, add several conditions and then try removing resulting rows only from table a in a batch size of 1000. Base query looks like this:
DELETE (SELECT *
FROM SCHEMA.TABLEA a
INNER JOIN SCHEMA.TABLEB b ON a.b_id = b.id
WHERE par=0 AND ROWNUM <= 1000);
This obviously doesn’t work as I am trying to manipulate a view: “data manipulation operation not legal on this view”
How can I rewrite this?
you can only remote from a table, there now Need to do a join. you can handle it in a where clause if you Need
you delete Statement could be e.g.
DELETE from SCHEMA.TABLEA a
where a.id in (select b.id from SCHEMA.TABLEB b)
and par=0 AND ROWNUM <= 1000
You can write simple query which checks if the rows in TABLEA that are required to be deleted exists in TABLEB.
DELETE
FROM schema.tablea a
WHERE par = 0
AND EXISTS (SELECT 1 FROM schema.tableb b WHERE a.b_id = b.id)
AND rownum <= 1000;

What is the efficient way to query subset of a joined table

My query is somewhat like this
SELECT TableA.Column1
FROM TableA
LEFT JOIN TableB ON TableA.ForeignKey = TableB.PrimaryKey
LEFT JOIN TableC ON TableC.PrimaryKey = TableB.ForeignKey
WHERE TableC.SomeColumn = 'XXX'
In the above case Table A and Table B are large tables (may contain more than 1 million rows), but Table C is small, with just 25 rows.
I have applied indexes on primary keys of all the tables.
In our application scenario, I need to search in TableC for just two conditions, TableC.SomeColumn = 'XXX' or TableC.SomeColumn = 'YYY'.
My question is what is the most efficient way to do this. A straight join does work, but I am concerned about joining with all the rows in TableB, just to pick a small subset of it, when joined in Table C.
Is it a good approach to have an indexed view?
For example,
CREATE INDEXED VIEW FOR TableB
JOIN TableC ON TableC.PrimaryKey = TableB.ForeignKey
WHERE TableC.SomeColumn IN ('XXX', 'YYY'))?
You where clause undoes the outer join, so you might as well write the query as:
SELECT a.Column1
FROM TableA a JOIN
TableB b
ON a.ForeignKey = b.PrimaryKey JOIN
TableC c
ON c.PrimaryKey = b.ForeignKey
WHERE c.SomeColumn = 'XXX';
For this query, you want indexes these indexes:
TableC(SomeColumn, PrimaryKey)
TableB(ForeignKey, PrimaryKey)
TableA(ForeignKey, Column1)
You can create an indexed view. That would generally be the fastest for querying. However, it can incur a lot more overhead for updates and inserts into any of the base tables.
I typically only use a JOIN when I need to SELECT or GROUP on the data, not when using it as a predicate. That said, I would be very curious to see if Gordon's answer or this one performs better.
I would also suggest getting in the habit of using alias' when referencing your tables, its less typing, and makes your code easier to read.
I would test and compare execution times:
SELECT A.Column1
FROM TableA A
WHERE EXISTS (SELECT 1
FROM TableB B
WHERE A.ForeignKey = B.PrimaryKey
AND EXISTS (SELECT 1
FROM TableC C
WHERE C.PrimaryKey = B.ForeignKey
AND C.SomeColumn = 'XXX'))

Oracle semi-join with multiple tables in SQL subquery

This question is how to work around the apparent oracle limitation on semi-joins with multiple tables in the subquery. I have the following 2 UPDATE statements.
Update 1:
UPDATE
(SELECT a.flag update_column
FROM a, b
WHERE a.id = b.id AND
EXISTS (SELECT NULL
FROM c
WHERE c.id2 = b.id2 AND
c.time BETWEEN start_in AND end_in) AND
EXISTS (SELECT NULL
FROM TABLE(update_in) d
WHERE b.time BETWEEN d.start_time AND d.end_time))
SET update_column = 'F'
The execution plan indicayes that this correctly performs 2 semi-joins, and the update executes in seconds. These need to be semi-joins because c.id2 is not a unique foreign key on b.id2, unlike b.id and a.id. And update_in doesn't have any constraints at all since it's an array.
Update 2:
UPDATE
(SELECT a.flag update_column
FROM a, b
WHERE a.id = b.id AND
EXISTS (SELECT NULL
FROM c, TABLE(update_in) d
WHERE c.id2 = b.id2 AND
c.time > d.time AND
b.time BETWEEN d.start_time AND d.end_time))
SET update_column = 'F'
This does not do a semi-join; I believe based on the Oracle documentation that's because the EXISTS subquery has 2 tables in it. Due to the sizes of the tables, and partitioning, this update takes hours. However, there is no way to relate d.time to the associated d.start_time and d.end_time other than being on the same row. And the reason we pass in the update_in array and join it here is because running this query in a loop for each time/start_time/end_time combination also proved to give poor performance.
Is there a reason other than the 2 tables that the semi-join could be not working? If not, is there a way around this limitation? Some simple solution I am missing that could make these criteria work without putting 2 tables in the subquery?
As Bob suggests you can use a Global Temporary Table (GTT) with the same structure as your update_in array, but the key difference is that you can create indexes on the GTT, and if you populate the GTT with representative sample data, you can also collect statistics on the table so the SQL query analyzer is better able to predict an optimal query plan.
That said there are also some other notable differences in your two queries:
In the first exists clause of your first query you refer to two columns start_in and end_in that don't have table references. My guess is that they are either columns in table a or b, or they are variables within the current scope of your sql statement. It's not clear which.
In your second query you refer to column d.time, however, you don't use that column in the first query.
Does updating your second query to the following improve it's performance?
UPDATE
(SELECT a.flag update_column
FROM a, b
WHERE a.id = b.id AND
EXISTS (SELECT NULL
FROM c, TABLE(update_in) d
WHERE c.id2 = b.id2 AND
c.time BETWEEN start_in AND end_in AND
c.time > d.time AND
b.time BETWEEN d.start_time AND d.end_time))
SET update_column = 'F'

How can I compare two tables and delete the duplicate rows in SQL?

I have two tables and I need to remove rows from the first table if an exact copy of a row exists in the second table.
Does anyone have an example of how I would go about doing this in MSSQL server?
Well, at some point you're going to have to check all the columns - might as well get joining...
DELETE a
FROM a -- first table
INNER JOIN b -- second table
ON b.ID = a.ID
AND b.Name = a.Name
AND b.Foo = a.Foo
AND b.Bar = a.Bar
That should do it... there is also CHECKSUM(*), but this only helps - you'd still need to check the actual values to preclude hash-conflicts.
If you're using SQL Server 2005, you can use intersect:
delete * from table1 intersect select * from table2
I think the psuedocode below would do it..
DELETE FirstTable, SecondTable
FROM FirstTable
FULL OUTER JOIN SecondTable
ON FirstTable.Field1 = SecondTable.Field1
... continue for all fields
WHERE FirstTable.Field1 IS NOT NULL
AND SecondTable.Field1 IS NOT NULL
Chris's INTERSECT post is far more elegant though and I'll use that in future instead of writing out all of the outer join criteria :)
I would try a DISTINCT query and do a union of the two tables.
You can use a scripting language like asp/php to format the output into a series of insert statements to rebuild the table the resulting unique data.
try this:
DELETE t1 FROM t1 INNER JOIN t2 ON t1.name = t2.name WHERE t1.id = t2.id