delete rows in parent-child tables found in another parent-child tables - sql

I am loading data into a parent-child pair of tables in a "staging" database schema. If there are duplicate records that were previously loaded into a parent-child pair of tables in a "master" database schema, I want to delete them from the "staging" database tables.
This query
SELECT A.*,B.*
FROM STG.AUTO_REPR_PAR_STG A
JOIN STG.AUTO_REPR_CHLD_STG B
ON A.TEST_SEQ_NUM=B.TEST_SEQ_NUM
WHERE EXISTS ( SELECT A.*, B.*
FROM MST.AUTO_REPR_PAR A
JOIN MST.AUTO_REPR_CHLD B
ON A.TEST_SEQ_NUM=B.TEST_SEQ_NUM
)
will show what's in staging that was previously loaded in master. But how do I delete from the parent-child pair of tables in staging database? I am drawing a "blank"....I tried this but it bombs ("Tables not allowed in FROM clause"):
DELETE FROM STG.AUTO_REPR_PAR_STG A
JOIN STG.AUTO_REPR_CHLD_STG B
ON A.TEST_SEQ_NUM=B.TEST_SEQ_NUM
WHERE EXISTS (SELECT A.*, B.*
FROM MST.AUTO_REPR_PAR A
JOIN MST.AUTO_REPR_CHLD B
ON A.TEST_SEQ_NUM=B.TEST_SEQ_NUM
)
Back-end is Teradata v13. I am currently researching the CASCADE DELETE option but I am not even sure it is supported....Any idea?

There's no way to delete from multiple tables in a single DELETE statement, you need one for each table:
DELETE FROM STG.AUTO_REPR_PAR_STG A
WHERE TEST_SEQ_NUM IN (
SELECT A.TEST_SEQ_NUM FROM MST.AUTO_REPR_PAR A JOIN MST.AUTO_REPR_CHLD B
ON A.TEST_SEQ_NUM=B.TEST_SEQ_NUM )
;DELETE FROM STG.AUTO_REPR_CHLD_STG B
WHERE TEST_SEQ_NUM IN (
SELECT A.TEST_SEQ_NUM FROM MST.AUTO_REPR_PAR A JOIN MST.AUTO_REPR_CHLD B
ON A.TEST_SEQ_NUM=B.TEST_SEQ_NUM )
If you run this as a Multi Statement Request the join will be done only once.

You may try something like this:
Instead of a subquery with the EXIST clause, you can use an OUTER JOIN - you select all rows with NULL columns in the target outer table, i.e. the not-matching rows;
You save the result of the the previous query into a temporary table, and you run 2 DELETE statements.
An OUTER JOIN is much more efficient compared to a subquery with EXISTS, especially with large data sets.

Related

Best way to combine two tables, remove duplicates, but keep all other non-duplicate values in SQL

I am looking for the best way to combine two tables in a way that will remove duplicate records based on email with a priority of replacing any duplicates with the values in "Table 2", I have considered full outer join and UNION ALL but Union all will be too large as each table has several 1000 columns. I want to create this combination table as my full reference table and save as a view so I can reference it without always adding a union or something to that effect in my already complex statements. From my understanding, a full outer join will not necessarily remove duplicates. I want to:
a. Create table with ALL columns from both tables (fields that don't apply to records in one table will just have null values)
b. Remove duplicate records from this master table based on email field but only remove the table 1 records and keep the table 2 duplicates as they have the information that I want
c. A left-join will not work as both tables have unique records that I want to retain and I would like all 1000+ columns to be retained from each table
I don't know how feasible this even is but thank you so much for any answers!
If I understand your question correctly you want to join two large tables with thousands of columns that (hopefully) are the same between the two tables using the email column as the join condition and replacing duplicate records between the two tables with the records from Table 2.
I had to do something similar a few days ago so maybe you can modify my query for your purposes:
WITH only_in_table_1 AS(
SELECT *
FROM table_1 A
WHERE NOT EXISTS
(SELECT * FROM table_2 B WHERE B.email_field = A.email_field))
SELECT * FROM table_2
UNION ALL
SELECT * FROM only_in_table_1
If the columns/fields aren't the same between tables you can use a full outer join on only_in_table_1 and table_2
try using a FULL OUTER JOIN between the two tables and then a COALESCE function on each resultset column to determine from which table/column the resultset column is populated

SQL Inner Join w/ Unique Vals

Questions similar to this one about using DISTINCT values in an INNER JOIN have been asked a few times, but I don't see my (simple) use case.
Problem Description:
I have two tables Table A and Table B. They can be joined via a variable ID. Each ID may appear on multiple rows in both Table A and Table B.
I would like to INNER JOIN Table A and Table B on the distinct values of ID which appear in Table B and select all rows of Table A with a Table A.ID which appears matching some condition in Table B.
What I want:
I want to make sure I get only one copy of each row of Table A with a Table A.ID matching a Table B.ID which satisfies [some condition].
What I would like to do:
SELECT * FROM TABLE A
INNER JOIN (
SELECT DISTINCT ID FROM TABLE B WHERE [some condition]
) ON TABLE A.ID=TABLE B.ID
Additionally:
As a further (really dumb) constraint, I can't say anything about the SQL standard in use, since I'm executing the SQL query through Stata's odbc load command on a database I have no information about beyond the variable names and the fact that "it does accept SQL queries," ( <- this is the extent of the information I have).
If you want all rows in a that match an id in b, then use exists:
select a.*
from a
where exists (select 1 from b where b.id = a.id);
Trying to use join just complicates matters, because it both filters and generates duplicates.

What does it mean to INNER JOIN before an INSERT?

I have the following case where I'm doing an insert into a table, however, before I can do that, I to grab a foreign key ID that's associated with another table. That foreign key ID is not a simply look up, but rather requires an INNER JOIN of two other tables to be able to get that ID.
So, what I'm currently doing is the following:
Inner joining A, B and grabbing the ID that I need.
Once I resolve the value from above, I insert into table C with
the foreign key that I got from step 1.
Now, I was wondering if there is a better way for doing this. Could I do the join of table A and B and insert into table C all in one statement? This is where I was getting confused on what it means to INNER JOIN across tables and then INSERT. Are you potentially inserting into multiple tables?
You can use the insert-select syntax to insert the results of a query (which may or may not involve a join) to another table. E.g.:
INSERT INTO C
SELECT col_from_a, col_from_b
FROM a
JOIN b ON a.id = b.id

Need to use join in the where clause of a update statement in sql?

I need to join multiple tables in the where clause of a update statement. Precisely there are two tables with master slave relationship. I need to update a row in the master table but need to check for the foreign key entry in its slave table.
Table A
TableId,Empid,EmpName,EmpAdd
Table B
TableId,Empid,DeptId,DeptName
When a row is inserted in Table A, Table B also has an insert. Say I need to update EmpAdd of TableA and this shall be based on the columns Empid,DeptId,DeptName from the two tables. Therefore I guess I need to join two tables.
what about checking for EXISTS instead of JOINs:
UPDATE tbl_master m
SET m.some_column = some_value
WHERE m.masteID = updatetable_id
AND EXISTS (SELECT * FROM tbl_slave s WHERE s.masterID = m.masterID)

Delete Query using Inner joins on more than two tables

I want to delete records from a table using inner joins on more than two tables. Say if I have tables A,B,C,D with A's pk shared in all other mentioned tables. Then how to write a delete query to delete records from table D using inner joins on table B and A since the conditions are fetched from these two tables. I need this query from DB2 perspective. I am not using IN clause or EXISTS because of their limitations.
From your description, I take the schema as:
A(pk_A, col1, col2, ...)
B(pk_B, fk_A, col1, col2, ..., foreign key fk_A references A(pk_A))
C(pk_c, fk_A, col1, col2, ..., foreign key fk_A references A(pk_A))
D(pk_d, fk_A, col1, col2, ..., foreign key fk_A references A(pk_A))
As you say DB2 will allow only 1000 rows to be deleted if IN clause is used. I don't know about DB2, but Oracle allows only 1000 manual values inside the IN clause. There is not such limit on subquery results in Oracle at least. EXISTS should not be a problem as any database, including Oracle and DB2 checks only for existence of rows, be it one or a million.
There are three scenarios on deleting data from table D:
You want to delete data from table D in which fk_A (naturally) refers to a record in table A using column A.pk_A:
DELETE FROM d
WHERE EXISTS (
SELECT 1
FROM a
WHERE a.pk_A = d.fk_A
);
You want to delete data from table D in which fk_A refers to a record in table A, and that record in table A is also referred to by column B.fk_A. We do not want to delete the data from D that is in A but not in B. We can write:
DELETE FROM d
WHERE EXISTS (
SELECT 1
FROM a
INNER JOIN b ON a.pk_A = b.fk_A
WHERE a.pk_A = d.fk_A
);
The third scenario is when we have to delete data in table D that refers to a record in table A, and that record in A is also referred by columns B.fk_A and table C.fk_A. We want to delete only that data from table D which is common in all the four tables - A, B, C and D. We can write:
DELETE FROM d
WHERE EXISTS (
SELECT 1
FROM a
INNER JOIN b ON a.pk_A = b.fk_A
INNER JOIN c ON a.pk_A = c.fk_A
WHERE a.pk_A = d.fk_A
);
Depending upon your requirement you can incorporate one of these queries.
Note that "=" operator would return an error if the subquery retrieves more than one line. Also, I don't know if DB2 supports ANY or ALL keywords, hence I used a simple but powerful EXISTS keyword which performs faster than IN, ANY and ALL.
Also, you can observe here that the subqueries inside the EXISTS clause use "SELECT 1", not "SELECT a.pk" or some other column. This is because EXISTS, in any database, looks for only existence of rows, not for any particular values inside the columns.
Based on 'Using SQL to delete rows from a table using INNER JOIN to another table'
The key is that you specify the name of the table to be deleted from
as the SELECT. So, the JOIN and WHERE do the selection and limiting,
while the DELETE does the deleting. You're not limited to just one
table, though. If you have a many-to-many relationship (for instance,
Magazines and Subscribers, joined by a Subscription) and you're
removing a Subscriber, you need to remove any potential records from
the join model as well.
DELETE subscribers
FROM subscribers INNER JOIN subscriptions
ON subscribers.id = subscriptions.subscriber_id
INNER JOIN magazines
ON subscriptions.magazine_id = magazines.id
WHERE subscribers.name='Wes';
delete from D
where fk = (select d.fk from D d,A a,B b where a.pk = b.fk and b.fk = d.fk )
this should work