Anti Join based on multiple keys / columns (SQL)

Anti Join based on multiple keys / columns (SQL) - sql

The setting is simple, I wanted to retrieve all rows from table A that were not present in table B. Because a unique row can be identified using 4 columns, I needed to have a way to write the WHERE statement that it works correctly.
My solution is to concatenate the 4 columns and use that as "one" column/key to do the outer join:
select *
from table_A
where filter_condition = 0
and (column1 || column2 || column3 || column4) not in (
select A.column1 || A.column2 || A.column3 || A.column4
from table_A A -- 1618727
inner join table_B B
on A.column1 = B.column1
and A.column2 = B.column2
and A.column3 = B.column3
and A.column4 = B.column4
and filter_condition = 0
)
My question is, is this a good way of doing this or am I doing something fundamentally wrong?
To be clear, the desired result is simply to get back only the rows of table_A that I "lose" due to the INNER JOIN with table_A and table_B.

You seem to be looking for not exists:
select a.*
from table_a a
where a.filter_condition = 0
and not exists (
select 1
from table_b b
where
a.column1 = b.column1
and a.column2 = b.column2
and a.column3 = b.column3
and a.column4 = b.column4
)
This will give you all records in table_a that do not have a corresponding record in table_b.

Using a LEFT JOIN between A and B and checking for a NULL row in B is probably easier:
SELECT *
FROM table_A A
LEFT JOIN table_B B ON A.column1 = B.column1
AND A.column2 = B.column2
AND A.column3 = B.column3
AND A.column4 = B.column4
WHERE B.column1 IS NULL
AND A.filter_condition = 0

You should be able to use tuples (aka row constructors) in PostgreSQL:
select *
from table_a
where filter_condition = 0
and (column1, column2, column3, column4) not in
(
select column1, column2, column3, column4
from table_b
);
If the columns can be null, then better use NOT EXISTS, as null=null results in "unknown" rather than in true or false.

Related

Join two table with Multiple cases in ON condition

I have two tables A and B and I have to perform left join on that with multiple cases in on condition.
Is there any efficient way of doing this in big query or SQL.
select * from table_A A
left join table_B B
where
[some condition OR some condition]
on
case1
A.column1 =B.column1
and A.column2= B.column2
and A.column3= B.column3
and A.column4= B.column4
and A.column5= B.column5
OR case2
A.column1 =B.column1
and A.column3= B.column3
and A.column4= B.column4
and A.column5= B.column5
OR case3
A.column1 =B.column1
and A.column2= B.column2
and A.column4= B.column4
OR case4
A.column1 =B.column1
and A.column3= B.column3
and A.column5= B.column5
Here my main motive is that for one row if my case1 matches than it will not go into other cases. Likewise it will work if first is not matches then it will check second, then third and it will get best possible one match.
Here the cases will help that to get 100% of join between A and B table.
In first cases we are checking all 5 fields of both table, but if some of the field are null than it will check other case and likewise it should work.

If I understand correctly, the general approach in SQL is multiple left joins:
select a.*, coalesce(b1.col, b2.col, b3.col, b4.col) as col
from table_A A left join
table_B B1
on A.column1 = B1.column1 and
A.column2 = B1.column2 and
A.column3 = B1.column3 and
A.column4 = B1.column4 and
A.column5 = B1.column5 left join
table_b B2
on B1.column1 is null and
A.column1 = B2.column1 and
A.column3 = B2.column3 and
A.column4 = B2.column4 and
A.column5 = B2.column5 left join
table_b B3
on B2.column1 is null and
A.column1 = B3.column1 and
A.column2 = B3.column2 and
A.column3 = B3.column3 left join
table_b B4
on B3.column1 is null and
A.column2 = B4.column2 and
A.column4 = B4.column4

You want to get the "best" matching B rows. I.e. if there are rows matching case 1, you want to stick with these, but if there are none, then you want to try with case 2, etc.
What you can do is combine the conditions, so as to join all possible matches first. Then look at the matches and dismiss all except the best ones. Ranking can be done with RANK.
select *
from
(
select
*,
rank() over (partition by A.id
order by
case when A.column2 = B.column2
and A.column3 = B.column3
and A.column4 = B.column4
and A.column5 = B.column5 then 1
when A.column3 = B.column3
and A.column4 = B.column4
and A.column5 = B.column5 then 2
when A.column2 = B.column2
and A.column4 = B.column4 then 3
else 4
end) as rnk
from table_A A
left join table_B B
on A.column1 = B.column1
and
(
(A.column2 = B.column2 and A.column4 = B.column4)
or
(A.column3 = B.column3 and A.column5 = B.column5)
)
where [some condition OR some condition]
) ranked
where rnk = 1;
(My query assumes some ID in table_A. If your table doesn't have a unique ID, use whatever column(s) uniquely identify a row in the table.)

The solution can be to use a temporary data storage (temp table, cursors, or whatever) and use a parametrized loop to feed it. The problem that you have is that in pure SQL you don't have loops. You have to use the scripting languages of bigQuery, give a look here https://cloud.google.com/bigquery/docs/reference/standard-sql/scripting

Below two options I see - both for BigQuery Standard SQL (Thank you to #Thorsten-Kettner for helping in understanding OP's logic/requirements)
Option 1 - separate joins for each case; then combine all and finally pick the winner for each record in A
#standardSQL
SELECT * EXCEPT(priority, identity)
FROM (
SELECT AS VALUE ARRAY_AGG(t ORDER BY priority LIMIT 1)[OFFSET(0)]
FROM (
SELECT *, 1 priority, FORMAT('%t', A) identity
FROM table_A A LEFT JOIN table_B B
USING(column1,column2,column3,column4,column5) -- Case 1
WHERE [SOME condition OR SOME condition]
UNION ALL
SELECT *, 2 priority, FORMAT('%t', A) identity
FROM table_A A LEFT JOIN table_B B
USING(column1,column3,column4,column5) -- Case 2
WHERE [SOME condition OR SOME condition]
UNION ALL
SELECT *, 3 priority, FORMAT('%t', A) identity
FROM table_A A LEFT JOIN table_B B
USING(column1,column2,column4) -- Case 3
WHERE [SOME condition OR SOME condition]
UNION ALL
SELECT *, 4 priority, FORMAT('%t', A) identity
FROM table_A A LEFT JOIN table_B B
USING(column1,column3,column5) -- Case 4
WHERE [SOME condition OR SOME condition]
) t
GROUP BY identity
)
Option 1 - just pick all potential candidates in one query with on fly calculating which case the entry belong to and finally pick the winner for each row in A
#standardSQL
SELECT * EXCEPT(priority, identity)
FROM (
SELECT SELECT AS VALUE ARRAY_AGG(t ORDER BY priority LIMIT 1)[OFFSET(0)]
FROM (
SELECT A.*,
B.* EXCEPT(column1,column2,column3,column4,column5),
FORMAT('%t', A) identity
CASE
WHEN (A.column1,A.column2,A.column3,A.column4,A.column5) = (B.column1,B.column2,B.column3,B.column4,B.column5) THEN 1
WHEN (A.column1,A.column3,A.column4,A.column5) = (B.column1,B.column3,B.column4,B.column5) THEN 2
WHEN (A.column1,A.column2,A.column4) = (B.column1,B.column2,B.column4) THEN 3
WHEN (A.column1,A.column3,A.column5) = (B.column1,B.column3,B.column5) THEN 4
ELSE 5
END AS priority,
FROM table_A A LEFT JOIN table_B B
ON A.column1 = B.column1
OR A.column2 = B.column2
OR A.column3 = B.column3
OR A.column4 = B.column4
OR A.column5 = B.column5
WHERE [SOME condition OR SOME condition]
) t
WHERE priority < 5
GROUP BY identity
)
Note: above versions have similarity and different at the same time - it is matter of preferences to pick one vs another. Also wanted to note - above is not tested and just written on-fly so might need additional tunning - but most likely not :o)

Alternative for correlated update of Oracle

How can I re-write this query to sample update. Can someone decode me for this step by step. Is the whole table1 getting updated here/
UPDATE
(SELECT
A.COLUMN1 A_COLUMN1,
B.COLUMN2 B_COLUMN2
FROM TABLE1 A,TABLE2 B
WHERE A.COLUMN3=B.COLUMN3 AND A.COLUMN4=B.COLUMN4)
SET A_COLUMN1=B_COLUMN2;

Let me answer the question.
The subquery is using an inner join. Hence, the subquery will filter out rows that don't match the join conditions in the two tables. This happens before the update.
Hence, not all the rows get updated. If you want to update all rows, use a left join or:
UPDATE TABLE1
SET A_COLUMN1 = (SELECT B.B_COLUMN2
FROM TABLE2 B
WHERE A.COLUMN3 = B.COLUMN3 AND A.COLUMN4 = B.COLUMN4
);

This way it touches only the common elements
UPDATE TABLE1
SET A_COLUMN1 = (SELECT B.B_COLUMN2
FROM TABLE2 B
WHERE A.COLUMN3 = B.COLUMN3 AND A.COLUMN4 = B.COLUMN4
)
WHERE EXISTS (SELECT B.B_COLUMN2
FROM TABLE2 B
WHERE A.COLUMN3 = B.COLUMN3 AND A.COLUMN4 = B.COLUMN4);

build where clause to verify only once first clause is met

I have following sql select:
select ...
from table1 a, table2 b
where
a.column = 'ABC' and
a.column2 = b.column2
I would like to only check if a.column2 = b.column2 when a.column = 'ABC'.
How do I do that?
Thanks

I'm not sure from your question tag if you're trying to figure out how to do this with a JOIN specifically (as opposed to how you did it with the WHERE clause), but anyway -- a couple of ways:
1) --with WHERE clause
select ...
from
table1 a
INNER JOIN table2 b
ON a.column2 = b.column2
where
a.column = 'ABC'
2) --WITHOUT WHERE CLAUSE
select ...
from
table1 a
INNER JOIN table2 b
ON a.column2 = b.column2
AND a.column = 'ABC'

Try this. It will check column2 only when column is 'ABC':
select ...
from table1 a, table2 b
where
(a.column = 'ABC' and
a.column2 = b.column2) or a.column <> 'ABC'

Trouble with a conditional join

I'm trying to join depending on whether table1.column1 is null or not null.
For, example I have two tables, table1 and table2, and the query:
SELECT
A.column2
FROM table1 A , table2 B
WHERE if A.column1 is not null then (A.column1=B.column1)
else if A.column1 is null then (A.column3 = B.column1);

Try this:
SELECT
A.column2
FROM table1 A
JOIN table2 B ON
B.column1 = A.column1 OR
(A.column1 IS NULL AND B.column1 = A.column3)
Note that B.column1 = A.column1 will never be true if either B.column1 or A.column1 is NULL.

Try this...
SELECT A.column2
FROM table1 A
JOIN table2 B ON NVL(A.column1, A.column3) = b.Column1
If b.Column1 can also be null, and you want to compare two null columns, you can try this...
SELECT A.column2
FROM table1 A
JOIN table2 B ON NVL(b.Column1, 'X') = COALESCE(A.column1, A.column3, 'X')
Assuming b.Column1 can never have the value 'X'

Using a Oracle subselect to replace a CASE statement

Hy guys,
can anybody please help me with a subquery in Oracle database 10g? I need to extract the values for a column in the first table as value of another column in the second table.
I currently use this statement:
SELECT
CASE WHEN A.column1 = 'A' THEN 'aaa'
WHEN A.column1 = 'B' THEN 'bbb'
.......
WHEN A.column1 = 'X' THEN 'xxx'
ELSE 'bad' END AS COLUMN1, A.*
FROM TRANSACTION_TABLE A, CATEGORY_TABLE B
WHERE A.column1 IS NOT NULL
AND A.column1 <> ' '
This is not an elegant approach, so I'm trying to use a subselect from CATEGORY_TABLE B like the following:
SELECT A.column1, A.*
FROM TRANSACTION_TABLE A, CATEGORY_TABLE B
WHERE A.column1 IS NOT NULL
AND A.column1 = B.column_b_1
AND A.column1 <> ' '
AND A.column1 IN (SELECT B.column_b_1_descr FROM CATEGORY_TABLE B
WHERE B.FIELDNAME = 'column1' AND A.column1 = B.column_b_1)
So, I cannot get any results by using the subquery and don't want to continue using the CASE against many conditions, just want to replace the A.column1 values with the descriptive values from B.column_b_1_descr , as they're easier to read.
I would appreciate any feedback.
Thanks

Unless I'm misunderstanding your question...
CATEGORY_TABLE:
name | value
A aaa
B bbb
C ccc
...
SELECT B.value AS COLUMN1, A.\*
FROM TRANSACTION\_TABLE A, CATEGORY\_TABLE B
WHERE A.column1 = B.name
or
SELECT t2.value as COLUMN1, t1.\*
FROM TRANSACTION\_TABLE t1
INNER JOIN CATEGORY\_TABLE t2 ON t1.column1 = t2.name;
The where clause isn't needed, since an inner join automatically excludes rows with null values or no matches.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Anti Join based on multiple keys / columns (SQL) - sql

Using a LEFT JOIN between A and B and checking for a NULL row in B is probably easier: SELECT * FROM table_A A LEFT JOIN table_B B ON A.column1 = B.column1 AND A.column2 = B.column2 AND A.column3 = B.column3 AND A.column4 = B.column4 WHERE B.column1 IS NULL AND A.filter_condition = 0

Related

Join two table with Multiple cases in ON condition

Alternative for correlated update of Oracle

build where clause to verify only once first clause is met

Trouble with a conditional join

Using a Oracle subselect to replace a CASE statement

Categories

Resources