I have two tables A and B and I have to perform left join on that with multiple cases in on condition.
Is there any efficient way of doing this in big query or SQL.
select * from table_A A
left join table_B B
where
[some condition OR some condition]
on
case1
A.column1 =B.column1
and A.column2= B.column2
and A.column3= B.column3
and A.column4= B.column4
and A.column5= B.column5
OR case2
A.column1 =B.column1
and A.column3= B.column3
and A.column4= B.column4
and A.column5= B.column5
OR case3
A.column1 =B.column1
and A.column2= B.column2
and A.column4= B.column4
OR case4
A.column1 =B.column1
and A.column3= B.column3
and A.column5= B.column5
Here my main motive is that for one row if my case1 matches than it will not go into other cases. Likewise it will work if first is not matches then it will check second, then third and it will get best possible one match.
Here the cases will help that to get 100% of join between A and B table.
In first cases we are checking all 5 fields of both table, but if some of the field are null than it will check other case and likewise it should work.
If I understand correctly, the general approach in SQL is multiple left joins:
select a.*, coalesce(b1.col, b2.col, b3.col, b4.col) as col
from table_A A left join
table_B B1
on A.column1 = B1.column1 and
A.column2 = B1.column2 and
A.column3 = B1.column3 and
A.column4 = B1.column4 and
A.column5 = B1.column5 left join
table_b B2
on B1.column1 is null and
A.column1 = B2.column1 and
A.column3 = B2.column3 and
A.column4 = B2.column4 and
A.column5 = B2.column5 left join
table_b B3
on B2.column1 is null and
A.column1 = B3.column1 and
A.column2 = B3.column2 and
A.column3 = B3.column3 left join
table_b B4
on B3.column1 is null and
A.column2 = B4.column2 and
A.column4 = B4.column4
You want to get the "best" matching B rows. I.e. if there are rows matching case 1, you want to stick with these, but if there are none, then you want to try with case 2, etc.
What you can do is combine the conditions, so as to join all possible matches first. Then look at the matches and dismiss all except the best ones. Ranking can be done with RANK.
select *
from
(
select
*,
rank() over (partition by A.id
order by
case when A.column2 = B.column2
and A.column3 = B.column3
and A.column4 = B.column4
and A.column5 = B.column5 then 1
when A.column3 = B.column3
and A.column4 = B.column4
and A.column5 = B.column5 then 2
when A.column2 = B.column2
and A.column4 = B.column4 then 3
else 4
end) as rnk
from table_A A
left join table_B B
on A.column1 = B.column1
and
(
(A.column2 = B.column2 and A.column4 = B.column4)
or
(A.column3 = B.column3 and A.column5 = B.column5)
)
where [some condition OR some condition]
) ranked
where rnk = 1;
(My query assumes some ID in table_A. If your table doesn't have a unique ID, use whatever column(s) uniquely identify a row in the table.)
The solution can be to use a temporary data storage (temp table, cursors, or whatever) and use a parametrized loop to feed it. The problem that you have is that in pure SQL you don't have loops. You have to use the scripting languages of bigQuery, give a look here https://cloud.google.com/bigquery/docs/reference/standard-sql/scripting
Below two options I see - both for BigQuery Standard SQL (Thank you to #Thorsten-Kettner for helping in understanding OP's logic/requirements)
Option 1 - separate joins for each case; then combine all and finally pick the winner for each record in A
#standardSQL
SELECT * EXCEPT(priority, identity)
FROM (
SELECT AS VALUE ARRAY_AGG(t ORDER BY priority LIMIT 1)[OFFSET(0)]
FROM (
SELECT *, 1 priority, FORMAT('%t', A) identity
FROM table_A A LEFT JOIN table_B B
USING(column1,column2,column3,column4,column5) -- Case 1
WHERE [SOME condition OR SOME condition]
UNION ALL
SELECT *, 2 priority, FORMAT('%t', A) identity
FROM table_A A LEFT JOIN table_B B
USING(column1,column3,column4,column5) -- Case 2
WHERE [SOME condition OR SOME condition]
UNION ALL
SELECT *, 3 priority, FORMAT('%t', A) identity
FROM table_A A LEFT JOIN table_B B
USING(column1,column2,column4) -- Case 3
WHERE [SOME condition OR SOME condition]
UNION ALL
SELECT *, 4 priority, FORMAT('%t', A) identity
FROM table_A A LEFT JOIN table_B B
USING(column1,column3,column5) -- Case 4
WHERE [SOME condition OR SOME condition]
) t
GROUP BY identity
)
Option 1 - just pick all potential candidates in one query with on fly calculating which case the entry belong to and finally pick the winner for each row in A
#standardSQL
SELECT * EXCEPT(priority, identity)
FROM (
SELECT SELECT AS VALUE ARRAY_AGG(t ORDER BY priority LIMIT 1)[OFFSET(0)]
FROM (
SELECT A.*,
B.* EXCEPT(column1,column2,column3,column4,column5),
FORMAT('%t', A) identity
CASE
WHEN (A.column1,A.column2,A.column3,A.column4,A.column5) = (B.column1,B.column2,B.column3,B.column4,B.column5) THEN 1
WHEN (A.column1,A.column3,A.column4,A.column5) = (B.column1,B.column3,B.column4,B.column5) THEN 2
WHEN (A.column1,A.column2,A.column4) = (B.column1,B.column2,B.column4) THEN 3
WHEN (A.column1,A.column3,A.column5) = (B.column1,B.column3,B.column5) THEN 4
ELSE 5
END AS priority,
FROM table_A A LEFT JOIN table_B B
ON A.column1 = B.column1
OR A.column2 = B.column2
OR A.column3 = B.column3
OR A.column4 = B.column4
OR A.column5 = B.column5
WHERE [SOME condition OR SOME condition]
) t
WHERE priority < 5
GROUP BY identity
)
Note: above versions have similarity and different at the same time - it is matter of preferences to pick one vs another. Also wanted to note - above is not tested and just written on-fly so might need additional tunning - but most likely not :o)
I have a database with many tables and I would like to create two columns from the same data with different filtering for each. Specifically, I have the following SQL query:
select count(*), A.Column1
from Table1 as A
join Table2 as B
on A.Column2 = B.Column2
where B.Column3 in (
select C.Column3
from Table3 as C
where (C.Column4=9 or C.Column4=4))
group by A.Column1
This creates a table with 2 columns. I would like a 3rd column (another count(*)) which only differs in that there will be a 3rd qualifier in the where clause. I would also like to create a column which computes the ratio of these 2 count columns.
Can this be done in SQL or must I get the data into R or Python and do the calculations there?
If you want a ratio of the two columns, then divide:
select count(*), A.Column1,
avg(case when ? then 1.0 else 0 end) as ratio
from Table1 A join
Table2 B
on A.Column2 = B.Column2
where B.Column3 in (select C.Column3
from Table3 C
where (C.Column4 in (4, 9)
)
group by A.Column1;
The ? is for the condition that you care about.
Change the case when to match the criteria on your second filter.
select count(case when C.Column4 = 9 or C.Column4 = 4 then 1 else null end) as Count1
, count(case when C.Column4 = 9 or C.Column4 = 22 then 1 else null end) as Count2
, A.Column1
from Table1 as A
join Table2 as B
on A.Column2 = B.Column2
where B.Column3 in (
select C.Column3
from Table3 as C
group by A.Column1
The setting is simple, I wanted to retrieve all rows from table A that were not present in table B. Because a unique row can be identified using 4 columns, I needed to have a way to write the WHERE statement that it works correctly.
My solution is to concatenate the 4 columns and use that as "one" column/key to do the outer join:
select *
from table_A
where filter_condition = 0
and (column1 || column2 || column3 || column4) not in (
select A.column1 || A.column2 || A.column3 || A.column4
from table_A A -- 1618727
inner join table_B B
on A.column1 = B.column1
and A.column2 = B.column2
and A.column3 = B.column3
and A.column4 = B.column4
and filter_condition = 0
)
My question is, is this a good way of doing this or am I doing something fundamentally wrong?
To be clear, the desired result is simply to get back only the rows of table_A that I "lose" due to the INNER JOIN with table_A and table_B.
You seem to be looking for not exists:
select a.*
from table_a a
where a.filter_condition = 0
and not exists (
select 1
from table_b b
where
a.column1 = b.column1
and a.column2 = b.column2
and a.column3 = b.column3
and a.column4 = b.column4
)
This will give you all records in table_a that do not have a corresponding record in table_b.
Using a LEFT JOIN between A and B and checking for a NULL row in B is probably easier:
SELECT *
FROM table_A A
LEFT JOIN table_B B ON A.column1 = B.column1
AND A.column2 = B.column2
AND A.column3 = B.column3
AND A.column4 = B.column4
WHERE B.column1 IS NULL
AND A.filter_condition = 0
You should be able to use tuples (aka row constructors) in PostgreSQL:
select *
from table_a
where filter_condition = 0
and (column1, column2, column3, column4) not in
(
select column1, column2, column3, column4
from table_b
);
If the columns can be null, then better use NOT EXISTS, as null=null results in "unknown" rather than in true or false.
I am trying to make a condition where for a certain ID, when either of two values from two different tables are greater than a number, then I will display a row with both values. Otherwise, I don't want to display any new row. What is the correct syntax for this?
if(select
a.Column1 > 2 or
b.Column2 > 2
from
Table1 a join Table2 b on a.ID = b.ID)
begin
select
a.Column1,
b.Column2
from
Table1 a join Table2 b on a.ID = b.ID)
end
else
begin
Don't Select
end
You just need to add it as a where condition. If your where condition fails for a given row, that row wouldn't be selected.
select
a.Column1,
b.Column2
from
Table1 a join Table2 b on a.ID = b.ID
where a.column1 > 2 or b.column2 > 2
#vkp's answer is probably what you want, but the literal translation of the query you have written -- without using control-flow statements -- is this:
select
a.Column1,
b.Column2
from
Table1 a join Table2 b on a.ID = b.ID
where exists (select 1 from Table1 c join Table2 d on c.ID = d.ID where c.Column1 > 2 or d.Column2 > 2);
This will either return nothing at all if one of records in the join doesn't have Table1.Column1 > 2 or Table2.Column2 > 2, or it will return all records.
Update #A set Column1 = minC
from (select Ab.Column2, min(C.Column1) as minC
from #A Ab
inner join B on Ab.Column2 = B.Column2
inner join C on C.column2 = B.Column2 --No need to add again the A.col2 = B.col2
inner join D on D.column1 = B.column2
group by Ab.Column2) Grouped where Column2 = Grouped.Column2
and
Update #A set Column1 = minC
from (select Ab.Column2, min(C.Column1) as minC, B.column2 as tempcolumn
from #A Ab
inner join B on Ab.Column2 = B.Column2
inner join C on C.column2 = B.Column2 --No need to add again the A.col2=B.col2
group by Ab.Column2) Grouped
inner join D on D.column1 = Grouped.tempcolumn
where Column2 = Grouped.Column2
Are there any difference between the results of the 2 queries?