How to join two tables to get the following result? - sql

I'd like to join two tables.
TABLE_A
GROUP0 GROUP1 SUM_A
---------------------------
01 A 100
01 B 200
04 D 700
TABLE_B
GROUP0 GROUP1 SUM_B
---------------------------
01 300
01 A 350
02 B 400
03 C 500
How to join the tables to get the following result?
GROUP0 GROUP1 SUM_A SUM_B
------------------------------------------------
01 0 300
01 A 100 350
01 B 200 0
02 B 0 400
03 C 0 500
04 D 700 0

You want everything in the second table and then matching rows or new group0 in the first table.
I think this is the join logic:
select coalesce(t1.group0, t2.group0) as group0,
coalesce(t1.group1, t2.group1) as group1,
t1.sum_a, t2.sum_b
from table1 t1 full outer join
table2 t2
on t1.group0 = t2.group0
where (t2.group0 is not null and (t1.group1 = t2.group1 or t1.group0 is null)) or
t2.group0 is null;
This logic is easier with union all:
select t2.group0, t2.group1, t1.sum_a, t2.sum_b
from table2 t2 left join
table1 t1
on t2.group0 = t1.group0 and t2.group1 = t1.group1
union all
select t1.group1, t1.group1, t1.suma, 0
from table1
where not exists (select 1 from table2 t2 where t2.group0 = t1.group0);
EDIT:
The modified question is quite different from the original. That is a simple full outer join:
select coalesce(t1.group0, t2.group0) as group0,
coalesce(t1.group1, t2.group1) as group1,
coalesce(t1.sum_a, 0) as sum_a, coalesce(t2.sum_b, 0) as sum_b
from table1 t1 full outer join
table2 t2
on t1.group0 = t2.group0 and t1.group1 = t2.group1;

Related

In Bigquery: How to pick max(date) row while performing full outer join in case of duplicates?

I'm performing full outer join to combine two tables in bigquery in order to get all rows and columns from both the tables.
select distinct t1.Org,t1.begindate,t1.enddate,<fetch unit based on enddate> as f_Unit
from table1 t1
full outer join table2 t2
on t1.Org = t2.Org
Now the problem here is, both the tables have some rows with same value for all columns except enddate and Unit column
table1
Org Store Product begindate enddate FalUnit
01 12 xx 2020-04-16 9999-12-31 5
01 13 yy 2011-03-23 null 0
table2
Org Store Product begindate enddate Unit
01 12 xx null null 1
01 14 zz null null 3
in that case have to pick up the max(enddate) and it's respective Unit as well.
Output_Table
Org Store Product begindate enddate FalUnit Unit f_Unit
01 12 xx 2020-04-16 9999-12-31 5 null 5
01 13 yy 2011-03-23 null 0 null 0
01 14 zz null null null 3 3
How to include this condition to this query or any other approach possible other than joins ?
Any help will be appreciated to solve this issue.
Hmmm . . . I am thinking a prioritization. Something like this:
select t1.*
from table1 t1
union all
select t2.*
from table2 t2
where not exists (select 1
from table1 t1
where t1.org = t2.org and t1.store = t2.store and t1.product = t2.product
);
At the very least, this will return your specified results for the specified data in the question.

Multiple left outer joins on Hive

In Hive, I have two tables as shown below:
SELECT * FROM p_test;
OK
p_test.id p_test.age
01 1
02 2
01 10
02 11
Time taken: 0.07 seconds, Fetched: 4 row(s)
SELECT * FROM p_test2;
OK
p_test2.id p_test2.height
02 172
01 170
Time taken: 0.053 seconds, Fetched: 2 row(s)
I'm supposed to get the age differences between the same user in the p_test table. Hence, I run HiveQL via row_number function as following:
SELECT *
FROM
(SELECT *, ROW_NUMBER() OVER(partition by id order by age asc) rn FROM p_test) t1
LEFT JOIN
(SELECT *, ROW_NUMBER() OVER(partition by id order by age asc) rn FROM p_test) t2
ON t2.id=t1.id AND t1.rn=(t2.rn+1)
LEFT JOIN
(SELECT * FROM p_test2) t_2
ON t_2.id = t1.id;
The result of it is :
t1.id t1.age t1.rn t2.id t2.age t2.rn t_2.id t_2.height
01 1 1 NULL NULL NULL 01 170
01 10 2 01 1 1 01 170
02 11 1 NULL NULL NULL 02 172
02 2 2 02 11 1 02 172
Time taken: 60.773 seconds, Fetched: 4 row(s)
It is all ok so far. However, If I move the condition which left joins table t1 and table t2 shown above to the last line as shown below:
SELECT *
FROM
(SELECT *, ROW_NUMBER() OVER(partition by id order by age asc) rn FROM p_test) t1
LEFT JOIN
(SELECT *, ROW_NUMBER() OVER(partition by id order by age asc) rn FROM p_test) t2
LEFT JOIN
(SELECT * FROM p_test2) t_2
ON t_2.id = t1.id
AND t2.id=t1.id AND t1.rn=(t2.rn+1);
I got the unexpected result as following:
t1.id t1.age t1.rn t2.id t2.age t2.rn t_2.id t_2.height
01 1 1 01 1 1 NULL NULL
01 1 1 01 10 2 NULL NULL
01 1 1 02 11 1 NULL NULL
01 1 1 02 2 2 NULL NULL
01 10 2 01 1 1 01 170
01 10 2 01 10 2 NULL NULL
01 10 2 02 11 1 NULL NULL
01 10 2 02 2 2 NULL NULL
02 11 1 01 1 1 NULL NULL
02 11 1 01 10 2 NULL NULL
02 11 1 02 11 1 NULL NULL
02 11 1 02 2 2 NULL NULL
02 2 2 01 1 1 NULL NULL
02 2 2 01 10 2 NULL NULL
02 2 2 02 11 1 02 172
02 2 2 02 2 2 NULL NULL
It seems that the condition which I move to the last line doesn't work anymore. It bothers me for a long time. Do hope I can hear any valuable answers, thx for anyone who provides me with answers in advance.
In your second query LEFT JOIN with t2 without ON condition is transformed to CROSS JOIN. This is why you have duplication. For id=01 you have two rows in subquery t1 and 2 rows in t2 initially, this CROSS join gives you 2x2=4 rows.
And the ON condition works, but it is applied only to the last LEFT join with t_2 subquery, this condition is being checked only to determine which rows to join in the last join, not all joins, it does not affect first CROSS JOIN (LEFT JOIN without ON condition) at all.
Every join should have it's own ON condition, except cross joins.
See also this answer about joins without ON condition behavior: https://stackoverflow.com/a/46843832/2700344
BTW you can do the same without t2 join at all using lag or lead analytic functions for calculating values ordered by age.
Like this:
lag(height) over(partition by id order by age) -- to get previous height

If value in both table then assign 1

Table1 CompanyID Location #-of-employees
5234 NY 10
5268 DC 2
5879 NY 8
6897 KS 100
8789 CA 1
9992 OH 201
9877 TX 15
Table2 CompanyID #-of-Shareholders
5234 5
5879 2
6897 4
8789 2
I have two table with the column CompanyID. In table2 you can find companies that have shareholders and in table1 you can find all the companies. So in table 1 I want to add a dummy variable that assign a 1 if the companyID is in table2(which means the company has shareholders) and a 0 if not.
Expected output:
Table1 CompanyID Location #-of-employees Dummy
5234 NY 10 1
5268 DC 2 0
5879 NY 8 1
6897 KS 100 1
8789 CA 1 1
9992 OH 201 0
9877 TX 15 0
I tried using this query but it doesn't give me the output I expect.
SELECT CASE WHEN companyID IN table2 THEN 1
ELSE 0
END AS dummy
FROM table1
You have to use the Subquery for this. the below code working fine.
SELECT CASE WHEN companyID in(select CompanyId from table2) THEN 1
ELSE 0
END AS dummy
FROM table1
You can use EXISTS
SELECT CASE
WHEN EXISTS(SELECT 1 FROM Table2 AS T2 WHERE T1.CompanyID = T2.CompanyID) THEN 1
ELSE 0
END AS Dummy
FROM Table1 AS T1;
If your DB's version is 2012+ then, use with left join as :
select t1.*, iif(#_of_Shareholders is null, 0, 1) as dummy
from table1 t1
left join table2 t2
on ( t1.CompanyID = t2.CompanyID );
else
select t1.*,
( case when #_of_Shareholders is null then 0 else 1 end )
as dummy
from table1 t1
left join table2 t2
on ( t1.CompanyID = t2.CompanyID );
or
select t1.*,
sign(coalesce(#_of_Shareholders,0))
as dummy
from table1 t1
left join table2 t2
on ( t1.CompanyID = t2.CompanyID );
Rextester Demo

Select Missing date only if it is Maximum from a table in Oracle

I have 2 tables. If table 1 has dates greater than table 2 only those record has should be populated in Output.
Table 1:
ID Category Date
1 A 3/2/1990
1 A 3/5/2013
1 C 4/3/1979
2 D 4/3/1970
2 D 5/6/2016
3 E 8/8/2016
Table 2:
ID Category Date
1 A 3/2/1990
1 C 4/3/1979
1 C 4/3/1982
1 D 4/3/1982
2 D 5/6/2016
The expected Output is
ID Category Date
1 A 3/5/2013
3 E 8/8/2016
I tried the below query and its giving me incorrect results.
select a.id,a.category,a,Date from table1 a where
a.Date > (select Max(b.Date) from table2 b where a.id=b.id and a.category =b.catgory group by b.id,b.category)
SQL Fiddle Demo
WITH cte AS (
SELECT ID, Category, MAX(Date) as mdate
FROM Table2
GROUP BY ID, Category
)
SELECT T1.* --, T2.*
FROM Table1 as T1
LEFT JOIN cte as T2
ON T1.ID = T2.ID
AND T1.Category = T2.Category
WHERE T1.Date > T2.mdate
OR T2.mdate is NULL
OUTPUT
SELECT T1.*
FROM Table1 AS T1 INNER JOIN Table2 AS T2
ON T1.ID = T2.ID
WHERE T1.Date > T2.mdate;
As per the required output, you need to use left outer join
SELECT T1.*
FROM table1 T1
LEFT OUTER JOIN (
SELECT ID
,category
,MAX(Date) mdate
FROM Table2
GROUP BY ID
,category
) T2 ON (
T1.ID = T2.ID
AND T1.category = T2.category
)
WHERE T1.date > nvl(T2.mdate, '01/01/1900');
Filtering Table2:
SELECT ID, Category,MAX(Date) as Date
FROM Table2
GROUP BY ID,Category;
| ID | Category | Date |
|----|----------|-------------------------|
| 1 | A | March, 02 1990 00:00:00 |
| 1 | C | April, 03 1982 00:00:00 |
| 1 | D | April, 03 1982 00:00:00 |
| 2 | D | May, 06 2016 00:00:00 |
Now using this to create a left join with Table1:
SELECT t1.*
FROM Table1 t1 LEFT JOIN
(SELECT ID, Category,MAX(Date) as Date
FROM Table2
GROUP BY ID,Category) AS t2part
ON t1.ID = t2part.ID
AND t1.Category = t2part.Category
WHERE t1.Date > t2part.Date;
| ID | Category | Date |
|----|----------|-------------------------|
| 1 | A | March, 05 2013 00:00:00 |
Please note that the row with ID=3, category=E wasn't found due to not matching neither ID or Category in the JOIN.
As good practice if the entities should interact there must be some sort of normalization applied so we could make best use of joins through indexes.
fiddle with your provided data and queries.

Join tables with different data in rows

I have two tables like this
t1
id value1
BMC 16
EC 22
LLU 60
MC 274
UHC 54
UHS 28
t2
id value2
BMC 5
e900 4
EC 7
LLU 2
MC 1
How could I get this out put using sql server? I have used full outer join also. But its not gives me correct results
BMC 16 5
EC 22 7
LLU 60 2
MC 274 1
UHC 54
UHS 28
e900 4
Here is my outer join, Its for two select statements. Not for tables. But those select statements gives above results (t1, t2)
SELECT * FROM
(
SELECT b.EntityCode, COUNT('a') AS GroupCountUser1 FROM #TempUser a INNER JOIN OP_TB_TRN_Entity b
ON a.Entity=b.EntityID
GROUP BY b.EntityCode
) t1
FULL OUTER JOIN
(SELECT b.EntityCode, COUNT('a') AS GroupCountUser2 FROM #TempUser1 a INNER JOIN OP_TB_TRN_Entity b
ON a.Entity=b.EntityID
GROUP BY b.EntityCode) t2
ON t1.EntityCode = t2.EntityCode
Guessing you are forgetting to coalesce the IDs, try
Select coalesce( A.Id, B.Id) id,
A.Value1, B.Value2
From A Full Join B On A.Id = B.Id
Select concat( t1.value1, t2.value2) as totalvalue
From t1 join t2 where t1.Id = t2.Id
If i understand what you're asking, this should help.