Inner Join not giving results in Hive - hive

I am trying to join 4 talbles. One is the main table and joinging 3 other tables with this main table. Please see data looks like as follows:
Expected output.
Here is the query I developed and its always returning 0 rows.
select COALESCE(TableB.Date, TableC.Date, TableD.Date),
COALESCE(count(key1),0),
COALESCE(count(key2),0),
COALESCE(count(key3),0)
FROM TableA A JOIN TableB B on A.Date = B.Date
JOIN TableC C on A.Date = C.Date
JOIN TableD D on A.Date = D.Date
Group by COALESCE(TableB.Date, TableC.Date, TableD.Date);
when I ran individual query on each table(see below query) with TableA, it returns the data but when I am joining with all 3 tables, its not retuning any data.
select TableB.Date, count(key1)
FROM TableA A JOIN TableB B on A.Date = B.Date
Group by TableB.Date;
I am not sure what's going wrong and could some one help to understand where is the issue in join query.
Thanks,
Babu

Do the joins in the subquery, then do grouping, e.g.,
with join_table as (select COALESCE(TableB.Date, TableC.Date, TableD.Date) as Date,
key1, key2, key3
FROM TableA A JOIN TableB B on A.Date = B.Date
JOIN TableC C on A.Date = C.Date
JOIN TableD D on A.Date = D.Date
)
select Date,
COALESCE(count(key1),0),
COALESCE(count(key2),0),
COALESCE(count(key3),0)
from join_table
group by Date
In fact, you don't need to coalesce the join keys in the inner join.

You can try this -
select A.Date,
COALESCE(count(key1),0),
COALESCE(count(key2),0),
COALESCE(count(key3),0)
FROM TableA A
LEFT JOIN TableB B on A.Date = B.Date
LEFT JOIN TableC C on A.Date = C.Date
LEFT JOIN TableD D on A.Date = D.Date
Group by A.Date;
I used left join instead of inner join because we need all data from Table A. Also first column should come from table A which is driver table.

I think you are looking for full outer join .
Here will be sample code
select COALESCE(a.Date, b.Date, c.Date,d.Date),
COALESCE(sum(key1),0),
COALESCE(sum(key2),0),
COALESCE(sum(key3),0)
(select distinct date as date from tableA ) as a
full outer join
(select date ,sum(key1) as key1 from tableB group by date) as b
on a.date=b.date
full outer join
(select date ,sum(key2) as key2 from tableB group by date) as c
on a.date=c.date
full outer join
(select date ,sum(key3) as key2 from tableB group by date) as d
on a.date=d.date
Group by COALESCE(a.Date, b.Date, c.Date,d.Date);`

Related

SQL query inner join and where on the second table

I have a oracle database and I'm trying to query data in table1 and inner join with another table2 where one of the columns(date) is equal to the most recent date and another column in table2(built) is equal to 'yes'. This query below is not picking up the where function and can't pinpoint why
SELECT id, b, c, d
FROM table1 a
INNER JOIN table2 b on b.id = a.id
WHERE b.date =(SELECT MAX(date) FROM table2) AND b.built = 'yes'
Actual query
SELECT m_tp_str, m_tp_trn, m_tp_dte, m_tp_buy, m_tp_qtyeq, m_tp_nom, m_instr,
m_tp_p, m_tp_status2
FROM HA_PRD_DM.TP_ALL_REP a INNER JOIN HA_PRD_DM.UDF_CURR_REP b
ON a.m_udf_ref2 = b.m_nb
WHERE b.m_rep_date2 = (SELECT MAX(c.m_rep_date2) FROM HA_PRD_DM.UDF_CURR_REP c)
AND b.m_purpose = 'yes'
You can do this using analytic functions:
SELECT id, b, c, d
FROM table1 a INNER JOIN
(SELECT b.*, MAX(date) OVER (PARTITION BY b.id) as max_date
FROM table2 b
WHERE built = 'yes'
) b
ON b.id = a.id AND b.max_date = b.date;

issue in sql join query formation

I have two tables Say A and B. A is master table and B is child table, from which I need values as below.
select A.Id, A.Name, B.Path from A,B where A.Id=B.Id
Now, I want to add column of 3rd table which is child of table 'B', say C i.e. C.File.
The value of C.File will be null if C.SubId=B.SubId is false else will return value when condition becomes true.
This is the exact definition of a left join:
SELECT a.id, b.name, b.path, c.file
FROM a
JOIN b ON a.id = b.id
LEFT JOIN c ON b.subid = c.subid
You need to LEFT JOIN your third table from what I can gather.
SELECT A.Id, A.Name, B.Path, C.file
FROM tableA a
INNER JOIN tableB b ON a.id = b.id
LEFT JOIN tableC c ON b.subid = c.subid
Simply Join all the three tables using INNER JOIN
select A.Id, A.Name, B.Path ,C.File
FROM A
INNER JOIN B
ON A.Id=B.Id
INNER JOIN C
ON C.SubId=B.SubId

SQL summations with multiple outer joins

I have tables a, b, c, and d whereby:
There are 0 or more b rows for each a row
There are 0 or more c rows for each a row
There are 0 or more d rows for each a row
If I try a query like the following:
SELECT a.id, SUM(b.debit), SUM(c.credit), SUM(d.other)
FROM a
LEFT JOIN b on a.id = b.a_id
LEFT JOIN c on a.id = c.a_id
LEFT JOIN d on a.id = d.a_id
GROUP BY a.id
I notice that I have created a cartesian product and therefore my sums are incorrect (much too large).
I see that there are other SO questions and answers, however I'm still not grasping how I can accomplish what I want to do in a single query. Is it possible in SQL to write a query which aggregates all of the following data:
SELECT a.id, SUM(b.debit)
FROM a
LEFT JOIN b on a.id = b.a_id
GROUP BY a.id
SELECT a.id, SUM(c.credit)
FROM a
LEFT JOIN c on a.id = c.a_id
GROUP BY a.id
SELECT a.id, SUM(d.other)
FROM a
LEFT JOIN d on a.id = d.a_id
GROUP BY a.id
in a single query?
Your analysis is correct. Unrelated JOIN create cartesian products.
You have to do the sums separately and then do a final addition. This is doable in one query and you have several options for that:
Sub-requests in your SELECT: SELECT a.id, (SELECT SUM(b.debit) FROM b WHERE b.a_id = a.id) + ...
CROSS APPLY with a similar query as the first bullet then SELECT a.id, b_sum + c_sum + d_sum
UNION ALL as you suggested with an outer SUM and GROUP BY on top of that.
LEFT JOIN to similar subqueries as above.
And probably more... The performance of the various solutions might be slightly different depending on how many rows in A you want to select.
SELECT a.ID, debit, credit, other
FROM a
LEFT JOIN (SELECT a_id, SUM(b.debit) as debit
FROM b
GROUP BY a_id) b ON a.ID = b.a_id
LEFT JOIN (SELECT a_id, SUM(b.credit) as credit
FROM c
GROUP BY a_id) c ON a.ID = c.a_id
LEFT JOIN (SELECT a_id, SUM(b.other) as other
FROM d
GROUP BY a_id) d ON a.ID = d.a_id
Can also be done with correlated subqueries:
SELECT a.id
, (SELECT SUM(debit) FROM b WHERE a.id = b.a_id)
, (SELECT SUM(credit) FROM c WHERE a.id = c.a_id)
, (SELECT SUM(other) FROM d WHERE a.id = d.a_id)
FROM a

I am trying to subquery with Left outer join and its giving me error. can some one take a look? Please

In this query other than 5 join tables i am trying to use 6th table "Days" to compare value with three tables in joins. but it give me error that i cant use subquery in joins.
select
a.ID, a.Name, a.AMT, b.Address, c.Date, c.Pay, d.Check
from
Table1 a
left outer join Table2 b on a.ID = b.ID
left outer join Table3 c on a.ID = c.ID and c.Date= (select Derived_date from Days where TODAY_DATE = TO_DATE(SYSDATE, 'YYYY/MM/DD'))
left outer join Table4 d on a.ID = d.ID and d.Date= (select Derived_date from Days where TODAY_DATE = TO_DATE(SYSDATE, 'YYYY/MM/DD'))
left outer join Table5 e on a.ID = e.ID and e.Date= (select Derived_date from Days where TODAY_DATE = TO_DATE(SYSDATE, 'YYYY/MM/DD'))
Trying to use a subselect in an ON clause isn't going to work to well. You'd need to JOIN back to it like you would any other table. Since your subselect is the same for every single JOIN, I'd put that in a (temp?) table first so you can JOIN to it normally and not have to SELECT the same data three times.
CREATE TABLE Derived_Dates AS SELECT Derived_date FROM Days WHERE TODAY_DATE = TO_DATE(SYSDATE, 'YYYY/MM/DD')
SELECT a.ID, a.Name, a.AMT, b.Address, c.Date, c.Pay, d.Check
FROM Table1 a LEFT OUTER JOIN Table2 b on a.ID = b.ID
LEFT OUTER JOIN Table3 c ON a.ID = c.ID
LEFT OUTER JOIN Table4 d ON a.ID = d.ID
LEFT OUTER JOIN Table5 e ON a.ID = e.ID
INNER JOIN Dervied_date dt ON c.Date = dt.Derived_date
AND d.Date = dt.Derived_date
AND e.Date = dt.Derived_date
Here's how you can do it with your subselect:
SELECT a.ID, a.Name, a.AMT, b.Address, c.Date, c.Pay, d.Check
FROM Table1 a LEFT OUTER JOIN Table2 b on a.ID = b.ID
LEFT OUTER JOIN Table3 c ON a.ID = c.ID
LEFT OUTER JOIN Table4 d ON a.ID = d.ID
LEFT OUTER JOIN Table5 e ON a.ID = e.ID
INNER JOIN (SELECT Derived_date FROM Days WHERE TODAY_DATE = TO_DATE(SYSDATE, 'YYYY/MM/DD')) dt ON c.Date = dt.Derived_date
AND d.Date = dt.Derived_date
AND e.Date = dt.Derived_date
Instead of JOINing back to your derived dates, you could also just use a WHERE clause. You have some options, and you might want to make some changes for your particular implementation, but this is more or less how I'd approach this.

Aliasing derived table which is a union of two selects

I can't get the syntax right for aliasing the derived table correctly:
SELECT * FROM
(SELECT a.*, b.*
FROM a INNER JOIN b ON a.B_id = b.B_id
WHERE a.flag IS NULL AND b.date < NOW()
UNION
SELECT a.*, b.*
FROM a INNER JOIN b ON a.B_id = b.B_id
INNER JOIN c ON a.C_id = c.C_id
WHERE a.flag IS NOT NULL AND c.date < NOW())
AS t1
ORDER BY RAND() LIMIT 1
I'm getting a Duplicate column name of B_id. Any suggestions?
The problem isn't the union, it's the select a.*, b.* in each of the inner select statements - since a and b both have B_id columns, that means you have two B_id cols in the result.
You can fix that by changing the selects to something like:
select a.*, b.col_1, b.col_2 -- repeat for columns of b you need
In general, I'd avoid using select table1.* in queries you're using from code (rather than just interactive queries). If someone adds a column to the table, various queries can suddenly stop working.
In your derived table, you are retrieving the column id that exists in table a and table b, so you need to choose one of them or give an alias to them:
SELECT * FROM
(SELECT a.*, b.[all columns except id]
FROM a INNER JOIN b ON a.B_id = b.B_id
WHERE a.flag IS NULL AND b.date < NOW()
UNION
SELECT a.*, b.[all columns except id]
FROM a INNER JOIN b ON a.B_id = b.B_id
INNER JOIN c ON a.C_id = c.C_id
WHERE a.flag IS NOT NULL AND c.date < NOW())
AS t1
ORDER BY RAND() LIMIT 1
First, you could use UNION ALL instead of UNION. The two subqueries will have no common rows because of the excluding condtion on a.flag.
Another way you could write it, is:
SELECT a.*, b.*
FROM a
INNER JOIN b
ON a.B_id = b.B_id
WHERE ( a.flag IS NULL
AND b.date < NOW()
)
OR
( a.flag IS NOT NULL
AND EXISTS
( SELECT *
FROM c
WHERE a.C_id = c.C_id
AND c.date < NOW()
)
)
ORDER BY RAND()
LIMIT 1