Ignore rows with NULL join columns in hive query - sql

I have three tables A, B and C. A is having 1 billion records, B is having 10 million records and C is having 5 million records.
My query is like
select *
from tableA a
left outer join tableB b on a.id=b.id
left outer join tableC c on b.id=c.id;
After first join i will be having more than 990 million NULL b.id columns. Now the second join on table C will require all 990 million NULL rows (b.Id) to be processed and this causes one reducer to be loaded for a very long time. Is there a way i can avoid rows with NULL join columns?

We have used rand() for NULL ; so our join condition will be
coalesce(b.id, rand()) = c.id
Thus null values got distributed by its own, but i am wondering why the skewjoin settings didnot help (we have tried coalesce(b.id, 'SomeString') = c.id with skewjoin enable )

Add b.id is not null condition to the ON clause. Depending on your Hive version this may help:
select *
from tableA a
left outer join tableB b on a.id=b.id
left outer join tableC c on b.id=c.id and b.id is not null;
But this is not a problem since 0.14 version as far as I know.
Also you can divide null rows and not null and join only not null rows.
In the first query only null rows selected. Add NULL as col for columns from C table. Then use UNION ALL + select all not null rows:
with a as(
select a.*, b.*
from tableA a
left outer join tableB b on a.id=b.id
)
select a.*, null as c_col1 --add all other columns(from c) as null to get same schema
from a where a.b_id_col is null
UNION ALL
select a.*, c.*
left outer join tableC c on a.b_id_col=c.id
from a where a.b_id_col is not null

Related

Query left join without all the right rows from B table

I have 2 tables, A and B.
I need all columns from A + 1 column from B in my select.
Unfortunately, B has multiples rows(all identicals) for 1 row in A
on the join condition.
I tried but I can't isolate one row in A for one row in B with left join for example while keeping my select.
How can I do this query ? Query in ORACLE SQL
Thanks in advance.
This is a good use for outer apply. The structure of the query looks like this:
select a.*, b.col
from a outer apply
(select top 1 b.col
from b
where b.? = a.?
) b;
Normally, you would only use top 1 with order by. In this case, it doesn't seem to make a difference which row you choose.
You can group by on all columns from A, and then use an aggregate (like max or min) to pick any of the identical B values:
select a.*
, b.min_col1
from TableA a
left join
(
select a_id
, min(col1) as min_col1
from TableB
group by
a_id
) b
on b.a_id = a.id

SQL join on three tables, lines that exist in 2 tables but not the third

Please I need your help.
Suppose that we have 3 tables A, B and C as shown in the image below:
I want to get lines in the table A that exist or not exist in table B, and lines in table C that exist or not exist in table B, using one sql request.
I have tried this but doesn't work :
SELECT A.ATS0804, C.ATS0207, A.ATS0959, A.ATS0802, B.ATS0827
FROM
ISUT183.ENS0042 B
RIGHT JOIN ISUT183.ENS0038 A
ON B.ENS0038K = A.ATS0804
RIGHT JOIN ISUT183.EN00041 C
ON B.EN00041K = C.AT02812
WHERE ( C.ATS0207 = '0001757430'
AND B.ATS0823 = '9999-01-01'
AND A.ATS0803 = '9999-01-01'
AND A.ATS0959 = '61384352001'
AND A.ATS0802 ='01.01.2010'
) ;
you can do a cross join too:
with AB as (
select * from A left outer join B on A.ID1=B.ID1
),
AC as (
select * from C left outer join B on C.ID2=B.ID2
)
select * from AB CROSS JOIN AC
use where exists and where not exists clauses
If you test equality into table B in where clause, the left outer join or right outer join dont take null
You dont have join between A and C, then you can do a UNION ALL
but you must take columns of same type in selects clause (ID1 same type of ID2)
select * from (
select 'A-B' typejoin, A.ID1 as IDA_OR_C, B.ID1 as IDB from A left outer join B on A.ID1=B.ID1
union all
select 'A-C' typejoin, C.ID2 as IDA_OR_C, B.ID2 as IDB from C left outer join B on C.ID2=B.ID2
) tmp
where ....

SQL - not sure how to join tables

I'm trying to join two tables like this:
Table A
ID Value1
1 A
2 B
3 C
Table B
ID Value2
1 A
3 B
4 C
Result should be:
ID Value1 Value2
1 A A
2 B null
3 C B
4 null C
I.e. join Table A to Table B on ID. If ID doesn't exist in Table A, add the ID from Table B.
The closest I've come is:
SELECT
a.ID, a.Value1, b.Value2
FROM
TableA a
OUTER JOIN
TableB b ON a.ID = b.ID
That gives me the new rows from TableB, but the ID is null.
How can I accomplish this?
You are very close, you just need a little push in the right direction:
SELECT COALESCE(a.ID, B.ID) As ID, a.Value1, b.Value2
FROM TableA a
FULL OUTER JOIN TableB b ON a.ID=b.ID
The COALESCE function returns the first parameter it gets that is not null. since this is a full outer join, a.id will be null on one row and b.id would be null on a different row.
Try this:
SELECT *
FROM TableA A
FULL OUTER JOIN TableB B
ON A.ID = B.ID;
Just a note: you should not name your tables in SQL with spaces in them.
Remember the basic for joining different tables
SELECT column_name(s)
FROM table1
FULL OUTER JOIN table2
ON table1.column_name=table2.column_name;
For your case:
SELECT a.value1, b.value2
FROM TableA a
FULL OUTER JOIN TableB b ON a.ID=b.ID
remember full outer join
The FULL OUTER JOIN keyword returns all rows from the table (tableA) and from the table (tableB) and the FULL OUTER JOIN keyword combines the result of both LEFT and RIGHT joins.

Select using LEFT OUTER JOIN with condition

I have two tables Table A and Table B
Table A
1. *id*
2. *name*
Table B
1. *A.id*
2. *datetime*
I want to select
1. *A.id*
2. *A.name*
3. *B.datetime*
Even if table B do not contains a row with A.id for specific day and it should replace that column with NULL
e.g
Table A contains
1. *(1 , Haris)*
2. *(2, Hashsim)*
Table B Contains following for today's date.
1. *(1, '2014-12-26 08:00:00')*
I should show 2 results with id 1 and 2 instead of only id 1.
Using LEFT OUTER JOIN with WHERE Clause makes it a LEFT INNER JOIN, how to work around that ?
SELECT A.id, A.name, b.datetime
FROM A
LEFT Outer JOIN B on B.id = A.id
Use LEFT OUTER JOIN to get all the rows from Left table and one that does not have match will have NULL values in Right table columns
SELECT A.id,
A.name,
B.[datetime]
FROM tableA A
LEFT OUTER JOIN tableB B
ON A.Id = B.id
AND B.[datetime] < #date
SELECT a.id, a.name, b.datetime
FROM A
LEFT JOIN B on B.aid = a.id
WHERE coalesce(B.datetie, '1900-01-01') < #MyDateTime
Select A.id,A.name,B.datetime
from tableA A
Left join
(
SELECT B.ID,B.datetime
FROM tableB B
WHERE B.datetime <= 'myDateTime'
)B
ON A.aid = B.id

Bidirectional outer join

Suppose we have a table A:
itemid mark
1 5
2 3
and table B:
itemid mark
1 3
3 5
I want to join A*B on A.itemid=B.itemid both right and left ways. i.e. result:
itemid A.mark B.mark
1 5 3
2 3 NULL
3 NULL 5
Is there a way to do it in one query in MySQL?
It's called a full outer join and it's not supported natively in MySQL, judging from its docs. You can work around this limitation using UNION as described in the comments to the page I linked to.
[edit] Since others posted snippets, here you go. You can see explanation on the linked page.
SELECT *
FROM A LEFT JOIN B ON A.id = B.id
UNION ALL
SELECT *
FROM A RIGHT JOIN B ON A.id = B.id
WHERE A.id IS NULL
Could do with some work but here is some sql
select distinct T.itemid, A.mark as "A.mark", B.mark as "B.mark"
from (select * from A union select * from B) T
left join A on T.itemid = A.itemid
left join B on T.itemid = B.itemid;
This relies on the left join, which returns all the rows in the original table (in this case this is the subselect table T). If there are no matches in the joined table, then it will set the column to NULL.
This works for me on SQL Server:
select isnull(a.id, b.id), a.mark, b.mark
from a
full outer join b on b.id = a.id