Multiple outer joins semantics - sql

Some SQL code:
SELECT *
FROM table1 tab1
LEFT OUTER JOIN table2 tab2 ON (tab1.fg = tab2.fg)
LEFT OUTER JOIN table4 tab4 ON (tab1.ss = tab4.ss)
INNER JOIN table3 tab3 ON (tab4.xya = tab3.xya)
LEFT OUTER JOIN table5 tab5 ON (tab4.kk = tab5.kk)
I know what different types of JOINs do, but what I'd like to know is: for each JOIN, which table assumes the role of the "LEFT" table? Will table1 always have the role of the "LEFT" table?

They are processed in top-to-bottom order, with the joins all associating to the "whole" of the prior FROM clause.
All things being equal:
tab1 is the mandatory partner for the OUTER JOIN with the optional partner tab2
the above is the mandatory partner for the OUTER JOIN with the optional partner tab4
the above and tab4 are both mandatory partners in the INNER JOIN
the above is the mandatory partner for the OUTER JOIN with the optional partner tab5
However, the problem with this query
SELECT *
FROM table1 tab1
LEFT OUTER JOIN table2 tab2 ON tab1.fg = tab2.fg
LEFT OUTER JOIN table4 tab4 ON tab1.ss = tab4.ss
INNER JOIN table3 tab3 ON tab4.xya = tab3.xya
LEFT OUTER JOIN table5 tab5 ON tab4.kk = tab5.kk
Is that the INNER JOIN with table3 uses a condition that REQUIRES tab4 to get involved, making it virtually a mandatory link to retain records from the left part, so in total tab1/tab4/tab3 have to successfully join, with tab2 and tab5 optional.

Related

Hive SQL multiple left outer join query is missing records in its results

I am trying to join 7 tables and insert the joined data into one big joined table, to do this I am using the query below
INSERT OVERWRITE TABLE databaseName.joinTab PARTITION (tran_date)
SELECT <180 cols across all 7 tables>
FROM databaseName.table1 tab1
LEFT OUTER JOIN databaseName.table2 tab2 ON (tab1.id = tab2.id and
tab1.tran_date='20171030' and tab2.tran_date='20171030')
LEFT OUTER JOIN databaseName.table3 tab3 ON (tab1.id = tab3.id and
tab1.tran_date='20171030' and tab3.tran_date='20171030')
LEFT OUTER JOIN databaseName.table4 tab4 ON (tab1.id = tab4.id and
tab1.tran_date='20171030' and tab4.tran_date='20171030')
LEFT OUTER JOIN databaseName.table5 tab5 ON (tab1.id = tab5.id and
tab1.tran_date='20171030' and tab5.tran_date='20171030')
LEFT OUTER JOIN databaseName.table6 tab6 ON (tab1.id = tab6.id and
tab1.tran_date='20171030' and tab6.tran_date='20171030')
LEFT OUTER JOIN databaseName.table7 tab7 ON (tab1.id = tab7.id and
tab1.tran_date='20171030' and tab7.tran_date='20171030')
WHERE (tab1.tran_date='20171030');
tran_date is the partition column for all of these tables, the reason that i have a where clause as well as the condition being in the ON statement is that i was finding that the tez job started would do a full table scan for table1 if i didnt.
So my issue here is if i do a count(*) from table1 on tran_date=20171030 then i get 11845917 as the result
If i do a count(*) from the new joined table(joinTab) for that same partition tran_date=20171030 I only get the result 97609 which is a very large difference, as i'm using left outer joins i had thought that it should move all the data from table1 into the join table and populate nulls for anything not in the other tables. I should mention tran_date in joinTab is derived from when table1 data is loaded
Is there anything here that doesn't look right?
Thanks for your help
Dan
I couldn't test if this solution works because you haven't provided a reproducible example, but you can try something like this:
WITH tab1_temp AS (SELECT <tab1 cols> WHERE tab1.tran_date='20171030'
)
INSERT OVERWRITE TABLE databaseName.joinTab PARTITION (tran_date)
SELECT <180 cols across all 7 tables>
FROM tab1_temp
LEFT OUTER JOIN databaseName.table2 tab2 ON (tab1.id = tab2.id and
tab1.tran_date='20171030' and tab2.tran_date='20171030')
LEFT OUTER JOIN databaseName.table3 tab3 ON (tab1.id = tab3.id and
tab1.tran_date='20171030' and tab3.tran_date='20171030')
LEFT OUTER JOIN databaseName.table4 tab4 ON (tab1.id = tab4.id and
tab1.tran_date='20171030' and tab4.tran_date='20171030')
LEFT OUTER JOIN databaseName.table5 tab5 ON (tab1.id = tab5.id and
tab1.tran_date='20171030' and tab5.tran_date='20171030')
LEFT OUTER JOIN databaseName.table6 tab6 ON (tab1.id = tab6.id and
tab1.tran_date='20171030' and tab6.tran_date='20171030')
LEFT OUTER JOIN databaseName.table7 tab7 ON (tab1.id = tab7.id and
tab1.tran_date='20171030' and tab7.tran_date='20171030')
;

Joining tbl1 to select statement twice with join to tbl2 that also joins to tbl3

I'm using SQL server manger.
I have 3 tables
I need a query that pulls t1 ands add an Origin Basin and a Destination Basin.
So far I have the following:
select T1.[Country (destination)], T3.AreaName
From T1
left outer join T2 on
T1.[Country (destination)] = T2.CountryName
inner join T3 on
T2.AreaID = T3.AreaID
inner join T3 on
T2.AreaID = T3.AreaID
Which returns:
Country | Area
However, I'm having trouble doing this for the second country column. I believe you use aliases. I've tried:
select (select AreaName
FROM T3
where T3.AreaID = T2.AreaID) as 'Area Imp',
(select AreaID
From T2
where T2.CountryName = T1.[Country (origin)]) as 'x',
(select AreaID
From T2
where T2.CountryName = T1.[Country (destination)]) as 'y'
FROM T1
But I can't get it to work.
This is what you need to do:
select t1.date, t1.country_destination, t1.country_origin, destination_area.AreaName as area_destination, origin_area.AreaName as area_origin
from t1 as t1 join t2 as destination on t1.country_destination = destination.countryname
join t2 as origin on t1.country_origin = origin.countryname
join t3 as destination_area on t2.areaid = destination_area.areaid
join t3 as origin_area on t2.areaid = origin_area.areaid
You will need to join with the same table twice, both for t2 and t3 so that you get the matching records for your needs.
It helps usually to put aliases that match the purpose of the join (in this case, destination and origin) when writing the query.
I think what you're trying to do is something like this:
select T1.*, T3dest.AreaName, T3orig.AreaName
From
T1
inner join
T2 T2dest on
T1.[Country (destination)] = T2dest.CountryName
inner join
T3 T3dest on
T2dest.AreaID = T3dest.AreaID
inner join
T2 T2orig on
T1.[Country (origin)] = T2orig.CountryName
inner join
T3 T3orig on
T2orig.AreaID = T3orig.AreaID
Note that I've switched to inner joins throughout, at the moment. If you do want left join semantics, you either need to use those for all of the joins to the T2 and T3 tables or you need to change the join order (so that the relevant T3 joins to the T2 tables occur before the attempted join with T1). It's not clear from the sample data if that's required, however.
Try this, You would still want to join on area id's
select T1.Date,T1.[Country (destination)], null [Country (origin)], T3.AreaName [AreaName(Destination)], null [AreaName(Origin)]
From T1
left outer join T2 on
T1.[Country (destination)] = T2.CountryName
inner join T3 on
T2.AreaID = T3.AreaID
union all
select T1.Date,null [Country (destination)], t1.[Country (origin)], Null [AreaName(Destination)], t3. [AreaName(Origin)]
From T1
left outer join T2 on
T1.[Country (Origin)] = T2.CountryName
inner join T3 on
T2.AreaID = T3.AreaID

Can we get result of left outer join using right outer join

Can we get the results of a left outer join using a right outer join?
Yes, you can do this. A right (outer) join is equivalent to a left (outer) join with the position of the tables switched.
Hence, the following query:
SELECT *
FROM table1 t1
LEFT JOIN table2 t2
ON t1.col = t2.col
is equivalent to
SELECT *
FROM table2 t2
RIGHT JOIN table1 t1
ON t1.col = t2.col

Stuck on multiple Left Join and Inner Join

I've to make a query on Oracle and i'm a little bit stuck with it. In my TABLE1, I've 287 reccords so I want all informations from TABLE2 AND TABLE3 that egal with my 287 reccords (that's why I use Left Join). But I also want all reccords that match between TABLE2 and TABLE4, TABLE4 AND TABLE5 (That's why I use Inner Join).
But my query don't work and I don't know why. Someone can help me ?
My query :
SELECT distinct(TABLE1.NUM_SIN),
TABLE1.LIBELLE,
TABLE1.DATE_FRAIS,
TABLE2.CODE_SIN,
TABLE2.PKPR,
TABLE1.MT,
TABLE4.POSTBUD,
TABLE3.VEENG
FROM TABLE1
LEFT JOIN TABLE2
ON TABLE2.NUM_SIN = TABLE1 .NUM_SIN
INNER JOIN TABLE4
ON TABLE4.NUM_SIN = TABLE2.NUM_SIN
AND TABLE4.SCSO = TABLE2.SCSO
LEFT JOIN TABLE5
ON TABLE5.CDC = TABLE4.NO
AND TABLE5.CDEXE = TABLE4.CDEXE
AND TABLE5.SCSO = TABLE4.SCSO
LEFT JOIN TABLE3
ON TABLE3.CNCT = TABLE1.NUM_SIN
WHERE ... ;
A graph to understand :
Thx in advice !
I think the issue here is perhaps that you really don't want to use an inner join in your query, and perhaps that you don't know exactly what the difference is between an inner join and an outer join.
The inner join in your query will return ONLY the rows from TABLE4 that are a match in TABLE2. Joins are sequential and cumulative, so your remaining LEFT joins will have the reduced rowset on the left side of the join.
Thus, I believe you will want to use LEFT joins throughout your query, e.g.:
SELECT distinct(TABLE1.NUM_SIN),
TABLE1.LIBELLE,
TABLE1.DATE_FRAIS,
TABLE2.CODE_SIN,
TABLE2.PKPR,
TABLE1.MT,
TABLE4.POSTBUD,
TABLE3.VEENG
FROM TABLE1
LEFT JOIN TABLE2
ON TABLE2.NUM_SIN = TABLE1 .NUM_SIN
LEFT JOIN TABLE4
ON TABLE4.NUM_SIN = TABLE2.NUM_SIN
AND TABLE4.SCSO = TABLE2.SCSO
LEFT JOIN TABLE5
ON TABLE5.CDC = TABLE4.NO
AND TABLE5.CDEXE = TABLE4.CDEXE
AND TABLE5.SCSO = TABLE4.SCSO
LEFT JOIN TABLE3
ON TABLE3.CNCT = TABLE1.NUM_SIN
WHERE ... ;
Are you sure you don't want to left join to table4? The way it is written only values in TABLE4 would be allowed which would limit the results from table2 and table5.
NB - the image CDN is filtered here so I can't see the image.
SELECT --
FROM TABLE1
LEFT JOIN TABLE2 ON TABLE2.NUM_SIN = TABLE1 .NUM_SIN
LEFT JOIN TABLE3 ON TABLE3.CNCT = TABLE1.NUM_SIN
-- unless you want to reduce the number of table2 rows use left join here.
LEFT JOIN TABLE4 ON TABLE4.NUM_SIN = TABLE2.NUM_SIN AND TABLE4.SCSO = TABLE2.SCSO
LEFT JOIN TABLE5 ON TABLE5.CDC = TABLE4.NO
AND TABLE5.CDEXE = TABLE4.CDEXE
AND TABLE5.SCSO = TABLE4.SCSO
WHERE ... ;

Rewrite left outer join involving multiple tables from Informix to Oracle

How do I write an Oracle query which is equivalent to the following Informix query?
select tab1.a,tab2.b,tab3.c,tab4.d
from table1 tab1,
table2 tab2 OUTER (table3 tab3,table4 tab4,table5 tab5)
where tab3.xya = tab4.xya
AND tab4.ss = tab1.ss
AND tab3.dd = tab5.dd
AND tab1.fg = tab2.fg
AND tab4.kk = tab5.kk
AND tab3.desc = "XYZ"
I tried:
select tab1.a,tab2.b,tab3.c,tab4.d
from table1 tab1,
table2 tab2 LEFT OUTER JOIN (table3 tab3,table4 tab4,table5 tab5)
where tab3.xya = tab4.xya
AND tab4.ss = tab1.ss
AND tab3.dd = tab5.dd
AND tab1.fg = tab2.fg
AND tab4.kk = tab5.kk
AND tab3.desc = "XYZ"
What is the correct syntax?
Write one table per join, like this:
select tab1.a,tab2.b,tab3.c,tab4.d
from
table1 tab1
inner join table2 tab2 on tab2.fg = tab1.fg
left join table3 tab3 on tab3.xxx = tab1.xxx and tab3.desc = "XYZ"
left join table4 tab4 on tab4.xya = tab3.xya and tab4.ss = tab3.ss
left join table5 tab5 on tab5.dd = tab3.dd and tab5.kk = tab4.kk
Note that while my query contains actual left join, your query apparently doesn't.
Since the conditions are in the where, your query should behave like inner joins. (Although I admit I don't know Informix, so maybe I'm wrong there).
The specfific Informix extension used in the question works a bit differently with regards to left joins. Apart from the exact syntax of the join itself, this is mainly in the fact that in Informix, you can specify a list of outer joined tables. These will be left outer joined, and the join conditions can be put in the where clause. Note that this is a specific extension to SQL. Informix also supports 'normal' left joins, but you can't combine the two in one query, it seems.
In Oracle this extension doesn't exist, and you can't put outer join conditions in the where clause, since the conditions will be executed regardless.
So look what happens when you move conditions to the where clause:
select tab1.a,tab2.b,tab3.c,tab4.d
from
table1 tab1
inner join table2 tab2 on tab2.fg = tab1.fg
left join table3 tab3 on tab3.xxx = tab1.xxx
left join table4 tab4 on tab4.xya = tab3.xya
left join table5 tab5 on tab5.dd = tab3.dd and tab5.kk = tab4.kk
where
tab3.desc = "XYZ" and
tab4.ss = tab3.ss
Now, only rows will be returned for which those two conditions are true. They cannot be true when no row is found, so if there is no matching row in table3 and/or table4, or if ss is null in either of the two, one of these conditions is going to return false, and no row is returned. This effectively changed your outer join to an inner join, and as such changes the behavior significantly.
PS: left join and left outer join are the same. It means that you optionally join the second table to the first (the left one). Rows are returned if there is only data in the 'left' part of the join. In Oracle you can also right [outer] join to make not the left, but the right table the leading table. And there is and even full [outer] join to return a row if there is data in either table.
I'm guessing that you want something like
SELECT tab1.a, tab2.b, tab3.c, tab4.d
FROM table1 tab1
JOIN table2 tab2 ON (tab1.fg = tab2.fg)
LEFT OUTER JOIN table4 tab4 ON (tab1.ss = tab4.ss)
LEFT OUTER JOIN table3 tab3 ON (tab4.xya = tab3.xya and tab3.desc = 'XYZ')
LEFT OUTER JOIN table5 tab5 on (tab4.kk = tab5.kk AND
tab3.dd = tab5.dd)