Rewrite left outer join involving multiple tables from Informix to Oracle - sql

How do I write an Oracle query which is equivalent to the following Informix query?
select tab1.a,tab2.b,tab3.c,tab4.d
from table1 tab1,
table2 tab2 OUTER (table3 tab3,table4 tab4,table5 tab5)
where tab3.xya = tab4.xya
AND tab4.ss = tab1.ss
AND tab3.dd = tab5.dd
AND tab1.fg = tab2.fg
AND tab4.kk = tab5.kk
AND tab3.desc = "XYZ"
I tried:
select tab1.a,tab2.b,tab3.c,tab4.d
from table1 tab1,
table2 tab2 LEFT OUTER JOIN (table3 tab3,table4 tab4,table5 tab5)
where tab3.xya = tab4.xya
AND tab4.ss = tab1.ss
AND tab3.dd = tab5.dd
AND tab1.fg = tab2.fg
AND tab4.kk = tab5.kk
AND tab3.desc = "XYZ"
What is the correct syntax?

Write one table per join, like this:
select tab1.a,tab2.b,tab3.c,tab4.d
from
table1 tab1
inner join table2 tab2 on tab2.fg = tab1.fg
left join table3 tab3 on tab3.xxx = tab1.xxx and tab3.desc = "XYZ"
left join table4 tab4 on tab4.xya = tab3.xya and tab4.ss = tab3.ss
left join table5 tab5 on tab5.dd = tab3.dd and tab5.kk = tab4.kk
Note that while my query contains actual left join, your query apparently doesn't.
Since the conditions are in the where, your query should behave like inner joins. (Although I admit I don't know Informix, so maybe I'm wrong there).
The specfific Informix extension used in the question works a bit differently with regards to left joins. Apart from the exact syntax of the join itself, this is mainly in the fact that in Informix, you can specify a list of outer joined tables. These will be left outer joined, and the join conditions can be put in the where clause. Note that this is a specific extension to SQL. Informix also supports 'normal' left joins, but you can't combine the two in one query, it seems.
In Oracle this extension doesn't exist, and you can't put outer join conditions in the where clause, since the conditions will be executed regardless.
So look what happens when you move conditions to the where clause:
select tab1.a,tab2.b,tab3.c,tab4.d
from
table1 tab1
inner join table2 tab2 on tab2.fg = tab1.fg
left join table3 tab3 on tab3.xxx = tab1.xxx
left join table4 tab4 on tab4.xya = tab3.xya
left join table5 tab5 on tab5.dd = tab3.dd and tab5.kk = tab4.kk
where
tab3.desc = "XYZ" and
tab4.ss = tab3.ss
Now, only rows will be returned for which those two conditions are true. They cannot be true when no row is found, so if there is no matching row in table3 and/or table4, or if ss is null in either of the two, one of these conditions is going to return false, and no row is returned. This effectively changed your outer join to an inner join, and as such changes the behavior significantly.
PS: left join and left outer join are the same. It means that you optionally join the second table to the first (the left one). Rows are returned if there is only data in the 'left' part of the join. In Oracle you can also right [outer] join to make not the left, but the right table the leading table. And there is and even full [outer] join to return a row if there is data in either table.

I'm guessing that you want something like
SELECT tab1.a, tab2.b, tab3.c, tab4.d
FROM table1 tab1
JOIN table2 tab2 ON (tab1.fg = tab2.fg)
LEFT OUTER JOIN table4 tab4 ON (tab1.ss = tab4.ss)
LEFT OUTER JOIN table3 tab3 ON (tab4.xya = tab3.xya and tab3.desc = 'XYZ')
LEFT OUTER JOIN table5 tab5 on (tab4.kk = tab5.kk AND
tab3.dd = tab5.dd)

Related

Combining multiple tables with uncommon columns

How can i get my expected output. I join first two tables by Left Outer Join, but as soon as i join the third one my expected result goes away.
You want left joins:
select
t1.*,
t2.runno,
t3.blockno,
t4.quantity
from t1
left join t2
on t2.ordercode = t1.ordercode and t2.orderitem = t1.orderitem
left join t3
on t3.runno = t2.runno
left join t4
on t4.blockno = t3.blockno

Hive SQL multiple left outer join query is missing records in its results

I am trying to join 7 tables and insert the joined data into one big joined table, to do this I am using the query below
INSERT OVERWRITE TABLE databaseName.joinTab PARTITION (tran_date)
SELECT <180 cols across all 7 tables>
FROM databaseName.table1 tab1
LEFT OUTER JOIN databaseName.table2 tab2 ON (tab1.id = tab2.id and
tab1.tran_date='20171030' and tab2.tran_date='20171030')
LEFT OUTER JOIN databaseName.table3 tab3 ON (tab1.id = tab3.id and
tab1.tran_date='20171030' and tab3.tran_date='20171030')
LEFT OUTER JOIN databaseName.table4 tab4 ON (tab1.id = tab4.id and
tab1.tran_date='20171030' and tab4.tran_date='20171030')
LEFT OUTER JOIN databaseName.table5 tab5 ON (tab1.id = tab5.id and
tab1.tran_date='20171030' and tab5.tran_date='20171030')
LEFT OUTER JOIN databaseName.table6 tab6 ON (tab1.id = tab6.id and
tab1.tran_date='20171030' and tab6.tran_date='20171030')
LEFT OUTER JOIN databaseName.table7 tab7 ON (tab1.id = tab7.id and
tab1.tran_date='20171030' and tab7.tran_date='20171030')
WHERE (tab1.tran_date='20171030');
tran_date is the partition column for all of these tables, the reason that i have a where clause as well as the condition being in the ON statement is that i was finding that the tez job started would do a full table scan for table1 if i didnt.
So my issue here is if i do a count(*) from table1 on tran_date=20171030 then i get 11845917 as the result
If i do a count(*) from the new joined table(joinTab) for that same partition tran_date=20171030 I only get the result 97609 which is a very large difference, as i'm using left outer joins i had thought that it should move all the data from table1 into the join table and populate nulls for anything not in the other tables. I should mention tran_date in joinTab is derived from when table1 data is loaded
Is there anything here that doesn't look right?
Thanks for your help
Dan
I couldn't test if this solution works because you haven't provided a reproducible example, but you can try something like this:
WITH tab1_temp AS (SELECT <tab1 cols> WHERE tab1.tran_date='20171030'
)
INSERT OVERWRITE TABLE databaseName.joinTab PARTITION (tran_date)
SELECT <180 cols across all 7 tables>
FROM tab1_temp
LEFT OUTER JOIN databaseName.table2 tab2 ON (tab1.id = tab2.id and
tab1.tran_date='20171030' and tab2.tran_date='20171030')
LEFT OUTER JOIN databaseName.table3 tab3 ON (tab1.id = tab3.id and
tab1.tran_date='20171030' and tab3.tran_date='20171030')
LEFT OUTER JOIN databaseName.table4 tab4 ON (tab1.id = tab4.id and
tab1.tran_date='20171030' and tab4.tran_date='20171030')
LEFT OUTER JOIN databaseName.table5 tab5 ON (tab1.id = tab5.id and
tab1.tran_date='20171030' and tab5.tran_date='20171030')
LEFT OUTER JOIN databaseName.table6 tab6 ON (tab1.id = tab6.id and
tab1.tran_date='20171030' and tab6.tran_date='20171030')
LEFT OUTER JOIN databaseName.table7 tab7 ON (tab1.id = tab7.id and
tab1.tran_date='20171030' and tab7.tran_date='20171030')
;

Why does AND statement placement affect record count

OK so my question is: There a difference in record count depending on where the AND statement is positioned within my query. For instance, if I have several INNER JOINS and a couple LEFT Joins and at the very end of my query if I place all my AND statements the record count is different then if I place my AND statement right below the matching JOIN table. Since the AND statement specifies the table, why would it matter where its placed within the query?
Example 1:
FROM table (nolock)
INNER JOIN table2 (nolock) ON Table.ID = table2.ID
INNER JOIN table3 (nolock) ON table2.ID = table3.ID
LEFT JOIN table4 (nolock) ON table3.ID = table4.ID
where table.vendor = 1234
AND table.Active = 1
And table2.Active = 1
And table3.Active = 1
and table4.Active = 1
and table3.Name LIKE 'someName'
Example 2:
FROM table (nolock)
INNER JOIN table2 (nolock) ON Table.ID = table2.ID
And table2.Active = 1
INNER JOIN table3 (nolock) ON table2.ID = table3.ID
And table3.Active = 1
and table3.Name LIKE 'someName'
LEFT JOIN table4 (nolock) ON table3.ID = table4.ID
where table.vendor = 1234
and table4.Active = 1
AND table.Active = 1
When using INNER JOIN it makes no difference whether the criteria accompanies the JOIN or if it's in the WHERE clause, however with LEFT JOIN adding join criteria doesn't filter out non-joining records, but that criteria in the WHERE clause will exclude non-joining records. In your case WHERE table4.Active = 1 excludes non-joining records from table4, but moving that criteria to the JOIN will not exclude those records.
Here is a simple demonstration: SQL Fiddle
Note: I've made the assumption that your sample code isn't quite right and that this is the problem based on your description.

Stuck on multiple Left Join and Inner Join

I've to make a query on Oracle and i'm a little bit stuck with it. In my TABLE1, I've 287 reccords so I want all informations from TABLE2 AND TABLE3 that egal with my 287 reccords (that's why I use Left Join). But I also want all reccords that match between TABLE2 and TABLE4, TABLE4 AND TABLE5 (That's why I use Inner Join).
But my query don't work and I don't know why. Someone can help me ?
My query :
SELECT distinct(TABLE1.NUM_SIN),
TABLE1.LIBELLE,
TABLE1.DATE_FRAIS,
TABLE2.CODE_SIN,
TABLE2.PKPR,
TABLE1.MT,
TABLE4.POSTBUD,
TABLE3.VEENG
FROM TABLE1
LEFT JOIN TABLE2
ON TABLE2.NUM_SIN = TABLE1 .NUM_SIN
INNER JOIN TABLE4
ON TABLE4.NUM_SIN = TABLE2.NUM_SIN
AND TABLE4.SCSO = TABLE2.SCSO
LEFT JOIN TABLE5
ON TABLE5.CDC = TABLE4.NO
AND TABLE5.CDEXE = TABLE4.CDEXE
AND TABLE5.SCSO = TABLE4.SCSO
LEFT JOIN TABLE3
ON TABLE3.CNCT = TABLE1.NUM_SIN
WHERE ... ;
A graph to understand :
Thx in advice !
I think the issue here is perhaps that you really don't want to use an inner join in your query, and perhaps that you don't know exactly what the difference is between an inner join and an outer join.
The inner join in your query will return ONLY the rows from TABLE4 that are a match in TABLE2. Joins are sequential and cumulative, so your remaining LEFT joins will have the reduced rowset on the left side of the join.
Thus, I believe you will want to use LEFT joins throughout your query, e.g.:
SELECT distinct(TABLE1.NUM_SIN),
TABLE1.LIBELLE,
TABLE1.DATE_FRAIS,
TABLE2.CODE_SIN,
TABLE2.PKPR,
TABLE1.MT,
TABLE4.POSTBUD,
TABLE3.VEENG
FROM TABLE1
LEFT JOIN TABLE2
ON TABLE2.NUM_SIN = TABLE1 .NUM_SIN
LEFT JOIN TABLE4
ON TABLE4.NUM_SIN = TABLE2.NUM_SIN
AND TABLE4.SCSO = TABLE2.SCSO
LEFT JOIN TABLE5
ON TABLE5.CDC = TABLE4.NO
AND TABLE5.CDEXE = TABLE4.CDEXE
AND TABLE5.SCSO = TABLE4.SCSO
LEFT JOIN TABLE3
ON TABLE3.CNCT = TABLE1.NUM_SIN
WHERE ... ;
Are you sure you don't want to left join to table4? The way it is written only values in TABLE4 would be allowed which would limit the results from table2 and table5.
NB - the image CDN is filtered here so I can't see the image.
SELECT --
FROM TABLE1
LEFT JOIN TABLE2 ON TABLE2.NUM_SIN = TABLE1 .NUM_SIN
LEFT JOIN TABLE3 ON TABLE3.CNCT = TABLE1.NUM_SIN
-- unless you want to reduce the number of table2 rows use left join here.
LEFT JOIN TABLE4 ON TABLE4.NUM_SIN = TABLE2.NUM_SIN AND TABLE4.SCSO = TABLE2.SCSO
LEFT JOIN TABLE5 ON TABLE5.CDC = TABLE4.NO
AND TABLE5.CDEXE = TABLE4.CDEXE
AND TABLE5.SCSO = TABLE4.SCSO
WHERE ... ;

Multiple outer joins semantics

Some SQL code:
SELECT *
FROM table1 tab1
LEFT OUTER JOIN table2 tab2 ON (tab1.fg = tab2.fg)
LEFT OUTER JOIN table4 tab4 ON (tab1.ss = tab4.ss)
INNER JOIN table3 tab3 ON (tab4.xya = tab3.xya)
LEFT OUTER JOIN table5 tab5 ON (tab4.kk = tab5.kk)
I know what different types of JOINs do, but what I'd like to know is: for each JOIN, which table assumes the role of the "LEFT" table? Will table1 always have the role of the "LEFT" table?
They are processed in top-to-bottom order, with the joins all associating to the "whole" of the prior FROM clause.
All things being equal:
tab1 is the mandatory partner for the OUTER JOIN with the optional partner tab2
the above is the mandatory partner for the OUTER JOIN with the optional partner tab4
the above and tab4 are both mandatory partners in the INNER JOIN
the above is the mandatory partner for the OUTER JOIN with the optional partner tab5
However, the problem with this query
SELECT *
FROM table1 tab1
LEFT OUTER JOIN table2 tab2 ON tab1.fg = tab2.fg
LEFT OUTER JOIN table4 tab4 ON tab1.ss = tab4.ss
INNER JOIN table3 tab3 ON tab4.xya = tab3.xya
LEFT OUTER JOIN table5 tab5 ON tab4.kk = tab5.kk
Is that the INNER JOIN with table3 uses a condition that REQUIRES tab4 to get involved, making it virtually a mandatory link to retain records from the left part, so in total tab1/tab4/tab3 have to successfully join, with tab2 and tab5 optional.