SQL, Self join on 3 joined tables - sql

Tables and requested output
I'm using National Instrumets Teststand default database setup. I've tried to simplify the DB layout in the picture above.
I can manage to get what i want through some rather "complicated" sql, and it's very slow.
I think there is a better way, and then i stumbled over SELF JOIN. Basically what I want is to get data values from several different rows, from one "serial number".
My problem is to combine the self Join with the "general" join of my tables.
I'm using an Access Databdase at the moment.

This will give you the output you're aiming for with the sample data:
with x as (
select
row_number() over (partition by t1.Serial order by t1.Serial) as [RN],
t1.Serial,
case when t3.Sub_Test_Name = 'AAA' then t3.Value end as [AAA],
case when t3.Sub_Test_Name = 'BBB' then t3.Value end as [BBB],
case when t3.Sub_Test_Name = 'CCC' then t3.Value end as [CCC],
case when t3.Sub_Test_Name = 'DDD' then t3.Value end as [DDD]
from Table_1 t1
inner join Table_2 t2 on t2.Table_1_Id = t1.Id
inner join Table_3 t3 on t3.Table_2_Id = t2.Id
)
select
x.Serial,
AAA.AAA,
BBB.BBB,
CCC.CCC,
DDD.DDD
from x
left outer join x AAA on AAA.Serial = x.Serial and AAA.RN = x.rn + 0
left outer join x BBB on BBB.Serial = x.Serial and BBB.RN = x.rn + 1
left outer join x CCC on CCC.Serial = x.Serial and CCC.RN = x.rn + 2
left outer join x DDD on DDD.Serial = x.Serial and DDD.RN = x.rn + 3
where x.rn = 1
This uses self joins as you mentioned (where you see x being left joined to itself multiple times in the final select statement).
I've deliberately added extra columns CCC and DDD so it is easier to see how you would build this out for a larger data set, incrementing the row_number offset for each join.
I've tested this in SQL Fiddle and you're welcome to play around with it. If you need to apply additional filters, your where clause should be placed inside the CTE.
Note, you're effectively pivoting the data with this sort of query (except we're not aggregating anything, so we can't use the built in PIVOT option). The downside of both this method and real pivots is that you have to manually specify every column header with its own CASE statement in the CTE, and a left join in the final select statement. This can get unwieldy in medium - large data sets, so it best suited in cases where you will have a small number of known column headers in your results.

Related

Stored procedure slowed all of a sudden

I have a procedure which is basically inserting records to a table from selection of few table and views combinations
and SQL is like below
ALTER PROCEDURE [dbo].[aka_spring_rep_sum]
AS
INSERT INTO tbl_spring (col1,col2,....colx)
SELECT col1,col2...colx
FROM vw_tbl_spring bk
LEFT JOIN
(SELECT col1,col2,..
FROM vw_tbl_prices_spring) sp ON bk.col1 = sp.col1
AND bk.col2 = sp.col2
LEFT JOIN
(SELECT col1,col2...
FROM tbl_xx) stock ON bk.col1 = stock.col1
AND bk.col2 = stock.col2
LEFT JOIN
(SELECT DISTINCT col1,col2,....
FROM tbl_v) sf ON bk.col1 = sf.col1
AND bk.colx = sf.colx
LEFT JOIN
(SELECT DISTINCT col1, col2
FROM tbl_bb) vr ON sf.col1 = vr.col1
LEFT JOIN
(SELECT col1, is
FROM tbl_ss) sh ON bk.col1 = sh.col1
OPTION (RECOMPILE)
The stored procedure was taking less than 2-3 seconds only till today, but today all of a sudden this was taking very long time 30 minutes plus and never ending and forced to stop manually.
After breaking the different selections one by one I found that
select ..... FROM vw_tbl_spring bk
is ended up as a never ending call. Rest all select statements in the stored procedure are returning results less than 1 seconds.
ALTER VIEW [dbo].[vw_tbl_spring]
AS
SELECT col1, col2...., colx
FROM
(SELECT Icol1, col2, ....
FROM
(
SELECT DISTINCT col1,col2,... FROM tbl_pens s
INNER JOIN tbl_penh h ON s.col1 = h.col1 AND s.col2 = h.col2
WHERE s.col6 >= 21 AND h.col1 = 'X'
)
) b
LEFT JOIN
( SELECT col1,col2,...... FROM tbl_pens s
INNER JOIN tbl_penh h ON ON s.col1 = h.col1 AND s.col2 = h.col2
WHERE WHERE s.col6 >= 21 AND h.col1 = 'X'
) p ON b.col1 = p.col1 AND b.col1 = p.col1
LEFT JOIN vw_tbl_kk k ON p.col1 = k.col1 AND p.col1 = k.col2
Again filtering the different selections inside this view found out that the last left join is slowing things down
If we removed the last left join ie
LEFT JOIN vw_tbl_kk k ON p.col1 = k.col1 AND p.col1 = k.col2
Everything will be as normal ie will return results in less than 2-3 seconds
Unable to find what is the reason behind this sudden slowness
The same behaviour occurred a few months back and that time try to delete and recreate all associated views and stored procedure and then the issue was resolved. But this time this also didn't help
Any way to check what is causing this slowness in SQL Server?
Your query is a 6-way join. This means that if there are n rows that join in each joined table, there will be n6 resulting rows to insert. To highlight the exponential growth this causes:
rows in each table that join | number of resultant rows
1 | 1
2 | 64
3 | 729
4 | 4,096
10 | 1,000,000
There are probably suddenly more joining rows in the tables, not only making it slower, but also hitting the logging high water mark, which means the database must rollback the transaction, which is typically very slow.

Refactoring slow SQL query

I currently have this very very slow query:
SELECT generators.id AS generator_id, COUNT(*) AS cnt
FROM generator_rows
JOIN generators ON generators.id = generator_rows.generator_id
WHERE
generators.id IN (SELECT "generators"."id" FROM "generators" WHERE "generators"."client_id" = 5212 AND ("generators"."state" IN ('enabled'))) AND
(
generators.single_use = 'f' OR generators.single_use IS NULL OR
generator_rows.id NOT IN (SELECT run_generator_rows.generator_row_id FROM run_generator_rows)
)
GROUP BY generators.id;
An I'm trying to refactor it/improve it with this query:
SELECT g.id AS generator_id, COUNT(*) AS cnt
from generator_rows gr
join generators g on g.id = gr.generator_id
join lateral(select case when exists(select * from run_generator_rows rgr where rgr.generator_row_id = gr.id) then 0 else 1 end as noRows) has on true
where g.client_id = 5212 and "g"."state" IN ('enabled') AND
(g.single_use = 'f' OR g.single_use IS NULL OR has.norows = 1)
group by g.id
For reason it doesn't quite work as expected(It returns 0 rows). I think I'm pretty close to the end result but can't get it to work.
I'm running on PostgreSQL 9.6.1.
This appears to be the query, formatted so I can read it:
SELECT gr.generators_id, COUNT(*) AS cnt
FROM generators g JOIN
generator_rows gr
ON g.id = gr.generator_id
WHERE gr.generators_id IN (SELECT g.id
FROM generators g
WHERE g.client_id = 5212 AND
g.state = 'enabled'
) AND
(g.single_use = 'f' OR
g.single_use IS NULL OR
gr.id NOT IN (SELECT rgr.generator_row_id FROM run_generator_rows rgr)
)
GROUP BY gr.generators_id;
I would be inclined to do most of this work in the FROM clause:
SELECT gr.generators_id, COUNT(*) AS cnt
FROM generators g JOIN
generator_rows gr
ON g.id = gr.generator_id JOIN
generators gg
on g.id = gg.id AND
gg.client_id = 5212 AND gg.state = 'enabled' LEFT JOIN
run_generator_rows rgr
ON g.id = rgr.generator_row_id
WHERE g.single_use = 'f' OR
g.single_use IS NULL OR
rgr.generator_row_id IS NULL
GROUP BY gr.generators_id;
This does make two assumptions that I think are reasonable:
generators.id is unique
run_generator_rows.generator_row_id is unique
(It is easy to avoid these assumptions, but the duplicate elimination is more work.)
Then, some indexes could help:
generators(client_id, state, id)
run_generator_rows(id)
generator_rows(generators_id)
Generally avoid inner selects as in
WHERE ... IN (SELECT ...)
as they are usually slow.
As it was already shown for your problem it's a good idea to think of SQL as of set- theory.
You do NOT join tables on their sole identity:
In fact you take (SQL does take) the set (- that is: all rows) of the first table and "multiply" it with the set of the second table - thus ending up with n times m rows.
Then the ON- clause is used to (often strongly) reduce the result by simply selecting each one of those many combinations by evaluating this portion to either true (take) or false (drop). This way you can chose any arbitrary logic to select those combinations in favor.
Things get trickier with LEFT JOIN and RIGHT JOIN, but one can easily think of them as to take one side for granted:
output the combinations of that row IF the logic yields true (once at least) - exactly like JOIN does
output exactly ONE row, with 'the other side' (right side on LEFT JOIN and vice versa) consisting of ALL NULL for every column.
Count(*) is great either, but if things getting complicated don't stick to it: Use Sub- Selects for the keys only, and once all the hard word is done join the Fun- Stuff to it. Like in
SELECT SUM(VALID), ID
FROM SELECT
(
(1 IF X 0 ELSE) AS VALID, ID
FROM ...
)
GROUP BY ID) AS sub
JOIN ... AS details ON sub.id = details.id
Difference is: The inner query is executed only once. The outer query does usually have no indices left to work with and will be slow, but if the inner select here doesn't make the data explode this is usually many times faster than SELECT ... WHERE ... IN (SELECT..) constructs.

Improving performance: Very Slow Oracle SQL Join

I am a newbie in SQL querying and I am spending 3hrs to get the whole result of joining 2 queries.
I have focused on using left joins and avoided using subqueries on the select statement after researching. However it is still extremely slow. I have no close friends who know sql enough to explain whats wrong or what I approach I should take.
I am also new here so if this question is not allowed please inform me and I will remove it immediately.
This is the structure of the query...
The first query will get the member details.
The second query will get the transaction details.
The relationship is,
one product has many sub-plans which has many members.
One product also has many transactions which is made on a per product basis.
I am required to show all transactions and duplicate each line for each member.
I joined the queries using the product primary key.
Prior to joining, I have tested both individual queries and they turned out fine. Only 1-2 secs and I get the result.
But joining the two, I end up with 3 hrs of waiting.
SELECT
MPPFF.N_DX,
MPPFF.PM_A_P,
MPPFF.FEE1,
MPPFF.FEE2,
MPPFF.FEE3,
MPPFF.FEE4,
MPPFF.FEE11,
MPPFF.FEE12,
MPPFF.FEE5,
MPPFF.N_NO,
MPPFF.SETN_DX,
MPPFF.PRIME_NO,
MPPFF.SECN_NO,
MPPFF.COMM_A,
MPPFF.TYX_NO,
MPPFF.P_NAME,
MPPFF.B_BFX,
MPPFF.B_FM,
MPPFF.B_TO,
MPPFF.BB_NAME_P,
MPPFF.BB_NAME_S,
MPPFF.REVERSE_BFX,
MPPFF.TYX_REF_NO,
MPPFF.BB_NO_AX,
MPPFF.BB_NAME_AX,
MPPFF.DXC,
MPPFF.ST,
MPPFF.DAY,
MPPFF.CE_D_PRODUCT,
MPPFF.CE_H,
MPPFF.AS_C_E,
MPPFF.BCH,
MPPFF.RCPY_NO,
MPPFF.RE_BFX,
MPPFF.A_END,
MPPFF.PLACE,
MPPFF.MEMB_DX,
MPPFF.MBR_NO,
MPPFF.MBR_TR_BFX,
MPPFF.CE_D_TERM_CE,
MPPFF.MEMBER_AS,
MPPFF.C_USER,
MPPFF.C_BFX,
MPPFF.U_USER,
MPPFF.U_BFX
FROM (
SELECT
FF.N_DX,
FF.PM_A_P,
FF.FEE1,
FF.FEE2,
FF.FEE3,
FF.FEE4,
FF.FEE11,
FF.FEE12,
FF.FEE5,
FF.N_NO,
FF.SETN_DX,
FF.PRIME_NO,
FF.SECN_NO,
FF.COMM_A,
FF.TYX_NO,
FF.P_NAME,
FF.B_BFX,
FF.B_FM,
FF.B_TO,
FF.BB_NAME_P,
FF.BB_NAME_S,
FF.REVERSE_BFX,
FF.TYX_REF_NO,
FF.BB_NO_AX,
FF.BB_NAME_AX,
FF.DXC,
FF.ST,
FF.DAY,
FF.CE_D_PRODUCT,
FF.CE_H,
FF.AS_C_E,
FF.RCPY_NO,
FF.RE_BFX,
FF.A_END,
FF.BCH,
MPP.MBR_NO,
MPP.MBR_TR_BFX,
MPP.CE_D_TERM_CE,
MPP.C_USER,
MPP.C_BFX,
MPP.U_USER,
MPP.U_BFX,
MPP.PLACE,
MPP.MEMBER_AS,
MPP.TYX_DX,
MPP.AS_DX,
MPP.PRODUCT,
MPP.POPL_DX,
MPP.MEMB_DX,
FF.TYX_DX
FROM (
SELECT
MBR.MEMB_DX,
MBR.MBR_NO,
MBR.MBR_TR_BFX,
MBR.CE_D_TERM_CE,
MBR.C_USER,
MBR.C_BFX,
MBR.U_USER,
MBR.U_BFX,
MPP.PLACE,
MPP.MEMBER_AS,
MPP.TYX_DX,
MPP.AS_DX,
MPP.PRODUCT,
MPP.POPL_DX
FROM (
SELECT
MPP.PLACE,
MPP.MEMBER_AS,
MPP.TYX_DX,
MPP.AS_DX,
MPP.PRODUCT,
MPP.POPL_DX,
MMP.MEMB_DX
FROM(
SELECT
MPP.PLACE,
MPP.TYX_AS_DXC MEMBER_AS,
MPP.TYX_DX,
MPP.AS_DX,
MPP.POPL_DX,
RPT.PRODUCT
FROM
TABLE1 MPP
LEFT JOIN (
SELECT
SUBSTR(CE_D_PRODUCT,9) PRODUCT,
AS_DX
FROM
TABLE6 RPT,
TABLE7 PP
WHERE
PP.PRTY_DX = RPT.PRTY_DX
) RPT
ON MPP.AS_DX = RPT.AS_DX
) MPP
LEFT JOIN (
SELECT
POPL_DX,
MEMB_DX
FROM
TABLE4
)MMP
ON MPP.POPL_DX=MMP.POPL_DX
) MPP,
(
SELECT
MBR.MEMB_DX,
MBR.MBR_NO,
MBR.TERM_BFX MBR_TR_BFX,
MBR.CE_D_TERM_CE,
MBR.C_USER,
MBR.C_BFX,
MBR.U_USER,
MBR.U_BFX
FROM
TABLE8 MBR
) MBR
WHERE
MPP.MEMB_DX = MBR.MEMB_DX
) MPP
INNER JOIN
(
SELECT
FF.N_DX,
ROUND(CB.FEE5 * FF.RATE,2) PM_A_P,
CB.FEE1,
CB.FEE2,
CB.FEE3,
CB.FEE4,
CB.FEE11,
CB.FEE12,
CB.FEE5,
FF.N_NO,
FF.SETN_DX,
FF.PRIME_NO,
FF.SECN_NO,
FF.COMM_A,
FF.TYX_NO,
FF.P_NAME_1||', '||FF.P_NAME_2||' '||FF.P_NAME_3 P_NAME,
FF.B_BFX,
FF.B_FM,
FF.B_TO,
FF.BB_NAME_1_P||', '||FF.BB_NAME_2_P BB_NAME_P,
FF.BB_NAME_1_S||', '||FF.BB_NAME_2_S BB_NAME_S,
CB.REVERSE_BFX,
FF.TYX_REF_NO,
FF.BB_NO_AX,
FF.BB_NAME_1_AX||' '|| FF.BB_NAME_2_AX BB_NAME_AX,
CASE
WHEN FF.CE_D_ST IN ('A', 'B', 'C') THEN 'AC'
WHEN FF.DAY >1 THEN 'NEW'
ELSE 'AB'
END DXC,
FF.CE_D_ST ST,
FF.DAY,
FF.CE_D_PRODUCT,
FF.CE_D_COMP CE_H,
FF.AS_C AS_C_E,
FF.RCPY_NO,
FF.RE_BFX,
ROUND(CB.A_S,2) A_END,
FF.TYX_DX,
MP.BCH
FROM
TABLE2 CB,
TABLE3 FF
LEFT JOIN (
SELECT
SUBSTR(CE_D_BCH_O,13) BCH,
TYX_DX
FROM
TABLE5 MP
)MP
ON MP.TYX_DX = FF.TYX_DX
WHERE
FF.SETN_DX = CB.SETN_DX AND
EXTRACT( YEAR FROM FF.EFF_BFX) >=2013
) FF
ON MPP.TYX_DX = FF.TYX_DX
)MPPFF
;
Use ROWNUM to prevent optimizer transformations from degrading the performance.
You are encountering a common problem - two queries run fast separately but run slow when put together. Oracle does not have to run the queries in the order they are written. It can merge views, push predicates around, and generally completely re-write the query to run in a different order. Normally this is a great thing because you don't want to have to worry about which physical order to join tables. But sometimes Oracle applies the wrong transformations and the results are disastrous.
There are two ways to solve these problems.
Look at table structures, the statements, the execution plans, SQL monitoring or traces, statistics, etc. Try to find out which operation is slow, and why (use cardinality as your guide), and then try to fix it. This process can easily take hours, maybe even days, but it's the best way to learn.
Stop the optimizer from combining the queries with a simple trick. There are a few ways to do this but in my experience the simplest way is to add the pseudo-column ROWNUM to any inline view that you do not want transformed. ROWNUM is a special column that tells Oracle "this query block must be returned in a specific way, don't do anything to it".
Change this:
--This is slow:
select ...
from
(
--This is fast:
select ...
) inline_view1
join
(
--This is fast:
select ...
) inline_view2
on ...
to this:
--Now this is fast.
select ...
from
(
--This is fast:
select rownum /*add rownum to prevent slow transformations*/, ...
) inline_view1
join
(
--This is fast:
select rownum /*add rownum to prevent slow transformations*/, ...
) inline_view2
on ...
In your code I believe the two inline views to modify would be the outer-most MPP and FF.
On a side note, I disagree with with some of the other comments and answers.
A CTE will not help here since none of the tables are used twice.
You don't always need to know a million details about the query to tune it. Unless you have the time and want to improve your skills.
I think your over-all query structure is good. You are on the right path to building great SQL statements. Inline views are the key to writing SQL - build small units of code, combine them in simple steps, repeat. Putting all the tables together in one massive join is a recipe for spaghetti code. Although I agree with others that you should avoid the old-fashioned join syntax. And the query would really benefit from some comments and more meaningful names. And don't feel afraid to put all the select list items on one line. Having a 500-column line isn't ideal, but you want to focus on the joins, not the simple list of columns.
Your query is almost unreadable, because of all the nesting. And you are mixing pre 1992 style joins with current join syntax. Don't use the outdated comma-separated join syntax. It is prone to errors. All your outer-joins are void, because at some point you will always have criteria that dismisses outer-joined records, such as when inner-joining table8 on the outer-joined table4's memb_dx.
Your query seems to translate to
select
<several fields from the tables>
from table1 mpp
join table6 rpt on rpt.as_dx = mpp.as_dx
join table7 pp on pp.prty_dx = rpt.prty_dx
join table4 mmp on mmp.popl_dx = mpp.popl_dx
join table8 mbr on mpp.memb_dx = mmp.memb_dx
join table3 ff on ff.tyx_dx = mpp.tyx_dx and extract(year from ff.eff_bfx) >= 2013
join table2 cb on ff.setn_dx = cb.setn_dx
left join table5 mp on mp.tyx_dx = ff.tyx_dx;
and maybe you want it to be
select
<several fields from the tables>
from table1 mpp
left join table6 rpt on rpt.as_dx = mpp.as_dx
left join table7 pp on pp.prty_dx = rpt.prty_dx
left join table4 mmp on mmp.popl_dx = mpp.popl_dx
left join table8 mbr on mpp.memb_dx = mmp.memb_dx
join table3 ff on ff.tyx_dx = mpp.tyx_dx and extract(year from ff.eff_bfx) >= 2013
join table2 cb on ff.setn_dx = cb.setn_dx
left join table5 mp on mp.tyx_dx = ff.tyx_dx;
instead or something along the lines. Get rid of all the nesting and stay with a clear and easy to read from clause.
One thing others haven't mentioned is the use of
EXTRACT( YEAR FROM FF.EFF_BFX) >=2013
This applies the EXTRACT function to every row selected from TABLE3 (I believe that's what FF refers to at this point in the query). I suggest replacing the above with
FF.EFF_BFX >= TO_DATE('01-JAN-2013', 'DD-MON-YYYY')
or something similar. This requires only a single call to TO_DATE to generate the date constant, which is then compared directly to FF.EFF_BFX, which appears to be a column of type DATE.
This query also uses the same table alias (e.g. FF, MPP, etc) multiple times for different entities in different contexts. In my opinion this is bad practice, and I suggest you rework your query to use a unique alias for each entity, which will make the query easier to understand.
As others have mentioned, getting rid of the pre-1992 joins in the WHERE clause would also help clarify what's going on, as would getting rid of the long column lists. A couple of the subqueries could be eliminated as well which would make the query cleaner and clearer.
After dealing with all the above I get the following:
SELECT *
FROM (SELECT *
FROM TABLE1 MPP
LEFT OUTER JOIN (SELECT SUBSTR(CE_D_PRODUCT, 9) PRODUCT,
AS_DX
FROM TABLE6 RPT
INNER JOIN TABLE7 PP
ON PP.PRTY_DX = RPT.PRTY_DX) RPT
ON MPP.AS_DX = RPT.AS_DX
LEFT OUTER JOIN TABLE4 MMP
ON MPP.POPL_DX = MMP.POPL_DX) MPP
INNER JOIN TABLE8 MBR
ON MPP.MEMB_DX = MBR.MEMB_DX
INNER JOIN (SELECT FF.*,
CB.*,
ROUND(CB.FEE5 * FF.RATE,2) PM_A_P,
FF.P_NAME_1 || ', ' || FF.P_NAME_2 || ' ' || FF.P_NAME_3 P_NAME,
FF.BB_NAME_1_P || ', ' || FF.BB_NAME_2_P BB_NAME_P,
FF.BB_NAME_1_S || ', ' || FF.BB_NAME_2_S BB_NAME_S,
FF.BB_NAME_1_AX || ' ' || FF.BB_NAME_2_AX BB_NAME_AX,
CASE
WHEN FF.CE_D_ST IN ('A', 'B', 'C') THEN 'AC'
WHEN FF.DAY > 1 THEN 'NEW'
ELSE 'AB'
END DXC,
ROUND(CB.A_S,2) A_END,
SUBSTR(MP.CE_D_BCH_O, 13) AS BCH
FROM TABLE2 CB
INNER JOIN TABLE3 FF
ON FF.SETN_DX = CB.SETN_DX
LEFT OUTER JOIN TABLE5 MP
ON MP.TYX_DX = FF.TYX_DX
WHERE FF.EFF_BFX >= TO_DATE('01-JAN-2013', 'DD-MON-YYYY')) FF
ON MPP.TYX_DX = FF.TYX_DX
Best of luck.
I tried to make your query more readable:
SELECT MPPFF.*
FROM
(SELECT FF.*, MPP.*
FROM
(SELECT MBR.*, MPP.*
FROM
(SELECT MPP.*, MMP.*
FROM
(SELECT MPP.*, RPT.*
FROM TABLE1 MPP
LEFT JOIN (SELECT * FROM TABLE6 RPT, TABLE7 PP WHERE PP.PRTY_DX = RPT.PRTY_DX) RPT ON MPP.AS_DX = RPT.AS_DX) MPP
LEFT JOIN (SELECT * FROM TABLE4) MMP ON MPP.POPL_DX=MMP.POPL_DX) MPP,
(SELECT MBR.* FROM TABLE8 MBR) MBR
WHERE MPP.MEMB_DX = MBR.MEMB_DX) MPP
INNER JOIN (SELECT FF.*, CB.* FROM TABLE2 CB, TABLE3 FF
LEFT JOIN (SELECT * FROM TABLE5 MP ) MP ON MP.TYX_DX = FF.TYX_DX
WHERE FF.SETN_DX = CB.SETN_DX
AND EXTRACT( YEAR FROM FF.EFF_BFX) >=2013) FF ON MPP.TYX_DX = FF.TYX_DX) MPPFF
;
You select 8 different tables and the only WHERE condition is EXTRACT( YEAR FROM FF.EFF_BFX) >= 2013
Unless the tables are tiny it will always take some time to query them all together.
Why do you mix ANSI join syntax and old-style Oracle join syntax?

Oracle left outer join, only want the null values

I'm working on a problem with two tables. Charge and ChargeHistory. I want to display a selection of columns from both tables where either the matching row in ChargeHistory has a different value and/or date from Charge or if there is no matching entry in ChargeHistory at all.
I'm using a left outer join declared using the ansi standard and while it does show the rows correctly where there is a difference, it isn't showing the null entries.
I've read that there can sometimes be issues if you are using the WHERE clause as well as the ON clause. However when I try and put all the conditons in the ON clause the query takes too long > 15 minutes (so long I have just cancelled the runs).
To make things worse both tables use a three part compound key.
Does anyone have any ideas as to why the null values are being left out?
SELECT values...
FROM bcharge charge
LEFT OUTER JOIN chgHist history
ON charge.key1 = history.key1 AND charge.key2 = history.key2 AND charge.key3 = history.key3 AND charge.chargeType = history.chargeType
WHERE charge.chargeType = '2'
AND (charge.value <> history.value OR charge.date <> history.date)
ORDER BY key1, key2, key
You probably want to explicitly select the null values:
SELECT values...
FROM bcharge charge
LEFT OUTER JOIN chgHist history
ON charge.key1 = history.key1 AND charge.key2 = history.key2 AND charge.key3 = history.key3 AND charge.chargeType = history.chargeType
WHERE charge.chargeType = '2'
AND ((charge.value <> history.value or history.value is null) OR (charge.date <> history.date or history.date is null))
ORDER BY key1, key2, key
You can explicitly look for a match in the where. I would recommend looking at one of the keys used for the join:
SELECT . . .
FROM bcharge charge LEFT OUTER JOIN
chgHist history
ON charge.key1 = history.key1 AND charge.key2 = history.key2 AND
charge.key3 = history.key3 AND charge.chargeType = history.chargeType
WHERE charge.chargeType = '2' AND
(charge.value <> history.value OR charge.date <> history.date OR history.key1 is null)
ORDER BY key1, key2, key;
The expressions charge.value <> history.value change the left outer join to an inner join because NULL results will be filtered out.
A WHERE clause filters the data returned by a join. Therefore when your inner table has null data for a particular column, the corresponding rows get filtered out based on your specified condition. That is why you should move that logic to the ON clause instead.
For the performance issues, you could consider adding indexes on the columns used for joining and filtering.
Have a look at this site, it will be very helpful for you, visual illustration of all the join statements with code samples
blog.codinghorror.com
Quoted of the relevant info in the above link:
SELECT * FROM TableA
LEFT OUTER JOIN TableB
ON TableA.name = TableB.name
Sample output:
id name id name
-- ---- -- ----
1 Pirate 2 Pirate
2 Monkey null null
3 Ninja 4 Ninja
4 Spaghetti null null
Left outer join
produces a complete set of records from Table A, with the matching records (where available) in Table B. If there is no match, the right side will contain null
For any field from an outer joined table used in the where clause you must also permit an IS NULL option for that same field, otherwise you negate the effect of the outer join and the result is the same as if you had used an inner join.
SELECT
*
FROM bcharge charge
LEFT OUTER JOIN chgHist history
ON charge.key1 = history.key1
AND charge.key2 = history.key2
AND charge.key3 = history.key3
AND charge.chargeType = history.chargeType
WHERE charge.chargeType = '2'
AND (
(charge.value <> history.value OR history.value IS NULL)
OR
(charge.date <> history.date OR history.date IS NULL)
)
ORDER BY
key1, key2, key3
Edit: Appears that this is the same query structure used by Rene above, so treat this one as in support of that please.

Which is the best practice to write sql

Below two queries give same results.
Just wanted to know which one is better in terms of performance.
Query 1:
SELECT N.*
FROM NOTIFICATIONS N
JOIN NOTIFICATION_COMPANY_GROUPS NCG
ON ( N.COMPANY_ID = NCG.COMPANY_ID
AND N.ID = NCG.NOTIFICATION_ID )
JOIN COMPANY_USER_GROUPS CUG
ON ( N.COMPANY_ID = CUG.COMPANY_ID
AND CUG.COMPANY_GROUP_ID = NCG.COMPANY_GROUP_ID )
JOIN NOTIFICATION_PROPERTIES NP ON ( N.COMPANY_ID = NP.COMPANY_ID )
JOIN COMPANY_USER_PROPERTIES CUP
ON ( N.COMPANY_ID = CUP.COMPANY_ID
AND CUP.PROPERTY_ID = NP.PROPERTY_ID )
WHERE N.COMPANY_ID = 2138
AND CUG.COMPANY_USER_ID = 41422
AND CUP.COMPANY_USER_ID = 41422;
Query 2:
SELECT N.*
FROM NOTIFICATIONS N
JOIN NOTIFICATION_COMPANY_GROUPS NCG
ON ( N.COMPANY_ID = 2138
AND N.COMPANY_ID = NCG.COMPANY_ID
AND N.ID = NCG.NOTIFICATION_ID )
JOIN COMPANY_USER_GROUPS CUG
ON ( CUG.COMPANY_USER_ID = 41422
AND N.COMPANY_ID = CUG.COMPANY_ID
AND CUG.COMPANY_GROUP_ID = NCG.COMPANY_GROUP_ID )
JOIN NOTIFICATION_PROPERTIES NP ON ( N.COMPANY_ID = NP.COMPANY_ID )
JOIN COMPANY_USER_PROPERTIES CUP
ON ( CUP.COMPANY_USER_ID = 41422
AND N.COMPANY_ID = CUP.COMPANY_ID
AND CUP.PROPERTY_ID = NP.PROPERTY_ID );
I expect the performance should be the same, but you can use EXPLAIN to verify that the query plan is the same.
However, the first version is the "proper" way to write it. Generally, ON clauses should only contain conditions that relate the tables being joines, while conditions on single tables should be in WHERE clauses.
The only exception to this is in LEFT JOIN clauses, where conditions on the table being joined should be in the ON clause. This is because if you put them in the WHERE clause, the null rows from rows in the main table that have no matches in the joining table will be filtered out unless you explicitly check for NULL. As an example:
SELECT ...
FROM T1
LEFT JOIN T2 ON T2.T1_id = T1.id AND T2.someCol = 3
versus
SELECT ...
FROM T1
LEFT JOIN T2 ON T2.T1_id = T1.id
WHERE T2.someCol = 3
In the first version, the test of T2.someCol is done before joining; the result will contain all rows from T1, but the ones with no matching row in T2 will have NULL for all the T2 columns. But the second version won't have any of these non-matching rows, because the join is done first, and then it performs the T2.someCol = 3 test; if there was no matching T2 row, T2.someCol will be NULL, and this test will fail and the row will be filtered out by WHERE.
In the case of an inner join, it doesn't matter whether you do the comparison before or after joining, the results are equivalent. The query planner should order these in whichever way takes best advantage of indexes.