SQL - Consecutive "ON" Statements - sql

As I was cleaning up some issues in an old view in our database I came across this "strange" join condition:
from
tblEmails [e]
join tblPersonEmails [pe]
on (e.EmailID = pe.EmailID)
right outer join tblUserAccounts [ua]
join People [p]
on (ua.PersonID = p.Id)
join tblChainEmployees [ce]
on (ua.PersonID = ce.PersonID)
on (pe.PersonID = p.Id)
Table tblUserAccounts is referenced as a right outer join, but the on condition for it is not declared until after tblChainEmployees is referenced; then there are two consecutive on statements in a row.
I couldn't find a relevant answer anywhere on the Internet, because I didn't know what this kind of join is called.
So the questions:
Does this kind of "deferred conditional" join have a name?
How can this be rewritten to produce the same result set where the on statements are not consecutive?
Maybe this is a "clever" solution when there has always been a simpler/clearer way?

(1) This is just syntax and I've never heard of some special name. If you read carefully this MSDN article you'll see that (LEFT|RIGHT) JOIN has to be paired with ON statement. If it's not, expression inside is parsed as <table_source>. You can put parentheses to make it more readable:
from
tblEmails [e]
join tblPersonEmails [pe]
on (e.EmailID = pe.EmailID)
right outer join
(
tblUserAccounts [ua]
join People [p]
on (ua.PersonID = p.Id)
join tblChainEmployees [ce]
on (ua.PersonID = ce.PersonID)
) on (pe.PersonID = p.Id)
(2) I would prefer LEFT syntax, with explicit parentheses (I know, it's a matter of taste). This produces the same execution plan:
FROM tblUserAccounts ua
JOIN People p ON ua.PersonID = p.Id
JOIN tblChainEmployees ce ON ua.PersonID = ce.PersonID
LEFT JOIN
(
tblEmails e
JOIN tblPersonEmails pe ON e.EmailID = pe.EmailID
) ON pe.PersonID = p.Id
(3) Yes, it's clever, just like some C++ expressions (i.e. (i++)*(*t)[0]<<p->a) on interviews. Language is flexible. Expressions and queries can be tricky, but some 'arrangements' lead to readability degradation and errors.

Looks to me like you have tblEmail and tblPerson with their own independent IDs, emailID and ID (person), a linking table tblPersonEmail with the valid pairs of emailID/IDs, and then the person table may have a 1-1 relationship with UserAccount, which may then have a 1-1 relationship with chainEmployee, so to get rid of the RIGHT OUTER JOIN in favor of LEFT, I'd use:
FROM
((tblPerson AS p INNER JOIN
(tblEmail AS e INNER JOIN
tblPersonEmail AS pe ON
e.emailID = pe.emailID) ON
p.ID = pe.personID) LEFT JOIN
tblUserAccount AS ua ON
p.ID = ua.personID) LEFT JOIN
tblChainEmployee AS ce ON
ua.personID = ce.personID

I can't think of a great practical example of this off the top of my head so I'll give you a generic example that hopefully makes sense. Unfortunately I'm not aware of a generic name for this either.
Many people will start off with a query like this:
select ...
from
A a left outer join
B b on b.id = a.id left outer join
C c on c.id2 = b.id2;
The look at the results and realize that they really need to eliminate the rows in B that don't have a corresponding C but if you tried to say where b.id2 is not null and c.id2 is not null you've defeated the whole purpose of the left join from A.
So next you try to do this but it doesn't take long to figure out it's not going to work. The inner join at the tail end of the chain has basically converted both the joins to inner joins.
select ...
from
A a left outer join
B b on b.id = a.id inner join
C c on c.id2 = b.id2;
The problem seems simple yet it doesn't work right. Essentially after you ponder for a while you discover that you need to control the join order and do the inner join first. So the three queries below are equivalent ways to accomplish that. The first one is probably the one you're more familiar with:
select ...
from
A a left outer join
(select * from B b inner join C c on c.id2 = b.id2) bc
on bc.id = a.id
select ...
from
A a left outer join
B b inner join
C c on c.id2 = b.id2
on b.id = a.id
select ...
from
B b inner join
C c on c.id2 = b.id2 right outer join -- now they can be done in order
A a on a.id = b.id
You query is a little more complicated but ultimately the same issues came into play which is where the odd stuff came from. SQL has evolved and you have to remember that platforms didn't always have the fancy things like derived tables, scalar subqueries, CTEs so sometimes people had to write things this way. And then there were graphical query builders with a lot of limitations in older versions of tools like Crystal Report that didn't allow for complex join conditions...

Related

Entity Framework with Multiple Joins and Subquery

I have a complex query with multiple left joins and subqueries which I need to implement in Entify Framework. I've received the monster SQL and
my goal is to do it on a elegant way with EF. The query consumes multiple tables and creates a "WITH" subquery on top which
is included in the joins later. I've done a first approach with EF but when I inspect the output that EF sends to the DB, inner joins are sent when
I am expecting LEFT JOINs.
A summary of the SQL follows:
WITH SUB_QUERY
AS ( SELECT FIELD_A,
FIELD_B,
FIELD_C,
MAX (FIELD_D) MAX_FIELD_D
FROM TABLE_X
WHERE SOME FIELD_A = 'WHATEVER'
GROUP BY FIELD_A, FIELD_B, FIELD_C)
SELECT C.FIELD_A,
C.FIELD_B,
B.FIELD_X,
D.FIELD_S,
E.FIELD_J,
F.FIELD_Y
FROM TABLE_A A
LEFT JOIN SUB_QUERY B
ON A.FIELD_C = B.FIELD_C
LEFT JOIN TABLE_C C
ON B.FIELD_A = C.FIELD_A
LEFT JOIN TABLE_D D
ON A.FIELD_C = D.FIELD_C
LEFT JOIN TABLE_E E
ON A.FIELD_X = E.FIELD_X
LEFT JOIN TABLE_F F
ON A.FIELD_W = F.FIELD_W
WHERE A.FIELD_H = D.FIELD_H
AND A.FIELD_D = B.MAX_FIELD_D
As you see, a subquery on top filters and groups some data to be consumed in a join below. Then all the joins take place
and some fields are taken from different tables as the output of the query.
Which approach would you recommend me to accomplish this task? I've tried different approaches and no one of them works (either retrieve nothing, or many more rows than the SQL query on the DB, etc..)
Please note that the Domain Model in Entity Framework is properly setup: Primary Keys, collections, nested objects etc.. so I believe some of these
joins are not even required because my EF entities contain already references to the child collections and parent objects (navigation properties).
Thanks a lot!!
if you really need a left join you should mode the where condition related to a left joined table in the proper on clause
FROM TABLE_A A
LEFT JOIN SUB_QUERY B
ON A.FIELD_C = B.FIELD_C
LEFT JOIN TABLE_C C
ON B.FIELD_A = C.FIELD_A
LEFT JOIN TABLE_D D
ON A.FIELD_C = D.FIELD_C AND A.FIELD_D = B.MAX_FIELD_D
LEFT JOIN TABLE_E E
ON A.FIELD_X = E.FIELD_X
LEFT JOIN TABLE_F F
ON A.FIELD_W = F.FIELD_W
the use of a left join table column in where force the relation to work as a INNER JOIN

Strange / esoteric join syntax

I've been provided this old SQL code (table names changed) to replicate, and the JOIN syntax isn't something I've seen before, and is proving hard to google:
select <stuff>
from A
inner join B
on A.ID = B.A_ID
inner join C -- eh? No ON?
inner join D
ON C.C_ID = D.C_ID
ON B.C_ID = D.C_ID -- a second ON here? what?
When I saw the code, I assumed I'd be sent broken code and it wouldn't run.
But it does. (Sql Server 2012)
What does it do? Is there a more sensible / standard way of writing it? What's happening here?
While unusual, this is perfectly valid tsql. Typically you see this approach when you have an outer join to a set of related tables which are inner joined to one another. A better IMHO way to write this would be:
inner join B
on A.ID = B.A_ID
inner join (C inner join D ON C.C_ID = D.C_ID)
ON B.C_ID = D.C_ID
This makes the join logic clear - and it also helps the reader. Additionally, it lets the reader know that the developer did this intentionally. So let this be an example of poor coding. Comment things that are unusual. Explain things. Have someone review your code periodically to help improve your style and usage.
And you could write this in a "typical" style by rearranging the order of tables in the from clause - but I'll guess that the current version makes more logical sense with the real table names.
I ran it by a colleague who figured it out:
select <stuff>
from A
inner join B
on A.ID = B.A_ID
inner join ( C -- put a bracket here...
inner join D
ON C.C_ID = D.C_ID
) -- and one here
ON B.C_ID = D.C_ID
or to format it a little nicer:
select <stuff>
from A
inner join B
on A.ID = B.A_ID
inner join (
C
inner join D
ON C.C_ID = D.C_ID
)
ON B.C_ID = D.C_ID
I wasn't familiar with this kind of "sub-join" (I don't know what it's called), but this is much more readable and clear

How to using Left Outer Join or Right Outer Join in Oracle 11g

I have a query using "=" in where clause, but it is long time to execute when many datas. How to use the Left Outer Join or Right Outer Join or something like that to increase performance
This is query:
select sum(op.quantity * op.unit_amount) into paid_money
from tableA op , tableB ssl, tableC ss, tableD pl, tableE p
where (op.id = ssl.id and ssL.id = ss.id and ss.type='A')
or
(op.id = pl.id and pl.id = p.id and p.type='B');
Your problem is not left or right joins. It is cross joins. You are doing many unnecessary cartesian products. I'm guessing this query will never finish. If it did, you'd get the wrong answer anyway.
Split this into two separate joins and then bring the results together. Only use the tables you need for each set of joins:
select SUM(val) into paid_money
from (select sum(op.quantity * op.unit_amount) as val
from tableA op , tableB ssl
where (op.id = ssl.id and ssL.id = ss.id and ss.type='A')
union all
select sum(op.quantity * op.unit_amount) as val
from tableA op , tableD pl, tableD p
where (op.id = pl.id and pl.id = p.id and p.type='B')
) t
I haven't fixed your join syntax. But, you should learn to use the join keyword and to put the join conditions in an on clause rather than the where clause.
Are you sure that this query is returning the required data? To me it looks like it will be returning the cartesian product of op, ssl & ss for each op, pl, p match and vice versa.
I would advise that you split it into two seperate queries, union them together, and then sum over the top.

Sybase 12 LEFT JOIN Performance issue for CrystalReports

I am trying to optimize a Crystal Report that is used very frequently here. I succeeded to optimize lots of queries but I still have one last bottleneck: This is the main query, generated from the report.
SELECT
A.*,
B.*,
C.*,
D.*,
E."N",
F."N",
G."N"
FROM
A
LEFT OUTER JOIN B ON
A."PK" = B."FK"
LEFT OUTER JOIN C ON
A."PK" = C."FK"
LEFT OUTER JOIN D ON
A."FK" = D."PK"
LEFT OUTER JOIN E ON
A."PK" = E."FK"
LEFT OUTER JOIN F ON
A."PK" = F."FK"
LEFT OUTER JOIN G ON
A."PK" = G."FK"
WHERE A.PK = ####
A,B,C and D are tables. E,F,G are simple views.
As you see, the report generated multiple LEFT JOINS. This query takes 2.28 seconds to complete (From the Plan Viewer stats). I identified three joins that seem problematic. If I remove E,F,G from the query, it becomes almost instant (0.0009s from the same stats)
SELECT
A.*,
B.*,
C.*,
D.*
FROM
A
LEFT OUTER JOIN B ON
A."PK" = B."FK"
LEFT OUTER JOIN C ON
A."PK" = C."FK"
LEFT OUTER JOIN D ON
A."FK" = D."PK"
WHERE A.PK = ####
I tought it might be the views that are slow, but if I do for example ...
SELECT *
FROM E
WHERE E.FK = ####
... it is also almost instant (0.0009s)
Tables all have indexes on PKs-FKs.
Views E,F,G all return one or no row with [FK|N] as columns, so the resulting column is NULL or a number.
Do you know how I could make this query fast?
PS: If I replace LEFT OUTER JOINS by INNER JOINS the main query becomes fast... :-/
Or trying to split this query into multiple queries on the report would be a better solution?
Thank you!
I would create functions for the lookup against E, F and G instead of joining them.
That way there is little chance the optimiser gets confused and tries to do stupid things.
SELECT
A.*,
B.*,
C.*,
D.*,
GET_E(A."PK"),
GET_F(A."PK"),
GET_G(A."PK")
FROM
A
LEFT OUTER JOIN B ON
A."PK" = B."FK"
LEFT OUTER JOIN C ON
A."PK" = C."FK"
LEFT OUTER JOIN D ON
A."FK" = D."PK"
WHERE A.PK = ####
The problem is probably because you are creating a huge cartesian product of 5 tables all joined to A in some way (A and D will only contribute one record to the product). Having such a big cartesian product will consume quite a bit of memory internally in Sybase. It is likely that your query is just wrong.

SQL joining three tables, join precedence

I have three tables: R, S and P.
Table R Joins with S through a foreign key; there should be at least one record in S, so I can JOIN:
SELECT
*
FROM
R
JOIN S ON (S.id = R.fks)
If there's no record in S then I get no rows, that's fine.
Then table S joins with P, where records is P may or may not be present and joined with S.
So I do
SELECT
*
FROM
R
JOIN S ON (S.id = R.fks)
LEFT JOIN P ON (P.id = S.fkp)
What if I wanted the second JOIN to be tied to S not to R, like if I could use parentheses:
SELECT
*
FROM
R
JOIN (S ON (S.id = R.fks) JOIN P ON (P.id = S.fkp))
Or is that already a natural behaviour of the cartesian product between R, S and P?
All kinds of outer and normal joins are in the same precedence class and operators take effect left-to-right at a given nesting level of the query. You can put the join expression on the right side in parentheses to cause it to take effect first. Remember that you will have to move the ON clauses around so that they stay with their joins—the join in parentheses takes its ON clause with it into the parentheses, so it now comes textually before the other ON clause which will be after the parentheses in the outer join statement.
(PostgreSQL example)
In
SELECT * FROM a LEFT JOIN b ON (a.id = b.id) JOIN c ON (b.ref = c.id);
the a-b join takes effect first, but we can force the b-c join to take effect first by putting it in parentheses, which looks like:
SELECT * FROM a LEFT JOIN (b JOIN c ON (b.ref = c.id)) ON (a.id = b.id);
Often you can express the same thing without extra parentheses by moving the joins around and changing the direction of the outer joins, e.g.
SELECT * FROM b JOIN c ON (b.ref = c.id) RIGHT JOIN a ON (a.id = b.id);
When you join the third table, your first query
SELECT
*
FROM
R
JOIN S ON (S.id = R.fks)
is like a derived table to which you're joining the third table. So if R JOIN S produces no rows, then joining P will never yield any rows (because you're trying to join to an empty table).
So, if you're looking for precedence rules then in this case it's just set by using LEFT JOIN as opposed to JOIN.
However, I may be misunderstanding your question, because if I were writing the query, I would swap S and R around. eg.
SELECT
*
FROM
S
JOIN R ON (S.id = R.fks)
The second join is tied to S as you explicity state JOIN P ON (P.id = S.fkp) - no column from R is referenced in the join.
with a as (select 1 as test union select 2)
select * from a left join
a as b on a.test=b.test and b.test=1 inner join
a as c on b.test=c.test
go
with a as (select 1 as test union select 2)
select * from a inner join
a as b on a.test=b.test right join
a as c on b.test=c.test and b.test=1
Ideally, we would hope that the above two queries are the same. However, they are not - so anybody that says a right join can be replaced with a left join in all cases is wrong. Only by using the right join can we get the required result.