I am trying to optimize a Crystal Report that is used very frequently here. I succeeded to optimize lots of queries but I still have one last bottleneck: This is the main query, generated from the report.
SELECT
A.*,
B.*,
C.*,
D.*,
E."N",
F."N",
G."N"
FROM
A
LEFT OUTER JOIN B ON
A."PK" = B."FK"
LEFT OUTER JOIN C ON
A."PK" = C."FK"
LEFT OUTER JOIN D ON
A."FK" = D."PK"
LEFT OUTER JOIN E ON
A."PK" = E."FK"
LEFT OUTER JOIN F ON
A."PK" = F."FK"
LEFT OUTER JOIN G ON
A."PK" = G."FK"
WHERE A.PK = ####
A,B,C and D are tables. E,F,G are simple views.
As you see, the report generated multiple LEFT JOINS. This query takes 2.28 seconds to complete (From the Plan Viewer stats). I identified three joins that seem problematic. If I remove E,F,G from the query, it becomes almost instant (0.0009s from the same stats)
SELECT
A.*,
B.*,
C.*,
D.*
FROM
A
LEFT OUTER JOIN B ON
A."PK" = B."FK"
LEFT OUTER JOIN C ON
A."PK" = C."FK"
LEFT OUTER JOIN D ON
A."FK" = D."PK"
WHERE A.PK = ####
I tought it might be the views that are slow, but if I do for example ...
SELECT *
FROM E
WHERE E.FK = ####
... it is also almost instant (0.0009s)
Tables all have indexes on PKs-FKs.
Views E,F,G all return one or no row with [FK|N] as columns, so the resulting column is NULL or a number.
Do you know how I could make this query fast?
PS: If I replace LEFT OUTER JOINS by INNER JOINS the main query becomes fast... :-/
Or trying to split this query into multiple queries on the report would be a better solution?
Thank you!
I would create functions for the lookup against E, F and G instead of joining them.
That way there is little chance the optimiser gets confused and tries to do stupid things.
SELECT
A.*,
B.*,
C.*,
D.*,
GET_E(A."PK"),
GET_F(A."PK"),
GET_G(A."PK")
FROM
A
LEFT OUTER JOIN B ON
A."PK" = B."FK"
LEFT OUTER JOIN C ON
A."PK" = C."FK"
LEFT OUTER JOIN D ON
A."FK" = D."PK"
WHERE A.PK = ####
The problem is probably because you are creating a huge cartesian product of 5 tables all joined to A in some way (A and D will only contribute one record to the product). Having such a big cartesian product will consume quite a bit of memory internally in Sybase. It is likely that your query is just wrong.
Related
I have a complex query with multiple left joins and subqueries which I need to implement in Entify Framework. I've received the monster SQL and
my goal is to do it on a elegant way with EF. The query consumes multiple tables and creates a "WITH" subquery on top which
is included in the joins later. I've done a first approach with EF but when I inspect the output that EF sends to the DB, inner joins are sent when
I am expecting LEFT JOINs.
A summary of the SQL follows:
WITH SUB_QUERY
AS ( SELECT FIELD_A,
FIELD_B,
FIELD_C,
MAX (FIELD_D) MAX_FIELD_D
FROM TABLE_X
WHERE SOME FIELD_A = 'WHATEVER'
GROUP BY FIELD_A, FIELD_B, FIELD_C)
SELECT C.FIELD_A,
C.FIELD_B,
B.FIELD_X,
D.FIELD_S,
E.FIELD_J,
F.FIELD_Y
FROM TABLE_A A
LEFT JOIN SUB_QUERY B
ON A.FIELD_C = B.FIELD_C
LEFT JOIN TABLE_C C
ON B.FIELD_A = C.FIELD_A
LEFT JOIN TABLE_D D
ON A.FIELD_C = D.FIELD_C
LEFT JOIN TABLE_E E
ON A.FIELD_X = E.FIELD_X
LEFT JOIN TABLE_F F
ON A.FIELD_W = F.FIELD_W
WHERE A.FIELD_H = D.FIELD_H
AND A.FIELD_D = B.MAX_FIELD_D
As you see, a subquery on top filters and groups some data to be consumed in a join below. Then all the joins take place
and some fields are taken from different tables as the output of the query.
Which approach would you recommend me to accomplish this task? I've tried different approaches and no one of them works (either retrieve nothing, or many more rows than the SQL query on the DB, etc..)
Please note that the Domain Model in Entity Framework is properly setup: Primary Keys, collections, nested objects etc.. so I believe some of these
joins are not even required because my EF entities contain already references to the child collections and parent objects (navigation properties).
Thanks a lot!!
if you really need a left join you should mode the where condition related to a left joined table in the proper on clause
FROM TABLE_A A
LEFT JOIN SUB_QUERY B
ON A.FIELD_C = B.FIELD_C
LEFT JOIN TABLE_C C
ON B.FIELD_A = C.FIELD_A
LEFT JOIN TABLE_D D
ON A.FIELD_C = D.FIELD_C AND A.FIELD_D = B.MAX_FIELD_D
LEFT JOIN TABLE_E E
ON A.FIELD_X = E.FIELD_X
LEFT JOIN TABLE_F F
ON A.FIELD_W = F.FIELD_W
the use of a left join table column in where force the relation to work as a INNER JOIN
As I was cleaning up some issues in an old view in our database I came across this "strange" join condition:
from
tblEmails [e]
join tblPersonEmails [pe]
on (e.EmailID = pe.EmailID)
right outer join tblUserAccounts [ua]
join People [p]
on (ua.PersonID = p.Id)
join tblChainEmployees [ce]
on (ua.PersonID = ce.PersonID)
on (pe.PersonID = p.Id)
Table tblUserAccounts is referenced as a right outer join, but the on condition for it is not declared until after tblChainEmployees is referenced; then there are two consecutive on statements in a row.
I couldn't find a relevant answer anywhere on the Internet, because I didn't know what this kind of join is called.
So the questions:
Does this kind of "deferred conditional" join have a name?
How can this be rewritten to produce the same result set where the on statements are not consecutive?
Maybe this is a "clever" solution when there has always been a simpler/clearer way?
(1) This is just syntax and I've never heard of some special name. If you read carefully this MSDN article you'll see that (LEFT|RIGHT) JOIN has to be paired with ON statement. If it's not, expression inside is parsed as <table_source>. You can put parentheses to make it more readable:
from
tblEmails [e]
join tblPersonEmails [pe]
on (e.EmailID = pe.EmailID)
right outer join
(
tblUserAccounts [ua]
join People [p]
on (ua.PersonID = p.Id)
join tblChainEmployees [ce]
on (ua.PersonID = ce.PersonID)
) on (pe.PersonID = p.Id)
(2) I would prefer LEFT syntax, with explicit parentheses (I know, it's a matter of taste). This produces the same execution plan:
FROM tblUserAccounts ua
JOIN People p ON ua.PersonID = p.Id
JOIN tblChainEmployees ce ON ua.PersonID = ce.PersonID
LEFT JOIN
(
tblEmails e
JOIN tblPersonEmails pe ON e.EmailID = pe.EmailID
) ON pe.PersonID = p.Id
(3) Yes, it's clever, just like some C++ expressions (i.e. (i++)*(*t)[0]<<p->a) on interviews. Language is flexible. Expressions and queries can be tricky, but some 'arrangements' lead to readability degradation and errors.
Looks to me like you have tblEmail and tblPerson with their own independent IDs, emailID and ID (person), a linking table tblPersonEmail with the valid pairs of emailID/IDs, and then the person table may have a 1-1 relationship with UserAccount, which may then have a 1-1 relationship with chainEmployee, so to get rid of the RIGHT OUTER JOIN in favor of LEFT, I'd use:
FROM
((tblPerson AS p INNER JOIN
(tblEmail AS e INNER JOIN
tblPersonEmail AS pe ON
e.emailID = pe.emailID) ON
p.ID = pe.personID) LEFT JOIN
tblUserAccount AS ua ON
p.ID = ua.personID) LEFT JOIN
tblChainEmployee AS ce ON
ua.personID = ce.personID
I can't think of a great practical example of this off the top of my head so I'll give you a generic example that hopefully makes sense. Unfortunately I'm not aware of a generic name for this either.
Many people will start off with a query like this:
select ...
from
A a left outer join
B b on b.id = a.id left outer join
C c on c.id2 = b.id2;
The look at the results and realize that they really need to eliminate the rows in B that don't have a corresponding C but if you tried to say where b.id2 is not null and c.id2 is not null you've defeated the whole purpose of the left join from A.
So next you try to do this but it doesn't take long to figure out it's not going to work. The inner join at the tail end of the chain has basically converted both the joins to inner joins.
select ...
from
A a left outer join
B b on b.id = a.id inner join
C c on c.id2 = b.id2;
The problem seems simple yet it doesn't work right. Essentially after you ponder for a while you discover that you need to control the join order and do the inner join first. So the three queries below are equivalent ways to accomplish that. The first one is probably the one you're more familiar with:
select ...
from
A a left outer join
(select * from B b inner join C c on c.id2 = b.id2) bc
on bc.id = a.id
select ...
from
A a left outer join
B b inner join
C c on c.id2 = b.id2
on b.id = a.id
select ...
from
B b inner join
C c on c.id2 = b.id2 right outer join -- now they can be done in order
A a on a.id = b.id
You query is a little more complicated but ultimately the same issues came into play which is where the odd stuff came from. SQL has evolved and you have to remember that platforms didn't always have the fancy things like derived tables, scalar subqueries, CTEs so sometimes people had to write things this way. And then there were graphical query builders with a lot of limitations in older versions of tools like Crystal Report that didn't allow for complex join conditions...
I need to join at least 4 tables. Table A is an Association table that contains a guid for Table B and C, Parentguid (B), Childguid (C). Table D contains information just for table C.
I need the results to look like this.
B - C - D
Monitor - Computer Name - Active
So the main thing is to show all of B table, only C table that is connected to B, and only D table this is associated with C.
I suspect I will need sub joins ( ). I am still a novice, it makes sense in my head but I can't seem to make the code work. I have played with joins for the past 2 days.
FROM vHWDesktopMonitor mon -- [Symantec_CMDB2].[dbo].[ResourceAssociation]
join ResourceAssociation RM on mon._ResourceGuid = RM.ParentResourceGuid
full outer join vComputer comp on RM.ChildResourceGuid = comp.Guid
full outer join vAsset on RM.ChildResourceGuid = vAsset._ResourceGuid
FROM vHWDesktopMonitor A
FULL OUTER JOIN ResourceAssociation B
on A._ResourceGuid = B.ParentResourceGuid
LEFT JOIN vComputer C
on B.ChildResourceGuid = C.Guid
LEFT JOIN vAsset D
on C.ChildResourceGuid = D._ResourceGuid
So the above will return
All FROM A and ALL records from B (Full Outer between A, B)
Only records from C that are in B (LEFT Between B and C)
Only records from D that are in C (LEFT Between C and D)
However, if you apply any where clause limits, it could reduce records which otherwise would be kept due to the left or outer joins...
For example if A._ResourceGuid ='7' exists in A but isn't in B;
and you set where B._ResourceGuid ='7' then the A record would otherwise would be kept due to the full outer join would then be excluded (making the full outer join the same as an INNER JOIN)!
a Full outer would return data like this:
A B
7 7
2
3
if you add a where clause where B=7 then you may be expecting to get because of the full outer since you said return all records from both...
A B
7 7
2
But you would end up getting
A B
7 7
Because the where clause occurs AFTER the full outer and therefore reduces the A.2 record.
To compensate for this you either have to put teh limits on teh join before the full outer executes or handle it in a where clause (but this method is VERY messy and prone to error and performance issues)
So when using outer joins, you MUST put the limiting criteria on the JOIN itself like below..
FROM vHWDesktopMonitor A
FULL OUTER JOIN ResourceAssociation B
on A._ResourceGuid = B.ParentResourceGuid
and B._resourceGuid = '7'
LEFT JOIN vComputer C
on B.ChildResourceGuid = C.Guid
LEFT JOIN vAsset D
on C.ChildResourceGuid = D._ResourceGuid
You could also put it in the where clause but you must remember to account for all the outer joins on the table and include null values for the other (this is just messy and slow)
FROM vHWDesktopMonitor A
FULL OUTER JOIN ResourceAssociation B
on A._ResourceGuid = B.ParentResourceGuid
LEFT JOIN vComputer C
on B.ChildResourceGuid = C.Guid
LEFT JOIN vAsset D
on C.ChildResourceGuid = D._ResourceGuid
WHERE (A._ResourceGuid is null OR B.ParentResourceGuid ='7')
If I understand you correctly either of these should work"
FROM vHWDesktopMonitor mon -- [Symantec_CMDB2].[dbo].[ResourceAssociation]
left join ResourceAssociation RM on mon._ResourceGuid = RM.ParentResourceGuid
left join vComputer comp on RM.ChildResourceGuid = comp.Guid
left join vAsset on comp.Guid = vAsset._ResourceGuid
or
FROM vHWDesktopMonitor mon -- [Symantec_CMDB2].[dbo].[ResourceAssociation]
left join ResourceAssociation RM on mon._ResourceGuid = RM.ParentResourceGuid
left join (select [list fields here] from vComputer comp
join vAsset on comp.Guid = vAsset._ResourceGuid) comp2
on RM.ChildResourceGuid = comp2.Guid
this should get you all the records from vHWDesktopMonitor and teh asscoiated records from ResourceAssociation with nulls for any records in vHWDesktopMonitor but not in ResourceAssociation. Then you get al teh records in vComputer that are also in ResourceAssociation. Finally you get al teh records in vAsset that are in vComputer. as alawys when you are getting all the records in teh first table, tehere will be nulls inteh fileds from other tables if you do not have an associated record.
If this doesn't work, perhaps you need to show us some sample data and expected results.
I have the following tables with the following attributes:
Op(OpNo, OpName, Date)
OpConvert(OpNo, M_OpNo, Source_ID, Date)
Source(Source_ID, Source_Name, Date)
Fleet(OpNo, S_No, Date)
I have the current multiple JOIN query which gives me the results that I want:
SELECT O.OpNo AS Op_NO, O.OpName, O.Date AS Date_Entered, C.*
FROM Op O
LEFT OUTER JOIN OpConvert C
ON O.OpNo = C.OpNo
LEFT OUTER JOIN Source D
ON C.Source_ID = D.Source_ID
WHERE C.OpNo IS NOT NULL
The problem is this. I need to join the Fleet table on the previous multiple JOIN statement to attach the relevant S_No to the multiple JOIN table. Would I still be able to accomplish this using a LEFT OUTER JOIN or would I have to use a different JOIN statement? Also, which table would I JOIN on?
Please note that I am only familiar with LEFT OUTER JOINS.
Thanks.
I guess in your case you could use INNER JOIN or LEFT JOIN (which is the same thing as LEFT OUTER JOIN in SQL Server.
INNER JOIN means that it will only return records from other tables only if there are corresponding records (based on the join condition) in the Fleet table.
LEFT JOIN means that it will return records from other tables even if there are no corresponding records (based on the join condition) in the Fleet table. All columns from Fleet will return NULL in this case.
As for which table to join, you should really join the table that makes more logical sense based on your data structure.
Yes, you can use all tables mentioned before in your join conditions. Actually, JOINS (no matter of INNER, LEFT OUTER, RIGHT OUTER, CROSS, FULL OUTER or whatever) are left- associative, i. e. they are implicitly evaluated as if they would have been included in parentheses from the left as follows:
FROM ( ( ( Op O
LEFT OUTER JOIN OpConvert C
ON O.OpNo = C.OpNo
)
LEFT OUTER JOIN Source D
ON C.Source_ID = D.Source_ID
)
LEFT OUTER JOIN Fleet
ON ...
)
This is similar to how + or - would implicitly use parentheses, i. e.
2 + 3 - 4 - 5
is evaluated as
(((2 + 3) - 4) - 5)
By the way: If you use C.OpNo IS NOT NULL, then the LEFT OUTER JOIN Source D is treated as if it were an INNER JOIN, as you are explicitly removing all the "OUTER" rows.
I have three tables: R, S and P.
Table R Joins with S through a foreign key; there should be at least one record in S, so I can JOIN:
SELECT
*
FROM
R
JOIN S ON (S.id = R.fks)
If there's no record in S then I get no rows, that's fine.
Then table S joins with P, where records is P may or may not be present and joined with S.
So I do
SELECT
*
FROM
R
JOIN S ON (S.id = R.fks)
LEFT JOIN P ON (P.id = S.fkp)
What if I wanted the second JOIN to be tied to S not to R, like if I could use parentheses:
SELECT
*
FROM
R
JOIN (S ON (S.id = R.fks) JOIN P ON (P.id = S.fkp))
Or is that already a natural behaviour of the cartesian product between R, S and P?
All kinds of outer and normal joins are in the same precedence class and operators take effect left-to-right at a given nesting level of the query. You can put the join expression on the right side in parentheses to cause it to take effect first. Remember that you will have to move the ON clauses around so that they stay with their joins—the join in parentheses takes its ON clause with it into the parentheses, so it now comes textually before the other ON clause which will be after the parentheses in the outer join statement.
(PostgreSQL example)
In
SELECT * FROM a LEFT JOIN b ON (a.id = b.id) JOIN c ON (b.ref = c.id);
the a-b join takes effect first, but we can force the b-c join to take effect first by putting it in parentheses, which looks like:
SELECT * FROM a LEFT JOIN (b JOIN c ON (b.ref = c.id)) ON (a.id = b.id);
Often you can express the same thing without extra parentheses by moving the joins around and changing the direction of the outer joins, e.g.
SELECT * FROM b JOIN c ON (b.ref = c.id) RIGHT JOIN a ON (a.id = b.id);
When you join the third table, your first query
SELECT
*
FROM
R
JOIN S ON (S.id = R.fks)
is like a derived table to which you're joining the third table. So if R JOIN S produces no rows, then joining P will never yield any rows (because you're trying to join to an empty table).
So, if you're looking for precedence rules then in this case it's just set by using LEFT JOIN as opposed to JOIN.
However, I may be misunderstanding your question, because if I were writing the query, I would swap S and R around. eg.
SELECT
*
FROM
S
JOIN R ON (S.id = R.fks)
The second join is tied to S as you explicity state JOIN P ON (P.id = S.fkp) - no column from R is referenced in the join.
with a as (select 1 as test union select 2)
select * from a left join
a as b on a.test=b.test and b.test=1 inner join
a as c on b.test=c.test
go
with a as (select 1 as test union select 2)
select * from a inner join
a as b on a.test=b.test right join
a as c on b.test=c.test and b.test=1
Ideally, we would hope that the above two queries are the same. However, they are not - so anybody that says a right join can be replaced with a left join in all cases is wrong. Only by using the right join can we get the required result.