Query with combined WHERE clause is slower than two individual WHERE clauses - sql

I'm having a performance problem with a SQL query that is generated by a .NET application.
Basically what the query is doing is:
(query1) left join (query2) right join (queries3 to 30) WHERE (query1.ID IS NULL) OR (query3.ID IS NULL AND query4.ID IS NULL AND… queryN.ID IS NULL)
When the query only does WHERE A (query1.ID) the query is fast.
When the query only does WHERE B (query3 to 30) the query is fast
When A and B are a combined WHERE clause with an OR, the query is
very slow.
I'm looking for a way to optimize this query without variables or stored procedures.
The query:
SELECT DISTINCT [Table0].[FIELD]
FROM /*8*/ ([Table0] AS [Table0]
INNER JOIN
[XTABLE] AS [XTABLE0]
ON [Table0].ID = [XTABLE0].ID1
AND [XTABLE0].ID3 = 52)
RIGHT OUTER /*10*/ JOIN
[Table1] AS [Table1]
/*21*/ /*11*/ ON [XTABLE0].ID2 = [Table1].ID
AND [XTABLE0].ID3 = 52
LEFT OUTER JOIN
([XTABLE] AS [XTABLE1]
INNER JOIN
[Table2] AS [Table2]
ON [XTABLE1].ID1 = [Table2].ID
AND [XTABLE1].ID3 = 19
/*20a*/ INNER JOIN
[XTABLE] AS [XTABLE2]
ON [Table2].ID = [XTABLE2].ID1
AND [XTABLE2].ID3 = 8
INNER JOIN
[Table3] AS [Table3]
ON [XTABLE2].ID2 = [Table3].ID
AND [XTABLE2].ID3 = 8/*22*/ )
ON [Table1].ID = [XTABLE1].ID2
AND [XTABLE1].ID3 = 19
/*26 */ LEFT OUTER JOIN
([XTABLE] AS [XTABLE3]
... and tens of similar INNER JOIN blocks
WHERE (/*13*/ [XTABLE0].ID IS NULL)
OR (/*25*/ [XTABLE1].ID IS NULL
AND /*27b*/ [XTABLE3].ID IS NULL
AND /*27b*/ [XTABLE5].ID IS NULL
... and tens of similar lines
AND /*27b*/ [XTABLE131].ID IS NULL);

You are OUTER JOIN'ing the queries, so, when you start putting stuff in the WHERE clause from the result of the OUTER JOIN table expressions (derived table in this case) then it will more than likely be treat as an INNER JOIN - you can see that by checking the query plan.

Related

LEFT OUTER JOIN with isnull

This is a pretty straightforward LEFT OUTER JOIN, but I get zero returned rows:
SELECT All_varieties.*,variety_join.*
FROM All_varieties LEFT JOIN
variety_join
ON All_varieties.varietyID = variety_join.varietyID
WHERE (variety_join.sourceID = Null);
However, I am using a query All_varieties
SELECT plant.plantName, plant.plantGenus, plant.plantSpecies, variety.varietyLatin, variety.varietyEnglish, variety.varietyID
FROM plant LEFT JOIN
variety
ON plant.plantID = variety.plantID;
Is the problem maybe with the query All_varieties?
TABLES:
It should be variety_join.sourceID is null instead of variety_join.sourceID = Null
SELECT All_varieties.*,variety_join.*
FROM All_varieties LEFT JOIN
variety_join
ON All_varieties.varietyID = variety_join.varietyID
WHERE variety_join.sourceID is Null

Simplifying where clause when a column is mutual for the tables at from clause

I would like to learn if there is any more efficient way to write the query below:
SELECT *
FROM requests srp
INNER JOIN surgeons rpsur
ON rpsur.id = srp.surgeon_id
LEFT OUTER JOIN #usersurgeons usersurgeons
ON usersurgeons.surgeon_id = srp.surgeon_id
LEFT OUTER JOIN surgeons LOsurgeons
ON usersurgeons.surgeon_id = LOsurgeons.id
LEFT OUTER JOIN provsurgeons LOprovsurgeons
ON LOprovsurgeons.id = LOsurgeons.provsurgeon_id
INNER JOIN #selectedsurgeons up
ON up.surgeon_id = rpsur.id
LEFT OUTER JOIN provsurgeons ps
ON ps.id = rpsur.provsurgeon_id
WHERE rpsur.isprimary = 0
AND usersurgeons.isprimary = 0
AND LOsurgeons.isprimary = 0
AND LOprovsurgeons.isprimary = 0
AND up.isprimary = 0
AND ps.isprimary = 0
I am not happy with the where clause here, is there any more professional way to write this, rather than adding the clauses to the join lines (such as on xx.id = yy.id and xx.isPrimary=0)??
From this query alone there are not many things that can be said. You should consider adding some more context (how do you get data into those temporary tables and the structure of %surgeons tables):
1) Select * makes almost impossible to use any index and also provides a lot of columns (Requests.*, surgeons.*, Provsurgeons.* etc.) in your final result. Return only the columns that you need.
2) If isPrimary = 0 filtering is performed often in your queries (not just this one), you can consider creating a view that fetches data filtered by isPrimary = 0. E.g. vwSurgeons, vwProvsurgeons. Then, you can just JOIN directly to the view instead of the table.
3) [already mentioned in the comments] Any condition that excludes NULL values for the OUTER JOINed table will transform the OUTER into INNER.
Instead of joining all tables and having a where clause at the end, use a derived tables only with filtered records. This way your query performance will be better.
SELECT *
FROM requests srp
INNER JOIN surgeons rpsur
ON rpsur.id = srp.surgeon_id
LEFT OUTER JOIN
(
SELECT *
FROM #usersurgeons
WHERE isprimary = 0
)usersurgeons
ON usersurgeons.surgeon_id = srp.surgeon_id
LEFT OUTER JOIN
(
SELECT *
FROM surgeons
WHERE isprimary = 0
)LOsurgeons
ON usersurgeons.surgeon_id = LOsurgeons.id
LEFT OUTER JOIN
(
SELECT *
FROM provsurgeons
WHERE isprimary = 0
)LOprovsurgeons
ON LOprovsurgeons.id = LOsurgeons.provsurgeon_id
INNER JOIN
(
SELECT *
FROM #selectedsurgeons
WHERE isprimary = 0
)up
ON up.surgeon_id = rpsur.id
LEFT OUTER JOIN
(
SELECT *
FROM provsurgeons
WHERE isprimary = 0
) ps
ON ps.id = rpsur.provsurgeon_id
WHERE rpsur.isprimary = 0

LEFT JOIN ON COALESCE(a, b, c) - very strange behavior

I have encountered very strange behavior of my query and I wasted a lot of time to understand what causes it, in vane. So I am asking for your help.
SELECT count(*) FROM main_table
LEFT JOIN front_table ON front_table.pk = main_table.fk_front_table
LEFT JOIN info_table ON info_table.pk = front_table.fk_info_table
LEFT JOIN key_table ON key_table.pk = COALESCE(info_table.fk_key_table, front_table.fk_key_table_1, front_table.fk_key_table_2)
LEFT JOIN side_table ON side_table.fk_front_table = front_table.pk
WHERE side_table.pk = (SELECT MAX(pk) FROM side_table WHERE fk_front_table = front_table.pk)
OR side_table.pk IS NULL
Seems like a simple join query, with coalesce, I've used this technique before(not too many times) and it worked right.
In this query I don't ever get nulls for side_table.pk. If I remove coalesce or just don't use key_table, then the query returns rows with many null side_table.pk, but if I add coalesce join, I can't get those nulls.
It seems key_table and side_table don't have anything in common, but the result is so weird.
Also, when I don't use side_table and WHERE clause, the count(*) result with coalesce and without differs, but I can't see any pattern in rows missing, it seems random!
Real query:
SELECT ECHANGE.EXC_AUTO_KEY, STOCK_RESERVATIONS.STR_AUTO_KEY FROM EXCHANGE
LEFT JOIN WO_BOM ON WO_BOM.WOB_AUTO_KEY = EXCHANGE.WOB_AUTO_KEY
LEFT JOIN VIEW_WO_SUB ON VIEW_WO_SUB.WOO_AUTO_KEY = WO_BOM.WOO_AUTO_KEY
LEFT JOIN STOCK stock3 ON stock3.STM_AUTO_KEY = EXCHANGE.STM_AUTO_KEY
LEFT JOIN STOCK stock2 ON stock2.STM_AUTO_KEY = EXCHANGE.ORIG_STM
LEFT JOIN CONSIGNMENT_CODES con2 ON con2.CNC_AUTO_KEY = stock2.CNC_AUTO_KEY
LEFT JOIN CONSIGNMENT_CODES con3 ON con3.CNC_AUTO_KEY = stock3.CNC_AUTO_KEY
LEFT JOIN CI_UTL ON CI_UTL.CUT_AUTO_KEY = EXCHANGE.CUT_AUTO_KEY
LEFT JOIN PART_CONDITION_CODES pcc2 ON pcc2.PCC_AUTO_KEY = stock2.PCC_AUTO_KEY
LEFT JOIN PART_CONDITION_CODES pcc3 ON pcc3.PCC_AUTO_KEY = stock3.PCC_AUTO_KEY
LEFT JOIN STOCK_RESERVATIONS ON STOCK_RESERVATIONS.STM_AUTO_KEY = stock3.STM_AUTO_KEY
LEFT JOIN WAREHOUSE wh2 ON wh2.WHS_AUTO_KEY = stock2.WHS_ORIGINAL
LEFT JOIN SM_HISTORY ON (SM_HISTORY.STM_AUTO_KEY = EXCHANGE.ORIG_STM AND SM_HISTORY.WOB_REF = EXCHANGE.WOB_AUTO_KEY)
LEFT JOIN RC_DETAIL ON stock3.RCD_AUTO_KEY = RC_DETAIL.RCD_AUTO_KEY
LEFT JOIN RC_HEADER ON RC_HEADER.RCH_AUTO_KEY = RC_DETAIL.RCH_AUTO_KEY
LEFT JOIN WAREHOUSE wh3 ON wh3.WHS_AUTO_KEY = COALESCE(RC_DETAIL.WHS_AUTO_KEY, stock3.WHS_ORIGINAL, stock3.WHS_AUTO_KEY)
WHERE STOCK_RESERVATIONS.STR_AUTO_KEY = (SELECT MAX(STR_AUTO_KEY) FROM STOCK_RESERVATIONS WHERE STM_AUTO_KEY = stock3.STM_AUTO_KEY)
OR STOCK_RESERVATIONS.STR_AUTO_KEY IS NULL
Removing LEFT JOIN WAREHOUSE wh3 gives me about unique EXC_AUTO_KEY values with a lot of NULL STR_AUTO_KEY, while leaving this row removes all NULL STR_AUTO_KEY.
I recreated simple tables with numbers with the same structure and query works without any problems o.0
I have a feeling COALESCE is acting as a REQUIRED flag for the joined table, hence shooting the LEFT JOIN to become an INNER JOIN.
Try this:
SELECT COUNT(*)
FROM main_table
LEFT JOIN front_table ON front_table.pk = main_table.fk_front_table
LEFT JOIN info_table ON info_table.pk = front_table.fk_info_table
LEFT JOIN key_table ON key_table.pk = NVL(info_table.fk_key_table, NVL(front_table.fk_key_table_1, front_table.fk_key_table_2))
LEFT JOIN (SELECT fk_, MAX(pk) as pk FROM side_table GROUP BY fk_) st ON st.fk_ = front_table.pk
NVL might behave just the same though...
I undertood what was the problem (not entirely though): there is a LEFT JOIN VIEW_WO_SUB in original query, 3rd line. It causes this query to act in a weird way.
When I replaced the view with the other table which contained the information I needed, the query started returning right results.
Basically, with this view join, NVL, COALESCE or CASE join with combination of certain arguments did not work along with OR clause in WHERE subquery, all rest was fine. ALthough, I could get the query to work with this view join, by changing the order of joined tables, I had to place table participating in where subquery to the bottom.

How to improve the performance of a SQL query even after adding indexes?

I am trying to execute the following sql query but it takes 22 seconds to execute. the number of returned items is 554192. I need to make this faster and have already put indexes in all the tables involved.
SELECT mc.name AS MediaName,
lcc.name AS Country,
i.overridedate AS Date,
oi.rating,
bl1.firstname + ' ' + bl1.surname AS Byline,
b.id BatchNo,
i.numinbatch ItemNumberInBatch,
bah.changedatutc AS BatchDate,
pri.code AS IssueNo,
pri.name AS Issue,
lm.neptunemessageid AS MessageNo,
lmt.name AS MessageType,
bl2.firstname + ' ' + bl2.surname AS SourceFullName,
lst.name AS SourceTypeDesc
FROM profiles P
INNER JOIN profileresults PR
ON P.id = PR.profileid
INNER JOIN items i
ON PR.itemid = I.id
INNER JOIN batches b
ON b.id = i.batchid
INNER JOIN itemorganisations oi
ON i.id = oi.itemid
INNER JOIN lookup_mediachannels mc
ON i.mediachannelid = mc.id
LEFT OUTER JOIN lookup_cities lc
ON lc.id = mc.cityid
LEFT OUTER JOIN lookup_countries lcc
ON lcc.id = mc.countryid
LEFT OUTER JOIN itembylines ib
ON ib.itemid = i.id
LEFT OUTER JOIN bylines bl1
ON bl1.id = ib.bylineid
LEFT OUTER JOIN batchactionhistory bah
ON b.id = bah.batchid
INNER JOIN itemorganisationissues ioi
ON ioi.itemorganisationid = oi.id
INNER JOIN projectissues pri
ON pri.id = ioi.issueid
LEFT OUTER JOIN itemorganisationmessages iom
ON iom.itemorganisationid = oi.id
LEFT OUTER JOIN lookup_messages lm
ON iom.messageid = lm.id
LEFT OUTER JOIN lookup_messagetypes lmt
ON lmt.id = lm.messagetypeid
LEFT OUTER JOIN itemorganisationsources ios
ON ios.itemorganisationid = oi.id
LEFT OUTER JOIN bylines bl2
ON bl2.id = ios.bylineid
LEFT OUTER JOIN lookup_sourcetypes lst
ON lst.id = ios.sourcetypeid
WHERE p.id = #profileID
AND b.statusid IN ( 6, 7 )
AND bah.batchactionid = 6
AND i.statusid = 2
AND i.isrelevant = 1
when looking at the execution plan I can see an step which is costing 42%. Is there any way I could get this to a lower threshold or any way that I can improve the performance of the whole query.
Remove the profiles table as it is not needed and change the WHERE clause to
WHERE PR.profileid = #profileID
You have a left outer join on the batchactionhistory table but also have a condition in your WHERE clause which turns it back into an inner join. Change you code to this:
LEFT OUTER JOIN batchactionhistory bah
ON b.id = bah.batchid
AND bah.batchactionid = 6
You don't need the batches table as it is used to join other tables which could be joined directly and to show the id in you SELECT which is also available in other tables. Make the following changes:
i.batchidid AS BatchNo,
LEFT OUTER JOIN batchactionhistory bah
ON i.batchidid = bah.batchid
Are any of the fields that are used in joins or the WHERE clause from tables that contain large amounts of data but are not indexed. If so try adding an index on at time to the largest table.
Do you need every field in the result - if you could loose one or to you maybe could reduce the number of tables further.
First, if this is not a stored procedure, make it one. That's a lot of text for sql server to complile.
Next, my experience is that "worst practices" are occasionally a good idea. Specifically, I have been able to improve performance by splitting large queries into a couple or three small ones and assembling the results.
If this query is associated with a .net, coldfusion, java, etc application, you might be able to do the split/re-assemble in your application code. If not, a temporary table might come in handy.

How order of joins affect performance of a query

I'm experiencing big differences in timeperformance in my query, and it seems the order of which the joins (inner and left outer) occur in the query makes all the difference.
Are there some "ground rules" in what order joins should be in?
Both of them are part of a bigger query.
The difference between them is that the left join is placed last in the faster query.
Slow query: (> 10 minutes)
SELECT [t0].[Ref], [t1].[Key], [t1].[Name],
(CASE
WHEN [t3].[test] IS NULL THEN CONVERT(NVarChar(250),#p0)
ELSE CONVERT(NVarChar(250),[t3].[Key])
END) AS [value],
(CASE
WHEN 0 = 1 THEN CONVERT(NVarChar(250),#p1)
ELSE CONVERT(NVarChar(250),[t4].[Key])
END) AS [value2]
FROM [dbo].[tblA] AS [t0]
INNER JOIN [dbo].[tblB] AS [t1] ON [t0].[RefB] = [t1].[Ref]
LEFT OUTER JOIN (
SELECT 1 AS [test], [t2].[Ref], [t2].[Key]
FROM [dbo].[tblC] AS [t2]
) AS [t3] ON [t0].[RefC] = ([t3].[Ref])
INNER JOIN [dbo].[tblD] AS [t4] ON [t0].[RefD] = ([t4].[Ref])
Faster query: (~ 30 seconds)
SELECT [t0].[Ref], [t1].[Key], [t1].[Name],
(CASE
WHEN [t3].[test] IS NULL THEN CONVERT(NVarChar(250),#p0)
ELSE CONVERT(NVarChar(250),[t3].[Key])
END) AS [value],
(CASE
WHEN 0 = 1 THEN CONVERT(NVarChar(250),#p1)
ELSE CONVERT(NVarChar(250),[t4].[Key])
END) AS [value2]
FROM [dbo].[tblA] AS [t0]
INNER JOIN [dbo].[tblB] AS [t1] ON [t0].[RefB] = [t1].[Ref]
INNER JOIN [dbo].[tblD] AS [t4] ON [t0].[RefD] = ([t4].[Ref])
LEFT OUTER JOIN (
SELECT 1 AS [test], [t2].[Ref], [t2].[Key]
FROM [dbo].[tblC] AS [t2]
) AS [t3] ON [t0].[RefC] = ([t3].[Ref])
Generally INNER JOIN order won't matter because inner joins are commutative and associative. In both cases, you still have t0 inner join t4 so should make no difference.
Re-phrasing that, SQL is declarative: you say "what you want", not "how". The optimiser works the "how" and will re-order JOINs as needed, looking as WHEREs etc too in practice.
In complex queries, a cost based query optimiser won't exhaust all permutation so it could matter occasionally.
So, I'd check for these:
You said these are part of a bigger query, so this section matters less because the whole query matters.
Complexity can be hidden using views too if any of the tables are actually views
Is this repeatable, no matter what order code runs in?
What are the query plan differences?
See some other SO questions:
how to best organize the Inner Joins in (select) statement
SQL Server 2005 - Order of Inner Joins
If u have more than 2 tables it is important to order table joins. It can make big differences. First table should get a leading hint. First table is that object with most selective rows. For example: If u have a member table with 1.000.000 people and you only want to select female gender and it is first table, so you only join 500.000 records to next table. If this table is at the end of join order (maybe table 4,5 or 6) then each record (worst case 1.000.000) will be joined. This includes inner and outer joins.
The Rule: Start with most selective table, then join next logical most selective table.
Converting functions and beautifying should do last. Sometimes it is better to
bundle the shole SQL in brackets and use expressions and functions in outer select statements.
In the case of left join it impact a lot the performance. i was having a problem in a select query that was like that :
select distinct count(p0_.id) over () as col_0_0_,
p0_.id as col_1_0_,
p0_.lp as col_2_0_,
0
as col_3_0_,
max(coalesce(i6_.cft, i7_.rfo,
'')) as col_4_0_,
p0_.pdv as col_5_0_,
(s8_.qer)
as col_6_0_,
cf1_.ests as col_7_0_
from Produit p0_
left outer join CF cf1_ on p0_.fk_cf = cf1_.id
left outer join CA c2_ on cf1_.fk_ca = c2_.id
left outer join ml mt on c2_.fk_m = mt.id
left outer join sk s8_ on p0_.id = s8_.fk_p
left outer join rf r5_ on
rp4_.fk_r = r5_.id
left outer join
in i6_ on r5_.fk_ireftc = i6_.id
left outer join r_p_r rp4_ on p0_.id = rp4_.fk_p
left outer join
ir i7_ on r5_.fk_if = i7_.id
left outer join re_p_g gc9_ on p0_.id = gc9_.fk_p
left outer join gc g10_ on gc9_.fk_g = g10_.id
where
and (p0_.lC is null or p0_.lS = 'E')
and g10_.id is null or g10_.id
and r5_.fk_i is null
group by col_1_0_, col_2_0_, col_3_0_, col_5_0_, col_6_0_, col_7_0_
order by col_2_0_ asc, p0_.id
limit 10;
the query takes 13 to 15 seconde to execute, when i change the order its takes 1 to 2 seconde.
select distinct count(p0_.id) over () as col_0_0_,
p0_.id as col_1_0_,
p0_.lp as col_2_0_,
0
as col_3_0_,
max(coalesce(i6_.cft, i7_.rfo,
'')) as col_4_0_,
p0_.pdv as col_5_0_,
(s8_.qer)
as col_6_0_,
cf1_.ests as col_7_0_
from Produit p0_
left outer join CF cf1_ on p0_.fk_cf = cf1_.id
left outer join sk s8_ on p0_.id = s8_.fk_p
left outer join r_p_r rp4_ on p0_.id = rp4_.fk_p
left outer join re_p_g gc9_ on p0_.id = gc9_.fk_p
left outer join CA c2_ on cf1_.fk_ca = c2_.id
left outer join ml mt on c2_.fk_m = mt.id
left outer join rf r5_ on
rp4_.fk_r = r5_.id
left outer join
in i6_ on r5_.fk_ireftc = i6_.id
left outer join
ir i7_ on r5_.fk_if = i7_.id
left outer join gc g10_ on gc9_.fk_g = g10_.id
where
and (p0_.lC is null or p0_.lS = 'E')
and(g10_.id is null
and r5_.fk_i is null
group by col_1_0_, col_2_0_, col_3_0_, col_5_0_, col_6_0_, col_7_0_
order by col_2_0_ asc, p0_.id
limit 10;
in my case i change the order in case when i load a table i use all the join that use this table in the join that follow and not to load it in another block. like in my p0_ table i made all the left join in the first 4 lines not like in the first code.
PS: to test my perf in postgre i use this website: http://tatiyants.com/pev/#/plans/new
At least in SQLite, I found out that it makes a huge difference. Actually it didn't need to be a very complex query for the difference to show itself. My JOIN statements were inside an embedded clause however.
Basically, you should start with the most specific limitations first, as Christian has pointed out.