How to improve the performance of a 10 min running query? - sql

I have a query which is taking approximately 10 mins to execute and produce the results. When I try to break it into parts and run it, it seems to run fine, within seconds.
I tried to modify the subselect of the top and the bottom portions of the query and determine if that was causing the issue, but it was not. It gave out some results within 3 seconds.
I am trying to learn to read the Estimated Execution plan, but it is becoming more confusing and hard for me to trace to the issue.
Can anyone please point out some mistakes which I made that is making the query for long?
Select Distinct
PostExtended.BatchNum,
post.ControlNumStatus,
post.AccountSeg,
Post.PostDat
From
Post
Post Records
join (Select Post, MAX(Dist) as Dist, COUNT(fkglDist) as RecordCount From PostExtend WITH (NOLOCK) Group By flPost) as PostExtender on Post.PK = PostExtender.flPost
join glPostExtended WITH (NOLOCK) on glPostExtendedLimiter.Post = glPostExtended.Post and (PostExtendedLimiter.fkglDist = PostExtend.Dist or PostExtend.Dist is null)
join (select lP.fkosControlNumberStatus, lP.SourceJENumber, AccountSegment,
sum(case
............
from Post WITH (NOLOCK)
join AccountingPeriod WITH (NOLOCK) on AccountingPeriod.pk = lP.fkglAccountingPeriod
join FiscalYear WITH (NOLOCK) on FiscalYear.pk = AccountingPeriod.FiscalYear
join Account WITH (NOLOCK) on Account.pk = FiscalYear.Account
where FiscalYear.Period = #Date
and glP.fkMLSosCodeEntryType = 2202
group by glP.fkosControlNumberStatus, glP.SourceNumber, AccountSeg) post on post.ControlNumStatus = Post.fkControlNumberStatus and postdata.SourceJENumber = glPost.SourceJENumber
where post.AmountT <> 0)......
Group by

The subqueries are very often the point of problems.
I would try to:
separate the postdata subquery from the main query,
save the result in a temporary table or even in a table variable,
put clustered index on fkosControlNumberStatus and SourceJENumber fields,
join this temporary table back to the main query.
Sometimes the result of these simple actions pleasantly surprises.

This is a fairly complex query. You are joining on Aggregate Queries (with GROUP BY).
The first thing I would do is see how long it takes to run each of the join queries. One of these may run very fast, while another may run very long. So, you may not really need to optimize the entire query--just one of the joined queries.
Another way to do it is just start eliminating joins one by one, then run the entire query and see how fast it goes. When you have a really significant decrease in time, you've found the error.
Typically, one thing that can add a lot of CPU is comparisons. The sums with case statements might be the biggest suspect.
Have you used the Database Engine Tuning Adviser? If all else fails, go with that and see what it tells you.
So, maybe try this approach:
Take away the CASE Statements inside the SUM expressions on that last join.
Remove the last JOIN with all the sums.
Remove the first join with that GROUP BY and the MAX expression
That would be my strategy.

Related

MariaDB SQL sentence: force order of tables for optimizer

I have a rather complex query that takes some 5 seconds to run.
EDITED: complete code for better understanding:
SELECT
AGEID, AGECLIID, CLINombre, CLIEstado,AGEOnline, AGEOnlineCancelacion,
IF(CANID>0,'Y','N') AS AGECancelado, ACTIOrden, ACTIColor, ACTINombre,
ACTICapacidad, 'Y' AS slotactivo
FROM
( agenda JOIN cliente on AGECLIID = CLIID AND CLICentro = 'Madrid'
JOIN actividad on ACTIID = AGEACTIID
LEFT JOIN agcambio on CAMAGEID = AGEID
LEFT JOIN agcancelacion on CANAGEID = AGEID
) RIGHT JOIN horario on DIAOrden = COALESCE( CAMDia, AGEDia )
ORDER BY HORHora, MINMinuto, DIAOrden
NOTE: Originally, it was "SELECT FROM horario LEFT JOIN all_the_rest of the query".
This produces the following explain plan:
NOTE: I tried to upload a picture here, but I cannot do it, it just creates a link to the picture.
explain plan with right join
It takes like 5 seconds to execute.
The key here is that the part of the query that retrieves busy slots executed alone takes like nothing. In timeslots (horario) table there are 336 rows (all possible time slots) so it should not take that long to perform the RIGHT JOIN.
The explain plan says that first it does an ALL access to timeslots, and after that all the other tables are accessed via INDEX.
So I wanted to change the access order to tables, forcing to access timeslots at the end. I changed the order in the FROM clause changing the JOIN to RIGHT, but the explain plan says the same (and the time to run the query is still 5 sec).
Testing things, I have tried to do a STRAIGHT_JOIN to force the table access order and it works, cause it took less than a second to return the rows (probably because it is a join, but it is still significantly faster). But I cannot use STRAIGHT_JOIN cause I need the RIGHT JOIN, and I've read that there is no way to do STRAIGHT in an OUTER JOIN.
The explain plan of the execution with straight join is the following:
explain plan with straight_join
I have tried to create a view but the result was the same.
So,
a) can I force somehow the optimizer to access the timeslots table the last one?
b) but the question is why it is taking so long, if there are only 336 rows to do a RIGHT JOIN with (150 rows of busy slots).... Maybe this could help me to rewrite the query.
Many thanks in advance for any hint... Meanwhile I'm rewriting the query again and again.... :)
Xavi.

Small vs Large and Large vs Small sql joins [duplicate]

I was just tidying up some sql when I came across this query:
SELECT
jm.IMEI ,
jm.MaxSpeedKM ,
jm.MaxAccel ,
jm.MaxDeccel ,
jm.JourneyMaxLeft ,
jm.JourneyMaxRight ,
jm.DistanceKM ,
jm.IdleTimeSeconds ,
jm.WebUserJourneyId ,
jm.lifetime_odo_metres ,
jm.[Descriptor]
FROM dbo.Reporting_WebUsers AS wu WITH (NOLOCK)
INNER JOIN dbo.Reporting_JourneyMaster90 AS jm WITH (NOLOCK) ON wu.WebUsersId = jm.WebUsersId
INNER JOIN dbo.Reporting_Journeys AS j WITH (NOLOCK) ON jm.WebUserJourneyId = j.WebUserJourneyId
WHERE ( wu.isActive = 1 )
AND ( j.JourneyDuration > 2 )
AND ( j.JourneyDuration < 1000 )
AND ( j.JourneyDistance > 0 )
My question is does it make any performance difference the order of the joins as for the above query I would have done
FROM dbo.Reporting_JourneyMaster90 AS jm
and then joined the other 2 tables to that one
Join order in SQL2008R2 server does unquestionably affect query performance, particularly in queries where there are a large number of table joins with where clauses applied against multiple tables.
Although the join order is changed in optimisation, the optimiser does't try all possible join orders. It stops when it finds what it considers a workable solution as the very act of optimisation uses precious resources.
We have seen queries that were performing like dogs (1min + execution time) come down to sub second performance just by changing the order of the join expressions. Please note however that these are queries with 12 to 20 joins and where clauses on several of the tables.
The trick is to set your order to help the query optimiser figure out what makes sense. You can use Force Order but that can be too rigid. Try to make sure that your join order starts with the tables where the will reduce data most through where clauses.
No, the JOIN by order is changed during optimization.
The only caveat is the Option FORCE ORDER which will force joins to happen in the exact order you have them specified.
I have a clear example of inner join affecting performance. It is a simple join between two tables. One had 50+ million records, the other has 2,000. If I select from the smaller table and join the larger it takes 5+ minutes.
If I select from the larger table and join the smaller it takes 2 min 30 seconds.
This is with SQL Server 2012.
To me this is counter intuitive since I am using the largest dataset for the initial query.
Usually not. I'm not 100% this applies verbatim to Sql-Server, but in Postgres the query planner reserves the right to reorder the inner joins as it sees fit. The exception is when you reach a threshold beyond which it's too expensive to investigate changing their order.
JOIN order doesn't matter, the query engine will reorganize their order based on statistics for indexes and other stuff.
For test do the following:
select show actual execution plan and run first query
change JOIN order and now run the query again
compare execution plans
They should be identical as the query engine will reorganize them according to other factors.
As commented on other asnwer, you could use OPTION (FORCE ORDER) to use exactly the order you want but maybe it would not be the most efficient one.
AS a general rule of thumb, JOIN order should be with table of least records on top, and most records last, as some DBMS engines the order can make a difference, as well as if the FORCE ORDER command was used to help limit the results.
Wrong. SQL Server 2005 it definitely matters since you are limiting the dataset from the beginning of the FROM clause. If you start with 2000 records instead of 2 million it makes your query faster.

Does Sql JOIN order affect performance?

I was just tidying up some sql when I came across this query:
SELECT
jm.IMEI ,
jm.MaxSpeedKM ,
jm.MaxAccel ,
jm.MaxDeccel ,
jm.JourneyMaxLeft ,
jm.JourneyMaxRight ,
jm.DistanceKM ,
jm.IdleTimeSeconds ,
jm.WebUserJourneyId ,
jm.lifetime_odo_metres ,
jm.[Descriptor]
FROM dbo.Reporting_WebUsers AS wu WITH (NOLOCK)
INNER JOIN dbo.Reporting_JourneyMaster90 AS jm WITH (NOLOCK) ON wu.WebUsersId = jm.WebUsersId
INNER JOIN dbo.Reporting_Journeys AS j WITH (NOLOCK) ON jm.WebUserJourneyId = j.WebUserJourneyId
WHERE ( wu.isActive = 1 )
AND ( j.JourneyDuration > 2 )
AND ( j.JourneyDuration < 1000 )
AND ( j.JourneyDistance > 0 )
My question is does it make any performance difference the order of the joins as for the above query I would have done
FROM dbo.Reporting_JourneyMaster90 AS jm
and then joined the other 2 tables to that one
Join order in SQL2008R2 server does unquestionably affect query performance, particularly in queries where there are a large number of table joins with where clauses applied against multiple tables.
Although the join order is changed in optimisation, the optimiser does't try all possible join orders. It stops when it finds what it considers a workable solution as the very act of optimisation uses precious resources.
We have seen queries that were performing like dogs (1min + execution time) come down to sub second performance just by changing the order of the join expressions. Please note however that these are queries with 12 to 20 joins and where clauses on several of the tables.
The trick is to set your order to help the query optimiser figure out what makes sense. You can use Force Order but that can be too rigid. Try to make sure that your join order starts with the tables where the will reduce data most through where clauses.
No, the JOIN by order is changed during optimization.
The only caveat is the Option FORCE ORDER which will force joins to happen in the exact order you have them specified.
I have a clear example of inner join affecting performance. It is a simple join between two tables. One had 50+ million records, the other has 2,000. If I select from the smaller table and join the larger it takes 5+ minutes.
If I select from the larger table and join the smaller it takes 2 min 30 seconds.
This is with SQL Server 2012.
To me this is counter intuitive since I am using the largest dataset for the initial query.
Usually not. I'm not 100% this applies verbatim to Sql-Server, but in Postgres the query planner reserves the right to reorder the inner joins as it sees fit. The exception is when you reach a threshold beyond which it's too expensive to investigate changing their order.
JOIN order doesn't matter, the query engine will reorganize their order based on statistics for indexes and other stuff.
For test do the following:
select show actual execution plan and run first query
change JOIN order and now run the query again
compare execution plans
They should be identical as the query engine will reorganize them according to other factors.
As commented on other asnwer, you could use OPTION (FORCE ORDER) to use exactly the order you want but maybe it would not be the most efficient one.
AS a general rule of thumb, JOIN order should be with table of least records on top, and most records last, as some DBMS engines the order can make a difference, as well as if the FORCE ORDER command was used to help limit the results.
Wrong. SQL Server 2005 it definitely matters since you are limiting the dataset from the beginning of the FROM clause. If you start with 2000 records instead of 2 million it makes your query faster.

ORDER BY column that has allow null is slow. Why?

So really my question is WHY this worked.
Anyway, I had this query that does a few inner joins, has a where clause and does an order by on a nvarchar column. If I run the query WITHOUT order by, the query takes less than a second. If I run the query WITH order by, it takes 12 seconds.
Now I had a great idea and changed all the INNER JOINs to LEFT JOINs. And also included the ORDER BY clause. That took less than a second. So I remembered the difference between LEFT JOINs and INNER JOINs. INNER JOINs check for NULL and LEFT JOINs don't. So I went into the table design and unchecked "Allow Nulls". Now I run the query WITH INNER JOINs and a ORDER BY clause and the query takes less than a second. WHY?
From what I understand, the FROM, JOINS, WHERE, then SELECT clauses should run first and return a result set. Then the ORDER BY clause runs at the very end on the resultant record set. Therefore the query should have taken AT MOST a second, yes, even with the column allowing nulls. So why would the query take less than a second WITHOUT the order by clause, but take 12 seconds WITH order by clause? That doesn't make sense to me.
Query below:
SELECT PlanInfo.PlanId, PlanName, COALESCE(tResponsible, '') AS tResponsible, Processor, CustName, TaskCategoryId, MapId, tEnd,
CASE MapId WHEN 9 THEN 1 ELSE 2 END AS sor
FROM PlanInfo INNER JOIN [orders].dbo.BaanOrders_Ext ON PlanInfo.PlanName = [orders].dbo.BaanOrders_Ext.OrderNo
INNER JOIN [orders].dbo.BaanOrders ON PlanInfo.PlanName = [orders].dbo.BaanOrders.OrderNo
INNER JOIN Tasks ON PlanInfo.PlanId = Tasks.PlanId
INNER JOIN EngSchedToTimingMap ON Tasks.CatId = EngSchedToTimingMap.TaskCategoryId
WHERE (MapId = 9 OR MapId = 11 or MapId = 13 or MapId = 15)
AND([orders].dbo.BaanOrders_Ext.Processor = 'metest' OR tResponsible = 'metest')
ORDER BY PlanInfo.PlanId
I would have to guess that it is due to having an index on PlanInfo.PlanId, on which you are sorting.
SQL Server could streamline collection so that it follows the index and build the rest of the columns along that order. When the field is NULLable, the index cannot be used for sorting, because it will not contain the NULL values, which incidentally come first, so it decides to optimize along a different path.
Showing the Execution Plan always helps. Either paste the images of the plans, or just show the text-mode plans, i.e. add the line above the query, then execute it
SET SHOWPLAN_TEXT ON;
<the query>
When you use the ORDER BY clause, you force the database engine to sort the results. This takes some time (especially if the result contains many rows) - thus it is possible that a query that runs 1 second without an ORDER BY clause runs 12 seconds with it. Note that sorting takes at best O(N*log(N)) time where N is the number of rows.
The reason why NULLs are generally slow is the fact that they must be treated specially. Sorting with NULLs adds more complex comparison conditions and slows the sorting down.
If your question is "Why does the ORDER BY clause cause my query to run longer?" the answer is because sorting the results is added to the query execution plan.
If you use the "Show Estimated Query Execution Plan" tool in SQL Server Studio, it will show you exactly what it thinks the SQL Server engine will do.

Aggregating two selects with a group by in SQL is really slow

I am currently working with a query in in MSSQL that looks like:
SELECT
...
FROM
(SELECT
...
)T1
JOIN
(SELECT
...
)T2
GROUP BY
...
The inner selects are relatively fast, but the outer select aggregates the inner selects and takes an incredibly long time to execute, often timing out. Removing the group by makes it run somewhat faster and changing the join to a LEFT OUTER JOIN speeds things up a bit as well.
Why would doing a group by on a select which aggregates two inner selects cause the query to run so slow? Why does an INNER JOIN run slower than a LEFT OUTER JOIN? What can I do to troubleshoot this further?
EDIT: What makes this even more perplexing is the two inner queries are date limited and the overall query only runs slow when looking at date ranges between the start of July and any other day in July, but if the date ranges are anytime before the the July 1 and Today then it runs fine.
Without some more detail of your query its impossible to offer any hints as to what may speed your query up. A possible guess is the two inner queries are blocking access to any indexes which might have been used to perform the join resulting in large scans but there are probably many other possible reasons.
To check where the time is used in the query check the execution plan, there is a detailed explanation here
http://www.sql-server-performance.com/tips/query_execution_plan_analysis_p1.aspx
The basic run down is run the query, and display the execution plan, then look for any large percentages - they are what is slowing your query down.
Try rewriting your query without the nested SELECTs, which are rarely necessary. When using nested SELECTs - except for trivial cases - the inner SELECT resultsets are not indexed, which makes joining them to anything slow.
As Tetraneutron said, post details of your query -- we may help you rewrite it in a straight-through way.
Have you given a join predicate? Ie join table A ON table.ColA = table.ColB. If you don't give a predicate then SQL may be forced to use nested loops, so if you have a lot of rows in that range it would explain a query slow down.
Have a look at the plan in the SQL studio if you have MS Sql Server to play with.
After your t2 statement add a join condition on t1.joinfield = t2.joinfield
The issue was with fragmented data. After the data was defragmented the query started running within reasonable time constraints.
JOIN = Cartesian Product. All columns from both tables will be joined in numerous permutations. It is slow because the inner queries are querying each of the separate tables, but once they hit the join, it becomes a Cartesian product and is more difficult to manage. This would occur at the outer select statement.
Have a look at INNER JOINs as Tetraneutron recommended.