How to diagnose slow/inconsistent SQL Server query? - sql-server-2012

Running Windows Server 2012, Hyper-V, SQL Server 2012 Active/Passive failover cluster w/two 8-processor, 60GB nodes, single instance, 300 databases. This query produces inconsistent results, running anywhere between 10 and 30 seconds.
DECLARE #OrgID BigInt = 780246
DECLARE #ActiveOnly Bit = 0
DECLARE #RestrictToOrgID Bit = 0;
WITH og (OrgID, GroupID) AS
(
SELECT ID, ID FROM Common.com.Organizations WHERE ISNULL(ParentID, 0) <> ID
UNION ALL
SELECT o.ID, og.GroupID FROM Common.com.Organizations o JOIN og ON og.OrgID = o.ParentID
)
SELECT e.*, v.Type AS VendorType, v.F1099, v.F1099Type, v.TaxID, v.TaxPercent,
v.ContactName, v.ContactPhone, v.ContactEMail, v.DistrictWide,
a.*
FROM og
JOIN books.Organizations bo ON bo.CommonID = og.OrgID
JOIN books.Organizations po ON po.CommonID = og.GroupID
JOIN books.Entities e ON e.OrgID = po.ID
JOIN Vendors v ON v.ID = e.ID
AND (e.OrgID = bo.ID OR v.DistrictWide = 1)
LEFT JOIN Addresses a ON a.ID = e.AddressID
WHERE bo.ID = #OrgID
AND (#ActiveOnly = 0 OR e.Active = 1)
AND (#RestrictToOrgID = 0 OR e.OrgID = #OrgID)
ORDER BY e.EntityName
Replacing the LEFT JOIN Addresses with JOIN Addresses
JOIN Addresses a ON a.ID = e.AddressID
WHERE bo.ID = #OrgID
AND (#ActiveOnly = 0 OR e.Active = 1)
AND (#RestrictToOrgID = 0 OR e.OrgID = #OrgID)
ORDER BY e.EntityName
or reducing the length of the columns selected from Addresses to less than 100 bytes
SELECT e.*, v.Type AS VendorType, v.F1099, v.F1099Type, v.TaxID, v.TaxPercent,
v.ContactName, v.ContactPhone, v.ContactEMail, v.DistrictWide,
a.Fax
reduces the execution time to about .5 seconds.
In addition, using SELECT DISTINCT and joining books.Entities to Vendors
SELECT DISTINCT e.*, v.Type AS VendorType, v.F1099, v.F1099Type, v.TaxID, v.TaxPercent,
v.ContactName, v.ContactPhone, v.ContactEMail, v.DistrictWide,
a.*
FROM og
JOIN books.Organizations bo ON bo.CommonID = og.OrgID
JOIN books.Organizations po ON po.CommonID = og.GroupID
JOIN Vendors v
JOIN books.Entities e ON v.ID = e.ID
ON e.OrgID = bo.ID OR (e.OrgID = po.ID AND v.DistrictWide = 1)
Reduces the time to about .75 seconds.
Summary
These conditions indicate there is some kind of resource limitation in the SQL Server instance that is causing these erratic results and I don't know how to go about diagnosing it. If I copy the offending database to my laptop running SQL Server 2012, the problem does not present. I can continue to change the SQL around and hope for the best but I would prefer to find a more definitive solution.
Any suggestions are appreciated.
Update 2/27/18
The execution plan for the unmodified query shows a Clustered Index Seek against the Addresses table as the problem.
Reducing the length of the columns selected from Addresses to less than 100 bytes
SELECT e.*, v.Type AS VendorType, v.F1099, v.F1099Type, v.TaxID, v.TaxPercent,
v.ContactName, v.ContactPhone, v.ContactEMail, v.DistrictWide,
a.Fax
replaced the Clustered Index Seek with a Clustered Index Scan to retrieve a.Fax and a Hash Match to join this value to the results.
The Addresses table primary key is created as follows:
ALTER TABLE dbo.Addresses
ADD CONSTRAINT PK_Addresses PRIMARY KEY CLUSTERED (ID ASC)
WITH (PAD_INDEX = OFF,
STATISTICS_NORECOMPUTE = OFF,
SORT_IN_TEMPDB = OFF,
IGNORE_DUP_KEY = OFF,
ONLINE = OFF,
ALLOW_ROW_LOCKS = ON,
ALLOW_PAGE_LOCKS = ON)
ON PRIMARY
This index is defragged and optimized, as needed, every day.
So far, I can find nothing helpful as to why the Clustered Index Seek adds so much time to the query.

Ok, as is so often the case, there was not one problem, but two problems. This is an example of where complex problem analysis can lead to the wrong conclusions.
The primary problem turned out to be the recursive CTE og which returns a pivot table giving the parent/child relationships between organizations. However, analysis of the execution plans appeared to indicate the problem was some kind of glitch in the optimizer related to the amount of data being returned from a left-joined table. This may be entirely the result of my inability to properly analyze an execution plan but there does appear to be some issue in how SQL Server 2012 SP4 creates an execution plan under these circumstances.
While far more significant on our production server, the problem with SQL Server's optimization of recursive CTE was apparent on both my localhost, running 2012 SP4, and staging server, running SP2. But it took further analysis and some guesswork to see it.
The Solution
I replaced the recursive CTE with a pivot table and added a trigger to the Organizations table to maintain it.
USE Common
GO
CREATE VIEW com.OrganizationGroupsCTE
AS
WITH cte (OrgID, GroupID) AS
(
SELECT ID, ID FROM com.Organizations WHERE ISNULL(ParentID, 0) <> ID
UNION ALL
SELECT o.ID, cte.GroupID FROM com.Organizations o JOIN cte ON cte.OrgID = o.ParentID
)
SELECT OrgID, GroupID FROM cte
GO
CREATE TABLE com.OrganizationGroups
(
OrgID BIGINT,
GroupID BIGINT
)
INSERT com.OrganizationGroups
SELECT OrgID, GroupID
FROM com.OrganizationGroupsCTE
GO
CREATE TRIGGER TR_OrganizationGroups ON com.Organizations AFTER INSERT,UPDATE,DELETE
AS
DELETE og
FROM com.OrganizationGroups og
JOIN deleted d ON d.ID IN (og.groupID, og.orgID);
INSERT com.OrganizationGroups
SELECT orgID, groupID
FROM inserted i
JOIN OrganizationGroupsCTE cte ON i.ID IN (cte.orgID, cte.groupID)
GO
After modifying the query to use the pivot table,
SELECT e.*, v.Type AS VendorType, v.F1099, v.F1099Type, v.TaxID, v.TaxPercent,
v.ContactName, v.ContactPhone, v.ContactEMail, v.DistrictWide,
a.*
FROM Common.com.OrganizationGroups og
JOIN books.Organizations bo ON bo.CommonID = og.OrgID
JOIN books.Organizations po ON po.CommonID = og.GroupID
JOIN books.Entities e ON e.OrgID = po.ID
JOIN Vendors v ON v.ID = e.ID
AND (e.OrgID = bo.ID OR v.DistrictWide = 1)
LEFT JOIN Addresses a ON a.ID = e.AddressID
WHERE bo.ID = #OrgID
AND (#ActiveOnly = 0 OR e.Active = 1)
AND (#RestrictToOrgID = 0 OR e.OrgID = #OrgID)
ORDER BY e.EntityName
SQL Server performance was improved, and consistent, in all three environments. Problems on the production server have now been eliminated.

Related

Date comparison slow for large number of joined rows

The following query groups Snippets by ChannelId and returns an UnreadSnippetCount.
To determine the UnreadSnippetCount, Channel is joined onto ChannelUsers to fetch the date that the User last read the Channel and it uses this LastReadDate to limit the count to rows where the snippet was created after the user last read the channel.
SELECT c.Id, COUNT(s.Id) as [UnreadSnippetCount]
FROM Channels c
INNER JOIN ChannelUsers cu
ON cu.ChannelId = c.Id
LEFT JOIN Snippets s
ON cu.ChannelId = s.ChannelId
AND s.CreatedByUserId <> #UserId
WHERE cu.UserId = #UserId
AND (cu.LastReadDate IS NULL OR s.CreatedDate > cu.LastReadDate)
AND c.Id IN (select value from STRING_SPLIT(#ChannelIds, ','))
GROUP BY c.Id
The query works well logically but for Channels that have a large number of Snippets (97691), the query can take 10 minutes or more to return.
The following index is created:
CREATE NONCLUSTERED INDEX [IX_Snippets_CreatedDate] ON [dbo].[Snippets]
(
[CreatedDate] ASC
)WITH (STATISTICS_NORECOMPUTE = OFF, DROP_EXISTING = OFF, ONLINE = OFF, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
GO
Update:
Query execution plan (original query):
https://www.brentozar.com/pastetheplan/?id=B19sI105F
Update 2
Moving the where clause into the join as suggested:
SELECT c.Id, COUNT(s.Id) as [UnreadSnippetCount]
FROM Channels c
INNER JOIN ChannelUsers cu
ON cu.ChannelId = c.Id
LEFT JOIN Snippets s
ON cu.ChannelId = s.ChannelId
AND s.CreatedByUserId <> #UserId
AND s.CreatedDate > cu.LastReadDate
WHERE cu.UserId = #UserId
AND c.Id IN (select value from STRING_SPLIT(#ChannelIds, ',')
Produces this execution plan:
https://www.brentozar.com/pastetheplan/?id=HkqwFk0ct
Is there a better date comparison method I can use?
Update 3 - Solution
Index
CREATE NONCLUSTERED INDEX [IX_Snippet_Created] ON [dbo].[Snippets]
(ChannelId ASC, CreatedDate ASC) INCLUDE (CreatedByUserId);
Stored Proc
ALTER PROCEDURE [dbo].[GetUnreadSnippetCounts2]
(
#ChannelIds ChannelIdsType READONLY,
#UserId nvarchar(36)
)
AS
SET NOCOUNT ON
SELECT
c.Id,
COUNT(s.Id) as [UnreadSnippetCount]
FROM Channels c
JOIN #ChannelIds cid
ON cid.Id = c.Id
INNER JOIN ChannelUsers cu
ON cu.ChannelId = c.Id
AND cu.UserId = #UserId
JOIN Snippets s
ON cu.ChannelId = s.ChannelId
AND s.CreatedByUserId <> #UserId
AND (cu.LastReadDate IS NULL OR s.CreatedDate > cu.LastReadDate)
GROUP BY c.Id;
This gives the correct results logically and returns quickly.
Resulting execution plan:
https://www.brentozar.com/pastetheplan/?id=S1GwRCCcK
There are a number of inefficiencies I can see in the query plan.
Using STRING_SPLIT means the compiler does not know how many values are being returned, or that they are unique, and the data type is mismatched. Ideally you would pass in a Table valued Parameter, however if you cannot do so then another solution is to dump them into a table variable
DECLARE #tmp TABLE (Id int PRIMARY KEY);
INSERT #tmp (Id)
select value
from STRING_SPLIT(#ChannelIds, ',')
You need better indexing on Snippets. I would suggest the following
CREATE NONCLUSTERED INDEX [IX_Snippet_Created] ON [dbo].[Snippets]
(ChannelId ASC, CreatedDate ASC) INCLUDE (CreatedByUserId);
It doesn't make sense to place CreatedByUserId in the key, because it's an inequality. Keep it in the INCLUDE
As you have already been told, it's better if you move the conditions (for left-joined tables) to the ON clause. I don't know if you then still need the cu.LastReadDate IS NULL check, I've left it in.
I must say, I'm unclear your schema, but INNER JOIN ChannelUsers cu feels wrong here, perhaps it should be a LEFT JOIN? I cannot say further without seeing your full setup and required output.
SELECT
c.Id,
COUNT(s.Id) as [UnreadSnippetCount]
FROM Channels c
JOIN #tmp t
ON t.Id = c.Id
INNER JOIN ChannelUsers cu
ON cu.ChannelId = c.Id
AND cu.UserId = #UserId
LEFT JOIN Snippets s
ON cu.ChannelId = s.ChannelId
AND s.CreatedByUserId <> #UserId
AND (cu.LastReadDate IS NULL OR s.CreatedDate > cu.LastReadDate)
GROUP BY c.Id;

Azure SQL DB causing connection time out for stored procedures

We have hosted our database in Azure and are running stored procedures on this DB. The stored procedures had been running fine till last week but suddenly started giving error connection timeout.
Our database size is 14 GB and the stored procedures in general return 2k to 20k records and we are using the S3 pricing tier (50 DTU) of Azure DB.
What I found interesting was the first time the stored procedure is executed, it takes a lot of time 2 - 3 mins and this is causing the timeout. The later executions are fast (maybe it caches the execution plan).
Also when I run on the same DB with the same number of records on a machine with the config of 8gb ram, Win10 it runs in 15 seconds.
This is my stored procedure:
CREATE PROCEDURE [dbo].[PRSP]
#CompanyID INT,
#fromDate DATETIME,
#toDate DATETIME,
#ListMailboxId as MailboxIds Readonly,
#ListConversationType as ConversationTypes Readonly
AS
BEGIN
SET NOCOUNT ON;
SELECT
C.ID,
C.MailboxID,
C.Status,
C.CustomerID,
Cust.FName,
Cust.LName,
C.ArrivalDate as ConversationArrivalDate,
C.[ClosureDate],
C.[ConversationType],
M.[From],
M.ArrivalDate as MessageArrivalDate,
M.ID as MessageID
FROM
[Conversation] as C
INNER JOIN
[ConversationHistory] AS CHis ON (CHis.ConversationID = C.ID)
INNER JOIN
[Message] AS M ON (M.ConversationID = C.ID)
INNER JOIN
[Mailbox] AS Mb ON (Mb.ID = C.MailboxID)
INNER JOIN
[Customer] AS Cust ON (Cust.ID = C.CustomerID)
JOIN
#ListConversationType AS convType ON convType.ID = C.[ConversationType]
JOIN
#ListMailboxId AS mailboxIds ON mailboxIds.ID = Mb.ID
WHERE
Mb.CompanyID = #CompanyID
AND ((CHis.CreatedOn > #fromDate
AND CHis.CreatedOn < #toDate
AND CHis.Activity = 1
AND CHis.TagData = '3')
OR (M.ArrivalDate > #fromDate
AND M.ArrivalDate < #toDate))
END
This is the execution plan :
Execution Plan
Please do give your suggestions as to what improvement is needed? Also do we need to upgrade my pricing tier?
Ideally for a 14GB DB what should be the Azure Pricing tier?
That query should take 1 to 3 seconds to complete on your Windows 10 8Gb RAM machine. It takes 15 seconds because SQL Server choose a poor execution plan. In this case, the root cause of poor execution plan is bad estimates, several operators in the plan show big difference between estimated rows and actual rows. For example, SQL Server estimated it only need to perform one seek into pk_customer clustered index, but it performed 16,522 seeks. The same thing occurs with [ConversationHistory].[IX_ConversationID_CreatedOn_Activity_ByWhom] and with [Message].[IX_ConversationID_ID_ArrivalDt_From_RStatus_Type.
Here you have some hints you could follow to improve the performance of the query:
Update statistics
Try OPTION (HASH JOIN) at the end of the query. It might improve the
performance or it might slow it down, it even can cause the query to
error.
Store table variable data in temporal tables and use them in the query. (SELECT * INTO #temp_table FROM #table_variable). Table variables don't have statistics causing bad estimates.
Identify the first operator where the difference between estimated rows and actual rows are big enough. Split the query. Query1: SELECT * INTO #operator_result FROM (query equivalent to operator). Query2: write the query using #operator_result. Because #operator_result is a temporal table SQL Server is forced to reevaluate estimates. In this case, the offending operator is the hash match (inner join)
There are other things you can do to improve the performance of this query:
Avoid key lookups. There are 16,522 key lookups into Conversation.PK_dbo.Conversation clusterd index. It can ve avoided by creating the appropriate covering index. In this case, the covering index is the following:
DROP INDEX [IX_MailboxID] ON [dbo].[Conversation]
GO
CREATE INDEX IX_MailboxID ON [dbo].[Conversation](MailboxID)
INCLUDE (ArrivalDate, Status, ClosureDate, CustomerID, ConversationType)
Split OR predicate into UNION or UNION ALL. For example:
instead of:
SELECT *
FROM table
WHERE <predicate1> OR <predicate2>
use:
SELECT *
FROM table
WHERE <predicate1>
UNION
SELECT *
FROM table
WHERE <predicate2>
Sometimes it improves performance.
Apply each hint individually and measure performance.
EDIT: You can try the following and see if it improves performance:
SELECT
C.ID,
C.MailboxID,
C.Status,
C.CustomerID,
Cust.FName,
Cust.LName,
C.ArrivalDate as ConversationArrivalDate,
C.[ClosureDate],
C.[ConversationType],
M.[From],
M.ArrivalDate as MessageArrivalDate,
M.ID as MessageID
FROM
#ListConversationType AS convType
INNER JOIN (
#ListMailboxId AS mailboxIds
INNER JOIN
[Mailbox] AS Mb ON (Mb.ID = mailboxIds.MailboxID)
INNER JOIN
[Conversation] as C
ON C.ID = Mb.ID
) ON convType.ID = C.[ConversationType]
INNER HASH JOIN
[Customer] AS Cust ON (Cust.ID = C.CustomerID)
INNER HASH JOIN
[ConversationHistory] AS CHis ON (CHis.ConversationID = C.ID)
INNER HASH JOIN
[Message] AS M ON (M.ConversationID = C.ID)
WHERE
Mb.CompanyID = #CompanyID
AND ((CHis.CreatedOn > #fromDate
AND CHis.CreatedOn < #toDate
AND CHis.Activity = 1
AND CHis.TagData = '3')
OR (M.ArrivalDate > #fromDate
AND M.ArrivalDate < #toDate))
And this:
SELECT
C.ID,
C.MailboxID,
C.Status,
C.CustomerID,
Cust.FName,
Cust.LName,
C.ArrivalDate as ConversationArrivalDate,
C.[ClosureDate],
C.[ConversationType],
M.[From],
M.ArrivalDate as MessageArrivalDate,
M.ID as MessageID
FROM
#ListConversationType AS convType
INNER JOIN (
#ListMailboxId AS mailboxIds
INNER JOIN
[Mailbox] AS Mb ON (Mb.ID = mailboxIds.MailboxID)
INNER JOIN
[Conversation] as C
ON C.ID = Mb.ID
) ON convType.ID = C.[ConversationType]
INNER MERGE JOIN
[Customer] AS Cust ON (Cust.ID = C.CustomerID)
INNER MERGE JOIN
[ConversationHistory] AS CHis ON (CHis.ConversationID = C.ID)
INNER MERGE JOIN
[Message] AS M ON (M.ConversationID = C.ID)
WHERE
Mb.CompanyID = #CompanyID
AND ((CHis.CreatedOn > #fromDate
AND CHis.CreatedOn < #toDate
AND CHis.Activity = 1
AND CHis.TagData = '3')
OR (M.ArrivalDate > #fromDate
AND M.ArrivalDate < #toDate))
50 DTU is equivalent to 1/2 logical core.
See more: Using the Azure SQL Database DTU Calculator
I had the same issue this week and the final users claimed slowness in using the application connected to the VM hosted in Azure. Also, I have almost the same VM (4CPUs, 14GB of RAM and S3 but with 100DTUs).
In my case, I had a lot of indexes with avg_fragmentation_in_percent greater than 30 and this caused poor performance in executing stored procedures.
Run this in SSMS and if the indexes of the tables you are running your stored procedure against are there, then you might take care of it:
SELECT dbschemas.[name] as 'Schema',
dbtables.[name] as 'Table',
dbindexes.[name] as 'Index',
indexstats.avg_fragmentation_in_percent,
indexstats.page_count
FROM sys.dm_db_index_physical_stats (DB_ID(), NULL, NULL, NULL, NULL) AS indexstats
INNER JOIN sys.tables dbtables on dbtables.[object_id] = indexstats.[object_id]
INNER JOIN sys.schemas dbschemas on dbtables.[schema_id] = dbschemas.[schema_id]
INNER JOIN sys.indexes AS dbindexes ON dbindexes.[object_id] = indexstats.[object_id]
WHERE indexstats.database_id = DB_ID()
AND indexstats.index_id = dbindexes.index_id
AND indexstats.avg_fragmentation_in_percent >30
--AND dbindexes.[name] like '%CLUSTER%'
ORDER BY indexstats.avg_fragmentation_in_percent DESC
More info here.
Edit:
Check also how old the statistics are:
SELECT
sys.objects.name AS table_name,
sys.indexes.name as index_name,
sys.indexes.type_desc as index_type,
stats_date(sys.indexes.object_id,sys.indexes.index_id)
as last_update_stats_date,
DATEDIFF(d,stats_date(sys.indexes.object_id,sys.indexes.index_id),getdate())
as stats_age_in_days
FROM
sys.indexes
INNER JOIN sys.objects on sys.indexes.object_id=sys.objects.object_id
WHERE
sys.objects.type = 'U'
AND
sys.indexes.index_id > 0
--AND sys.indexes.name Like '%CLUSTER%'
ORDER BY
stats_age_in_days DESC;
GO

Why do I get this unexpected SQL performance gain?

This is more a quiz question rather than me panicking over a deadline, however understanding how/why would no doubt let me scratch my head a little less!
So I have this UPDATE statement:
/*** #Table is a TABLE Variable ***/
UPDATE O
SET O.PPTime = T.PPTime
FROM #Table AS [O]
INNER JOIN
(SELECT O.OSID, O.STID, DATEDIFF(SECOND, O.StartDateTime, O.EndDateTime) AS [PPTime]
FROM tblO AS [O]
INNER JOIN tblS AS [S] ON O.OSID = S.OSID
INNER JOIN tblE AS [E] ON S.EID = E.EID
INNER JOIN tblEF AS [EF] ON E.EFID = EF.EFID
GROUP BY O.OSID, O.STID, O.StartDateTime, O.EndDateTime) AS [T]
ON O.OSID = T.OSID
WHERE O.PPTime IS NULL
The execution time is approximately 12 seconds.
Now below I have added in a small WHERE statement which does not have any impact on how many rows of data are returned to the user:
/*** #Table is a TABLE Variable ***/
UPDATE O
SET O.PPTime = T.PPTime
FROM #Table AS [O]
INNER JOIN
(SELECT O.OSID, O.STID, DATEDIFF(SECOND, O.StartDateTime, O.EndDateTime) AS [PPTime]
FROM tblO AS [O]
INNER JOIN tblS AS [S] ON O.OSID = S.OSID
INNER JOIN tblE AS [E] ON S.EID = E.EID
INNER JOIN tblEF AS [EF] ON E.EFID = EF.EFID
WHERE O.OSID >= 0 /*** Somehow fixes performance slow down! ***/
GROUP BY O.OSID, O.STID, O.StartDateTime, O.EndDateTime) AS [T]
ON O.OSID = T.OSID
WHERE O.PPTime IS NULL
The execution time is now less than a second. If I run both SELECT statements individually, they execute in the same time and return the same data.
Why do I get such a performance gain?
After reviewing the code, I noticed that adding a Primary Key and/or indexing to the table variable done the trick! One for me to remember!

Stored Procedure Optimisation

I have written the following SQL Stored Procedure and because of all the select commands (I think) it's really slow running now the database has been populated with lots of data. Is there a way to optimise it so that it runs much quicker? Currently in an Azure S0 DB it takes around 1:40 min to process. Here's the stored procedure:
USE [cmb2SQL]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE PROCEDURE [dbo].[spStockReport] #StockReportId as INT
AS
select
ProductId,
QtyCounted,
LastStockCount,
Purchases,
UnitRetailPrice,
CostPrice,
GrossProfit,
Consumed,
(Consumed * UnitRetailPrice) as ValueOfSales,
(QtyCounted * CostPrice) as StockOnHand,
StockCountId
from (
select
ProductId,
QtyCounted,
LastStockCount,
Purchases,
UnitRetailPrice,
CostPrice,
GrossProfit,
(LastStockCount + Purchases) - QtyCounted as Consumed,
StockCountId
from (
select
distinct
sci.StockCountItem_Product as ProductId,
(Select ISNULL(SUM(Qty), 0) as tmpQty from
(Select Qty from stockcountitems
join stockcounts on stockcountitems.stockcountitem_stockcount = stockcounts.id
where stockcountitem_Product = p.Id and stockcountitem_stockcount = sc.id and stockcounts.stockcount_pub = sc.StockCount_Pub
) as data
) as QtyCounted,
(Select ISNULL(SUM(Qty), 0) as LastStockCount from
(Select Qty from StockCountItems as sci
join StockCounts on sci.StockCountItem_StockCount = StockCounts.Id
join Products on sci.StockCountItem_Product = Products.id
where sci.StockCountItem_Product = p.id and sci.stockcountitem_stockcount =
(select top 1 stockcounts.id from stockcounts
join stockcountitems on stockcounts.id = stockcountitem_stockcount
where stockcountitems.stockcountitem_product = p.id and stockcounts.id < sc.id and StockCounts.StockCount_Pub = sc.StockCount_Pub
order by stockcounts.id desc)
) as data
) as LastStockCount,
(Select ISNULL(SUM(Qty * CaseSize), 0) as Purchased from
(select Qty, Products.CaseSize from StockPurchaseItems
join Products on stockpurchaseitems.stockpurchaseitem_product = products.id
join StockPurchases on stockpurchaseitem_stockpurchase = stockpurchases.id
join Periods on stockpurchases.stockpurchase_period = periods.id
where Products.id = p.Id and StockPurchases.StockPurchase_Period = sc.StockCount_Period and StockPurchases.StockPurchase_Pub = sc.StockCount_Pub) as data
) as Purchases,
sci.RetailPrice as UnitRetailPrice,
sci.CostPrice,
(select top 1 GrossProfit from Pub_Products where Pub_Products.Pub_Product_Product = p.id and Pub_Products.Pub_Product_Pub = sc.StockCount_Pub) as GrossProfit,
sc.Id as StockCountId
from StockCountItems as sci
join StockCounts as sc on sci.StockCountItem_StockCount = sc.Id
join Pubs on sc.StockCount_Pub = pubs.Id
join Periods as pd on sc.StockCount_Period = pd.Id
join Products as p on sci.StockCountItem_Product = p.Id
join Pub_Products as pp on p.Id = pp.Pub_Product_Product
Where StockCountItem_StockCount = #StockReportId and pp.Pub_Product_Pub = sc.StockCount_Pub
Group By sci.CostPrice, sci.StockCountItem_Product, sci.Qty, sc.Id, p.Id, sc.StockCount_Period, pd.Id, sci.RetailPrice, pp.CountPrice, sc.StockCount_Pub
) as data
) as final
GO
As requested here is the execution plan in XML (had to upload it to tinyupload as it exceeds the message character length):
execusionplan.xml
Schema:
Row Counts:
Table row_count
StockPurchaseItems 57511
Products 3116
StockCountItems 60949
StockPurchases 6494
StockCounts 240
Periods 30
Pub_Products 5694
Pubs 7
Without getting into the query rewrite, it's the most expensive and the last thing you should do probably. Try these steps first, one by one, and measure the impact - time, execution plan, and SET STATISTICS IO ON output. Create the baseline first for these metrics. Stop when you achieve the acceptable performance.
First of all, update statistics on relevant tables, I see some of the estimates are way off. Check the exec plan for estimated vs actual rows - any better now?
Create indexes on StockPurchaseItems(StockPurchaseItem_Product) and on StockCountItems(StockCountItem_Product, StockCountItem_StockCount). Check the execution plan, did optimizer consider using the indexes at all?
Add (include) other referenced columns to those two indexes in order to cover the query. Are they used now?
If nothing of above helped, consider breaking the query into smaller ones. Would be nice to have some real data to experiment with to be more specific.
** That "select distinct" smells real bad, are you sure the joins are all ok?

Very slow stored procedure

I have a hard time with query optimization, currently I'm very close to the point of database redesign. And the stackoverflow is my last hope. I don't think that just showing you the query is enough so I've linked not only database script but also attached database backup in case you don't want to generate the data by hand
Here you can find both the script and the backup
The problems start when you try to do the following...
exec LockBranches #count=64,#lockedBy='034C0396-5C34-4DDA-8AD5-7E43B373AE5A',#lockedOn='2011-07-01 01:29:43.863',#unlockOn='2011-07-01 01:32:43.863'
The main problems occur in this part:
UPDATE B
SET B.LockedBy = #lockedBy,
B.LockedOn = #lockedOn,
B.UnlockOn = #unlockOn,
B.Complete = 1
FROM
(
SELECT TOP (#count) B.LockedBy, B.LockedOn, B.UnlockOn, B.Complete
FROM Objectives AS O
INNER JOIN Generations AS G ON G.ObjectiveID = O.ID
INNER JOIN Branches AS B ON B.GenerationID = G.ID
INNER JOIN
(
SELECT SB.BranchID AS BranchID, SUM(X.SuitableProbes) AS SuitableProbes
FROM SpicieBranches AS SB
INNER JOIN Probes AS P ON P.SpicieID = SB.SpicieID
INNER JOIN
(
SELECT P.ID, 1 AS SuitableProbes
FROM Probes AS P
/* ----> */ INNER JOIN Results AS R ON P.ID = R.ProbeID /* SSMS Estimated execution plan says this operation is the roughest */
GROUP BY P.ID
HAVING COUNT(R.ID) > 0
) AS X ON P.ID = X.ID
GROUP BY SB.BranchID
) AS X ON X.BranchID = B.ID
WHERE
(O.Active = 1)
AND (B.Sealed = 0)
AND (B.GenerationNo < O.BranchGenerations)
AND (B.LockedBy IS NULL OR DATEDIFF(SECOND, B.UnlockOn, GETDATE()) > 0)
AND (B.Complete = 1 OR X.SuitableProbes = O.BranchSize * O.EstimateCount * O.ProbeCount)
) AS B
EDIT: Here are the amounts of rows in each table:
Spicies 71536
Results 10240
Probes 10240
SpicieBranches 4096
Branches 256
Estimates 5
Generations 1
Versions 1
Objectives 1
Somebody else might be able to explain better than I can why this is much quicker. Experience tells me when you have a bunch of queries that collectively run slow together but should be quick in their individual parts then its worth trying a temporary table.
This is much quicker
ALTER PROCEDURE LockBranches
-- Add the parameters for the stored procedure here
#count INT,
#lockedOn DATETIME,
#unlockOn DATETIME,
#lockedBy UNIQUEIDENTIFIER
AS
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON
--Create Temp Table
SELECT SpicieBranches.BranchID AS BranchID, SUM(X.SuitableProbes) AS SuitableProbes
INTO #BranchSuitableProbeCount
FROM SpicieBranches
INNER JOIN Probes AS P ON P.SpicieID = SpicieBranches.SpicieID
INNER JOIN
(
SELECT P.ID, 1 AS SuitableProbes
FROM Probes AS P
INNER JOIN Results AS R ON P.ID = R.ProbeID
GROUP BY P.ID
HAVING COUNT(R.ID) > 0
) AS X ON P.ID = X.ID
GROUP BY SpicieBranches.BranchID
UPDATE B SET
B.LockedBy = #lockedBy,
B.LockedOn = #lockedOn,
B.UnlockOn = #unlockOn,
B.Complete = 1
FROM
(
SELECT TOP (#count) Branches.LockedBy, Branches.LockedOn, Branches.UnlockOn, Branches.Complete
FROM Objectives
INNER JOIN Generations ON Generations.ObjectiveID = Objectives.ID
INNER JOIN Branches ON Branches.GenerationID = Generations.ID
INNER JOIN #BranchSuitableProbeCount ON Branches.ID = #BranchSuitableProbeCount.BranchID
WHERE
(Objectives.Active = 1)
AND (Branches.Sealed = 0)
AND (Branches.GenerationNo < Objectives.BranchGenerations)
AND (Branches.LockedBy IS NULL OR DATEDIFF(SECOND, Branches.UnlockOn, GETDATE()) > 0)
AND (Branches.Complete = 1 OR #BranchSuitableProbeCount.SuitableProbes = Objectives.BranchSize * Objectives.EstimateCount * Objectives.ProbeCount)
) AS B
END
This is much quicker with an average execution time of 54ms compared to 6 seconds with the original one.
EDIT
Had a look and combined my ideas with those from RBarryYoung's solution. If you use the following to create the temporary table
SELECT SB.BranchID AS BranchID, COUNT(*) AS SuitableProbes
INTO #BranchSuitableProbeCount
FROM SpicieBranches AS SB
INNER JOIN Probes AS P ON P.SpicieID = SB.SpicieID
WHERE EXISTS(SELECT * FROM Results AS R WHERE R.ProbeID = P.ID)
GROUP BY SB.BranchID
then you can get this down to 15ms which is 400x better than we started with. Looking at the execution plan shows that there is a table scan happening on the temp table. Normally you avoid table scans as best you can but for 128 rows (in this case) it is quicker than whatever it was doing before.
This is basically a complete guess here, but in times past I've found that joining onto the results of a sub-query can be horrifically slow. That is, the subquery was being evaluated way too many times when it really didn't need to.
The way around this was to move the subqueries into CTEs and to join onto those instead. Good luck!
It appears the join on the two uniqueidentifier columns are the source of the problem. One is a clustered index, the other non-clustered on the (FK table). Good that there are indexes on them. Unfortunately guids are notoriously poor performing when joining with large numbers of rows.
As troubleshooting steps:
what state are the indexes in? When was the last time the statistics were updated?
how performant is that subquery onto itself, when executed adhoc? i.e. when you run this statement by itself, how fast does the resultset return? acceptable?
after rebuilding the 2 indexes, and updating statistics, is there any measurable difference?
SELECT P.ID, 1 AS SuitableProbes FROM Probes AS P
INNER JOIN Results AS R ON P.ID = R.ProbeID
GROUP BY P.ID HAVING COUNT(R.ID) > 0
The following runs about 15x faster on my system:
UPDATE B
SET B.LockedBy = #lockedBy,
B.LockedOn = #lockedOn,
B.UnlockOn = #unlockOn,
B.Complete = 1
FROM
(
SELECT TOP (#count) B.LockedBy, B.LockedOn, B.UnlockOn, B.Complete
FROM Objectives AS O
INNER JOIN Generations AS G ON G.ObjectiveID = O.ID
INNER JOIN Branches AS B ON B.GenerationID = G.ID
INNER JOIN
(
SELECT SB.BranchID AS BranchID, COUNT(*) AS SuitableProbes
FROM SpicieBranches AS SB
INNER JOIN Probes AS P ON P.SpicieID = SB.SpicieID
WHERE EXISTS(SELECT * FROM Results AS R WHERE R.ProbeID = P.ID)
GROUP BY SB.BranchID
) AS X ON X.BranchID = B.ID
WHERE
(O.Active = 1)
AND (B.Sealed = 0)
AND (B.GenerationNo < O.BranchGenerations)
AND (B.LockedBy IS NULL OR DATEDIFF(SECOND, B.UnlockOn, GETDATE()) > 0)
AND (B.Complete = 1 OR X.SuitableProbes = O.BranchSize * O.EstimateCount * O.ProbeCount)
) AS B
Insertion of sub query into local temporary table
SELECT SB.BranchID AS BranchID, SUM(X.SuitableProbes) AS SuitableProbes
into #temp FROM SpicieBranches AS SB
INNER JOIN Probes AS P ON P.SpicieID = SB.SpicieID
INNER JOIN
(
SELECT P.ID, 1 AS SuitableProbes
FROM Probes AS P
/* ----> */ INNER JOIN Results AS R ON P.ID = R.ProbeID /* SSMS Estimated execution plan says this operation is the roughest */
GROUP BY P.ID
HAVING COUNT(R.ID) > 0
) AS X ON P.ID = X.ID
GROUP BY SB.BranchID
The below query shows the partial joins with the corresponding table instead of complete!!
UPDATE B
SET B.LockedBy = #lockedBy,
B.LockedOn = #lockedOn,
B.UnlockOn = #unlockOn,
B.Complete = 1
FROM
(
SELECT TOP (#count) B.LockedBy, B.LockedOn, B.UnlockOn, B.Complete
From
(
SELECT ID, BranchGenerations, (BranchSize * EstimateCount * ProbeCount) as MultipliedFactor
FROM Objectives AS O WHERE (O.Active = 1)
)O
INNER JOIN Generations AS G ON G.ObjectiveID = O.ID
Inner Join
(
Select Sealed, GenerationNo, LockedBy, UnlockOn, ID, Complete
From Branches
Where B.Sealed = 0 AND (B.LockedBy IS NULL OR DATEDIFF(SECOND, B.UnlockOn, GETDATE()) > 0)
)B ON B.GenerationID = G.ID
INNER JOIN
(
Select * from #temp
) AS X ON X.BranchID = B.ID
WHERE
AND (B.GenerationNo < O.BranchGenerations)
AND (B.Complete = 1 OR X.SuitableProbes = O.MultipliedFactor)
) AS B