We have hosted our database in Azure and are running stored procedures on this DB. The stored procedures had been running fine till last week but suddenly started giving error connection timeout.
Our database size is 14 GB and the stored procedures in general return 2k to 20k records and we are using the S3 pricing tier (50 DTU) of Azure DB.
What I found interesting was the first time the stored procedure is executed, it takes a lot of time 2 - 3 mins and this is causing the timeout. The later executions are fast (maybe it caches the execution plan).
Also when I run on the same DB with the same number of records on a machine with the config of 8gb ram, Win10 it runs in 15 seconds.
This is my stored procedure:
CREATE PROCEDURE [dbo].[PRSP]
#CompanyID INT,
#fromDate DATETIME,
#toDate DATETIME,
#ListMailboxId as MailboxIds Readonly,
#ListConversationType as ConversationTypes Readonly
AS
BEGIN
SET NOCOUNT ON;
SELECT
C.ID,
C.MailboxID,
C.Status,
C.CustomerID,
Cust.FName,
Cust.LName,
C.ArrivalDate as ConversationArrivalDate,
C.[ClosureDate],
C.[ConversationType],
M.[From],
M.ArrivalDate as MessageArrivalDate,
M.ID as MessageID
FROM
[Conversation] as C
INNER JOIN
[ConversationHistory] AS CHis ON (CHis.ConversationID = C.ID)
INNER JOIN
[Message] AS M ON (M.ConversationID = C.ID)
INNER JOIN
[Mailbox] AS Mb ON (Mb.ID = C.MailboxID)
INNER JOIN
[Customer] AS Cust ON (Cust.ID = C.CustomerID)
JOIN
#ListConversationType AS convType ON convType.ID = C.[ConversationType]
JOIN
#ListMailboxId AS mailboxIds ON mailboxIds.ID = Mb.ID
WHERE
Mb.CompanyID = #CompanyID
AND ((CHis.CreatedOn > #fromDate
AND CHis.CreatedOn < #toDate
AND CHis.Activity = 1
AND CHis.TagData = '3')
OR (M.ArrivalDate > #fromDate
AND M.ArrivalDate < #toDate))
END
This is the execution plan :
Execution Plan
Please do give your suggestions as to what improvement is needed? Also do we need to upgrade my pricing tier?
Ideally for a 14GB DB what should be the Azure Pricing tier?
That query should take 1 to 3 seconds to complete on your Windows 10 8Gb RAM machine. It takes 15 seconds because SQL Server choose a poor execution plan. In this case, the root cause of poor execution plan is bad estimates, several operators in the plan show big difference between estimated rows and actual rows. For example, SQL Server estimated it only need to perform one seek into pk_customer clustered index, but it performed 16,522 seeks. The same thing occurs with [ConversationHistory].[IX_ConversationID_CreatedOn_Activity_ByWhom] and with [Message].[IX_ConversationID_ID_ArrivalDt_From_RStatus_Type.
Here you have some hints you could follow to improve the performance of the query:
Update statistics
Try OPTION (HASH JOIN) at the end of the query. It might improve the
performance or it might slow it down, it even can cause the query to
error.
Store table variable data in temporal tables and use them in the query. (SELECT * INTO #temp_table FROM #table_variable). Table variables don't have statistics causing bad estimates.
Identify the first operator where the difference between estimated rows and actual rows are big enough. Split the query. Query1: SELECT * INTO #operator_result FROM (query equivalent to operator). Query2: write the query using #operator_result. Because #operator_result is a temporal table SQL Server is forced to reevaluate estimates. In this case, the offending operator is the hash match (inner join)
There are other things you can do to improve the performance of this query:
Avoid key lookups. There are 16,522 key lookups into Conversation.PK_dbo.Conversation clusterd index. It can ve avoided by creating the appropriate covering index. In this case, the covering index is the following:
DROP INDEX [IX_MailboxID] ON [dbo].[Conversation]
GO
CREATE INDEX IX_MailboxID ON [dbo].[Conversation](MailboxID)
INCLUDE (ArrivalDate, Status, ClosureDate, CustomerID, ConversationType)
Split OR predicate into UNION or UNION ALL. For example:
instead of:
SELECT *
FROM table
WHERE <predicate1> OR <predicate2>
use:
SELECT *
FROM table
WHERE <predicate1>
UNION
SELECT *
FROM table
WHERE <predicate2>
Sometimes it improves performance.
Apply each hint individually and measure performance.
EDIT: You can try the following and see if it improves performance:
SELECT
C.ID,
C.MailboxID,
C.Status,
C.CustomerID,
Cust.FName,
Cust.LName,
C.ArrivalDate as ConversationArrivalDate,
C.[ClosureDate],
C.[ConversationType],
M.[From],
M.ArrivalDate as MessageArrivalDate,
M.ID as MessageID
FROM
#ListConversationType AS convType
INNER JOIN (
#ListMailboxId AS mailboxIds
INNER JOIN
[Mailbox] AS Mb ON (Mb.ID = mailboxIds.MailboxID)
INNER JOIN
[Conversation] as C
ON C.ID = Mb.ID
) ON convType.ID = C.[ConversationType]
INNER HASH JOIN
[Customer] AS Cust ON (Cust.ID = C.CustomerID)
INNER HASH JOIN
[ConversationHistory] AS CHis ON (CHis.ConversationID = C.ID)
INNER HASH JOIN
[Message] AS M ON (M.ConversationID = C.ID)
WHERE
Mb.CompanyID = #CompanyID
AND ((CHis.CreatedOn > #fromDate
AND CHis.CreatedOn < #toDate
AND CHis.Activity = 1
AND CHis.TagData = '3')
OR (M.ArrivalDate > #fromDate
AND M.ArrivalDate < #toDate))
And this:
SELECT
C.ID,
C.MailboxID,
C.Status,
C.CustomerID,
Cust.FName,
Cust.LName,
C.ArrivalDate as ConversationArrivalDate,
C.[ClosureDate],
C.[ConversationType],
M.[From],
M.ArrivalDate as MessageArrivalDate,
M.ID as MessageID
FROM
#ListConversationType AS convType
INNER JOIN (
#ListMailboxId AS mailboxIds
INNER JOIN
[Mailbox] AS Mb ON (Mb.ID = mailboxIds.MailboxID)
INNER JOIN
[Conversation] as C
ON C.ID = Mb.ID
) ON convType.ID = C.[ConversationType]
INNER MERGE JOIN
[Customer] AS Cust ON (Cust.ID = C.CustomerID)
INNER MERGE JOIN
[ConversationHistory] AS CHis ON (CHis.ConversationID = C.ID)
INNER MERGE JOIN
[Message] AS M ON (M.ConversationID = C.ID)
WHERE
Mb.CompanyID = #CompanyID
AND ((CHis.CreatedOn > #fromDate
AND CHis.CreatedOn < #toDate
AND CHis.Activity = 1
AND CHis.TagData = '3')
OR (M.ArrivalDate > #fromDate
AND M.ArrivalDate < #toDate))
50 DTU is equivalent to 1/2 logical core.
See more: Using the Azure SQL Database DTU Calculator
I had the same issue this week and the final users claimed slowness in using the application connected to the VM hosted in Azure. Also, I have almost the same VM (4CPUs, 14GB of RAM and S3 but with 100DTUs).
In my case, I had a lot of indexes with avg_fragmentation_in_percent greater than 30 and this caused poor performance in executing stored procedures.
Run this in SSMS and if the indexes of the tables you are running your stored procedure against are there, then you might take care of it:
SELECT dbschemas.[name] as 'Schema',
dbtables.[name] as 'Table',
dbindexes.[name] as 'Index',
indexstats.avg_fragmentation_in_percent,
indexstats.page_count
FROM sys.dm_db_index_physical_stats (DB_ID(), NULL, NULL, NULL, NULL) AS indexstats
INNER JOIN sys.tables dbtables on dbtables.[object_id] = indexstats.[object_id]
INNER JOIN sys.schemas dbschemas on dbtables.[schema_id] = dbschemas.[schema_id]
INNER JOIN sys.indexes AS dbindexes ON dbindexes.[object_id] = indexstats.[object_id]
WHERE indexstats.database_id = DB_ID()
AND indexstats.index_id = dbindexes.index_id
AND indexstats.avg_fragmentation_in_percent >30
--AND dbindexes.[name] like '%CLUSTER%'
ORDER BY indexstats.avg_fragmentation_in_percent DESC
More info here.
Edit:
Check also how old the statistics are:
SELECT
sys.objects.name AS table_name,
sys.indexes.name as index_name,
sys.indexes.type_desc as index_type,
stats_date(sys.indexes.object_id,sys.indexes.index_id)
as last_update_stats_date,
DATEDIFF(d,stats_date(sys.indexes.object_id,sys.indexes.index_id),getdate())
as stats_age_in_days
FROM
sys.indexes
INNER JOIN sys.objects on sys.indexes.object_id=sys.objects.object_id
WHERE
sys.objects.type = 'U'
AND
sys.indexes.index_id > 0
--AND sys.indexes.name Like '%CLUSTER%'
ORDER BY
stats_age_in_days DESC;
GO
Related
I am attempting to include a new table with values that need to be checked and included in a stored procedure. Statement 1 is the existing table that needs to be checked against, while statement 2 is the new table to check against.
I currently have 2 EXISTS conditions that function independently and produce the results I am expecting. By this I mean if I comment out Statement 1, statement 2 works and vice versa. When I put them together the query doesn't complete, there is no error but it times out which is unexpected because each statement only takes a few seconds.
I understand there is likely a better way to do this but before I do, I would like to know why I cannot seem to do multiple exists statements like this? Are there not meant to be multiple EXISTS conditions in the WHERE clause?
SELECT *
FROM table1 S
WHERE
--Statement 1
EXISTS
(
SELECT 1
FROM table2 P WITH (NOLOCK)
INNER JOIN table3 SA ON SA.ID = P.ID
WHERE P.DATE = #Date AND P.OTHER_ID = S.ID
AND
(
SA.FILTER = ''
OR
(
SA.FILTER = 'bar'
AND
LOWER(S.OTHER) = 'foo'
)
)
)
OR
(
--Statement 2
EXISTS
(
SELECT 1
FROM table4 P WITH (NOLOCK)
INNER JOIN table5 SA ON SA.ID = P.ID
WHERE P.DATE = #Date
AND P.OTHER_ID = S.ID
AND LOWER(S.OTHER) = 'foo'
)
)
EDIT: I have included the query details. Table 1-5 represent different tables, there are no repeated tables.
Too long to comment.
Your query as written seems correct. The timeout will only be able to be troubleshot from the execution plan, but here are a few things that could be happening or that you could benefit from.
Parameter sniffing on #Date. Try hard-coding this value and see if you still get the same slowness
No covering index on P.OTHER_ID or P.DATE or P.ID or SA.ID which would cause a table scan for these predicates
Indexes for the above columns which aren't optimal (including too many columns, etc)
Your query being serial when it may benefit from parallelism.
Using the LOWER function on a database which doesn't have a case sensitive collation (most don't, though this function doesn't slow things down that much)
You have a bad query plan in cache. Try adding OPTION (RECOMPILE) at the bottom so you get a new query plan. This is also done when comparing the speed of two queries to ensure they aren't using cached plans, or one isn't when another is which would skew the results.
Since your query is timing out, try including the estimated execution plan and post it for us at past the plan
I found putting 2 EXISTS in the WHERE condition made the whole process take significantly longer. What I found fixed it was using UNION and keeping the EXISTS in separate queries. The final result looked like the following:
SELECT *
FROM table1 S
WHERE
--Statement 1
EXISTS
(
SELECT 1
FROM table2 P WITH (NOLOCK)
INNER JOIN table3 SA ON SA.ID = P.ID
WHERE P.DATE = #Date AND P.OTHER_ID = S.ID
AND
(
SA.FILTER = ''
OR
(
SA.FILTER = 'bar'
AND
LOWER(S.OTHER) = 'foo'
)
)
)
UNION
--Statement 2
SELECT *
FROM table1 S
WHERE
EXISTS
(
SELECT 1
FROM table4 P WITH (NOLOCK)
INNER JOIN table5 SA ON SA.ID = P.ID
WHERE P.DATE = #Date
AND P.OTHER_ID = S.ID
AND LOWER(S.OTHER) = 'foo'
)
SELECT C.CompanyName,
B.BranchName,
E.EmployerName,
FE.EmployeeUniqueID,
pcr.EmployerUniqueID,
Case when FE.Status_id= 1 then 1 else 0 end IsUnPaid,
Case when re.EmployeeUniqueID IS NULL OR re.EmployeeUniqueID= '' then 0 else 1 end AS 'EmployeeRegistration',
FE.IncomeFixedComponent,
FE.IncomeVariableComponent,
Convert(varchar(11), Fe.PayStartDate, 106) as PayStartDate,
Convert(varchar(11), Fe.PayEndDate, 106) as PayEndDate,
S.StatusDescription,
FE.IsRejected,
FE.ID 'EdrID',
Convert(varchar(20), tr.TransactionDateTime, 113) as TransactionDateTime,
tr.BatchNo,
tr.IsDIFCreated,
Convert(varchar(20),tr.DIFFileCreationDateTime,113) as DiffDateTime
From File_EdrEntries FE
JOIN PAFFiles pe ON pe.ID = FE.PAFFile_ID
inner Join RegisteredEmployees RE
ON RE.EmployeeUniqueID= FE.EmployeeUniqueID
inner join File_PCREntries pcr on pe.ID=pcr.PAFFile_ID
JOIN Employers E ON E.EmployerID = pcr.EmployerUniqueID
JOIN Branches B ON B.BranchID = E.Branch_ID
JOIN companies C ON C.COMPANYID = B.COMPANY_ID
JOIN Statuses S ON S.StatusID = FE.Status_ID
JOIN Transactions tr on tr.EDRRecord_ID= fe.ID
where E.Branch_id=3
AND FE.IsRejected=0 AND FE.Status_id= 3 and tr.BatchNo is not null
AND Re.Employer_ID= re.Employer_ID;
THis query is supposed to return 10 million or more records and it usually causes timeout because of large no of records. So how can I improve its performance becauses I have done in where condition what I could.
First of all, you need to
optimize query more
Add required Indexes to tables involved in query
Then,
You can use this, to increase Query Timeout:
SET LOCK_TIMEOUT 1800;
SELECT ##LOCK_TIMEOUT AS [Lock Timeout];
Also, refer This Post
Find out which combination of tables filters the most data. for example if the following query filters out the majority of data you could consider creating a temp table with the data needed, index it and then use that in your bigger query.
SELECT fe.*,re.*
From File_EdrEntries FE
inner Join RegisteredEmployees RE
ON RE.EmployeeUniqueID= FE.EmployeeUniqueID
Breaking out the query into smaller chunks is likely the best way to go. Also make sure you have proper indexes in place
I need to improve my query, specially the execution time.This is my query:
SELECT SQL_CALC_FOUND_ROWS p.*,v.type,v.idName,v.name as etapaName,m.name AS manager,
c.name AS CLIENT,
(SELECT SEC_TO_TIME(SUM(TIME_TO_SEC(duration)))
FROM activities a
WHERE a.projectid = p.projectid) AS worked,
(SELECT SUM(TIME_TO_SEC(duration))
FROM activities a
WHERE a.projectid = p.projectid) AS worked_seconds,
(SELECT SUM(TIME_TO_SEC(remain_time))
FROM tasks t
WHERE t.projectid = p.projectid) AS remain_time
FROM projects p
INNER JOIN users m
ON p.managerid = m.userid
INNER JOIN clients c
ON p.clientid = c.clientid
INNER JOIN `values` v
ON p.etapa = v.id
WHERE 1 = 1
ORDER BY idName
ASC
The execution time of this is aprox. 5 sec. If i remove this part: (SELECT SUM(TIME_TO_SEC(remain_time)) FROM tasks t WHERE t.projectid = p.projectid) AS remain_time
the execution time is reduced to 0.3 sec. Is there a way to get the values of the remain_time in order to reduce the exec.time ?
The SQL is invoked from PHP (if this is relevant to any proposed solution).
It sounds like you need an index on tasks.
Try adding this one:
create index idx_tasks_projectid_remaintime on tasks(projectid, remain_time);
The correlated subquery should just use the index and go much faster.
Optimizing the query as it is written would give significant performance benefits (see below). But the FIRST QUESTION TO ASK when approaching any optimization is whether you really need to see all the data - there is no filtering of the resultset implemented here. This is a HUGE impact on how you optimize a query.
Adding an index on the query above will only help if the optimizer is opening a new cursor on the tasks table for every row returned by the main query. In the absence of any filtering, it will be much faster to do a full table scan of the tasks table.
SELECT ilv.*, remaining.rtime
FROM (
SELECT p.*,v.type, v.idName, v.name as etapaName,
m.name AS manager, c.name AS CLIENT,
SEC_TO_TIME(asbq.worked) AS worked, asbq.worked AS seconds_worked,
FROM projects p
INNER JOIN users m
ON p.managerid = m.userid
INNER JOIN clients c
ON p.clientid = c.clientid
INNER JOIN `values` v
ON p.etapa = v.id
LEFT JOIN (
SELECT a.projectid, SUM(TIME_TO_SEC(duration)) AS worked
FROM activities a
GROUP BY a.projectid
) asbq
ON asbq.projectid=p.projectid
) ilv
LEFT JOIN (
(SELECT t.project_id, SUM(TIME_TO_SEC(remain_time)) as rtime
FROM tasks t
GROUP BY t.projectid) remaining
ON ilv.projectid=remaining.projectid
I am trying to join 14 tables in which few tables I need to join using left join.
With the existing data which is around 7000 records,its taking around 10 seconds to execute the below query.I am afraid what if the records are more than million.Please help me improve the performance of the below query.
CREATE proc [dbo].[GetTodaysActualInvoiceItemSoldHistory]
#fromdate datetime,
#todate datetime
as
Begin
select SDID.InvoiceDate as [Sold Date],Cust.custCompanyName as [Sold To] ,
case SQBD.TransferNo when '0' then IVM.VendorName else SQBD.TransferNo end as [Purchase From],
SQBD.BatchSellQty as SoldQty,SQID.SellPrice,
SDID.InvoiceNo as [Sales Invoice No],INV.PRInvoiceNo as [PO Invoice No],INV.PRInvoiceDate as [PO Invoice Date],
SQID.ItemDesc as [Item Description],SQID.NetPrice,SDHM.DeliveryHeaderMasterName as DeliveryHeaderName,
SQID.ItemCode as [Item Code],
SQBD.BatchNo,SQBD.ExpiryDate,SQID.Amount,
SQID.Dept_ID as Dept_ID,
Dept_Name as [Department],SQID.Catg_ID as Catg_ID,
Category_Name as [Category],SQID.Brand_ID as Brand_ID,
BrandName as BrandName, SQID.Manf_Id as Manf_Id,
Manf.ManfName as [Manufacturer],
STM.TaxName, SQID.Tax_ID as Tax_ID,
INV.VendorID as VendorID,
SQBD.ItemID,SQM.Isdeleted,
SDHM.DeliveryHeaderMasterID,Cust.CustomerMasterID
from SD_QuotationMaster SQM
inner join SD_InvoiceDetails SDID on SQM.QuoteID = SDID.QuoteID
inner join SD_QuoteItemDetails SQID on SDID.QuoteID = SQID.QuoteID
inner join SD_QuoteBatchDetails SQBD on SDID.QuoteID = SQBD.QuoteID and SQID.ItemID=SQBD.ItemID
inner join INV_ProductInvoice INV on SQBD.InvoiceID=INV.ProductInvoiceID
inner jOIN INV_VendorMaster IVM ON INV.VendorID = IVM.VendorID
inner jOIN Sys_TaxMaster STM ON SQID.Tax_ID = STM.Tax_ID
inner join Cust_CustomerMaster Cust on SQM.CustomerMasterID = Cust.CustomerMasterID
left jOIN INV_DeptartmentMaster Dept ON SQID.Dept_ID = Dept.Dept_ID
left jOIN INV_BrandMaster BRD ON SQID.Brand_ID = BRD.Brand_ID
left jOIN INV_ManufacturerMaster Manf ON SQID.Manf_Id = Manf.Manf_Id
left join INV_CategoryMaster CAT ON SQID.Catg_ID = CAT.Catg_ID
left join SLRB_DeliveryCustomerMaster SDCM on SQM.CustomerMasterID=SDCM.CustomerMasterID and SQM.DeliveryHeaderMasterID=SDCM.DeliveryHeaderMasterID
left join SLRB_DeliveryHeaderMaster SDHM on SDCM.DeliveryHeaderMasterID=SDHM.DeliveryHeaderMasterID
where (SQM.IsDeleted=0) and SQBD.BatchSellQty > 0
and SDID.InvoiceDate between #fromdate and #todate
order by ItemDesc
End
Only the below tables contain more data while other tables have records less than 20
InvoiceDetails, QuoteMaster, QuoteItemDetails, QuoteBatchDetails ProductInvoice
Below is the link for execution plan
http://jmp.sh/CSZc2x2
Thanks.
Let's start with an obvious error:
(isnull(SQBD.BatchSellQty,0) > 0)
That one is not indexable, so it should not happen. Seriously, BatchSellQty should not be unknown (nullable) in most cases, or you better handle null properly. That field should be indexed and I am not sure I would like that with an isNull - there are likely tons of batches. Also note that a filtered index (condition >0) may work here.
Second, check that you have all proper indices and the execution plan makes sense.
Thids, you have to test with a ton of data. Index statistics may make a difference. Check where the time is spent - it may be tempdb in which case you really need a good tempdb IO speed.... and it is not realted to the input side.
You can try to use query hints to help SQL Server optimizer build a optimal query execution plan. For example, you can force the order of tables will be joined, using FORCE ORDER statement. If you order your tables in order that joins with minimum result size at each step, query will execute faster (may be, needs to try). Example:
We need to A join B join C
If A join B = 2000 records x 1000 records = ~400 records (we suspect this result)
And A join C = 2000 records x 10 records = ~3 records (and this)
And B join C = 1000 records x 10 records = 10 000 records (and this)
In this case optimal order will be
A join C join B = ~3 records x 1000 records = ~3000 records
I'm running the following query, but it is taking too long. Is there a way to make it faster or change the way the query is written?
Please help.
SELECT *
FROM ProductGroupLocUpdate WITH (nolock)
WHERE CmStatusFlag > 2
AND EngineID IN ( 0, 1 )
AND NOT EXISTS (SELECT DISTINCT APGV.LocationID
FROM CM_ST_ActiveProductGroupsView AS APGV WITH(nolock)
WHERE APGV.LocationID = ProductGroupLocUpdate.Locationid);
Try rewriting the query with a join
SELECT PGLU.* from ProductGroupLocUpdate PGLU WITH (NOLOCK)
LEFT JOIN CM_ST_ActiveProductGroupsView APGV WITH (NOLOCK)
ON PGLU.LocationId = APGV.LocationID
WHERE APGV.LocationID IS NULL AND CmStatusFlag>2 AND EngineID IN (0,1)
Depending on how much data is in your table, check add indexes to LocationId (in both tables), CmStatusFlag and EngineID