Help improving SQL join

Help improving SQL join - sql

I have a stored procedure that runs to update gaming points for user balances. It's an insert with 5 subqueries. I have isolated one of the subqueries as the query that slows the entire batch down. Without it, the stored procedure will run in under 2 seconds. With it, it will take as much as 8 seconds. 8 Seconds isn't the end of the world, but for the sake of scalability, I will need to have it complete faster. Here is the isolated subquery:
(SELECT IsNull(Sum(A.TransAmount) + Sum(Case When A.BetResult = 1 Then (A.BetWinAmount + (A.TransAmount * -1)) End), 0)
FROM User_T A
LEFT OUTER JOIN User_TD B on A.TID = B.TID
LEFT OUTER JOIN Lines_BL C ON B.LID = C.LID
LEFT OUTER JOIN Lines_BM D ON C.BMID = D.BMID
LEFT OUTER JOIN Event_M E ON D.EID = E.EID
LEFT OUTER JOIN Event_KB F ON A.TransReason = F.BID
LEFT OUTER JOIN Event_M G ON F.BID = G.EID
where A.UserID = U.UserID AND (A.IsSettled = 1)
AND
(
(A.TransReason = 1 AND (datediff(dd, Convert(datetime, E.EDate, 101), Convert(datetime, #EndDate, 101)) = #DaysAgo)) OR
(A.TransReason >= 3000 AND (datediff(dd, Convert(datetime, G.EDate, 101), Convert(datetime, #EndDate, 101)) = #DaysAgo)
AND [dbo].[Event_CEAFKBID](A.TransReason) = 1) OR
(A.TransReason BETWEEN 3 and 150 AND (datediff(dd, Convert(datetime, A.TransDT, 101), Convert(datetime, #EndDate, 101)) = #DaysAgo))
)
What I have done to further isolate: When I run a Select * on just the joins (without the where clauses), the performance in very good - > 100000 rows in under a second. As I add in the where clauses, I believe the great slow down is from the 'or' clause and/or the function that needs to be evaluated.
As I understand it, a function inside the where clause evaluates each row - as opposed to somehow caching the definition of the function and evaluating that way. I do have indexes on the tables, but I am wondering if some of them are not correct.
Without you knowing the full database structure, I am sure it's very difficult to pin down where the problem is, but I would like to get pointed in a direction to begin to further isolate.

I suspect your biggest performance hits are from the correlated subquery (whatever table is behind U.UserId) and from the embedded function call dbo.Event_CEAFKBID. Much of course depends upon how big the tables are (how many rows are being read). All those datetime conversions won’t help and generate a very strong “bad design” smell, but I don’t think they’d impact performance too much.
Those left outer joins are ugly, as the optimizer has to check them all for row – so if “A” is big, all the joins on all the rows have to be performed, even if there’s no data there. If they can be replaced with inner joins, do so, but I’m guessing not because of that “table E or table G” logic. Lesses, it sure looks like what you’ve got is three separate queries moshed into one; if you broke it out into three, unioned together, it’d look something like the Frankenstein query below. I’ve no idea if this would run faster or not (heck, I can’t even debug the query and make sure the panetheses balance), but if you’ve got sparse data relative to your logic this should run pretty fast. (I took out the date conversions to make the code more legible, you’d have to plug them back in.)
SELECT isnull(sum(Total), 0) FinalTotal from (
SELECT
sum(A.TransAmount + Case When A.BetResult = 1 Then A.BetWinAmount - A.TransAmount else 0 End) Total
FROM User_T A
INNER JOIN User_TD B on A.TID = B.TID
INNER JOIN Lines_BL C ON B.LID = C.LID
INNER JOIN Lines_BM D ON C.BMID = D.BMID
INNER JOIN Event_M E ON D.EID = E.EID
where A.UserID = U.UserID
AND A.IsSettled = 1
AND A.TransReason = 1
AND (datediff(dd, E.EDate, #EndDate) = #DaysAgo))
UNION ALL SELECT
sum(A.TransAmount + Case When A.BetResult = 1 Then A.BetWinAmount - A.TransAmount else 0 End) Total
FROM User_T A
INNER JOIN Event_KB F ON A.TransReason = F.BID
INNER JOIN Event_M G ON F.BID = G.EID
where A.UserID = U.UserID
AND A.IsSettled = 1
AND A.TransReason >= 3000
AND (datediff(dd, G.EDate, #EndDate) = #DaysAgo)
AND [dbo].[Event_CEAFKBID](A.TransReason) = 1
UNION ALL SELECT
sum(A.TransAmount + Case When A.BetResult = 1 Then A.BetWinAmount - A.TransAmount else 0 End) Total
FROM User_T A
where A.UserID = U.UserID
AND A.IsSettled = 1
AND A.TransReason BETWEEN 3 and 150
AND datediff(dd, A.TransDT, #EndDate) = #DaysAgo)
) ThreeWayUnion

You can put the case in the where cause, and not directly on select first line.
why you need to put many join if in this statment you just user the tables A,E and G?
To performance better queries you can use execution plan on management Studio.

Correlated subqueries are a very poor programming technique which equates to using a cursor in the query. Make it a derived table instead.
And yes those functions are slowing you down. If you have to convert to datetime, your database structure needs to be fixed and the data stored correctly as datetime.

Do you need to do the conversions on the datetime for the DATEDIFF functions? Are you storing the dates as test, or are you reconverting to get rid of the time? If you are, then you don't need to as days different will be correct including time.

You should review whether the outer joins are necessary - they are more expensive than inner joins. You have some values that come from the dominant table, tagged A. You also have an OR condition that references E, and an OR condition that references G. I'd look to restructure the query along the lines of:
SELECT SUM(x.result)
FROM (SELECT A.TransAmount + CASE WHEN A.BetResult = 1
THEN (A.BetWinAmount + (A.TransAmount * -1))
ELSE 0 END AS result
FROM A
WHERE A.TransReason BETWEEN 3 AND 150
AND datediff(dd, Convert(datetime, A.TransDT, 101),
Convert(datetime, #EndDate, 101)) = #DaysAgo
AND A.UserID = U.UserID -- Where does alias U come from?
AND A.IsSettled = 1
UNION
SELECT A.TransAmount + CASE WHEN A.BetResult = 1
THEN (A.BetWinAmount + (A.TransAmount * -1))
ELSE 0 END AS result
FROM User_T A
JOIN User_TD B ON A.TID = B.TID
JOIN Lines_BL C ON B.LID = C.LID
JOIN Lines_BM D ON C.BMID = D.BMID
JOIN Event_M E ON D.EID = E.EID
WHERE A.TransReason = 1
AND datediff(dd, Convert(datetime, E.EDate, 101),
Convert(datetime, #EndDate, 101)) = #DaysAgo
AND A.UserID = U.UserID -- Where does alias U come from?
AND A.IsSettled = 1
UNION
SELECT A.TransAmount + CASE WHEN A.BetResult = 1
THEN (A.BetWinAmount + (A.TransAmount * -1))
ELSE 0 END S result
FROM User_T A
JOIN User_TD B ON A.TID = B.TID
JOIN Lines_BL C ON B.LID = C.LID
JOIN Lines_BM D ON C.BMID = D.BMID
JOIN Event_M E ON D.EID = E.EID
JOIN Event_KB F ON A.TransReason = F.BID
JOIN Event_M G ON F.BID = G.EID
WHERE A.TransReason >= 3000
AND datediff(dd, Convert(datetime, G.EDate, 101),
Convert(datetime, #EndDate, 101)) = #DaysAgo
AND [dbo].[Event_CEAFKBID](A.TransReason) = 1
AND A.UserID = U.UserID -- Where does alias U come from?
AND A.IsSettled = 1
) AS x
The thinking here is that the inner join queries will each be quicker than the outer join queries, and summing intermediate results is not a hardship to the DBMS (it was doing that anyway). It probably also avoids the need for IFNULL.
The alias U is, presumably, a reference to the outer query of which this is a part.

Related

Convert query select to a cursor

I need to convert my SQL Server query in a cursor.
I tried with the example of question Select statement in cursor but I had error.
Here my query:
SELECT DISTINCT
U.IDSocio, U.IDUtente,
U.Cognome, U.Nome, U.Sesso,
U.Luogo_Nascita,
U.Provincia_Nascita,
U.Stato_Nascita,
COALESCE(C.codice, SS.codice) as Nascita_CodCatastale,
CONVERT(DATE, U.Data_Nascita) AS Data_Nascita_Orig,
REPLACE(CONVERT(VARCHAR, U.Data_Nascita, 111), '/', '-') as Data_Nascita,
U.Indirizzo_Via as Residenza_Indirizzo,
U.Indirizzo_NumeroCivico as Residenza_NumeroCivico,
U.Indirizzo_Cap as Residenza_Cap,
U.Indirizzo_Citta as Residenza_Citta,
U.Indirizzo_Pv as Residenza_Provincia,
U.CodCatastaleResidenza as Residenza_CodCatastale,
U.Indirizzo_Stato as Residenza_Stato,
U.PIVA,
U.CodiceFiscale,
U.Documento,
U.Telefono_1,
U.Telefono_2,
U.SMS,
U.Email,
U.Note,
UG.Descrizione as Categoria,
U.AutorizzaSMS,
U.AutorizzaEmail,
U.AutorizzaCartaceo,
U.CFRicevuta,
U.CFRicevutaUtente
FROM [dbo].[Utenti] U
LEFT JOIN dbo.UtenteCustom UC on UC.IDUtente = U.IDUtente
LEFT JOIN dbo.UtentiCategorie UG on UG.IDCategoria = UC.IDCategoriaUtente
LEFT JOIN dbo.Comuni C on C.Comune = U.Luogo_Nascita and C.PV = U.Provincia_Nascita
LEFT JOIN dbo.Comuni SS on SS.Comune = U.Stato_Nascita
INNER JOIN dbo.AbbonamentiIscrizione AI ON U.IDUtente = AI.IDUtente
INNER JOIN dbo.AbbonamentiDurata AD ON AI.IDDurata = AD.IDDurata
INNER JOIN dbo.Abbonamenti A ON A.IDAbbonamento = AD.IDAbbonamento
INNER JOIN dbo.AziendeAbbonamenti AA On AA.IDAbbonamentoCategoria = A.IDCategoria
INNER JOIN dbo.TesseramentiAbbonamentiDurata TAD On TAD.IDDurata = AD.IDDurata
WHERE
COALESCE(AA.IDRicevutaAzienda, 2) = 2
AND (U.Cognome <> '' AND U.Nome <> '')
AND CONVERT(DATE, DataInizio) <= CONVERT(DATE, GETDATE())
AND CONVERT(DATE, DataFine) >= CONVERT(DATE, GETDATE())
ORDER BY U.IDUtente
Could you please help me to convert it into a cursor? I need to use a cursor because the query has many rows as result and the code that use it is too slow to execute.

How to prevent timeout in query?

SELECT C.CompanyName,
B.BranchName,
E.EmployerName,
FE.EmployeeUniqueID,
pcr.EmployerUniqueID,
Case when FE.Status_id= 1 then 1 else 0 end IsUnPaid,
Case when re.EmployeeUniqueID IS NULL OR re.EmployeeUniqueID= '' then 0 else 1 end AS 'EmployeeRegistration',
FE.IncomeFixedComponent,
FE.IncomeVariableComponent,
Convert(varchar(11), Fe.PayStartDate, 106) as PayStartDate,
Convert(varchar(11), Fe.PayEndDate, 106) as PayEndDate,
S.StatusDescription,
FE.IsRejected,
FE.ID 'EdrID',
Convert(varchar(20), tr.TransactionDateTime, 113) as TransactionDateTime,
tr.BatchNo,
tr.IsDIFCreated,
Convert(varchar(20),tr.DIFFileCreationDateTime,113) as DiffDateTime
From File_EdrEntries FE
JOIN PAFFiles pe ON pe.ID = FE.PAFFile_ID
inner Join RegisteredEmployees RE
ON RE.EmployeeUniqueID= FE.EmployeeUniqueID
inner join File_PCREntries pcr on pe.ID=pcr.PAFFile_ID
JOIN Employers E ON E.EmployerID = pcr.EmployerUniqueID
JOIN Branches B ON B.BranchID = E.Branch_ID
JOIN companies C ON C.COMPANYID = B.COMPANY_ID
JOIN Statuses S ON S.StatusID = FE.Status_ID
JOIN Transactions tr on tr.EDRRecord_ID= fe.ID
where E.Branch_id=3
AND FE.IsRejected=0 AND FE.Status_id= 3 and tr.BatchNo is not null
AND Re.Employer_ID= re.Employer_ID;
THis query is supposed to return 10 million or more records and it usually causes timeout because of large no of records. So how can I improve its performance becauses I have done in where condition what I could.

First of all, you need to
optimize query more
Add required Indexes to tables involved in query
Then,
You can use this, to increase Query Timeout:
SET LOCK_TIMEOUT 1800;
SELECT ##LOCK_TIMEOUT AS [Lock Timeout];
Also, refer This Post

Find out which combination of tables filters the most data. for example if the following query filters out the majority of data you could consider creating a temp table with the data needed, index it and then use that in your bigger query.
SELECT fe.*,re.*
From File_EdrEntries FE
inner Join RegisteredEmployees RE
ON RE.EmployeeUniqueID= FE.EmployeeUniqueID
Breaking out the query into smaller chunks is likely the best way to go. Also make sure you have proper indexes in place

How to optimise an SQL query that use multiple subqueries and aggregate functions?

I have an SQL query as follows. It serves the purpose but it is very slow and a bit complicated as it have many aggregate functions and sub queries. I find it complicated and very slow.
Here is the query:
SELECT
dd.period,
DATEADD(day,1,DATEADD(month,-12,MAX(dso.date_clearing))) AS startdate,
MAX(dso.date_clearing) AS lastdate,
ROUND(SUM(dso.DSO_actual_calc)/SUM(dso.amount_received_group_currency),1) AS dso,
ROUND(SUM(dso.DSO_overdue_calc)/SUM(dso.amount_received_group_currency),1) AS dsooverdue,
(SELECT
ROUND(SUM(dso1.DSO_actual_calc)/SUM(dso1.amount_received_group_currency),1)
FROM fact_dso_cleared_items as dso1
INNER JOIN dim_date dd1
ON dso1.date_clearing = dd1.the_date
WHERE dso1.date_clearing BETWEEN
DATEADD(day,1,DATEADD(month,-12,MAX(dso.date_clearing)))
AND MAX(dso.date_clearing))) AS dso_rltm,
(SELECT
ROUND(SUM(dso2.DSO_overdue_calc)/SUM(dso2.amount_received_group_currency), 1)
FROM fact_dso_cleared_items as dso2
INNER JOIN dim_date as dd2
ON dso2.date_clearing = dd2.the_date
WHERE dso2.date_clearing BETWEEN
DATEADD(day,1,DATEADD(month,-12,MAX(dso.date_clearing)))
AND MAX(dso.date_clearing))) AS dso_overdue_rltm
FROM fact_dso_cleared_items AS dso
INNER JOIN dim_date dd
ON dso.date_clearing = dd.the_date
WHERE dd.period IN('2012/01','2012/02','2012/03','2012/04','2012/05','2012/06',
'2012/07','2012/08','2012/09','2012/10','2012/11','2012/12'))
GROUP BY dd.period
ORDER BY dd.period
Here is diagram for this query that shows tables and fields.
And here are results of this query.
What are the areas I should start improving in this query for simplification and performance?

First attempt. I'm trying to kill the multiple aggregate functions, because you are using DATEADD(Day, 1, DATEADD(Month, -12, MAX(dso.date_clearing))) multiple times. I think a with query would help to.
First attempt.
SELECT *
,dso_rltm = (SELECT ROUND(SUM(dso1.DSO_actual_calc) / SUM(dso1.amount_received_group_currency), 1)
FROM fact_dso_cleared_items as dso1
INNER JOIN dim_date dd1 ON dso1.date_clearing = dd1.the_date
WHERE dso1.date_clearing BETWEEN data.startdate AND data.lastdate)
,dso_overdue_rltm = (SELECT ROUND(SUM(dso2.DSO_overdue_calc) / SUM(dso2.amount_received_group_currency), 1)
FROM fact_dso_cleared_items as dso2
INNER JOIN dim_date as dd2 ON dso2.date_clearing = dd2.the_date
WHERE dso2.date_clearing BETWEEN data.startdate AND data.lastdate)
FROM (
SELECT
period = dd.period
,startdate = DATEADD(Day, 1, DATEADD(Month, -12, MAX(dso.date_clearing)))
,lastdate = MAX(dso.date_clearing)
,dso = ROUND(SUM(dso.DSO_actual_calc) / SUM(dso.amount_received_group_currency), 1)
,dsooverdue = ROUND(SUM(dso.DSO_overdue_calc) / SUM(dso.amount_received_group_currency), 1)
FROM fact_dso_cleared_items AS dso
INNER JOIN dim_date dd ON dso.date_clearing = dd.the_date
WHERE dd.period IN('2012/01','2012/02','2012/03','2012/04','2012/05','2012/06','2012/07','2012/08','2012/09','2012/10','2012/11','2012/12'))
GROUP BY dd.period
) data
ORDER BY data.period

SQL Server query optimisation

I inherited this hellish query designed for pagination in SQL Server.
It's only getting 25 records, but according to SQL Profiler, it does 8091 reads, 208 writes and takes 74 milliseconds. Would prefer it to be a bit faster. There is an index on the ORDER BY column deployDate.
Anyone have any ideas on how to optimise it?
SELECT TOP 25
textObjectPK, textObjectID, title, articleCredit, mediaCredit,
commentingAllowed,deployDate,
container, mediaID, mediaAlign, fileName AS fileName, fileName_wide AS fileName_wide,
width AS width, height AS height,title AS mediaTitle, extension AS extension,
embedCode AS embedCode, jsArgs as jsArgs, description as description, commentThreadID,
totalRows = Count(*) OVER()
FROM
(SELECT
ROW_NUMBER() OVER (ORDER BY textObjects.deployDate DESC) AS RowNumber,
textObjects.textObjectPK, textObjects.textObjectID, textObjects.title,
textObjects.commentingAllowed, textObjects.credit AS articleCredit,
textObjects.deployDate,
containers.container, containers.mediaID, containers.mediaAlign,
media.fileName AS fileName, media.fileName_wide AS fileName_wide,
media.width AS width, media.height AS height, media.credit AS mediaCredit,
media.title AS mediaTitle, media.extension AS extension,
mediaTypes.embedCode AS embedCode, media.jsArgs as jsArgs,
media.description as description, commentThreadID,
TotalRows = COUNT(*) OVER ()
FROM textObjects WITH (NOLOCK)
INNER JOIN containers WITH (NOLOCK)
ON containers.textObjectPK = textObjects.textObjectPK
AND (containers.containerOrder = 0 or containers.containerOrder = 1)
INNER JOIN LUTextObjectTextObjectGroup tog WITH (NOLOCK)
ON textObjects.textObjectPK = tog.textObjectPK
AND tog.textObjectGroupID in (3)
LEFT OUTER JOIN media WITH (NOLOCK)
ON containers.mediaID = media.mediaID
LEFT OUTER JOIN mediaTypes WITH (NOLOCK)
ON media.mediaTypeID = mediaTypes.mediaTypeID
WHERE (((version = 1)
AND (textObjects.textObjectTypeID in (6))
AND (DATEDIFF(minute, deployDate, GETDATE()) >= 0)
AND (DATEDIFF(minute, expireDate, GETDATE()) <= 0))
OR ( (version = 1) AND (textObjects.textObjectTypeID in (6))
AND (DATEDIFF(minute, deployDate, GETDATE()) >= 0)
AND (expireDate IS NULL)))
AND deployEnglish = 1
) tmpInlineView
WHERE RowNumber >= 51
ORDER BY deployDate DESC

I am in a similar position to with the same sort of queries. Here are some tips:
Look at the query plans to make sure you have the right indexes.
I'm not sure if MSSQL optimizes around DATEDIFF(), but if it doesn't you can precompute threshold dates and turn it into a BETWEEN clause.
If you don't need to order by all those columns in your ROW_NUMBER() clause, get rid of them. That may allow you to do the pagination on a much simpler query, then just grab the extra data you need for the 25 rows you are returning.
Also, rewrite the two LEFT OUTER JOINs like this:
LEFT OUTER JOIN
(
media WITH (NOLOCK)
LEFT OUTER JOIN mediaTypes WITH (NOLOCK)
ON media.mediaTypeID = mediaTypes.mediaTypeID
)
ON containers.mediaID = media.mediaID
which should make the query optimizer behave a little better.

Very slow stored procedure

I have a hard time with query optimization, currently I'm very close to the point of database redesign. And the stackoverflow is my last hope. I don't think that just showing you the query is enough so I've linked not only database script but also attached database backup in case you don't want to generate the data by hand
Here you can find both the script and the backup
The problems start when you try to do the following...
exec LockBranches #count=64,#lockedBy='034C0396-5C34-4DDA-8AD5-7E43B373AE5A',#lockedOn='2011-07-01 01:29:43.863',#unlockOn='2011-07-01 01:32:43.863'
The main problems occur in this part:
UPDATE B
SET B.LockedBy = #lockedBy,
B.LockedOn = #lockedOn,
B.UnlockOn = #unlockOn,
B.Complete = 1
FROM
(
SELECT TOP (#count) B.LockedBy, B.LockedOn, B.UnlockOn, B.Complete
FROM Objectives AS O
INNER JOIN Generations AS G ON G.ObjectiveID = O.ID
INNER JOIN Branches AS B ON B.GenerationID = G.ID
INNER JOIN
(
SELECT SB.BranchID AS BranchID, SUM(X.SuitableProbes) AS SuitableProbes
FROM SpicieBranches AS SB
INNER JOIN Probes AS P ON P.SpicieID = SB.SpicieID
INNER JOIN
(
SELECT P.ID, 1 AS SuitableProbes
FROM Probes AS P
/* ----> */ INNER JOIN Results AS R ON P.ID = R.ProbeID /* SSMS Estimated execution plan says this operation is the roughest */
GROUP BY P.ID
HAVING COUNT(R.ID) > 0
) AS X ON P.ID = X.ID
GROUP BY SB.BranchID
) AS X ON X.BranchID = B.ID
WHERE
(O.Active = 1)
AND (B.Sealed = 0)
AND (B.GenerationNo < O.BranchGenerations)
AND (B.LockedBy IS NULL OR DATEDIFF(SECOND, B.UnlockOn, GETDATE()) > 0)
AND (B.Complete = 1 OR X.SuitableProbes = O.BranchSize * O.EstimateCount * O.ProbeCount)
) AS B
EDIT: Here are the amounts of rows in each table:
Spicies 71536
Results 10240
Probes 10240
SpicieBranches 4096
Branches 256
Estimates 5
Generations 1
Versions 1
Objectives 1

Somebody else might be able to explain better than I can why this is much quicker. Experience tells me when you have a bunch of queries that collectively run slow together but should be quick in their individual parts then its worth trying a temporary table.
This is much quicker
ALTER PROCEDURE LockBranches
-- Add the parameters for the stored procedure here
#count INT,
#lockedOn DATETIME,
#unlockOn DATETIME,
#lockedBy UNIQUEIDENTIFIER
AS
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON
--Create Temp Table
SELECT SpicieBranches.BranchID AS BranchID, SUM(X.SuitableProbes) AS SuitableProbes
INTO #BranchSuitableProbeCount
FROM SpicieBranches
INNER JOIN Probes AS P ON P.SpicieID = SpicieBranches.SpicieID
INNER JOIN
(
SELECT P.ID, 1 AS SuitableProbes
FROM Probes AS P
INNER JOIN Results AS R ON P.ID = R.ProbeID
GROUP BY P.ID
HAVING COUNT(R.ID) > 0
) AS X ON P.ID = X.ID
GROUP BY SpicieBranches.BranchID
UPDATE B SET
B.LockedBy = #lockedBy,
B.LockedOn = #lockedOn,
B.UnlockOn = #unlockOn,
B.Complete = 1
FROM
(
SELECT TOP (#count) Branches.LockedBy, Branches.LockedOn, Branches.UnlockOn, Branches.Complete
FROM Objectives
INNER JOIN Generations ON Generations.ObjectiveID = Objectives.ID
INNER JOIN Branches ON Branches.GenerationID = Generations.ID
INNER JOIN #BranchSuitableProbeCount ON Branches.ID = #BranchSuitableProbeCount.BranchID
WHERE
(Objectives.Active = 1)
AND (Branches.Sealed = 0)
AND (Branches.GenerationNo < Objectives.BranchGenerations)
AND (Branches.LockedBy IS NULL OR DATEDIFF(SECOND, Branches.UnlockOn, GETDATE()) > 0)
AND (Branches.Complete = 1 OR #BranchSuitableProbeCount.SuitableProbes = Objectives.BranchSize * Objectives.EstimateCount * Objectives.ProbeCount)
) AS B
END
This is much quicker with an average execution time of 54ms compared to 6 seconds with the original one.
EDIT
Had a look and combined my ideas with those from RBarryYoung's solution. If you use the following to create the temporary table
SELECT SB.BranchID AS BranchID, COUNT(*) AS SuitableProbes
INTO #BranchSuitableProbeCount
FROM SpicieBranches AS SB
INNER JOIN Probes AS P ON P.SpicieID = SB.SpicieID
WHERE EXISTS(SELECT * FROM Results AS R WHERE R.ProbeID = P.ID)
GROUP BY SB.BranchID
then you can get this down to 15ms which is 400x better than we started with. Looking at the execution plan shows that there is a table scan happening on the temp table. Normally you avoid table scans as best you can but for 128 rows (in this case) it is quicker than whatever it was doing before.

This is basically a complete guess here, but in times past I've found that joining onto the results of a sub-query can be horrifically slow. That is, the subquery was being evaluated way too many times when it really didn't need to.
The way around this was to move the subqueries into CTEs and to join onto those instead. Good luck!

It appears the join on the two uniqueidentifier columns are the source of the problem. One is a clustered index, the other non-clustered on the (FK table). Good that there are indexes on them. Unfortunately guids are notoriously poor performing when joining with large numbers of rows.
As troubleshooting steps:
what state are the indexes in? When was the last time the statistics were updated?
how performant is that subquery onto itself, when executed adhoc? i.e. when you run this statement by itself, how fast does the resultset return? acceptable?
after rebuilding the 2 indexes, and updating statistics, is there any measurable difference?
SELECT P.ID, 1 AS SuitableProbes FROM Probes AS P
INNER JOIN Results AS R ON P.ID = R.ProbeID
GROUP BY P.ID HAVING COUNT(R.ID) > 0

The following runs about 15x faster on my system:
UPDATE B
SET B.LockedBy = #lockedBy,
B.LockedOn = #lockedOn,
B.UnlockOn = #unlockOn,
B.Complete = 1
FROM
(
SELECT TOP (#count) B.LockedBy, B.LockedOn, B.UnlockOn, B.Complete
FROM Objectives AS O
INNER JOIN Generations AS G ON G.ObjectiveID = O.ID
INNER JOIN Branches AS B ON B.GenerationID = G.ID
INNER JOIN
(
SELECT SB.BranchID AS BranchID, COUNT(*) AS SuitableProbes
FROM SpicieBranches AS SB
INNER JOIN Probes AS P ON P.SpicieID = SB.SpicieID
WHERE EXISTS(SELECT * FROM Results AS R WHERE R.ProbeID = P.ID)
GROUP BY SB.BranchID
) AS X ON X.BranchID = B.ID
WHERE
(O.Active = 1)
AND (B.Sealed = 0)
AND (B.GenerationNo < O.BranchGenerations)
AND (B.LockedBy IS NULL OR DATEDIFF(SECOND, B.UnlockOn, GETDATE()) > 0)
AND (B.Complete = 1 OR X.SuitableProbes = O.BranchSize * O.EstimateCount * O.ProbeCount)
) AS B

Insertion of sub query into local temporary table
SELECT SB.BranchID AS BranchID, SUM(X.SuitableProbes) AS SuitableProbes
into #temp FROM SpicieBranches AS SB
INNER JOIN Probes AS P ON P.SpicieID = SB.SpicieID
INNER JOIN
(
SELECT P.ID, 1 AS SuitableProbes
FROM Probes AS P
/* ----> */ INNER JOIN Results AS R ON P.ID = R.ProbeID /* SSMS Estimated execution plan says this operation is the roughest */
GROUP BY P.ID
HAVING COUNT(R.ID) > 0
) AS X ON P.ID = X.ID
GROUP BY SB.BranchID
The below query shows the partial joins with the corresponding table instead of complete!!
UPDATE B
SET B.LockedBy = #lockedBy,
B.LockedOn = #lockedOn,
B.UnlockOn = #unlockOn,
B.Complete = 1
FROM
(
SELECT TOP (#count) B.LockedBy, B.LockedOn, B.UnlockOn, B.Complete
From
(
SELECT ID, BranchGenerations, (BranchSize * EstimateCount * ProbeCount) as MultipliedFactor
FROM Objectives AS O WHERE (O.Active = 1)
)O
INNER JOIN Generations AS G ON G.ObjectiveID = O.ID
Inner Join
(
Select Sealed, GenerationNo, LockedBy, UnlockOn, ID, Complete
From Branches
Where B.Sealed = 0 AND (B.LockedBy IS NULL OR DATEDIFF(SECOND, B.UnlockOn, GETDATE()) > 0)
)B ON B.GenerationID = G.ID
INNER JOIN
(
Select * from #temp
) AS X ON X.BranchID = B.ID
WHERE
AND (B.GenerationNo < O.BranchGenerations)
AND (B.Complete = 1 OR X.SuitableProbes = O.MultipliedFactor)
) AS B

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Help improving SQL join - sql

You can put the case in the where cause, and not directly on select first line. why you need to put many join if in this statment you just user the tables A,E and G? To performance better queries you can use execution plan on management Studio.

Do you need to do the conversions on the datetime for the DATEDIFF functions? Are you storing the dates as test, or are you reconverting to get rid of the time? If you are, then you don't need to as days different will be correct including time.

Related

Convert query select to a cursor

How to prevent timeout in query?

How to optimise an SQL query that use multiple subqueries and aggregate functions?

SQL Server query optimisation

Very slow stored procedure

Categories

Resources