SQL Server query optimisation - sql

I inherited this hellish query designed for pagination in SQL Server.
It's only getting 25 records, but according to SQL Profiler, it does 8091 reads, 208 writes and takes 74 milliseconds. Would prefer it to be a bit faster. There is an index on the ORDER BY column deployDate.
Anyone have any ideas on how to optimise it?
SELECT TOP 25
textObjectPK, textObjectID, title, articleCredit, mediaCredit,
commentingAllowed,deployDate,
container, mediaID, mediaAlign, fileName AS fileName, fileName_wide AS fileName_wide,
width AS width, height AS height,title AS mediaTitle, extension AS extension,
embedCode AS embedCode, jsArgs as jsArgs, description as description, commentThreadID,
totalRows = Count(*) OVER()
FROM
(SELECT
ROW_NUMBER() OVER (ORDER BY textObjects.deployDate DESC) AS RowNumber,
textObjects.textObjectPK, textObjects.textObjectID, textObjects.title,
textObjects.commentingAllowed, textObjects.credit AS articleCredit,
textObjects.deployDate,
containers.container, containers.mediaID, containers.mediaAlign,
media.fileName AS fileName, media.fileName_wide AS fileName_wide,
media.width AS width, media.height AS height, media.credit AS mediaCredit,
media.title AS mediaTitle, media.extension AS extension,
mediaTypes.embedCode AS embedCode, media.jsArgs as jsArgs,
media.description as description, commentThreadID,
TotalRows = COUNT(*) OVER ()
FROM textObjects WITH (NOLOCK)
INNER JOIN containers WITH (NOLOCK)
ON containers.textObjectPK = textObjects.textObjectPK
AND (containers.containerOrder = 0 or containers.containerOrder = 1)
INNER JOIN LUTextObjectTextObjectGroup tog WITH (NOLOCK)
ON textObjects.textObjectPK = tog.textObjectPK
AND tog.textObjectGroupID in (3)
LEFT OUTER JOIN media WITH (NOLOCK)
ON containers.mediaID = media.mediaID
LEFT OUTER JOIN mediaTypes WITH (NOLOCK)
ON media.mediaTypeID = mediaTypes.mediaTypeID
WHERE (((version = 1)
AND (textObjects.textObjectTypeID in (6))
AND (DATEDIFF(minute, deployDate, GETDATE()) >= 0)
AND (DATEDIFF(minute, expireDate, GETDATE()) <= 0))
OR ( (version = 1) AND (textObjects.textObjectTypeID in (6))
AND (DATEDIFF(minute, deployDate, GETDATE()) >= 0)
AND (expireDate IS NULL)))
AND deployEnglish = 1
) tmpInlineView
WHERE RowNumber >= 51
ORDER BY deployDate DESC

I am in a similar position to with the same sort of queries. Here are some tips:
Look at the query plans to make sure you have the right indexes.
I'm not sure if MSSQL optimizes around DATEDIFF(), but if it doesn't you can precompute threshold dates and turn it into a BETWEEN clause.
If you don't need to order by all those columns in your ROW_NUMBER() clause, get rid of them. That may allow you to do the pagination on a much simpler query, then just grab the extra data you need for the 25 rows you are returning.
Also, rewrite the two LEFT OUTER JOINs like this:
LEFT OUTER JOIN
(
media WITH (NOLOCK)
LEFT OUTER JOIN mediaTypes WITH (NOLOCK)
ON media.mediaTypeID = mediaTypes.mediaTypeID
)
ON containers.mediaID = media.mediaID
which should make the query optimizer behave a little better.

Related

How to optimize query to reduce execution time

My query's order by clause & datetime comparison of between causes the execution time to increase, where as I had indexed the datetime
SELECT TOP(1)
#PeriodStart = DATEADD(SECOND, 1, dbo.tbl_WPT_AttendanceLog.ATDateTime)
FROM
dbo.tbl_WPT_EmployeeMachineLink
INNER JOIN
dbo.tbl_WPT_Machine ON dbo.tbl_WPT_EmployeeMachineLink.FK_tbl_WPT_Machine_ID = dbo.tbl_WPT_Machine.ID
RIGHT OUTER JOIN
dbo.tbl_WPT_AttendanceLog ON dbo.tbl_WPT_EmployeeMachineLink.FK_tbl_WPT_Machine_ID = dbo.tbl_WPT_AttendanceLog.FK_tbl_WPT_Machine_ID
AND dbo.tbl_WPT_EmployeeMachineLink.MachineEnrollmentNo = dbo.tbl_WPT_AttendanceLog.ATEnrollmentNo
WHERE
(dbo.tbl_WPT_EmployeeMachineLink.FK_tbl_WPT_Employee_ID = #EmpID)
AND (dbo.tbl_WPT_AttendanceLog.ATDateTime BETWEEN #ShiftEndPreviousInstance AND #ShiftStart)
AND dbo.tbl_WPT_AttendanceLog.ATInOutMode in (1,2,5)
OR (dbo.tbl_WPT_AttendanceLog.ATDateTime BETWEEN #ShiftEndPreviousInstance AND #ShiftStart)
AND (dbo.tbl_WPT_AttendanceLog.FK_tbl_WPT_Employee_ID = #EmpID)
AND dbo.tbl_WPT_AttendanceLog.ATInOutMode in (1,2,5)
ORDER BY
dbo.tbl_WPT_AttendanceLog.ATDateTime DESC
It looks like you're trying to get an employee's info from multiple sources (EmployeeMachineLink and AttendanceLog). Is that correct? If so, I think you just need to clean up the WHERE clause logic:
SELECT TOP(1)
#PeriodStart = DATEADD(SECOND, 1, dbo.tbl_WPT_AttendanceLog.ATDateTime)
FROM dbo.tbl_WPT_EmployeeMachineLink eml
INNER JOIN dbo.tbl_WPT_Machine ON eml.FK_tbl_WPT_Machine_ID = dbo.tbl_WPT_Machine.ID
RIGHT OUTER JOIN dbo.tbl_WPT_AttendanceLog ON eml.FK_tbl_WPT_Machine_ID = dbo.tbl_WPT_AttendanceLog.FK_tbl_WPT_Machine_ID
AND eml.MachineEnrollmentNo = dbo.tbl_WPT_AttendanceLog.ATEnrollmentNo
WHERE (
eml.FK_tbl_WPT_Employee_ID = #EmpID OR
dbo.tbl_WPT_AttendanceLog.FK_tbl_WPT_Employee_ID = #EmpID
)
AND (dbo.tbl_WPT_AttendanceLog.ATDateTime BETWEEN #ShiftEndPreviousInstance AND #ShiftStart)
AND dbo.tbl_WPT_AttendanceLog.ATInOutMode IN (1,2,5)
ORDER BY dbo.tbl_WPT_AttendanceLog.ATDateTime DESC
Changes
- added table alias eml for readability
- removed duplicate reference to dbo.tbl_WPT_AttendanceLog.ATInOutMode IN (1,2,5)
- removed duplicate BETWEEN ... AND ... reference
- grouped OR conditions together
You have to be careful when mixing OR with AND without using parentheses. Otherwise that will lead to unexpected results and possibly poor performance.
Let me know if that helps.

Sort & Parallelism costing my query too much time

My SQL query is taking a large amount of time to run. I wrote a similar query and pit them against each other and this one runs FASTER when a small dataset (10K lines) is used, but about 20-30x slower than the other one when a LARGE dataset (500K+ lines) is used. My first query however does not have ONE column that I need, and I cannot add it without going about it with this different approach.
SELECT a.[RFIDTAGID], a.[JOB_NUMBER], d.[PROJECT_NUMBER], a.[PART_NUMBER], a.[QUANTITY], b.[DESIGNATION] as LOCATION,
c.[DESIGNATION] as CONTAINER, a.[LAST_SEEN_TIME], b.[TYPE], b.[BLDG], d.[PBG], d.[PLANNED_MFG_DELIVERY_DATE], d.[EXTENSION_DATE], a.[ORG_ID]
FROM [LTS].[dbo].[LTS_PACKAGE] as a
LEFT OUTER JOIN (
SELECT [DESIGNATION], [CONTAINER_ID], [LOCATION_ID]
FROM [LTS].[dbo].[LTS_CONTAINER]
) c ON a.[CONTAINER_ID] = c.[CONTAINER_ID]
LEFT OUTER JOIN (
SELECT [DESIGNATION], [TYPE], [BLDG], [LOCATION_ID]
FROM [LTS].[dbo].[LTS_LOCATION]
) b ON a.[LAST_SEEN_LOC_ID] = b.[LOCATION_ID] OR b.[LOCATION_ID] = c.[LOCATION_ID]
INNER JOIN (
SELECT [PBG], [PLANNED_MFG_DELIVERY_DATE], [EXTENSION_DATE], [DISCRETE_JOB_NUMBER], [PROJECT_NUMBER]
FROM [LTS].[dbo].[LTS_DISCRETE_JOB_SUMMARY]
)d ON a.[JOB_NUMBER] = d.[DISCRETE_JOB_NUMBER]
WHERE
d.[PLANNED_MFG_DELIVERY_DATE] <= GETDATE()
AND b.[TYPE] NOT IN('MFG', 'Manufacturing')
AND (b.[DESIGNATION] IS NOT NULL OR c.[DESIGNATION] IS NOT NULL)
ORDER BY [JOB_NUMBER], d.[PLANNED_MFG_DELIVERY_DATE] desc, [RFIDTAGID];
You can see below the usage, 100% is roughly 20,000, whereas my other query is about 900:
Is there something I can do to speed up my query, or where did I bog it down?
Remove inner selects and join directly to the tables:
SELECT a.[RFIDTAGID], a.[JOB_NUMBER], d.[PROJECT_NUMBER], a.[PART_NUMBER], a.[QUANTITY], b.[DESIGNATION] as LOCATION,
c.[DESIGNATION] as CONTAINER, a.[LAST_SEEN_TIME], b.[TYPE], b.[BLDG], d.[PBG], d.[PLANNED_MFG_DELIVERY_DATE], d.[EXTENSION_DATE], a.[ORG_ID]
FROM [LTS].[dbo].[LTS_PACKAGE] a
LEFT OUTER JOIN [LTS].[dbo].[LTS_CONTAINER]
c ON a.[CONTAINER_ID] = c.[CONTAINER_ID]
LEFT OUTER JOIN [dbo].[LTS_LOCATION]
b ON a.[LAST_SEEN_LOC_ID] = b.[LOCATION_ID] OR b.[LOCATION_ID] = c.[LOCATION_ID]
INNER JOIN
[LTS].[dbo].[LTS_DISCRETE_JOB_SUMMARY]
d ON a.[JOB_NUMBER] = d.[DISCRETE_JOB_NUMBER]
WHERE
d.[PLANNED_MFG_DELIVERY_DATE] <= GETDATE()
AND b.[TYPE] NOT IN('MFG', 'Manufacturing')
AND (b.[DESIGNATION] IS NOT NULL OR c.[DESIGNATION] IS NOT NULL)
ORDER BY [JOB_NUMBER], d.[PLANNED_MFG_DELIVERY_DATE] desc, [RFIDTAGID];

SQL not efficient enough, tuning assistance required

We have some SQL that is ok on smaller data volumes but poor once we scale up to selecting from larger volumes. Is there a faster alternative style to achieve the same output as below? The idea is to pull back a single unique row to get latest version of the data... The SQL does reference another view but this view runs very fast - so we expect the issue is here below and want to try a different approach
SELECT *
FROM
(SELECT (select CustomerId from PremiseProviderVersionsToday
where PremiseProviderId = b.PremiseProviderId) as CustomerId,
c.D3001_MeterId, b.CoreSPID, a.EnteredBy,
ROW_NUMBER() OVER (PARTITION BY b.PremiseProviderId
ORDER BY a.effectiveDate DESC) AS rowNumber
FROM PremiseMeterProviderVersions a, PremiseProviders b,
PremiseMeterProviders c
WHERE (a.TransactionDateTimeEnd IS NULL
AND a.PremiseMeterProviderId = c.PremiseMeterProviderId
AND b.PremiseProviderId = c.PremiseProviderId)
) data
WHERE data.rowNumber = 1
As Bilal Ayub stated above, the correlated subquery can result in performance issues. See here for more detail. Below are my suggestions:
Change all to explicit joins (ANSI standard)
Use aliases that are more descriptive than single characters (this is mostly to help readers understand what each table does)
Convert data subquery to a temp table or cte (temp tables and ctes usually perform better than subqueries)
Note: normally, you should explicitly create and insert into your temp table but I chose not to do that here as I do not know the data types of your columns.
SELECT d.CustomerId
, c.D3001_MeterId
, b.CoreSPID
, a.EnteredBy
, rowNumber = ROW_NUMBER() OVER(PARTITION BY b.PremiseProviderId ORDER BY a.effectiveDate DESC)
INTO #tmp_RowNum
FROM PremiseMeterProviderVersions a
JOIN PremiseMeterProviders c ON c.PremiseMeterProviderId = a.PremiseMeterProviderId
JOIN PremiseProviders b ON b.PremiseProviderId = c.PremiseProviderId
JOIN PremiseProviderVersionsToday d ON d.PremiseProviderId = b.PremiseProviderId
WHERE a.TransactionDateTimeEnd IS NULL
SELECT *
FROM #tmp_RowNum
WHERE rowNumber = 1
You are running a correlated query that will run in loop, if size of table is small it will be faster, i would suggest to change it and try to join the table in order to get customerid.
(select CustomerId from PremiseProviderVersionsToday where PremiseProviderId = b.PremiseProviderId) as CustomerId
Consider derived tables including an aggregate query that calculates maximum EffectoveDate by PremiseProviderId and unit level query, each using explicit joins (current ANSI SQL standard) and not implicit as you currently use:
SELECT data.*
FROM
(SELECT t.CustomerId, c.D3001_MeterId, b.CoreSPID, a.EnteredBy,
b.PremiseProviderId, a.EffectiveDate
FROM PremiseMeterProviders c
INNER JOIN PremiseMeterProviderVersions a
ON a.PremiseMeterProviderId = c.PremiseMeterProviderId
AND a.TransactionDateTimeEnd IS NULL
INNER JOIN PremiseProviders b
ON b.PremiseProviderId = c.PremiseProviderId
INNER JOIN PremiseProviderVersionsToday t
ON t.PremiseProviderId = b.PremiseProviderId
) data
INNER JOIN
(SELECT b.PremiseProviderId, MAX(a.EffectiveDate) As MaxEffDate
FROM PremiseMeterProviders c
INNER JOIN PremiseMeterProviderVersions a
ON a.PremiseMeterProviderId = c.PremiseMeterProviderId
AND a.TransactionDateTimeEnd IS NULL
INNER JOIN PremiseProviders b
ON b.PremiseProviderId = c.PremiseProviderId
GROUP BY b.PremiseProviderId
) agg
ON data.PremiseProviderId = agg.PremiseProviderId
AND data.EffectiveDate = agg.MaxEffDate

SQL - select only newest record with WHERE clause

I have been trying to get some data off our database but got stuck when I needed to only get the newest file upload for each file type. I have done this before using the WHERE clause but this time there is an extra table involved that is needed to determine the file type.
My query looks like this so far and i am getting six records for this user (2x filetypeNo4 and 4x filetypeNo2).
SELECT db_file.fileID
,db_profile.NAME
,db_applicationFileType.fileTypeID
,> db_file.dateCreated
FROM db_file
LEFT JOIN db_applicationFiles
ON db_file.fileID = db_applicationFiles.fileID
LEFT JOIN db_profile
ON db_applicationFiles.profileID = db_profile.profileID
LEFT JOIN db_applicationFileType
ON db_applicationFiles.fileTypeID = > > db_applicationFileType.fileTypeID
WHERE db_profile.profileID IN ('19456')
AND db_applicationFileType.fileTypeID IN ('2','4')
I have the WHERE clause looking like this which is not working:
(db_file.dateCreated IS NULL
OR db_file.dateCreated = (
SELECT MAX(db_file.dateCreated)
FROM db_file left join
db_applicationFiles on db_file.fileID = db_applicationFiles.fileID
WHERE db_applicationFileType.fileTypeID = db_applicationFiles.FiletypeID
))
Sorry I am a noob so this may be really simple, but I just learn this stuff as I go on my own..
SELECT
ff.fileID,
pf.NAME,
ff.fileTypeID,
ff.dateCreated
FROM db_profile pf
OUTER APPLY
(
SELECT TOP 1 af.fileTypeID, df.dateCreated, df.fileID
FROM db_file df
INNER JOIN db_applicationFiles af
ON df.fileID = af.fileID
WHERE af.profileID = pf.profileID
AND af.fileTypeID IN ('2','4')
ORDER BY create_date DESC
) ff
WHERE pf.profileID IN ('19456')
And it looks like all of your joins are actually INNER. Unless there may be profile without files (that's why OUTER apply instead of CROSS).
What about an obvious:
SELECT * FROM
(SELECT * FROM db_file ORDER BY dateCreated DESC) AS files1
GROUP BY fileTypeID ;

Help improving SQL join

I have a stored procedure that runs to update gaming points for user balances. It's an insert with 5 subqueries. I have isolated one of the subqueries as the query that slows the entire batch down. Without it, the stored procedure will run in under 2 seconds. With it, it will take as much as 8 seconds. 8 Seconds isn't the end of the world, but for the sake of scalability, I will need to have it complete faster. Here is the isolated subquery:
(SELECT IsNull(Sum(A.TransAmount) + Sum(Case When A.BetResult = 1 Then (A.BetWinAmount + (A.TransAmount * -1)) End), 0)
FROM User_T A
LEFT OUTER JOIN User_TD B on A.TID = B.TID
LEFT OUTER JOIN Lines_BL C ON B.LID = C.LID
LEFT OUTER JOIN Lines_BM D ON C.BMID = D.BMID
LEFT OUTER JOIN Event_M E ON D.EID = E.EID
LEFT OUTER JOIN Event_KB F ON A.TransReason = F.BID
LEFT OUTER JOIN Event_M G ON F.BID = G.EID
where A.UserID = U.UserID AND (A.IsSettled = 1)
AND
(
(A.TransReason = 1 AND (datediff(dd, Convert(datetime, E.EDate, 101), Convert(datetime, #EndDate, 101)) = #DaysAgo)) OR
(A.TransReason >= 3000 AND (datediff(dd, Convert(datetime, G.EDate, 101), Convert(datetime, #EndDate, 101)) = #DaysAgo)
AND [dbo].[Event_CEAFKBID](A.TransReason) = 1) OR
(A.TransReason BETWEEN 3 and 150 AND (datediff(dd, Convert(datetime, A.TransDT, 101), Convert(datetime, #EndDate, 101)) = #DaysAgo))
)
What I have done to further isolate: When I run a Select * on just the joins (without the where clauses), the performance in very good - > 100000 rows in under a second. As I add in the where clauses, I believe the great slow down is from the 'or' clause and/or the function that needs to be evaluated.
As I understand it, a function inside the where clause evaluates each row - as opposed to somehow caching the definition of the function and evaluating that way. I do have indexes on the tables, but I am wondering if some of them are not correct.
Without you knowing the full database structure, I am sure it's very difficult to pin down where the problem is, but I would like to get pointed in a direction to begin to further isolate.
I suspect your biggest performance hits are from the correlated subquery (whatever table is behind U.UserId) and from the embedded function call dbo.Event_CEAFKBID. Much of course depends upon how big the tables are (how many rows are being read). All those datetime conversions won’t help and generate a very strong “bad design” smell, but I don’t think they’d impact performance too much.
Those left outer joins are ugly, as the optimizer has to check them all for row – so if “A” is big, all the joins on all the rows have to be performed, even if there’s no data there. If they can be replaced with inner joins, do so, but I’m guessing not because of that “table E or table G” logic. Lesses, it sure looks like what you’ve got is three separate queries moshed into one; if you broke it out into three, unioned together, it’d look something like the Frankenstein query below. I’ve no idea if this would run faster or not (heck, I can’t even debug the query and make sure the panetheses balance), but if you’ve got sparse data relative to your logic this should run pretty fast. (I took out the date conversions to make the code more legible, you’d have to plug them back in.)
SELECT isnull(sum(Total), 0) FinalTotal from (
SELECT
sum(A.TransAmount + Case When A.BetResult = 1 Then A.BetWinAmount - A.TransAmount else 0 End) Total
FROM User_T A
INNER JOIN User_TD B on A.TID = B.TID
INNER JOIN Lines_BL C ON B.LID = C.LID
INNER JOIN Lines_BM D ON C.BMID = D.BMID
INNER JOIN Event_M E ON D.EID = E.EID
where A.UserID = U.UserID
AND A.IsSettled = 1
AND A.TransReason = 1
AND (datediff(dd, E.EDate, #EndDate) = #DaysAgo))
UNION ALL SELECT
sum(A.TransAmount + Case When A.BetResult = 1 Then A.BetWinAmount - A.TransAmount else 0 End) Total
FROM User_T A
INNER JOIN Event_KB F ON A.TransReason = F.BID
INNER JOIN Event_M G ON F.BID = G.EID
where A.UserID = U.UserID
AND A.IsSettled = 1
AND A.TransReason >= 3000
AND (datediff(dd, G.EDate, #EndDate) = #DaysAgo)
AND [dbo].[Event_CEAFKBID](A.TransReason) = 1
UNION ALL SELECT
sum(A.TransAmount + Case When A.BetResult = 1 Then A.BetWinAmount - A.TransAmount else 0 End) Total
FROM User_T A
where A.UserID = U.UserID
AND A.IsSettled = 1
AND A.TransReason BETWEEN 3 and 150
AND datediff(dd, A.TransDT, #EndDate) = #DaysAgo)
) ThreeWayUnion
You can put the case in the where cause, and not directly on select first line.
why you need to put many join if in this statment you just user the tables A,E and G?
To performance better queries you can use execution plan on management Studio.
Correlated subqueries are a very poor programming technique which equates to using a cursor in the query. Make it a derived table instead.
And yes those functions are slowing you down. If you have to convert to datetime, your database structure needs to be fixed and the data stored correctly as datetime.
Do you need to do the conversions on the datetime for the DATEDIFF functions? Are you storing the dates as test, or are you reconverting to get rid of the time? If you are, then you don't need to as days different will be correct including time.
You should review whether the outer joins are necessary - they are more expensive than inner joins. You have some values that come from the dominant table, tagged A. You also have an OR condition that references E, and an OR condition that references G. I'd look to restructure the query along the lines of:
SELECT SUM(x.result)
FROM (SELECT A.TransAmount + CASE WHEN A.BetResult = 1
THEN (A.BetWinAmount + (A.TransAmount * -1))
ELSE 0 END AS result
FROM A
WHERE A.TransReason BETWEEN 3 AND 150
AND datediff(dd, Convert(datetime, A.TransDT, 101),
Convert(datetime, #EndDate, 101)) = #DaysAgo
AND A.UserID = U.UserID -- Where does alias U come from?
AND A.IsSettled = 1
UNION
SELECT A.TransAmount + CASE WHEN A.BetResult = 1
THEN (A.BetWinAmount + (A.TransAmount * -1))
ELSE 0 END AS result
FROM User_T A
JOIN User_TD B ON A.TID = B.TID
JOIN Lines_BL C ON B.LID = C.LID
JOIN Lines_BM D ON C.BMID = D.BMID
JOIN Event_M E ON D.EID = E.EID
WHERE A.TransReason = 1
AND datediff(dd, Convert(datetime, E.EDate, 101),
Convert(datetime, #EndDate, 101)) = #DaysAgo
AND A.UserID = U.UserID -- Where does alias U come from?
AND A.IsSettled = 1
UNION
SELECT A.TransAmount + CASE WHEN A.BetResult = 1
THEN (A.BetWinAmount + (A.TransAmount * -1))
ELSE 0 END S result
FROM User_T A
JOIN User_TD B ON A.TID = B.TID
JOIN Lines_BL C ON B.LID = C.LID
JOIN Lines_BM D ON C.BMID = D.BMID
JOIN Event_M E ON D.EID = E.EID
JOIN Event_KB F ON A.TransReason = F.BID
JOIN Event_M G ON F.BID = G.EID
WHERE A.TransReason >= 3000
AND datediff(dd, Convert(datetime, G.EDate, 101),
Convert(datetime, #EndDate, 101)) = #DaysAgo
AND [dbo].[Event_CEAFKBID](A.TransReason) = 1
AND A.UserID = U.UserID -- Where does alias U come from?
AND A.IsSettled = 1
) AS x
The thinking here is that the inner join queries will each be quicker than the outer join queries, and summing intermediate results is not a hardship to the DBMS (it was doing that anyway). It probably also avoids the need for IFNULL.
The alias U is, presumably, a reference to the outer query of which this is a part.