How to optimize query to reduce execution time - sql

My query's order by clause & datetime comparison of between causes the execution time to increase, where as I had indexed the datetime
SELECT TOP(1)
#PeriodStart = DATEADD(SECOND, 1, dbo.tbl_WPT_AttendanceLog.ATDateTime)
FROM
dbo.tbl_WPT_EmployeeMachineLink
INNER JOIN
dbo.tbl_WPT_Machine ON dbo.tbl_WPT_EmployeeMachineLink.FK_tbl_WPT_Machine_ID = dbo.tbl_WPT_Machine.ID
RIGHT OUTER JOIN
dbo.tbl_WPT_AttendanceLog ON dbo.tbl_WPT_EmployeeMachineLink.FK_tbl_WPT_Machine_ID = dbo.tbl_WPT_AttendanceLog.FK_tbl_WPT_Machine_ID
AND dbo.tbl_WPT_EmployeeMachineLink.MachineEnrollmentNo = dbo.tbl_WPT_AttendanceLog.ATEnrollmentNo
WHERE
(dbo.tbl_WPT_EmployeeMachineLink.FK_tbl_WPT_Employee_ID = #EmpID)
AND (dbo.tbl_WPT_AttendanceLog.ATDateTime BETWEEN #ShiftEndPreviousInstance AND #ShiftStart)
AND dbo.tbl_WPT_AttendanceLog.ATInOutMode in (1,2,5)
OR (dbo.tbl_WPT_AttendanceLog.ATDateTime BETWEEN #ShiftEndPreviousInstance AND #ShiftStart)
AND (dbo.tbl_WPT_AttendanceLog.FK_tbl_WPT_Employee_ID = #EmpID)
AND dbo.tbl_WPT_AttendanceLog.ATInOutMode in (1,2,5)
ORDER BY
dbo.tbl_WPT_AttendanceLog.ATDateTime DESC

It looks like you're trying to get an employee's info from multiple sources (EmployeeMachineLink and AttendanceLog). Is that correct? If so, I think you just need to clean up the WHERE clause logic:
SELECT TOP(1)
#PeriodStart = DATEADD(SECOND, 1, dbo.tbl_WPT_AttendanceLog.ATDateTime)
FROM dbo.tbl_WPT_EmployeeMachineLink eml
INNER JOIN dbo.tbl_WPT_Machine ON eml.FK_tbl_WPT_Machine_ID = dbo.tbl_WPT_Machine.ID
RIGHT OUTER JOIN dbo.tbl_WPT_AttendanceLog ON eml.FK_tbl_WPT_Machine_ID = dbo.tbl_WPT_AttendanceLog.FK_tbl_WPT_Machine_ID
AND eml.MachineEnrollmentNo = dbo.tbl_WPT_AttendanceLog.ATEnrollmentNo
WHERE (
eml.FK_tbl_WPT_Employee_ID = #EmpID OR
dbo.tbl_WPT_AttendanceLog.FK_tbl_WPT_Employee_ID = #EmpID
)
AND (dbo.tbl_WPT_AttendanceLog.ATDateTime BETWEEN #ShiftEndPreviousInstance AND #ShiftStart)
AND dbo.tbl_WPT_AttendanceLog.ATInOutMode IN (1,2,5)
ORDER BY dbo.tbl_WPT_AttendanceLog.ATDateTime DESC
Changes
- added table alias eml for readability
- removed duplicate reference to dbo.tbl_WPT_AttendanceLog.ATInOutMode IN (1,2,5)
- removed duplicate BETWEEN ... AND ... reference
- grouped OR conditions together
You have to be careful when mixing OR with AND without using parentheses. Otherwise that will lead to unexpected results and possibly poor performance.
Let me know if that helps.

Related

Why do multiple EXISTS break a query

I am attempting to include a new table with values that need to be checked and included in a stored procedure. Statement 1 is the existing table that needs to be checked against, while statement 2 is the new table to check against.
I currently have 2 EXISTS conditions that function independently and produce the results I am expecting. By this I mean if I comment out Statement 1, statement 2 works and vice versa. When I put them together the query doesn't complete, there is no error but it times out which is unexpected because each statement only takes a few seconds.
I understand there is likely a better way to do this but before I do, I would like to know why I cannot seem to do multiple exists statements like this? Are there not meant to be multiple EXISTS conditions in the WHERE clause?
SELECT *
FROM table1 S
WHERE
--Statement 1
EXISTS
(
SELECT 1
FROM table2 P WITH (NOLOCK)
INNER JOIN table3 SA ON SA.ID = P.ID
WHERE P.DATE = #Date AND P.OTHER_ID = S.ID
AND
(
SA.FILTER = ''
OR
(
SA.FILTER = 'bar'
AND
LOWER(S.OTHER) = 'foo'
)
)
)
OR
(
--Statement 2
EXISTS
(
SELECT 1
FROM table4 P WITH (NOLOCK)
INNER JOIN table5 SA ON SA.ID = P.ID
WHERE P.DATE = #Date
AND P.OTHER_ID = S.ID
AND LOWER(S.OTHER) = 'foo'
)
)
EDIT: I have included the query details. Table 1-5 represent different tables, there are no repeated tables.
Too long to comment.
Your query as written seems correct. The timeout will only be able to be troubleshot from the execution plan, but here are a few things that could be happening or that you could benefit from.
Parameter sniffing on #Date. Try hard-coding this value and see if you still get the same slowness
No covering index on P.OTHER_ID or P.DATE or P.ID or SA.ID which would cause a table scan for these predicates
Indexes for the above columns which aren't optimal (including too many columns, etc)
Your query being serial when it may benefit from parallelism.
Using the LOWER function on a database which doesn't have a case sensitive collation (most don't, though this function doesn't slow things down that much)
You have a bad query plan in cache. Try adding OPTION (RECOMPILE) at the bottom so you get a new query plan. This is also done when comparing the speed of two queries to ensure they aren't using cached plans, or one isn't when another is which would skew the results.
Since your query is timing out, try including the estimated execution plan and post it for us at past the plan
I found putting 2 EXISTS in the WHERE condition made the whole process take significantly longer. What I found fixed it was using UNION and keeping the EXISTS in separate queries. The final result looked like the following:
SELECT *
FROM table1 S
WHERE
--Statement 1
EXISTS
(
SELECT 1
FROM table2 P WITH (NOLOCK)
INNER JOIN table3 SA ON SA.ID = P.ID
WHERE P.DATE = #Date AND P.OTHER_ID = S.ID
AND
(
SA.FILTER = ''
OR
(
SA.FILTER = 'bar'
AND
LOWER(S.OTHER) = 'foo'
)
)
)
UNION
--Statement 2
SELECT *
FROM table1 S
WHERE
EXISTS
(
SELECT 1
FROM table4 P WITH (NOLOCK)
INNER JOIN table5 SA ON SA.ID = P.ID
WHERE P.DATE = #Date
AND P.OTHER_ID = S.ID
AND LOWER(S.OTHER) = 'foo'
)

SQL - select only newest record with WHERE clause

I have been trying to get some data off our database but got stuck when I needed to only get the newest file upload for each file type. I have done this before using the WHERE clause but this time there is an extra table involved that is needed to determine the file type.
My query looks like this so far and i am getting six records for this user (2x filetypeNo4 and 4x filetypeNo2).
SELECT db_file.fileID
,db_profile.NAME
,db_applicationFileType.fileTypeID
,> db_file.dateCreated
FROM db_file
LEFT JOIN db_applicationFiles
ON db_file.fileID = db_applicationFiles.fileID
LEFT JOIN db_profile
ON db_applicationFiles.profileID = db_profile.profileID
LEFT JOIN db_applicationFileType
ON db_applicationFiles.fileTypeID = > > db_applicationFileType.fileTypeID
WHERE db_profile.profileID IN ('19456')
AND db_applicationFileType.fileTypeID IN ('2','4')
I have the WHERE clause looking like this which is not working:
(db_file.dateCreated IS NULL
OR db_file.dateCreated = (
SELECT MAX(db_file.dateCreated)
FROM db_file left join
db_applicationFiles on db_file.fileID = db_applicationFiles.fileID
WHERE db_applicationFileType.fileTypeID = db_applicationFiles.FiletypeID
))
Sorry I am a noob so this may be really simple, but I just learn this stuff as I go on my own..
SELECT
ff.fileID,
pf.NAME,
ff.fileTypeID,
ff.dateCreated
FROM db_profile pf
OUTER APPLY
(
SELECT TOP 1 af.fileTypeID, df.dateCreated, df.fileID
FROM db_file df
INNER JOIN db_applicationFiles af
ON df.fileID = af.fileID
WHERE af.profileID = pf.profileID
AND af.fileTypeID IN ('2','4')
ORDER BY create_date DESC
) ff
WHERE pf.profileID IN ('19456')
And it looks like all of your joins are actually INNER. Unless there may be profile without files (that's why OUTER apply instead of CROSS).
What about an obvious:
SELECT * FROM
(SELECT * FROM db_file ORDER BY dateCreated DESC) AS files1
GROUP BY fileTypeID ;

Horrible sql server performance when capturing result in variable

I'm using SQL Server 2012.
When I run this query...
select
count(*)
from
MembershipStatusHistory msh
join
gym.Account a on msh.AccountID = a.AccountID
join
gym.MembershipType mt on a.MembershipTypeID = mt.MembershipTypeID
join
MemberTypeGroups mtg on mt.MemberTypeGroupID = mtg.MemberTypeGroupID
where
mtg.MemberTypeGroupID IN (1,2)
and msh.NewMembershipStatus = 'Cancelled'
and year(msh.ChangeDate) = year(getdate())
and month(msh.ChangeDate) = month(getdate())
and day(msh.ChangeDate) = day(getdate())
...it returns almost instantly. Great. Now, when I run the same exact query like this:
declare #CancellationsToday int
SET #CancellationsToday = (
select
count(*)
from MembershipStatusHistory msh
join gym.Account a
on msh.AccountID = a.AccountID
join gym.MembershipType mt
on a.MembershipTypeID = mt.MembershipTypeID
join MemberTypeGroups mtg
on mt.MemberTypeGroupID = mtg.MemberTypeGroupID
where mtg.MemberTypeGroupID IN (1,2)
and msh.NewMembershipStatus = 'Cancelled'
and year(msh.ChangeDate) = year(getdate())
and month(msh.ChangeDate) = month(getdate())
and day(msh.ChangeDate) = day(getdate())
)
...it takes 1.5 MINUTES to return. Consistently, every time.
What the **** is going on? I have to use a variable because I need to sum the result later on in my stored proc. I am storing the results of other queries in the same proc and they are fast. I am stumped.
Here is the execution plan from the SLOW query:
And here is the execution plan from the FAST query:
I'll be honest, I don't know what these execution plans mean or what I need to correct.
Very strange but try something like this....
declare #CancellationsToday int;
select #CancellationsToday = count(*)
from MembershipStatusHistory msh
join gym.Account a
on msh.AccountID = a.AccountID
join gym.MembershipType mt
on a.MembershipTypeID = mt.MembershipTypeID
join MemberTypeGroups mtg
on mt.MemberTypeGroupID = mtg.MemberTypeGroupID
where mtg.MemberTypeGroupID IN (1,2)
and msh.NewMembershipStatus = 'Cancelled'
and year(msh.ChangeDate) = year(getdate())
and month(msh.ChangeDate) = month(getdate())
and day(msh.ChangeDate) = day(getdate())
Mmmm strange, try this:
SELECT #CancellationsToday = COUNT(*) FROM ......
Another thing worth to mention is don't use functions in the WHERE clause.
I think you have only the date in msh.ChangeDate, make a variable with today's date like this:
DATEADD(dd, 0, DATEDIFF(dd, 0, GETDATE()))
and use that in the WHERE clause.
You need to look at the execution plans for both queries in SQL Server Management Studio to understand what's going on and why. There may be an index you can add that will fix things, or the plan itself may tell you what's going side-ways and how to fix it. Without that info, it's hard to know what to say here.
As I commented above, adjusting your where clause to get rid of the six function calls and just compare the "date" portion of the database column with a constant variable should help some.
Another minor suggestion would be to be explicit about INNER JOIN if that's what you want... always specify exactly the type of join you want (INNER JOIN, LEFT OUTER JOIN, CROSS JOIN, etc.) instead of just 'join'. It makes things more clear.

SQL Server query optimisation

I inherited this hellish query designed for pagination in SQL Server.
It's only getting 25 records, but according to SQL Profiler, it does 8091 reads, 208 writes and takes 74 milliseconds. Would prefer it to be a bit faster. There is an index on the ORDER BY column deployDate.
Anyone have any ideas on how to optimise it?
SELECT TOP 25
textObjectPK, textObjectID, title, articleCredit, mediaCredit,
commentingAllowed,deployDate,
container, mediaID, mediaAlign, fileName AS fileName, fileName_wide AS fileName_wide,
width AS width, height AS height,title AS mediaTitle, extension AS extension,
embedCode AS embedCode, jsArgs as jsArgs, description as description, commentThreadID,
totalRows = Count(*) OVER()
FROM
(SELECT
ROW_NUMBER() OVER (ORDER BY textObjects.deployDate DESC) AS RowNumber,
textObjects.textObjectPK, textObjects.textObjectID, textObjects.title,
textObjects.commentingAllowed, textObjects.credit AS articleCredit,
textObjects.deployDate,
containers.container, containers.mediaID, containers.mediaAlign,
media.fileName AS fileName, media.fileName_wide AS fileName_wide,
media.width AS width, media.height AS height, media.credit AS mediaCredit,
media.title AS mediaTitle, media.extension AS extension,
mediaTypes.embedCode AS embedCode, media.jsArgs as jsArgs,
media.description as description, commentThreadID,
TotalRows = COUNT(*) OVER ()
FROM textObjects WITH (NOLOCK)
INNER JOIN containers WITH (NOLOCK)
ON containers.textObjectPK = textObjects.textObjectPK
AND (containers.containerOrder = 0 or containers.containerOrder = 1)
INNER JOIN LUTextObjectTextObjectGroup tog WITH (NOLOCK)
ON textObjects.textObjectPK = tog.textObjectPK
AND tog.textObjectGroupID in (3)
LEFT OUTER JOIN media WITH (NOLOCK)
ON containers.mediaID = media.mediaID
LEFT OUTER JOIN mediaTypes WITH (NOLOCK)
ON media.mediaTypeID = mediaTypes.mediaTypeID
WHERE (((version = 1)
AND (textObjects.textObjectTypeID in (6))
AND (DATEDIFF(minute, deployDate, GETDATE()) >= 0)
AND (DATEDIFF(minute, expireDate, GETDATE()) <= 0))
OR ( (version = 1) AND (textObjects.textObjectTypeID in (6))
AND (DATEDIFF(minute, deployDate, GETDATE()) >= 0)
AND (expireDate IS NULL)))
AND deployEnglish = 1
) tmpInlineView
WHERE RowNumber >= 51
ORDER BY deployDate DESC
I am in a similar position to with the same sort of queries. Here are some tips:
Look at the query plans to make sure you have the right indexes.
I'm not sure if MSSQL optimizes around DATEDIFF(), but if it doesn't you can precompute threshold dates and turn it into a BETWEEN clause.
If you don't need to order by all those columns in your ROW_NUMBER() clause, get rid of them. That may allow you to do the pagination on a much simpler query, then just grab the extra data you need for the 25 rows you are returning.
Also, rewrite the two LEFT OUTER JOINs like this:
LEFT OUTER JOIN
(
media WITH (NOLOCK)
LEFT OUTER JOIN mediaTypes WITH (NOLOCK)
ON media.mediaTypeID = mediaTypes.mediaTypeID
)
ON containers.mediaID = media.mediaID
which should make the query optimizer behave a little better.

SQL query help, conditional join

SQL newbie here :)
Here are my tables if anyone's interested.
AHH, cant post image yet
http://img832.imageshack.us/img832/442/72098588.jpg
What I'm trying to do is query the tblPatientStats table within a date interval (#StartDate, #EndDate)
and group them accordingly in a data grid on winforms.
So each row in tblPatientStats either have a RefDoctor or RefMode or both or none at all.
So the query should return a table with the Name of the patient from tblPatient, the RefMode from tblRefMode, the Name of the RefDoctor (Title + FirstName + lastName) and SessionDate from tblPatientStats
==> yfrog dot com/0yhi2dj
Here is my attempt so far.
INSERT #Final(Name, Doctor, Mode, SessionDate)
SELECT DISTINCT (FirstNames + LastName) as Name,
(tblRefDoctor.RefDTitle + ' ' + tblRefDoctor.RefDFNames + ' ' + tblRefDoctor.RefDName) AS Doctor,
tblRefMode.RefMode AS Mode, SessionDate
FROM tblPatientStats, tblPatient
left outer join tblRefDoctor on (RefDoctor = tblRefDoctor.RefDoctor)
left outer join tblRefMode on (RefModeID = tblRefMode.RefModeID)
WHERE
tblPatientStats.RefDoctor IS NOT NULL or tblPatientStats.RefModeID IS NOT NULL
AND
tblPatient.PatientID = tblPatientStats.PatientID
AND tblPatientStats.SessionDate between #StartDate AND #EndDate
What am I doing wrong? The query times out every single time, the tables are small, less than 10K records each.
Any help would be much appreciated.
I suspect the issue is because of the cartesian join on
tblPatientStats, tblPatient
Whilst there is a join condition in the where clause there is an issue with the precedence of the boolean operators. This is in order Not, And, Or so I think you need brackets around the 'Or' ed conditions.
The WHERE condition on the original query with brackets applied to show the effective operator precedence is
WHERE
tblPatientStats.RefDoctor IS NOT NULL or
(tblPatientStats.RefModeID IS NOT NULL
AND tblPatient.PatientID = tblPatientStats.PatientID
AND tblPatientStats.SessionDate between #StartDate AND #EndDate)
This is almost certainly not the desired semantics and will likely bring back too many rows.
I've moved the join condition between tblPatientStats and tblPatient up into the JOIN clauses and added brackets to the Or ed conditions.
FROM tblPatientStats
inner join tblPatient on tblPatient.PatientID = tblPatientStats.PatientID
left outer join tblRefDoctor on RefDoctor = tblRefDoctor.RefDoctor
left outer join tblRefMode on RefModeID = tblRefMode.RefModeID
WHERE
(tblPatientStats.RefDoctor IS NOT NULL or tblPatientStats.RefModeID IS NOT NULL)
AND tblPatientStats.SessionDate between #StartDate AND #EndDate