Hello I have the below SQL query that is taking on average 40 minutes to run, one of the tables that it references has over 7 million records in it.
I have ran this through the database tuning advisor and applied all recommendations, also I have assesed it within the activity monitor in sql and no further indexes etc have been recommended.
Any suggestions would be great, thanks in advance
WITH CTE AS
(
SELECT r.Id AS ResultId,
r.JobId,
r.CandidateId,
r.Email,
CAST(0 AS BIT) AS EmailSent,
NULL AS EmailSentDate,
'PICKUP' AS EmailStatus,
GETDATE() AS CreateDate,
C.Id AS UserId,
C.Email AS UserEmail,
NULL AS Subject
FROM Result R
INNER JOIN Job J ON R.JobId = J.Id
INNER JOIN User C ON J.UserId = C.Id
WHERE
ISNULL(J.Approved, CAST(0 AS BIT)) = CAST(1 AS BIT)
AND ISNULL(J.Closed, CAST(0 AS BIT)) = CAST(0 AS BIT)
AND ISNULL(R.Email,'') <> '' -- has an email address
AND ISNULL(R.EmailSent, CAST(0 AS BIT)) = CAST(0 AS BIT) -- email has not been sent
AND R.EmailSentDate IS NULL -- email has not been sent
AND ISNULL(R.EmailStatus,'') = '' -- email has not been sent
AND ISNULL(R.IsEmailSubscribe, 'True') <> 'False' -- not unsubscribed
-- not already been emailed for this job
AND NOT EXISTS (
SELECT SMTP.Email
FROM SMTP_Production SMTP
WHERE SMTP.JobId = R.JobId AND SMTP.CandidateId = R.CandidateId
)
-- not unsubscribed
AND NOT EXISTS (
SELECT u.Id FROM Unsubscribe u
WHERE ISNULL(u.EmailAddress, '') = ISNULL(R.Email, '')
)
AND NOT EXISTS (
SELECT SMTP.Id FROM SMTP_Production SMTP
WHERE SMTP.EmailStatus = 'PICKUP' AND SMTP.CandidateId = R.CandidateId
)
AND C.Id NOT IN (
-- list of ids
)
AND J.Id NOT IN (
-- list of ids
)
AND J.ClientId NOT IN
(
-- list of ids
)
)
INSERT INTO smtp_production (ResultId, JobId, CandidateId, Email, EmailSent, EmailSentDate, EmailStatus, CreateDate, ConsultantId, ConsultantEmail, Subject)
OUTPUT INSERTED.ResultId,GETDATE() INTO ResultstoUpdate
SELECT
CTE.ResultId,
CTE.JobId,
CTE.CandidateId,
CTE.Email,
CTE.EmailSent,
CTE.EmailSentDate,
CTE.EmailStatus,
CTE.CreateDate,
CTE.UserId,
CTE.UserEmail,
NULL
FROM CTE
INNER JOIN
(
SELECT *, row_number() over(partition by CTE.Email, CTE.CandidateId order by CTE.EmailSentDate desc) as rn
FROM CTE
) DCTE ON CTE.ResultId = DCTE.ResultId AND DCTE.rn = 1
Please see my updated query below:
WITH CTE AS
(
SELECT R.Id AS ResultId,
r.JobId,
r.CandidateId,
R.Email,
CAST(0 AS BIT) AS EmailSent,
NULL AS EmailSentDate,
'PICKUP' AS EmailStatus,
GETDATE() AS CreateDate,
C.Id AS UserId,
C.Email AS UserEmail,
NULL AS Subject
FROM RESULTS R
INNER JOIN JOB J ON R.JobId = J.Id
INNER JOIN Consultant C ON J.UserId = C.Id
WHERE
J.DCApproved = 1
AND (J.Closed = 0 OR J.Closed IS NULL)
AND (R.Email <> '' OR R.Email IS NOT NULL)
AND (R.EmailSent = 0 OR R.EmailSent IS NULL)
AND R.EmailSentDate IS NULL -- email has not been sent
AND (R.EmailStatus = '' OR R.EmailStatus IS NULL)
AND (R.IsEmailSubscribe = 'True' OR R.IsEmailSubscribe IS NULL)
-- not already been emailed for this job
AND NOT EXISTS (
SELECT SMTP.Email
FROM SMTP_Production SMTP
WHERE SMTP.JobId = R.JobId AND SMTP.CandidateId = R.CandidateId
)
-- not unsubscribed
AND NOT EXISTS (
SELECT u.Id FROM Unsubscribe u
WHERE (u.EmailAddress = R.Email OR (u.EmailAddress IS NULL AND R.Email IS NULL))
)
AND NOT EXISTS (
SELECT SMTP.Id FROM SMTP_Production SMTP
WHERE SMTP.EmailStatus = 'PICKUP' AND SMTP.CandidateId = R.CandidateId
)
AND C.Id NOT IN (
-- LIST OF IDS
)
AND J.Id NOT IN (
-- LIST OF IDS
)
AND J.ClientId NOT IN
(
-- LIST OF IDS
)
)
INSERT INTO smtp_production (ResultId, JobId, CandidateId, Email, EmailSent, EmailSentDate, EmailStatus, CreateDate, UserId, UserEmail, Subject)
OUTPUT INSERTED.ResultId,GETDATE() INTO ResultstoUpdate
SELECT
CTE.ResultId,
CTE.JobId,
CTE.CandidateId,
CTE.Email,
CTE.EmailSent,
CTE.EmailSentDate,
CTE.EmailStatus,
CTE.CreateDate,
CTE.UserId,
CTE.UserEmail,
NULL
FROM CTE
INNER JOIN
(
SELECT *, row_number() over(partition by CTE.Email, CTE.CandidateId order by CTE.EmailSentDate desc) as rn
FROM CTE
) DCTE ON CTE.ResultId = DCTE.ResultId AND DCTE.rn = 1
GO
Using ISNULL in your WHERE and JOIN clauses is probably the main cause here. Using functions against columns in your query causes the query to become non-SARGable (meaning that it can't use any of the indexes on your table(s) and so it has the scan the whole thing). Note; using functions against variables, in there WHERE is normally fine. For example WHERE SomeColumn = DATEADD(DAY, #n, #SomeDate). Things like WHERE SomeColumn = ISNULL(#Variable,0) have the smell of a "catch-all query", so can be performance hitters; depending on your set up. This isn't the discussion at hand though.
For clauses like ISNULL(J.Closed, CAST(0 AS BIT)) = CAST(0 AS BIT) this is therefore a big headache for the query optimiser and your query is riddled with them. You'll need to replace these with clauses like:
WHERE (J.Closed = 0 OR J.Closed IS NULL)
Although it makes no difference, there's no need to CAST the 0 there either. SQL Server can see you're making a comparison to a bit and will therefore interpret the 0 as one as well.
You also have a EXISTS with the WHERE clause ISNULL(u.EmailAddress, '') = ISNULL(R.Email, ''). This will need to become:
WHERE (u.EmailAddress = R.Email
OR (u.EmailAddress IS NULL AND R.Email IS NULL))
You'll need to change all of your ISNULL usage in your WHERE clauses (the CTE and the subqueries) and you should see a decent performance increase.
Generally, 7 million records are a joke for modern databases. If you alk problems, you are supposed to talk problems on billions of rows, not 7 millions.
Which indicates problems with the query. High CPU is generally a sign of non matching fields (compare string in one table to number in another ) or... functions called too often. Long running normally is a sign of either missing indices or.... non sargeability. Which you really do a lot to force.
Non-Sargeability means taht indices CAN NOT be used. Example of this is all this:
ISNULL(J.Approved, CAST(0 AS BIT)) = CAST(1 AS BIT)
The ISNULL(field, value) means that an index on field is not usable - baically "goodby index, hello table scan". It also means - well....
(J.Approoved = 1 or J.Approoved IS NULL)
has the same meaning, but it sargeable. Pretty much EVERY of your conditions is written in a non sargeable way - welcome to db hell. Start rewriting.
You may want to read up more on sargeability at https://www.techopedia.com/definition/28838/sargeable
Also make sure you ahve indices on all relevant foreign keys (and the referenced primary keys) - otherwise, again, welcome table scans.
I would like to write a procedure to database which will return select all data from database Tournaments plus bool parameter. If user is registered, it will return true.
Call:
exec TournamentsWithLoggedUser #user = 'asd123'
Procedure:
CREATE PROCEDURE [dbo].[TournamentsWithLoggedUser]
#user nvarchar(128)
AS
SELECT
t.Id, t.Info, BIT(r.Id)
FROM
Tournaments AS t
LEFT JOIN
Registrations AS r ON t.Id = r.TournamentId
WHERE
r.UserId IS NULL OR r.UserId = #user
RETURN
it mean something like
1, 'some info', true //1
2, 'some info2', false //2
Why not just use a case statement?
CASE WHEN r.Id IS NULL THEN 0 ELSE 1 END
Change the 0 and 1 to whatever you want for false and true.
SELECT t.Id, t.Info,
-- this works in SQL Server
CAST ((CASE WHEN r.UserId IS NOT NULL THEN 1 ELSE 0 END) AS BIT) AS IsRegistered
FROM Tournaments as t
LEFT JOIN Registrations as r ON t.Id = r.TournamentId
where (r.UserId = '' OR r.UserId = #user)
-- i think this one is help for you...
You are looking for this query
SELECT t.id,
t.info,
Cast (CASE
WHEN r.userid IS NOT NULL THEN 1
ELSE 0
END AS BIT) AS IsRegistered
FROM tournaments AS t
LEFT JOIN registrations AS r
ON t.id = r.tournamentid
AND r.userid = #user
You should clarify what SQL language you are actually using, but an answer can be provided anyway:
CREATE PROCEDURE [dbo].[TournamentsWithLoggedUser]
#user nvarchar(128)
AS
BEGIN
SELECT t.Id, t.Info,
-- this works in SQL Server
CAST ((CASE WHEN r.UserId IS NOT NULL THEN 1 ELSE 0 END) AS BIT) AS IsRegistered
FROM Tournaments as t
LEFT JOIN Registrations as r ON t.Id = r.TournamentId
where r.UserId IS NULL OR r.UserId = #user
-- this should not be required
RETURN
END
However, there is a problem with the logic:
#user is not nullable, so your procedure gives the impression that it looks data for a single user. However, your OR operator allows to select all records from unregistered users united with the record for the particular provided user (if exists).
Are both queries are same ?
Do both return the same result ?
1)
IF EXISTS(
SELECT
1
FROM
Users u
WHERE
u.UPIN = #AttendingDoctorID)
BEGIN
SELECT
u.UserId, 1
FROM
Users u WITH(nolock)
WHERE
u.UPIN = #AttendingDoctorID
END ELSE BEGIN
SELECT
u.UserId,
1
FROM
Users u (nolock)
WHERE
u.FirstName = #AttendingDoctorFirstName AND
u.LastName = #AttendingDoctorLastName
END
2)
SELECT
u.UserId, 1
FROM
Users u (nolock)
WHERE
(u.UPIN = #AttendingDoctorID)
OR
(u.FirstName = #AttendingDoctorFirstName AND
u.LastName = #AttendingDoctorLastName)
They are not the same.
The 2nd returns data for both conditions.
The 1st one tests first and applies only one condition
They're not semantically the same. The second query will possibly return records that fulfill both predicates (u.UPIN = #AttendingDoctorID) and (u.FirstName = #AttendingDoctorFirstName AND u.LastName = #AttendingDoctorLastName).
Whether or not this will ever occur depends on your data.
Assuming you're running under the default transaction isolation level, you also need to be aware that:
IF EXISTS(
SELECT
1
FROM
Users u
WHERE
u.UPIN = #AttendingDoctorID) --<-- Query 1
BEGIN
SELECT
u.UserId, 1
FROM
Users u WITH(nolock)
WHERE
u.UPIN = #AttendingDoctorID --<-- Query 2
END ELSE BEGIN
SELECT
u.UserId,
1
FROM
Users u (nolock)
WHERE
u.FirstName = #AttendingDoctorFirstName AND
u.LastName = #AttendingDoctorLastName
END
Another transaction might update Users between query 1 executing and query 2 executing, and so you might get an empty result set from query 2. Your second version runs everything as a single query, so will not have this issue (but others have pointed out other differences between the queries)
SELECT *
FROM myTable m
WHERE m.userId = :userId
AND m.X = (SELECT MAX(X)
FROM myTable m
WHERE m.userId = :userId
AND m.contactNumber = :contactNumber)";
The problem is,second part of statement evaluates to null in case no such row is present and the statement fails to execute.I want the result to be empty in such a case.
One way to solve this problem is to do expensive filesort(order by) and then fetch the required field at code level. Any better solution to this problem ?
Can you use ISNULL?
and m.X = ISNULL(, '')
I'm not sure why you're getting NULLs here, but try this:
SELECT myTable.*, IF myTableMax.myMaxX IS NOT NULL myTableMax.myMaxX ELSE ""
FROM myTable
LEFT OUTER JOIN
(SELECT userID, contactNumber, MAX(X) AS myMaxX
FROM myTable
GROUP BY userID, contactNumber) AS myTableMax
ON myTable.userID = myTableMax.userID
AND myTable.contactNumber = myTableMax.contactNumber
WHERE myTable.userID = :userID
AND myTable.contactNumber = :contactNumber
If you're concerned about performance, add an index on mytable (userID, contactNumber).
Goal
Select distinct ids from blog_news where
active = 1
title is not empty
has at least one picture unless picture is logo, or at least one video
The statement so far
select distinct n.id from blog_news n
left join blog_pics p ON n.id = p.blogid and active = '1' and trim(n.title) != ''
left join blog_vdos v ON n.id = v.blogid
where (p.islogo = '0' and p.id is not null) OR (v.id is not null)
order by `newsdate` desc, `createdate` desc
The issue
selects blog_news ids that have pictures, unless they're logos [correct]
selects blog_news ids that have both videos and pictures [correct]
does not select blog_news ids that have only videos [wrong]
How about this:
SELECT DISTINCT n.id
FROM blog_news n
WHERE n.active = '1'
AND trim(n.title) != ''
AND (EXISTS (SELECT 1
FROM blog_pics p
WHERE p.blogid = n.id
AND p.islogo = 0)
OR EXISTS (SELECT 1
FROM blog_vdos v
WHERE v.blogid = n.id)
)
ORDER BY n.newsdate desc, n.createdate desc
Where you are just interested in the existence (or not) of child rows then it is often clearer and easier to use EXISTS.
I can't see any problem in your query.
I expect active is column in blog_news table, you should call it n.active. If this column is in blog_pics table, then this is the problem.
I would add the condition (n.active, n.title) to WHERE, as it's not related to left join (blog_pics) - but that's just for better readability, the result would be the same.
You can write the query using sub selects as well:
SELECT n.id FROM blog_news n
WHERE n.active = 1 AND TRIM(n.title) != '' AND n.id IN (
SELECT DISTINCT p.blogid FROM blog_pics p WHERE p.islogo = 0 UNION
SELECT DISTINCT v.blogid FROM blog_vdos
);