Query timing out / making query more efficient (exacttarget) - sql

Here is my query, can anyone see any way to make this more efficient so it doesn't time out? I'm using Exacttarget (Salesforce Marketing Cloud). It has a 30 minute timeout limit. I've tried moving things around but it always seems to error. I'm kind of a beginner with SQL but I've been hitting it fairly hard the last week. My query is below. THanks!
SELECT DISTINCT c.Email, c.FName
FROM ENT.Contacts c WITH(NOLOCK)
INNER JOIN ENT.RegistrationData r WITH(NOLOCK)
ON c.Email = r.RegistrationContactEmail
LEFT Join ENT._Subscribers s WITH(NOLOCK)
ON c.Email = s.SubscriberKey
AND s.status NOT IN ('unsubscribed','held')
WHERE
(
(
(
(
r.RegistrationEmailOptStatus = '1' AND
r.RegistrationEventType = 'Wedding' AND
r.RegistrationEventRole IN ('Bride','Other','Bridesmaid','Mother Of the Bride') AND
r.RegistrationCountry IN ('USA') AND
r.RegistrationEventDate < '2014-05-31'
)
OR
(
r.RegistrationEmailOptStatus = '1' AND
r.RegistrationEventType = 'Prom' AND
r.RegistrationEventRole ='Prom' AND
r.RegistrationCountry IN ('USA') AND
r.RegistrationEventDate BETWEEN '2014-01-01' AND '2015-12-31'
)
)
AND
(
(
c.Email IN
(
SELECT DISTINCT
s.SubscriberKey AS Email
FROM
_Open s
WHERE
datediff(mm,s.EventDate, getdate()) <= 3
)
)
OR
(
c.Email IN
(
SELECT DISTINCT
s.SubscriberKey AS Email
FROM
_Click s
WHERE
datediff(mm,s.EventDate, getdate()) <= 3
)
)
)
)
OR
(
r.RegistrationEmailOptStatus = '1' AND
r.RegistrationEventType = 'Wedding' AND
r.RegistrationEventRole IN ('Bride','Other','Bridesmaid','Mother Of the Bride') AND
r.RegistrationCountry IN ('USA') AND
r.RegistrationEventDate BETWEEN '2015-05-01' AND '2015-05-31'
)
)

I agree with Karl that your main performance hit is in the sub-query that references the _Open and _Click system data views. But, based on my experience with the ExactTarget (Salesforce Marketing Cloud), you are limited to only running 'SELECT' statements and will not be able to declare a variable this way.
I recommend running a separate query on the _Open and _Click data view and then reference the resulting data extension in your query. This may require more steps. But, you'll find the overall processing time is less.
For the first query, I would create a data extension of everyone that has either opened or clicked in the last 3 months. And then in the second query, I would reference the resulting data extension with a "IN" statement. This will eliminate one of the "OR" conditions in your query, which can be expensive. If the query still preforms poorly, I would suggest re-writing the conditional logic on the RegistrationData data extension in a way to a avoids "OR" conditions.
Query1:
SELECT DISTINCT s.SubscriberKey AS Email
FROM _Open s WITH(NOLOCK)
WHERE datediff(mm,s.EventDate, getdate()) <= 3
union all
SELECT DISTINCT s.SubscriberKey AS Email
FROM _Click s WITH(NOLOCK)
WHERE datediff(mm,s.EventDate, getdate()) <= 3
Query2:
SELECT DISTINCT c.Email, c.FName
FROM ENT.Contacts c WITH(NOLOCK)
INNER JOIN ENT.RegistrationData r WITH(NOLOCK)
ON c.Email = r.RegistrationContactEmail
LEFT Join ENT._Subscribers s WITH(NOLOCK)
ON c.Email = s.SubscriberKey
AND s.status NOT IN ('unsubscribed','held')
WHERE
(
(
(
(
r.RegistrationEmailOptStatus = '1' AND
r.RegistrationEventType = 'Wedding' AND
r.RegistrationEventRole IN ('Bride','Other','Bridesmaid','Mother Of the Bride') AND
r.RegistrationCountry IN ('USA') AND
r.RegistrationEventDate < '2014-05-31'
)
OR
(
r.RegistrationEmailOptStatus = '1' AND
r.RegistrationEventType = 'Prom' AND
r.RegistrationEventRole ='Prom' AND
r.RegistrationCountry IN ('USA') AND
r.RegistrationEventDate BETWEEN '2014-01-01' AND '2015-12-31'
)
)
AND
(
c.Email in (
select s.SubscriberKey
from OpenOrClickDE s
where s.SubscriberKey = c.Email
)
)
)
OR
(
r.RegistrationEmailOptStatus = '1' AND
r.RegistrationEventType = 'Wedding' AND
r.RegistrationEventRole IN ('Bride','Other','Bridesmaid','Mother Of the Bride') AND
r.RegistrationCountry IN ('USA') AND
r.RegistrationEventDate BETWEEN '2015-05-01' AND '2015-05-31'
)
)

I'll take a shot. There maybe some minor stuff, but the only thing that looks to me like it should make the query spin a long time is
c.Email IN
(
SELECT DISTINCT
s.SubscriberKey AS Email
FROM
_Open s
WHERE
datediff(mm,s.EventDate, getdate()) <= 3
)
OR
c.Email IN
(
SELECT DISTINCT
s.SubscriberKey AS Email
FROM
_Click s
WHERE
datediff(mm,s.EventDate, getdate()) <= 3
)
There are two issues in there. First you are doing date math a bajillion times and second using IN (SELECT ...) here is almost certainly inefficient.
To address the first, calculate a single test date to and use that. For the second prefer checking with EXISTS.
DECLARE #testDate DATE = DATEADD(mm,3,GETDATE())
...
EXISTS(SELECT 1 FROM _Open s WHERE s.EventDate>#testDate AND c.Email = s.SubscriberKey)
OR EXISTS(SELECT 1 FROM _Click s WHERE s.EventDate>#testDate AND c.Email = s.SubscriberKey)
You could also probably unwind the EXISTS and use joins to _Open and _Click, but that feels more complex.
Give this a shot and let us know if it helps.

Related

Slow query when using NOT EXIST in Query

I would like to seek some help regarding the query below.
Running this Script causes the system to timeout. The query is so slow it took 5 minutes to run for just 22 records. I believe this has something to do with "NOT IN" statement. I already look for answers here in Stackoverflow regarding this and some are suggesting using LEFT OUTER JOIN and WHERE NOT EXIST but I can't seem to incorporate it in this query.
SELECT a.UserId, COUNT(DISTINCT(a.CustomerId)) AS TotalUniqueContact
FROM [UserActivityLog] a WITH(NOLOCK)
WHERE CAST(a.ActivityDatetime AS DATE) BETWEEN '2015-09-28' AND '2015-09-30' AND a.ID
NOT IN (
SELECT DISTINCT(COALESCE(a.activitylogid, 0))
FROM [CustomerNoteInteractions] a WITH(NOLOCK)
WHERE a.reason IN ('20', '36') AND CAST(a.datecreated AS DATE) BETWEEN '2015-09-28' AND '2015-09-30' AND a.UserId IN (SELECT b.Id
FROM [User] b
WHERE b.UserType = 'EpicUser' AND b.IsEpicEmployee = 1 AND b.IsActive = 1)
)
AND a.UserId IN (
SELECT b.Id
FROM [User] b
WHERE b.UserType = 'EpicUser' AND b.IsEpicEmployee = 1 AND b.IsActive = 1)
GROUP BY a.UserId
Here is what should be an equivalent query using EXISTS and NOT EXISTS:
SELECT a.UserId,
COUNT(DISTINCT a.CustomerId) AS TotalUniqueContact
FROM [UserActivityLog] a WITH(NOLOCK)
WHERE CAST(a.ActivityDatetime AS DATE) BETWEEN '2015-09-28' AND '2015-09-30'
AND EXISTS (SELECT *
FROM [User] b
WHERE b.Id = a.UserId
AND b.UserType = 'EpicUser'
AND b.IsEpicEmployee = 1
AND b.IsActive = 1)
AND NOT EXISTS (SELECT *
FROM [CustomerNoteInteractions] b WITH(NOLOCK)
JOIN [User] c
ON c.Id = b.UserId
AND c.UserType = 'EpicUser'
AND c.IsEpicEmployee = 1
AND c.IsActive = 1
WHERE b.activitylogid = a.ID
AND b.reason IN ('20', '36')
AND CAST(b.datecreated AS DATE) BETWEEN '2015-09-28' AND '2015-09-30' )
GROUP BY a.UserId
Obviously, it's hard to understand what will truly help your performance without understanding your data. But here is what I expect:
I think the EXISTS/NOT EXISTS version of the query will help.
I think your conditions on UserActivityLog.ActivityDateTime and CustomerNoteInteractions.datecreated are a problem. Why are you casting? Is it not a date type? If not, why not? You would probably get big gains if you could take advantage of an index on those columns. But with the cast, I don't think you can use an index there. Can you do something about it?
You'll also probably benefit from indexes on User.Id (probably the PK anyways), and CustomerNoteInteractions.ActivityLogId.
Also, not a big fan of using with (nolock) to improve performance (Bad habits : Putting NOLOCK everywhere).
EDIT
If your date columns are of type DateTime as you mention in the comments, and so you are using the CAST to eliminate the time portion, a much better alternative for performance is to not cast, but instead modify the way you filter the column. Doing this will allow you to take advantage of any index on the date column. It could make a very big difference.
The query could then be further improved like this:
SELECT a.UserId,
COUNT(DISTINCT a.CustomerId) AS TotalUniqueContact
FROM [UserActivityLog] a WITH(NOLOCK)
WHERE a.ActivityDatetime >= '2015-09-28'
AND a.ActivityDatetime < dateadd(day, 1, '2015-09-30')
AND EXISTS (SELECT *
FROM [User] b
WHERE b.Id = a.UserId
AND b.UserType = 'EpicUser'
AND b.IsEpicEmployee = 1
AND b.IsActive = 1)
AND NOT EXISTS (SELECT *
FROM [CustomerNoteInteractions] b WITH(NOLOCK)
JOIN [User] c
ON c.Id = b.UserId
AND c.UserType = 'EpicUser'
AND c.IsEpicEmployee = 1
AND c.IsActive = 1
WHERE b.activitylogid = a.ID
AND b.reason IN ('20', '36')
AND b.datecreated >= '2015-09-28'
AND b.datecreated < dateadd(day, 1, '2015-09-30'))
GROUP BY a.UserId
This should get you pretty close or exactly work:
SELECT a.UserId, COUNT(DISTINCT(a.CustomerId)) AS TotalUniqueContact
FROM [UserActivityLog] a WITH(NOLOCK)
inner join [User] b with (Nolock) on a.userid = b.id
and b.UserType = 'EpicUser' AND b.IsEpicEmployee = 1 AND b.IsActive = 1
left outer join [CustomerNoteInteractions] c with (nolock) on a.id = c.activitylogid
and c.reason IN ('20', '36') AND CAST(c.datecreated AS DATE) BETWEEN '2015-09-28' AND '2015-09-30'
left outer join [User] d with (nolock) on c.userid = d.id
and d.UserType = 'EpicUser' AND d.IsEpicEmployee = 1 AND d.IsActive = 1
WHERE CAST(a.ActivityDatetime AS DATE) BETWEEN '2015-09-28' AND '2015-09-30'
and c.activitylogid is null
GROUP BY a.UserId

How to optimize the sql query

This query takes dynamic input in the place of cg.ownerid IN (294777,228649 ,188464).when the input increases in the IN condition the query is taking too much time to execute. Please suggest me a way to optimize it.
For example, the below query is taking 4 seconds, if I reduce the list to just IN(188464) its just taking 1 second.
SELECT *
FROM
(SELECT *,
Row_number() OVER(
ORDER BY datecreated DESC) AS rownum
FROM
(SELECT DISTINCT c.itemid,
(CASE WHEN (Isnull(c.password, '') <> '') THEN 1 ELSE 0 END) AS password,
c.title,
c.encoderid,
c.type,
(CASE WHEN c.author = 'education' THEN 'Discovery' ELSE c.type END) AS TYPE,
c.publisher,
c.description,
c.author,
c.duration,
c.copyright,
c.rating,
c.userid,
Stuff(
(SELECT DISTINCT ' ' + NAME AS [text()]
FROM firsttable SUB
LEFT JOIN secondtable AS rgc ON thirdtable = rgc.id
WHERE SUB.itemid = c.itemid
FOR xml path('')), 1, 1, '')AS [Sub_Categories]
FROM fourthtable AS cg
LEFT JOIN item AS c ON c.itemid = cg.itemid
WHERE Isnull(title, '') <> ''
AND c.active = '1'
AND c.systemid = '20'
AND cg.ownerid IN (294777,
228649,
188464)) AS a) AS b
WHERE rownum BETWEEN 1 AND 32
ORDER BY datecreated DESC
As I haven't further information, I just would suggest a first change of your where clause. They should be moved to a subquery as you left join those columns.
SELECT *
FROM(
SELECT *,
Row_number() OVER(
ORDER BY datecreated DESC) AS rownum
FROM
(SELECT DISTINCT c.itemid,
(CASE WHEN (Isnull(c.password, '') <> '') THEN 1 ELSE 0 END) AS password,
c.title,
c.encoderid,
c.type,
(CASE WHEN c.author = 'education' THEN 'Discovery' ELSE c.type END) AS TYPE,
c.publisher,
c.description,
c.author,
c.duration,
c.copyright,
c.rating,
c.userid,
Stuff(
(
SELECT DISTINCT ' ' + NAME AS [text()]
FROM firsttable SUB
LEFT JOIN secondtable AS rgc ON thirdtable = rgc.id
WHERE SUB.itemid = c.itemid
FOR xml path('')
), 1, 1, ''
) AS [Sub_Categories]
FROM (
SELECT cg.itemid
FROM fourthtable as cg
WHERE cg.ownerid IN (294777,228649, 188464)
) AS cg
LEFT JOIN (
SELECT DISTINCT c.itemid, c.type, c.author, c.title, c.encoderid, c.type, c.publisher, c.description, c.author, c.duration, c.copyright, c.rating,c.userid
FROM item as c
WHERE Isnull(c.title, '') <> ''
AND c.active = '1'
AND c.systemid = '20'
) AS c
ON c.itemid = cg.itemid
) AS a
) AS b
WHERE rownum BETWEEN 1 AND 32
ORDER BY datecreated DESC
But not quite sure if everything is connected right away, your missing some aliases which makes it hard for me to get through your query. But I thing you'll get my idea. :-)
With this little information it's impossible to give any specific ideas, but the normal general things apply:
Turn on statistics io and check what's causing most of the logical I/O and try to solve that
Look at the actual plan and check if there's something that doesn't look ok, for example:
Clustered index / table scans (new index could solve this)
Key lookups with a huge amount of rows (adding more columns to index could solve this, either as normal or included fields)
Spools (new index could solve this)
Big difference between estimated and actual number of rows (10x, 100x and so on)
To give any better hints you should really include the actual plan, table / index structure at least on the essential parts and tell what is too much time (seconds, minutes, hours?)

Looking at SQL Code for Validation

I have two queries that appear to be the same. We are trying to get quarter today. One using a hard coded date and one not. One query is returning 2100 less records. The first and last record are the same. All of Feb match in both and the last 5 days in January match too.
Query 1 with hard coded date.
SELECT c.Account_RecordType
,timewait
,a.convotime
,DATENAME(Month, timewait) AS 'Mnth'
,DATENAME(year, timewait) AS 'yr'
,DATENAME(quarter, timewait) AS 'qrt'
FROM SalesForce.dbo.SalesForceContact AS b
INNER JOIN Dossier_Replication.dbo.vwSF_DATA_Contact c
ON b.ContactID = c.CONTACTID__C
RIGHT OUTER JOIN satVRS.dbo.rptNECACallHistory AS a
ON b.UserID = a.UserID_Caller
WHERE ( b.Platform = 'HandsonVRS' )
AND ( a.timeWait BETWEEN '2014-01-01' AND '2014-02-24' )
AND ( a.isReport = '1' )
AND ( a.NECA_isReport = '1' )
AND ( a.ConvoTime > '0' )
AND ( c.Account_RecordType = 'Enterprise Account' )
GROUP BY c.Account_RecordType
,timewait
,a.convotime
Second query is pulling quarter,year, and day from the date field in question.
SELECT c.Account_RecordType
,timewait
,a.convotime
,DATENAME(Month, timewait) AS 'Mnth'
,DATENAME(year, timewait) AS 'yr'
,DATENAME(quarter, timewait) AS 'qrt'
FROM SalesForce.dbo.SalesForceContact AS b
INNER JOIN Dossier_Replication.dbo.vwSF_DATA_Contact c
ON b.ContactID = c.CONTACTID__C
RIGHT OUTER JOIN satVRS.dbo.rptNECACallHistory AS a
ON b.UserID = a.UserID_Caller
WHERE ( b.Platform = 'HandsonVRS' )
AND ( a.timeWait BETWEEN '2014-01-01' AND '2014-02-24' )
AND ( a.isReport = '1' )
AND ( a.NECA_isReport = '1' )
AND ( a.ConvoTime > '0' )
AND ( c.Account_RecordType = 'Enterprise Account' )
GROUP BY c.Account_RecordType
,timewait
,a.convotime
Does any one see anything obviously wrong with these two. Any suggestions, I thank you in advance. We are running sql 2012
Sorry my bad, here is the second one...
select
c.Account_RecordType,timewait,a.convotime,(datename(year,timewait))as twy,(datename(quarter,timewait))as twq
FROM SalesForce.dbo.SalesForceContact AS b INNER JOIN
Dossier_Replication.dbo.vwSF_DATA_Contact c ON b.ContactID = c.CONTACTID__C RIGHT OUTER JOIN
satVRS.dbo.rptNECACallHistory AS a ON b.UserID = a.UserID_Caller
WHERE (b.Platform = 'HandsonVRS')
AND (datename(year,timewait))=(datename(year,getdate()-1)) AND (datename(quarter,timewait))=(datename(quarter,getdate()-1))
AND (a.isReport = '1')
AND (a.NECA_isReport = '1')
AND (a.ConvoTime > '0')
AND (c.Account_RecordType = 'Enterprise Account')
Group by c.Account_RecordType,timewait,a.convotime
Order by timewait

Inner join that ignore singlets

I have to do an self join on a table. I am trying to return a list of several columns to see how many of each type of drug test was performed on same day (MM/DD/YYYY) in which there were at least two tests done and at least one of which resulted in a result code of 'UN'.
I am joining other tables to get the information as below. The problem is I do not quite understand how to exclude someone who has a single result row in which they did have a 'UN' result on a day but did not have any other tests that day.
Query Results (Columns)
County, DrugTestID, ID, Name, CollectionDate, DrugTestType, Results, Count(DrugTestType)
I have several rows for ID 12345 which are correct. But ID 12346 is a single row of which is showing they had a row result of count (1). They had a result of 'UN' on this day but they did not have any other tests that day. I want to exclude this.
I tried the following query
select
c.desc as 'County',
dt.pid as 'PID',
dt.id as 'DrugTestID',
p.id as 'ID',
bio.FullName as 'Participant',
CONVERT(varchar, dt.CollectionDate, 101) as 'CollectionDate',
dtt.desc as 'Drug Test Type',
dt.result as Result,
COUNT(dt.dru_drug_test_type) as 'Count Of Test Type'
from
dbo.Test as dt with (nolock)
join dbo.History as h on dt.pid = h.id
join dbo.Participant as p on h.pid = p.id
join BioData as bio on bio.id = p.id
join County as c with (nolock) on p.CountyCode = c.code
join DrugTestType as dtt with (nolock) on dt.DrugTestType = dtt.code
inner join
(
select distinct
dt2.pid,
CONVERT(varchar, dt2.CollectionDate, 101) as 'CollectionDate'
from
dbo.DrugTest as dt2 with (nolock)
join dbo.History as h2 on dt2.pid = h2.id
join dbo.Participant as p2 on h2.pid = p2.id
where
dt2.result = 'UN'
and dt2.CollectionDate between '11-01-2011' and '10-31-2012'
and p2.DrugCourtType = 'AD'
) as derived
on dt.pid = derived.pid
and convert(varchar, dt.CollectionDate, 101) = convert(varchar, derived.CollectionDate, 101)
group by
c.desc, dt.pid, p.id, dt.id, bio.fullname, dt.CollectionDate, dtt.desc, dt.result
order by
c.desc ASC, Participant ASC, dt.CollectionDate ASC
This is a little complicated because the your query has a separate row for each test. You need to use window/analytic functions to get the information you want. These allow you to do calculate aggregation functions, but to put the values on each line.
The following query starts with your query. It then calculates the number of UN results on each date for each participant and the total number of tests. It applies the appropriate filter to get what you want:
with base as (<your query here>)
select b.*
from (select b.*,
sum(isUN) over (partition by Participant, CollectionDate) as NumUNs,
count(*) over (partition by Partitipant, CollectionDate) as NumTests
from (select b.*,
(case when result = 'UN' then 1 else 0 end) as IsUN
from base
) b
) b
where NumUNs <> 1 or NumTests <> 1
Without the with clause or window functions, you can create a particularly ugly query to do the same thing:
select b.*
from (<your query>) b join
(select Participant, CollectionDate, count(*) as NumTests,
sum(case when result = 'UN' then 1 else 0 end) as NumUNs
from (<your query>) b
group by Participant, CollectionDate
) bsum
on b.Participant = bsum.Participant and
b.CollectionDate = bsum.CollectionDate
where NumUNs <> 1 or NumTests <> 1
If I understand the problem, the basic pattern for this sort of query is simply to include negating or exclusionary conditions in your join. I.E., self-join where columnA matches, but columns B and C do not:
select
[columns]
from
table t1
join table t2 on (
t1.NonPkId = t2.NonPkId
and t1.PkId != t2.PkId
and t1.category != t2.category
)
Put the conditions in the WHERE clause if it benchmarks better:
select
[columns]
from
table t1
join table t2 on (
t1.NonPkId = t2.NonPkId
)
where
t1.PkId != t2.PkId
and t1.category != t2.category
And it's often easiest to start with the self-join, treating it as a "base table" on which to join all related information:
select
[columns]
from
(select
[columns]
from
table t1
join table t2 on (
t1.NonPkId = t2.NonPkId
)
where
t1.PkId != t2.PkId
and t1.category != t2.category
) bt
join [othertable] on (<whatever>)
join [othertable] on (<whatever>)
join [othertable] on (<whatever>)
This can allow you to focus on getting that self-join right, without interference from other tables.

SQL input needed to get correct output

Select A.SubscriberKey, A.EventDate,B.CreatedDate
From _Click A
JOIN _ListSubscribers B
ON A.SubscriberKey = B.SubscriberKey
Where B.ListID = '10630' AND B.CreatedDate > (Select DATEADD(day,-180,getdate())) AND A.EventDate IS NULL
Group By A.SubscriberKey,B.CreatedDate, A.EventDate
At the moment, nothing is being returned. I want to return the Subscribers SubscriberKey(which is their email), the EventDate(the date that the click took place), and the date they were added(CreatedDate) when they have not clicked on anything. Can anyone help point me in the right direction? Thanks everyone!
Try:
SELECT A.SubscriberKey, A.EventDate, B.CreatedDate
FROM _Click A, _ListSubscribers B
WHERE A.SubscriberKey(+) = B.SubscriberKey
AND B.ListID = '10630'
AND B.CreatedDate > (Select DATEADD(day,-180,getdate()))
AND nvl(A.EventDate,null) IS NULL
GROUP BY A.SubscriberKey,B.CreatedDate, A.EventDate
I don't see the purpose of a GROUP BY in this case. You aren't doing any aggregation.
I believe this will give the result you are looking for.
select SubscriberKey,
null as EventDate,
CreatedDate
from _ListSubscribers ls
where ListID='10630'
and CreatedDate > DateAdd( day, -180, getdate())
and not exists( select 1 from _Click c where c.SubscriberKey = B.SubscriberKey )
The following query using an outer join should give the same results, and it is more similar to your original query.
select ls.SubscriberKey,
c.EventDate,
ls.CreatedDate
from _ListSubscribers ls
left outer join _Click c
on c.SubscriberKey = ls.SubscriberKey
where ListID='10630'
and CreatedDate > DateAdd( day, -180, getdate())
and c.EventDate is null