SQL input needed to get correct output - sql

Select A.SubscriberKey, A.EventDate,B.CreatedDate
From _Click A
JOIN _ListSubscribers B
ON A.SubscriberKey = B.SubscriberKey
Where B.ListID = '10630' AND B.CreatedDate > (Select DATEADD(day,-180,getdate())) AND A.EventDate IS NULL
Group By A.SubscriberKey,B.CreatedDate, A.EventDate
At the moment, nothing is being returned. I want to return the Subscribers SubscriberKey(which is their email), the EventDate(the date that the click took place), and the date they were added(CreatedDate) when they have not clicked on anything. Can anyone help point me in the right direction? Thanks everyone!

Try:
SELECT A.SubscriberKey, A.EventDate, B.CreatedDate
FROM _Click A, _ListSubscribers B
WHERE A.SubscriberKey(+) = B.SubscriberKey
AND B.ListID = '10630'
AND B.CreatedDate > (Select DATEADD(day,-180,getdate()))
AND nvl(A.EventDate,null) IS NULL
GROUP BY A.SubscriberKey,B.CreatedDate, A.EventDate

I don't see the purpose of a GROUP BY in this case. You aren't doing any aggregation.
I believe this will give the result you are looking for.
select SubscriberKey,
null as EventDate,
CreatedDate
from _ListSubscribers ls
where ListID='10630'
and CreatedDate > DateAdd( day, -180, getdate())
and not exists( select 1 from _Click c where c.SubscriberKey = B.SubscriberKey )
The following query using an outer join should give the same results, and it is more similar to your original query.
select ls.SubscriberKey,
c.EventDate,
ls.CreatedDate
from _ListSubscribers ls
left outer join _Click c
on c.SubscriberKey = ls.SubscriberKey
where ListID='10630'
and CreatedDate > DateAdd( day, -180, getdate())
and c.EventDate is null

Related

SQL : Get Column table twice with differents clause where

I try to get the same column in the same table twice with different clauses :
My query:
SELECT
*
FROM
(SELECT TOP 10
CONVERT(DATE, attemptdate) AS Date,
Max(currentcount) AS A
FROM
logintracking
INNER JOIN
maxuser ON logintracking.loginid = maxuser.loginid
INNER JOIN
site ON site.siteid = maxuser.defsite
WHERE
attemptdate BETWEEN #dateDebut AND #dateFin
AND logintracking.clientaddr IN ('10.118.254.21', '10.118.254.156')
GROUP BY
CONVERT(DATE, attemptdate)
ORDER BY
CONVERT(DATE, attemptdate) ASC
) AS T1,
(SELECT TOP 10
CONVERT(DATE, attemptdate) AS Date,
MAX(currentcount) AS B
FROM
logintracking
INNER JOIN
maxuser ON logintracking.loginid = maxuser.loginid
INNER JOIN
site ON site.siteid = maxuser.defsite
WHERE
attemptdate BETWEEN #dateDebut AND #dateFin
AND logintracking.clientaddr = '10.118.254.35'
GROUP BY
CONVERT(DATE, attemptdate)
ORDER BY
CONVERT(DATE, attemptdate) ASC) AS T2
Result:
Desired result:
My objective is to get the same column 'max(currentcount)' twice and to apply different where clauses so to get two columns named (A & B), and i need also to show the date in the first column, can you please help ? Thanks
Since the only difference between A and B is logintracking.clientaddr, you can put that condition within a CASE statement within the MAX function:
SELECT CONVERT(DATE, attemptdate) AS Date,
MAX(CASE WHEN logintracking.clientaddr IN ( '10.118.254.21', '10.118.254.156' ) THEN currentcount END) AS A,
MAX(CASE WHEN logintracking.clientaddr IN ( '10.118.254.35' ) THEN currentcount END) AS B
FROM logintracking
INNER JOIN maxuser
ON logintracking.loginid = maxuser.loginid
INNER JOIN site
ON site.siteid = maxuser.defsite
WHERE attemptdate BETWEEN #dateDebut AND #dateFin
GROUP BY CONVERT(DATE, attemptdate)
ORDER BY CONVERT(DATE, attemptdate) ASC

is two inner joins is best for optimization of query

i just got a challenge from school optimise this query this is theoretical question
Challenge :
SELECT TO_CHAR(CONVERT_TIMEZONE ('UTC','America/Los_Angeles',tableA."date"),'YYYY-MM') AS "date_month",
COUNT(DISTINCT CASE WHEN (tableB."date" IS NOT NULL) THEN tableB._id ELSE NULL END) AS "tableB.countB",
COUNT(DISTINCT CASE WHEN (tableC."date" IS NOT NULL) THEN tableC._id ELSE NULL END) AS "tableC.countC"
FROM tableA AS tableA
LEFT JOIN tableB AS tableB ON (DATE (CONVERT_TIMEZONE ('UTC','America/Los_Angeles',tableB."date"))) = (DATE (CONVERT_TIMEZONE ('UTC','America/Los_Angeles',tableA."date")))
LEFT JOIN tableC AS tableC ON (DATE (CONVERT_TIMEZONE ('UTC','America/Los_Angeles',tableC."date"))) = (DATE (CONVERT_TIMEZONE ('UTC','America/Los_Angeles',tableA."date")))
WHERE tableA."date" >= CONVERT_TIMEZONE ('America/Los_Angeles','UTC',DATEADD (month,-17,DATE_TRUNC('month',DATE_TRUNC('day',CONVERT_TIMEZONE ('UTC','America/Los_Angeles',GETDATE ()))))
GROUP BY 1
ORDER BY 1 DESC LIMIT 500;
for optimize, i just remove case statements in above mentioned query i think this will also improve the efficiency of query
SELECT To_char(Convert_timezone ('UTC','America/Los_Angeles',tablea."date"),'YYYY-MM') AS "date_month",
Count(DISTINCT
decode(tableb."date", not null,tableb._id,null)
AS "tableB.countB",
Count(DISTINCT
decode(tablec."date", not null,tablec._id ,null)
AS "tableC.countC"
FROM tablea AS tablea
LEFT JOIN tableb AS tableb
ON (
Date (Convert_timezone ('UTC','America/Los_Angeles',tableb."date"))) = (Date (Convert_timezone ('UTC','America/Los_Angeles',tablea."date")))
LEFT JOIN tablec AS tablec
ON (
Date (Convert_timezone ('UTC','America/Los_Angeles',tablec."date"))) = (Date (Convert_timezone ('UTC','America/Los_Angeles',tablea."date")))
WHERE tablea."date" >= convert_timezone ('America/Los_Angeles','UTC',Dateadd (month,-17,Date_trunc('month',Date_trunc('day',Convert_timezone ('UTC','America/Los_Angeles',Getdate ())))) group BY 1 ORDER BY 1 DESC limit 500;
what you suggest if we remove one left join and merge the statement
is that fine for optimization
... or, use a shorter alias that actually makes the SQL shorter and cleaner. This also helps read-ability. Also, format it to separate clauses (Select, From, Join, Where, Order By, Group by, Having, etc. so they are easy to separate and distinguish with the eye. and use indentation consistent with the logical structure that supports, and does not hinder, you ability to separate those sections one from another.
Just as an example, here's your first SQL query re formatted, but identical in logical structure to what you posted:
SELECT TO_CHAR(CONVERT_TIMEZONE ('UTC','America/Los_Angeles', a.date),'YYYY-MM') date_month,
COUNT(DISTINCT CASE WHEN (b."date" IS NOT NULL) THEN b._id ELSE NULL END) countB,
COUNT(DISTINCT CASE WHEN (c."date" IS NOT NULL) THEN c._id ELSE NULL END) countC
FROM tableA a
LEFT JOIN tableB b
ON (DATE (CONVERT_TIMEZONE ('UTC','America/Los_Angeles',b.date))) =
(DATE (CONVERT_TIMEZONE ('UTC','America/Los_Angeles',a.date)))
LEFT JOIN tableC c
ON (DATE (CONVERT_TIMEZONE ('UTC','America/Los_Angeles',c.date))) =
(DATE (CONVERT_TIMEZONE ('UTC','America/Los_Angeles',a.date)))
WHERE a.date >= CONVERT_TIMEZONE ('America/Los_Angeles', 'UTC',
DATEADD (month,-17,DATE_TRUNC('month',
DATE_TRUNC('day',CONVERT_TIMEZONE ('UTC','America/Los_Angeles',
GETDATE ()))))
GROUP BY 1
ORDER BY 1 DESC LIMIT 500;
Here is an optimized version
SELECT DatePart(month, a.Date-8/24) date_month,
sum(case when b.date is Not null then 1 else 0 end) countb,
sum(case when c.date is Not null then 1 else 0 end) countc,
FROM tableA a
LEFT JOIN tableB b
ON b.Date = a.Date -- Timezone offsets are not necessary,
LEFT JOIN tableC c
ON c.date = a.date -- both in same timezone
WHERE a.date >= DateAdd(hour, 8,
DATEADD (month,-17,DATE_TRUNC('month',
GETDATE () ))
GROUP BY 1
ORDER BY 1 DESC LIMIT 500;
Presumably, the _id columns are unique. So:
SELECT TO_CHAR(CONVERT_TIMEZONE('UTC','America/Los_Angeles', a."date"), 'YYYY-MM') AS date_month,
SUM(CASE WHEN b."date" IS NOT NULL THEN 1 ELSE 0 END) AS tableB_countB,
SUM(CASE WHEN c."date" IS NOT NULL THEN 1 ELSE 0 END) AS tableC_countC
FROM tableA a LEFT JOIN
tableB b
ON DATE(CONVERT_TIMEZONE ('UTC', 'America/Los_Angeles', b."date")) = DATE(CONVERT_TIMEZONE ('UTC', 'America/Los_Angeles', b."date")) LEFT JOIN
tableC c
ON DATE(CONVERT_TIMEZONE('UTC', 'America/Los_Angeles', c."date")) = DATE(CONVERT_TIMEZONE('UTC', 'America/Los_Angeles', a."date")
WHERE a."date" >= CONVERT_TIMEZONE('America/Los_Angeles', 'UTC',
DATEADD(month, -17, DATE_TRUNC('month', DATE_TRUNC('day', CONVERT_TIMEZONE('UTC', 'America/Los_Angeles', GETDATE ()))
GROUP BY 1
ORDER BY 1 DESC
LIMIT 500;
Then, the date conversions in the ON clause don't seem necessary, because the two sides are being converted from the same time zone. If the values have no time component (as suggested by a name like date), then the DATE() is not needed either:
SELECT TO_CHAR(CONVERT_TIMEZONE('UTC', 'America/Los_Angeles', a."date"), 'YYYY-MM') AS date_month,
SUM(CASE WHEN b."date" IS NOT NULL THEN 1 ELSE 0 END) AS tableB_countB,
SUM(CASE WHEN c."date" IS NOT NULL THEN 1 ELSE 0 END) AS tableC_countC
FROM tableA a LEFT JOIN
tableB b
ON b."date" = b."date" LEFT JOIN
tableC c
ON c."date" = a."date"
WHERE a."date" >= CONVERT_TIMEZONE('America/Los_Angeles', 'UTC',
DATEADD(month, -17, DATE_TRUNC('month', DATE_TRUNC('day', CONVERT_TIMEZONE('UTC', 'America/Los_Angeles', GETDATE ()))
GROUP BY 1
ORDER BY 1 DESC
LIMIT 500;
The WHERE clause is fine. It can take advantage of an index on a(date).

Slow query when using NOT EXIST in Query

I would like to seek some help regarding the query below.
Running this Script causes the system to timeout. The query is so slow it took 5 minutes to run for just 22 records. I believe this has something to do with "NOT IN" statement. I already look for answers here in Stackoverflow regarding this and some are suggesting using LEFT OUTER JOIN and WHERE NOT EXIST but I can't seem to incorporate it in this query.
SELECT a.UserId, COUNT(DISTINCT(a.CustomerId)) AS TotalUniqueContact
FROM [UserActivityLog] a WITH(NOLOCK)
WHERE CAST(a.ActivityDatetime AS DATE) BETWEEN '2015-09-28' AND '2015-09-30' AND a.ID
NOT IN (
SELECT DISTINCT(COALESCE(a.activitylogid, 0))
FROM [CustomerNoteInteractions] a WITH(NOLOCK)
WHERE a.reason IN ('20', '36') AND CAST(a.datecreated AS DATE) BETWEEN '2015-09-28' AND '2015-09-30' AND a.UserId IN (SELECT b.Id
FROM [User] b
WHERE b.UserType = 'EpicUser' AND b.IsEpicEmployee = 1 AND b.IsActive = 1)
)
AND a.UserId IN (
SELECT b.Id
FROM [User] b
WHERE b.UserType = 'EpicUser' AND b.IsEpicEmployee = 1 AND b.IsActive = 1)
GROUP BY a.UserId
Here is what should be an equivalent query using EXISTS and NOT EXISTS:
SELECT a.UserId,
COUNT(DISTINCT a.CustomerId) AS TotalUniqueContact
FROM [UserActivityLog] a WITH(NOLOCK)
WHERE CAST(a.ActivityDatetime AS DATE) BETWEEN '2015-09-28' AND '2015-09-30'
AND EXISTS (SELECT *
FROM [User] b
WHERE b.Id = a.UserId
AND b.UserType = 'EpicUser'
AND b.IsEpicEmployee = 1
AND b.IsActive = 1)
AND NOT EXISTS (SELECT *
FROM [CustomerNoteInteractions] b WITH(NOLOCK)
JOIN [User] c
ON c.Id = b.UserId
AND c.UserType = 'EpicUser'
AND c.IsEpicEmployee = 1
AND c.IsActive = 1
WHERE b.activitylogid = a.ID
AND b.reason IN ('20', '36')
AND CAST(b.datecreated AS DATE) BETWEEN '2015-09-28' AND '2015-09-30' )
GROUP BY a.UserId
Obviously, it's hard to understand what will truly help your performance without understanding your data. But here is what I expect:
I think the EXISTS/NOT EXISTS version of the query will help.
I think your conditions on UserActivityLog.ActivityDateTime and CustomerNoteInteractions.datecreated are a problem. Why are you casting? Is it not a date type? If not, why not? You would probably get big gains if you could take advantage of an index on those columns. But with the cast, I don't think you can use an index there. Can you do something about it?
You'll also probably benefit from indexes on User.Id (probably the PK anyways), and CustomerNoteInteractions.ActivityLogId.
Also, not a big fan of using with (nolock) to improve performance (Bad habits : Putting NOLOCK everywhere).
EDIT
If your date columns are of type DateTime as you mention in the comments, and so you are using the CAST to eliminate the time portion, a much better alternative for performance is to not cast, but instead modify the way you filter the column. Doing this will allow you to take advantage of any index on the date column. It could make a very big difference.
The query could then be further improved like this:
SELECT a.UserId,
COUNT(DISTINCT a.CustomerId) AS TotalUniqueContact
FROM [UserActivityLog] a WITH(NOLOCK)
WHERE a.ActivityDatetime >= '2015-09-28'
AND a.ActivityDatetime < dateadd(day, 1, '2015-09-30')
AND EXISTS (SELECT *
FROM [User] b
WHERE b.Id = a.UserId
AND b.UserType = 'EpicUser'
AND b.IsEpicEmployee = 1
AND b.IsActive = 1)
AND NOT EXISTS (SELECT *
FROM [CustomerNoteInteractions] b WITH(NOLOCK)
JOIN [User] c
ON c.Id = b.UserId
AND c.UserType = 'EpicUser'
AND c.IsEpicEmployee = 1
AND c.IsActive = 1
WHERE b.activitylogid = a.ID
AND b.reason IN ('20', '36')
AND b.datecreated >= '2015-09-28'
AND b.datecreated < dateadd(day, 1, '2015-09-30'))
GROUP BY a.UserId
This should get you pretty close or exactly work:
SELECT a.UserId, COUNT(DISTINCT(a.CustomerId)) AS TotalUniqueContact
FROM [UserActivityLog] a WITH(NOLOCK)
inner join [User] b with (Nolock) on a.userid = b.id
and b.UserType = 'EpicUser' AND b.IsEpicEmployee = 1 AND b.IsActive = 1
left outer join [CustomerNoteInteractions] c with (nolock) on a.id = c.activitylogid
and c.reason IN ('20', '36') AND CAST(c.datecreated AS DATE) BETWEEN '2015-09-28' AND '2015-09-30'
left outer join [User] d with (nolock) on c.userid = d.id
and d.UserType = 'EpicUser' AND d.IsEpicEmployee = 1 AND d.IsActive = 1
WHERE CAST(a.ActivityDatetime AS DATE) BETWEEN '2015-09-28' AND '2015-09-30'
and c.activitylogid is null
GROUP BY a.UserId

Query timing out / making query more efficient (exacttarget)

Here is my query, can anyone see any way to make this more efficient so it doesn't time out? I'm using Exacttarget (Salesforce Marketing Cloud). It has a 30 minute timeout limit. I've tried moving things around but it always seems to error. I'm kind of a beginner with SQL but I've been hitting it fairly hard the last week. My query is below. THanks!
SELECT DISTINCT c.Email, c.FName
FROM ENT.Contacts c WITH(NOLOCK)
INNER JOIN ENT.RegistrationData r WITH(NOLOCK)
ON c.Email = r.RegistrationContactEmail
LEFT Join ENT._Subscribers s WITH(NOLOCK)
ON c.Email = s.SubscriberKey
AND s.status NOT IN ('unsubscribed','held')
WHERE
(
(
(
(
r.RegistrationEmailOptStatus = '1' AND
r.RegistrationEventType = 'Wedding' AND
r.RegistrationEventRole IN ('Bride','Other','Bridesmaid','Mother Of the Bride') AND
r.RegistrationCountry IN ('USA') AND
r.RegistrationEventDate < '2014-05-31'
)
OR
(
r.RegistrationEmailOptStatus = '1' AND
r.RegistrationEventType = 'Prom' AND
r.RegistrationEventRole ='Prom' AND
r.RegistrationCountry IN ('USA') AND
r.RegistrationEventDate BETWEEN '2014-01-01' AND '2015-12-31'
)
)
AND
(
(
c.Email IN
(
SELECT DISTINCT
s.SubscriberKey AS Email
FROM
_Open s
WHERE
datediff(mm,s.EventDate, getdate()) <= 3
)
)
OR
(
c.Email IN
(
SELECT DISTINCT
s.SubscriberKey AS Email
FROM
_Click s
WHERE
datediff(mm,s.EventDate, getdate()) <= 3
)
)
)
)
OR
(
r.RegistrationEmailOptStatus = '1' AND
r.RegistrationEventType = 'Wedding' AND
r.RegistrationEventRole IN ('Bride','Other','Bridesmaid','Mother Of the Bride') AND
r.RegistrationCountry IN ('USA') AND
r.RegistrationEventDate BETWEEN '2015-05-01' AND '2015-05-31'
)
)
I agree with Karl that your main performance hit is in the sub-query that references the _Open and _Click system data views. But, based on my experience with the ExactTarget (Salesforce Marketing Cloud), you are limited to only running 'SELECT' statements and will not be able to declare a variable this way.
I recommend running a separate query on the _Open and _Click data view and then reference the resulting data extension in your query. This may require more steps. But, you'll find the overall processing time is less.
For the first query, I would create a data extension of everyone that has either opened or clicked in the last 3 months. And then in the second query, I would reference the resulting data extension with a "IN" statement. This will eliminate one of the "OR" conditions in your query, which can be expensive. If the query still preforms poorly, I would suggest re-writing the conditional logic on the RegistrationData data extension in a way to a avoids "OR" conditions.
Query1:
SELECT DISTINCT s.SubscriberKey AS Email
FROM _Open s WITH(NOLOCK)
WHERE datediff(mm,s.EventDate, getdate()) <= 3
union all
SELECT DISTINCT s.SubscriberKey AS Email
FROM _Click s WITH(NOLOCK)
WHERE datediff(mm,s.EventDate, getdate()) <= 3
Query2:
SELECT DISTINCT c.Email, c.FName
FROM ENT.Contacts c WITH(NOLOCK)
INNER JOIN ENT.RegistrationData r WITH(NOLOCK)
ON c.Email = r.RegistrationContactEmail
LEFT Join ENT._Subscribers s WITH(NOLOCK)
ON c.Email = s.SubscriberKey
AND s.status NOT IN ('unsubscribed','held')
WHERE
(
(
(
(
r.RegistrationEmailOptStatus = '1' AND
r.RegistrationEventType = 'Wedding' AND
r.RegistrationEventRole IN ('Bride','Other','Bridesmaid','Mother Of the Bride') AND
r.RegistrationCountry IN ('USA') AND
r.RegistrationEventDate < '2014-05-31'
)
OR
(
r.RegistrationEmailOptStatus = '1' AND
r.RegistrationEventType = 'Prom' AND
r.RegistrationEventRole ='Prom' AND
r.RegistrationCountry IN ('USA') AND
r.RegistrationEventDate BETWEEN '2014-01-01' AND '2015-12-31'
)
)
AND
(
c.Email in (
select s.SubscriberKey
from OpenOrClickDE s
where s.SubscriberKey = c.Email
)
)
)
OR
(
r.RegistrationEmailOptStatus = '1' AND
r.RegistrationEventType = 'Wedding' AND
r.RegistrationEventRole IN ('Bride','Other','Bridesmaid','Mother Of the Bride') AND
r.RegistrationCountry IN ('USA') AND
r.RegistrationEventDate BETWEEN '2015-05-01' AND '2015-05-31'
)
)
I'll take a shot. There maybe some minor stuff, but the only thing that looks to me like it should make the query spin a long time is
c.Email IN
(
SELECT DISTINCT
s.SubscriberKey AS Email
FROM
_Open s
WHERE
datediff(mm,s.EventDate, getdate()) <= 3
)
OR
c.Email IN
(
SELECT DISTINCT
s.SubscriberKey AS Email
FROM
_Click s
WHERE
datediff(mm,s.EventDate, getdate()) <= 3
)
There are two issues in there. First you are doing date math a bajillion times and second using IN (SELECT ...) here is almost certainly inefficient.
To address the first, calculate a single test date to and use that. For the second prefer checking with EXISTS.
DECLARE #testDate DATE = DATEADD(mm,3,GETDATE())
...
EXISTS(SELECT 1 FROM _Open s WHERE s.EventDate>#testDate AND c.Email = s.SubscriberKey)
OR EXISTS(SELECT 1 FROM _Click s WHERE s.EventDate>#testDate AND c.Email = s.SubscriberKey)
You could also probably unwind the EXISTS and use joins to _Open and _Click, but that feels more complex.
Give this a shot and let us know if it helps.

Count based on subset of data

I have a join to a table and I want to include all users who have a record after a certain date, but to only include records after another date in the count.
Here is my SQL :
select a.userid, count(ce.codeentryid)
from [account] a
inner join [profile] p
on a.userid = p.userid
inner join codesentered ce
on a.userid = ce.userid and ce.entrydate > '2011-01-01 00:00:00'
where a.camp = 0
group by a.userid
order by a.userid
So here I want to view a list of all users who have entered a code after 1st Jan 2011, but to only include in the count codes entered after 1st Jan 2013. How would I do this?
EDIT : So this would give me all users who have entered a code after 01/01/2011, but only include codes entered after 01/01/2013 in the count?
select a.userid, count(CASE WHEN ce.entrydate > '2013-01-01 00:00:00' THEN 1 ELSE 0 END)
from [account] a
inner join [profile] p
on a.userid = p.userid
inner join codesentered ce
on a.userid = ce.userid and ce.entrydate > '2011-01-01 00:00:00'
where a.camp = 0
group by a.userid
order by a.userid
Remove the date condition from the ON clause, and use this in the SELECT clause instead of COUNT(ce.codeentryid):
SUM(CASE WHEN ce.entrydate > '2011-01-01 00:00:00' THEN 1 ELSE 0 END)
Your question doesn't make sense, because using two dates is redundant. Unless I assume that you want users whose first count is after 2011-01-01 and then only count what happens after 2013-01-01.
If that is what you want, then use a having clause:
select a.userid, sum(CASE WHEN ce.entrydate > '2013-01-01 00:00:00' THEN 1 ELSE 0 END)
from [account] a inner join
[profile] p
on a.userid = p.userid inner join
codesentered ce
on a.userid = ce.userid
where a.camp = 0
group by a.userid
having min(ce.entrydate) > '2011-01-01 00:00:00'
order by a.userid;
Note that count(CASE WHEN ce.entrydate > '2013-01-01 00:00:00' THEN 1 ELSE 0 END) is the same as count(*). count() counts non-null values. Use sum() instead.