Count based on subset of data - sql

I have a join to a table and I want to include all users who have a record after a certain date, but to only include records after another date in the count.
Here is my SQL :
select a.userid, count(ce.codeentryid)
from [account] a
inner join [profile] p
on a.userid = p.userid
inner join codesentered ce
on a.userid = ce.userid and ce.entrydate > '2011-01-01 00:00:00'
where a.camp = 0
group by a.userid
order by a.userid
So here I want to view a list of all users who have entered a code after 1st Jan 2011, but to only include in the count codes entered after 1st Jan 2013. How would I do this?
EDIT : So this would give me all users who have entered a code after 01/01/2011, but only include codes entered after 01/01/2013 in the count?
select a.userid, count(CASE WHEN ce.entrydate > '2013-01-01 00:00:00' THEN 1 ELSE 0 END)
from [account] a
inner join [profile] p
on a.userid = p.userid
inner join codesentered ce
on a.userid = ce.userid and ce.entrydate > '2011-01-01 00:00:00'
where a.camp = 0
group by a.userid
order by a.userid

Remove the date condition from the ON clause, and use this in the SELECT clause instead of COUNT(ce.codeentryid):
SUM(CASE WHEN ce.entrydate > '2011-01-01 00:00:00' THEN 1 ELSE 0 END)

Your question doesn't make sense, because using two dates is redundant. Unless I assume that you want users whose first count is after 2011-01-01 and then only count what happens after 2013-01-01.
If that is what you want, then use a having clause:
select a.userid, sum(CASE WHEN ce.entrydate > '2013-01-01 00:00:00' THEN 1 ELSE 0 END)
from [account] a inner join
[profile] p
on a.userid = p.userid inner join
codesentered ce
on a.userid = ce.userid
where a.camp = 0
group by a.userid
having min(ce.entrydate) > '2011-01-01 00:00:00'
order by a.userid;
Note that count(CASE WHEN ce.entrydate > '2013-01-01 00:00:00' THEN 1 ELSE 0 END) is the same as count(*). count() counts non-null values. Use sum() instead.

Related

How to use count(‘condition') over(partition by xx) in SQL

Here is the database:
The first table is named Trip and the second table is Users.
CLIENT_ID and DRIVER_ID are foreign keys for USERS_ID in the Users table. I want to find how many orders cancelled by the non-banned driver and non-banned passenger for each day (Trip.Status != 'completed' and Users.Banned ='No').
My code is :
SELECT
t.Request_at AS 'Day',
COUNT(Status != 'completed') OVER (PARTITION BY t.Request_at) AS 'Cancellation Num'
FROM
Trips t
JOIN
Users u1 ON t.Client_Id = u1.Users_Id AND u1.Banned = 'No'
JOIN
Users u2 ON t.Driver_Id = u2.Users_Id AND u2.Banned = 'No'
WHERE
t.Request_at >= '2013-10-01' AND t.Request_at <= '2013-10-03'
GROUP BY
t.Request_at
The results for '2013-10-01' and '2013-10-03' are right(both equal to 1). But it turns wrong at '2013-10-02'. It becomes to 1, but it should be 0. I do not know where is the mistake in my code. Could someone help me?
I suspect that you really want conditional aggregation, not a window function:
SELECT t.Request_at AS Day,
SUM(CASE WHEN Status <> 'completed' THEN 1 ELSE 0 END) as num_cancellations
FROM Trips t JOIN
Users u1
ON t.Client_Id = u1.Users_Id AND
u1.Banned = 'No' JOIN
Users u2
ON t.Driver_Id = u2.Users_Id AND u2.Banned = 'No'
WHERE t.Request_at >= '2013-10-01' AND t.Request_at <= '2013-10-03'
GROUP BY t.Request_at;
Note: Only use single quotes for string and date constants. Don't use them for column aliases.
Try this:
SELECT
t.Request_at AS 'Day',
COUNT(case when Status != 'completed' then 1 else null end) OVER (PARTITION BY t.Request_at) AS 'Cancellation Num'
FROM
Trips t
JOIN
Users u1 ON t.Client_Id = u1.Users_Id AND u1.Banned = 'No'
JOIN
Users u2 ON t.Driver_Id = u2.Users_Id AND u2.Banned = 'No'
WHERE
t.Request_at >= '2013-10-01' AND t.Request_at <= '2013-10-03'
GROUP BY
t.Request_at

is two inner joins is best for optimization of query

i just got a challenge from school optimise this query this is theoretical question
Challenge :
SELECT TO_CHAR(CONVERT_TIMEZONE ('UTC','America/Los_Angeles',tableA."date"),'YYYY-MM') AS "date_month",
COUNT(DISTINCT CASE WHEN (tableB."date" IS NOT NULL) THEN tableB._id ELSE NULL END) AS "tableB.countB",
COUNT(DISTINCT CASE WHEN (tableC."date" IS NOT NULL) THEN tableC._id ELSE NULL END) AS "tableC.countC"
FROM tableA AS tableA
LEFT JOIN tableB AS tableB ON (DATE (CONVERT_TIMEZONE ('UTC','America/Los_Angeles',tableB."date"))) = (DATE (CONVERT_TIMEZONE ('UTC','America/Los_Angeles',tableA."date")))
LEFT JOIN tableC AS tableC ON (DATE (CONVERT_TIMEZONE ('UTC','America/Los_Angeles',tableC."date"))) = (DATE (CONVERT_TIMEZONE ('UTC','America/Los_Angeles',tableA."date")))
WHERE tableA."date" >= CONVERT_TIMEZONE ('America/Los_Angeles','UTC',DATEADD (month,-17,DATE_TRUNC('month',DATE_TRUNC('day',CONVERT_TIMEZONE ('UTC','America/Los_Angeles',GETDATE ()))))
GROUP BY 1
ORDER BY 1 DESC LIMIT 500;
for optimize, i just remove case statements in above mentioned query i think this will also improve the efficiency of query
SELECT To_char(Convert_timezone ('UTC','America/Los_Angeles',tablea."date"),'YYYY-MM') AS "date_month",
Count(DISTINCT
decode(tableb."date", not null,tableb._id,null)
AS "tableB.countB",
Count(DISTINCT
decode(tablec."date", not null,tablec._id ,null)
AS "tableC.countC"
FROM tablea AS tablea
LEFT JOIN tableb AS tableb
ON (
Date (Convert_timezone ('UTC','America/Los_Angeles',tableb."date"))) = (Date (Convert_timezone ('UTC','America/Los_Angeles',tablea."date")))
LEFT JOIN tablec AS tablec
ON (
Date (Convert_timezone ('UTC','America/Los_Angeles',tablec."date"))) = (Date (Convert_timezone ('UTC','America/Los_Angeles',tablea."date")))
WHERE tablea."date" >= convert_timezone ('America/Los_Angeles','UTC',Dateadd (month,-17,Date_trunc('month',Date_trunc('day',Convert_timezone ('UTC','America/Los_Angeles',Getdate ())))) group BY 1 ORDER BY 1 DESC limit 500;
what you suggest if we remove one left join and merge the statement
is that fine for optimization
... or, use a shorter alias that actually makes the SQL shorter and cleaner. This also helps read-ability. Also, format it to separate clauses (Select, From, Join, Where, Order By, Group by, Having, etc. so they are easy to separate and distinguish with the eye. and use indentation consistent with the logical structure that supports, and does not hinder, you ability to separate those sections one from another.
Just as an example, here's your first SQL query re formatted, but identical in logical structure to what you posted:
SELECT TO_CHAR(CONVERT_TIMEZONE ('UTC','America/Los_Angeles', a.date),'YYYY-MM') date_month,
COUNT(DISTINCT CASE WHEN (b."date" IS NOT NULL) THEN b._id ELSE NULL END) countB,
COUNT(DISTINCT CASE WHEN (c."date" IS NOT NULL) THEN c._id ELSE NULL END) countC
FROM tableA a
LEFT JOIN tableB b
ON (DATE (CONVERT_TIMEZONE ('UTC','America/Los_Angeles',b.date))) =
(DATE (CONVERT_TIMEZONE ('UTC','America/Los_Angeles',a.date)))
LEFT JOIN tableC c
ON (DATE (CONVERT_TIMEZONE ('UTC','America/Los_Angeles',c.date))) =
(DATE (CONVERT_TIMEZONE ('UTC','America/Los_Angeles',a.date)))
WHERE a.date >= CONVERT_TIMEZONE ('America/Los_Angeles', 'UTC',
DATEADD (month,-17,DATE_TRUNC('month',
DATE_TRUNC('day',CONVERT_TIMEZONE ('UTC','America/Los_Angeles',
GETDATE ()))))
GROUP BY 1
ORDER BY 1 DESC LIMIT 500;
Here is an optimized version
SELECT DatePart(month, a.Date-8/24) date_month,
sum(case when b.date is Not null then 1 else 0 end) countb,
sum(case when c.date is Not null then 1 else 0 end) countc,
FROM tableA a
LEFT JOIN tableB b
ON b.Date = a.Date -- Timezone offsets are not necessary,
LEFT JOIN tableC c
ON c.date = a.date -- both in same timezone
WHERE a.date >= DateAdd(hour, 8,
DATEADD (month,-17,DATE_TRUNC('month',
GETDATE () ))
GROUP BY 1
ORDER BY 1 DESC LIMIT 500;
Presumably, the _id columns are unique. So:
SELECT TO_CHAR(CONVERT_TIMEZONE('UTC','America/Los_Angeles', a."date"), 'YYYY-MM') AS date_month,
SUM(CASE WHEN b."date" IS NOT NULL THEN 1 ELSE 0 END) AS tableB_countB,
SUM(CASE WHEN c."date" IS NOT NULL THEN 1 ELSE 0 END) AS tableC_countC
FROM tableA a LEFT JOIN
tableB b
ON DATE(CONVERT_TIMEZONE ('UTC', 'America/Los_Angeles', b."date")) = DATE(CONVERT_TIMEZONE ('UTC', 'America/Los_Angeles', b."date")) LEFT JOIN
tableC c
ON DATE(CONVERT_TIMEZONE('UTC', 'America/Los_Angeles', c."date")) = DATE(CONVERT_TIMEZONE('UTC', 'America/Los_Angeles', a."date")
WHERE a."date" >= CONVERT_TIMEZONE('America/Los_Angeles', 'UTC',
DATEADD(month, -17, DATE_TRUNC('month', DATE_TRUNC('day', CONVERT_TIMEZONE('UTC', 'America/Los_Angeles', GETDATE ()))
GROUP BY 1
ORDER BY 1 DESC
LIMIT 500;
Then, the date conversions in the ON clause don't seem necessary, because the two sides are being converted from the same time zone. If the values have no time component (as suggested by a name like date), then the DATE() is not needed either:
SELECT TO_CHAR(CONVERT_TIMEZONE('UTC', 'America/Los_Angeles', a."date"), 'YYYY-MM') AS date_month,
SUM(CASE WHEN b."date" IS NOT NULL THEN 1 ELSE 0 END) AS tableB_countB,
SUM(CASE WHEN c."date" IS NOT NULL THEN 1 ELSE 0 END) AS tableC_countC
FROM tableA a LEFT JOIN
tableB b
ON b."date" = b."date" LEFT JOIN
tableC c
ON c."date" = a."date"
WHERE a."date" >= CONVERT_TIMEZONE('America/Los_Angeles', 'UTC',
DATEADD(month, -17, DATE_TRUNC('month', DATE_TRUNC('day', CONVERT_TIMEZONE('UTC', 'America/Los_Angeles', GETDATE ()))
GROUP BY 1
ORDER BY 1 DESC
LIMIT 500;
The WHERE clause is fine. It can take advantage of an index on a(date).

Group By with Where

I want to do a group by on a dataset with a where clause based upon a datetime, but I need to return a count of 0 for any users in the Account table that do not meet the where date requirement. Here is my SQL statement:
select a.userid, count(c.codeentryid)
from [account] a
left join codesentered c
on a.userid = c.userid
where a.camp = 0 and c.entrydate > '2013-12-03 00:00:00'
group by a.userid
order by a.userid
Currently I get counts for all the users who meet the entrydate requirement, but how would I also return the users who don't meet this requirement with a count of 0?
You can include the condition in the join. Since it is a left outer join, it will always show all records from account, and only those of codesentered which match the condition:
select a.userid, count(c.codeentryid)
from [account] a
left outer join codesentered c
on a.userid = c.userid
/* here */ and c.entrydate > '2013-12-03 00:00:00'
where a.camp = 0
group by a.userid
order by a.userid
When you are using a left join, all conditions on the second table should go into the on clause. Otherwise, the outer join becomes an inner join. So, try this:
select a.userid, count(c.codeentryid)
from [account] a left join
codesentered c
on a.userid = c.userid and c.entrydate > '2013-12-03 00:00:00'
where a.camp = 0
group by a.userid
order by a.userid;
Conditions on the first table, in the on clause are basically ignored. A left join returns all rows from the first table, even when the on clause evaluates to false or NULL.
Something like this maybe. It would be easier to test with some sample data.
select a.userid, SUM(CASE WHEN c.entrydate > '2013-12-03 00:00:00' THEN 1 ELSE 0 END)
from [account] a
left join codesentered c
on a.userid = c.userid
where a.camp = 0
group by a.userid
order by a.userid

Left Join is not working

I wrote following query
SELECT
us.Id as Id, us.Name as Name,
SUM(CASE WHEN c.isPublish = 0 THEN 1 ELSE 0 END) AS PendingCoupons,
SUM(CASE WHEN c.isPublish = 1 and convert(date,c.PublishedDate,101) >= convert(date, GETDATE(), 101) THEN 1 ELSE 0 END) AS ApprovedCouponsToday,
SUM(CASE WHEN c.isPublish = 0 and convert(date,c. CreateDate, 101) = convert(date, GETDATE(), 101) THEN 1 ELSE 0 END) AS PendingCouponsToday,
SUM(CASE WHEN c.isPublish = 1 THEN 1 ELSE 0 END) AS ApprovedCoupons,
SUM(CASE WHEN c.isPublish = 1 and c.Userid = us.Id and convert(date, c.PublishedDate, 101) >= convert(date, GETDATE(), 101) THEN 1 ELSE 0 END) AS ApprovedByUserToday,
SUM(CASE WHEN c.isPublish = 1 and c.Userid = us.Id THEN 1 ELSE 0 END) AS ApprovedByUser,
SUM(CASE WHEN c.ReviewVerify = 1 and convert(date, c.PublishedDate, 101) >= convert(date, GETDATE(), 101) THEN 1 ELSE 0 END) AS ProcessToday,
COUNT(*) AS Total
FROM
Users AS us
LEFT JOIN
Coupon c ON Userid = us.Id
GROUP BY
us.Name , us.Id
and I have following two tables
and after running above query the result is always this
Is there any error in this query , because its always returning me count " 0 " and I have almost 100 coupons on every user but its not showing
It would seem that since RIGHT JOIN works and gives correct answers (and probably a bogus user id) while a LEFT JOIN doesn't give any results related to coupons at all, the coupons are registered on a non existing user.
LEFT JOIN demands that data exists in the left part of the table that possibly exists in the right side, while RIGHT JOIN does the reverse. In other words, data exists in your rightmost table (coupon) that doesn't have any connection to the left side (user)
I think you may need to use a RIGHT JOIN to get all records form the Coupon table.
You not using the correct join.
You want to get coupon for a user id?
Use this join.
FROM
Users AS us
LEFT JOIN
Coupon c ON us.Id = c.Userid
if this does not work then use:
FROM
Users AS us
LEFT OUTER JOIN
Coupon c ON us.Id = c.Userid

SQL input needed to get correct output

Select A.SubscriberKey, A.EventDate,B.CreatedDate
From _Click A
JOIN _ListSubscribers B
ON A.SubscriberKey = B.SubscriberKey
Where B.ListID = '10630' AND B.CreatedDate > (Select DATEADD(day,-180,getdate())) AND A.EventDate IS NULL
Group By A.SubscriberKey,B.CreatedDate, A.EventDate
At the moment, nothing is being returned. I want to return the Subscribers SubscriberKey(which is their email), the EventDate(the date that the click took place), and the date they were added(CreatedDate) when they have not clicked on anything. Can anyone help point me in the right direction? Thanks everyone!
Try:
SELECT A.SubscriberKey, A.EventDate, B.CreatedDate
FROM _Click A, _ListSubscribers B
WHERE A.SubscriberKey(+) = B.SubscriberKey
AND B.ListID = '10630'
AND B.CreatedDate > (Select DATEADD(day,-180,getdate()))
AND nvl(A.EventDate,null) IS NULL
GROUP BY A.SubscriberKey,B.CreatedDate, A.EventDate
I don't see the purpose of a GROUP BY in this case. You aren't doing any aggregation.
I believe this will give the result you are looking for.
select SubscriberKey,
null as EventDate,
CreatedDate
from _ListSubscribers ls
where ListID='10630'
and CreatedDate > DateAdd( day, -180, getdate())
and not exists( select 1 from _Click c where c.SubscriberKey = B.SubscriberKey )
The following query using an outer join should give the same results, and it is more similar to your original query.
select ls.SubscriberKey,
c.EventDate,
ls.CreatedDate
from _ListSubscribers ls
left outer join _Click c
on c.SubscriberKey = ls.SubscriberKey
where ListID='10630'
and CreatedDate > DateAdd( day, -180, getdate())
and c.EventDate is null