Cohort Analysis in SQL while recounting users - sql

I'm trying to create a cohort query using SQL.
Usually with cohort analysis we look at users and check if a user who performed a specific action at a specific time and count if that user performs the same action over time.
WITH by_week
AS (SELECT
user_id,
TD_DATE_TRUNC('week', login_time) AS login_week
FROM logins
GROUP BY 1, 2),
with_first_week
AS (SELECT
user_id,
login_week,
FIRST_VALUE(login_week) OVER (PARTITION BY user_id ORDER BY login_week) AS first_week
FROM by_week),
with_week_number
AS (SELECT
user_id,
login_week,
first_week,
(login_week - first_week) / (24 * 60 * 60 * 7) AS week_number
FROM with_first_week)
SELECT
TD_TIME_FORMAT(first_week, 'yyyy-MM-dd') AS first_week,
SUM(CASE WHEN week_number = 1 THEN 1 ELSE 0 END) AS week_1,
SUM(CASE WHEN week_number = 2 THEN 1 ELSE 0 END) AS week_2,
SUM(CASE WHEN week_number = 3 THEN 1 ELSE 0 END) AS week_3,
SUM(CASE WHEN week_number = 4 THEN 1 ELSE 0 END) AS week_4,
SUM(CASE WHEN week_number = 5 THEN 1 ELSE 0 END) AS week_5,
SUM(CASE WHEN week_number = 6 THEN 1 ELSE 0 END) AS week_6,
SUM(CASE WHEN week_number = 7 THEN 1 ELSE 0 END) AS week_7,
SUM(CASE WHEN week_number = 8 THEN 1 ELSE 0 END) AS week_8,
SUM(CASE WHEN week_number = 9 THEN 1 ELSE 0 END) AS week_9
FROM with_week_number
GROUP BY 1
ORDER BY 1
But let say now I don't care that much about first time/user-level analysis and I only want to see if my login action increases over time (i.e I want to add up logins of the first cohort during week 2 with logins of the second cohort in week 1). Is there a simple/elegant way to do this?
Edit:
Giving an example below
WeekStart Week1 Week2 Week 3
2017/05/03 66 **53** **49**
2017/05/10 (**53**+74) (**49**+70) **65**
2017/05/17 (**49**+ 70 + 45) (**65** + 80) etc.

I think you need to group by login_week instead of first_week so you count all logins during the given week in every row, not by cohort, and then you have to use >= instead of = so it will sum up this week's cohort with all older cohorts in any given row.
WITH
by_week AS (
SELECT
user_id,
TD_DATE_TRUNC('week', login_time) AS login_week
FROM logins
GROUP BY 1, 2
)
,with_first_week AS (
SELECT
user_id,
login_week,
FIRST_VALUE(login_week) OVER (PARTITION BY user_id ORDER BY login_week) AS first_week
FROM by_week
)
,with_week_number AS (
SELECT
user_id,
login_week,
first_week,
(login_week - first_week) / (24 * 60 * 60 * 7) AS week_number
FROM with_first_week
)
SELECT
TD_TIME_FORMAT(login_week, 'yyyy-MM-dd') AS login_week,
SUM(CASE WHEN week_number>= 1 THEN 1 ELSE 0 END) AS week_1,
SUM(CASE WHEN week_number>= 2 THEN 1 ELSE 0 END) AS week_2,
SUM(CASE WHEN week_number>= 3 THEN 1 ELSE 0 END) AS week_3,
SUM(CASE WHEN week_number>= 4 THEN 1 ELSE 0 END) AS week_4,
SUM(CASE WHEN week_number>= 5 THEN 1 ELSE 0 END) AS week_5,
SUM(CASE WHEN week_number>= 6 THEN 1 ELSE 0 END) AS week_6,
SUM(CASE WHEN week_number>= 7 THEN 1 ELSE 0 END) AS week_7,
SUM(CASE WHEN week_number>= 8 THEN 1 ELSE 0 END) AS week_8,
SUM(CASE WHEN week_number>= 9 THEN 1 ELSE 0 END) AS week_9
FROM with_week_number
GROUP BY 1
ORDER BY 1;

Related

SQL Basic: Group by function

I am trying to get a table that would outline two group by functions but having some minor difficulty.
select
to_char("CreateTime", 'YYYY-MM') as MonthYear,
floor(sum("Time"))::integer / 60 as "#MinutesWorkouts",
sum(case when "Type" = 29 then 1 else 0 end) as "#Streaming",
(sum(case when "Type" = 29 then "Time" else 0 end))::integer / 60 as "StreamingMinutes",
sum(case when "Type" = 9 then 1 else 0 end) as "#GuidedProgram",
sum(case when "Type" = 28 then 1 else 0 end) as "#Tall"
from
match_history
WHERE
"MachineId" = {{Machine_Id}}
GROUP by
to_char("CreateTime", 'YYYY-MM')
Ideally - I'd like to showcase Machine ID in the column along with the dates. Currently the result shows as (image link) as it only shows the monthyear as the group function. I would like to ensure it shows machine ID on the column too as I'd like to add multiple machine IDs.
Why not just add machine_id ?
select
to_char("CreateTime", 'YYYY-MM') as MonthYear,
"MachineId",
floor(sum("Time"))::integer / 60 as "#MinutesWorkouts",
sum(case when "Type" = 29 then 1 else 0 end) as "#Streaming",
(sum(case when "Type" = 29 then "Time" else 0 end))::integer / 60 as "StreamingMinutes",
sum(case when "Type" = 9 then 1 else 0 end) as "#GuidedProgram",
sum(case when "Type" = 28 then 1 else 0 end) as "#Tall"
from
match_history
GROUP by
to_char("CreateTime", 'YYYY-MM'), "MachineId"

Get data of 2 different dates in 2 different columns sql

I have a sql table having columns Name, VisitingDate, StayTime
I want a query which can give me data in which in 1 column I can get data of thismonthvisit and other column I can get data of lastmonthvisit and in 3rd column I can data of summation of StayTime of particular person .
Database Table : --
Name
VisitingDate
StayTime(in minutes)
A
2021-04-20
5
A
2021-04-21
15
A
2021-03-20
10
B
2021-03-20
5
Result Wanted : --
Name
Thismonthvisit
TotalStayTimeThismonth(in minutes)
LastmonthVisit
TotalStayTimelastmonth(in minutes)
A
2
20
1
10
B
0
0
1
5
Here is what you are looking for :
select name,
SUM(CASE WHEN FORMAT(VisitingDate, 'YYYYMM') = FORMAT(getdate(),'YYYYMM') THEN 1 ELSE 0 END) AS ThisMonthVisit,
SUM(CASE WHEN FORMAT(VisitingDate, 'YYYYMM') = FORMAT(getdate(),'YYYYMM') THEN StayTime ELSE 0 END) AS TotalStayTimeThisMonth,
SUM(CASE WHEN FORMAT(VisitingDate, 'YYYYMM') = FORMAT(dateadd(month, -1, getdate()),'YYYYMM') THEN 1 ELSE 0 END) AS LastMonthVisit,
SUM(CASE WHEN FORMAT(VisitingDate, 'YYYYMM') = FORMAT(dateadd(month, -1, getdate()),'YYYYMM') THEN StayTime ELSE 0 END) AS TotalStayTimeLastMonth
from MyTable
where FORMAT(VisitingDate, 'YYYYMM') > FORMAT(dateadd(month, -2, getdate()),'YYYYMM')
group by Name
SEE DEMO HERE
You can use aggregation:
select name,
sum(case when month(visitingdate) = month(getdate())
then 1 else 0
end) as cnt_thismonth,
sum(case when month(visitingdate) = month(getdate())
then staytime else 0
end) staytime_thismonth,
sum(case when month(visitingdate) <> month(getdate())
then 1 else 0
end) as cnt_lastmonth,
sum(case when month(visitingdate) <> month(getdate())
then staytime else 0
end) staytime_lastmonth
from t
where visitingdate >= dateadd(month, -1, datefromparts(year(getdate()), month(getdate()), 1))
group by name;

SQL row in column

I have this SQL query
SELECT COUNT(*)*100/(SELECT COUNT(*) FROM tickets WHERE status = 'closed')
FROM tickets
WHERE closed_at <= due_at
UNION
SELECT COUNT(*)*100/(SELECT COUNT(*) FROM tickets WHERE status = 'closed')
FROM tickets
WHERE closed_at > due_at;
and it returns this
ROW 1 - 35
ROW 2 - 47
but I need the return like this:
1 | 2 |
35 47
I need the returns in columns, not rows.
Thanks.
Use conditional aggregation. I would recommend:
SELECT (SUM(CASE WHEN closed_at <= due_at THEN 100.0 ELSE 0 END) /
SUM(CASE WHEN status = 'closed' THEN 1 ELSE 0 END)
),
(SUM(CASE WHEN closed_at > due_at THEN 100.0 ELSE 0 END) /
SUM(CASE WHEN status = 'closed' THEN 1 ELSE 0 END)
)
FROM tickets ;
It seems strange that you are filtering on status = 'closed' in the denominator, but not in the numerator. If status = closed should be the filter for both, then you can simplify this to:
SELECT AVG(CASE WHEN closed_at <= due_at THEN 100.0 ELSE 0 END),
AVG(CASE WHEN closed_at > due_at THEN 100.0 ELSE 0 END)
FROM tickets
WHERE status = 'closed';

Select distinct count usage divided by month

I do have a table license_Usage which works like a log of the usage of licenses in a day
ID User license date
1 1 A 22/1/2015
2 1 A 23/1/2015
3 1 B 23/1/2015
4 1 A 24/1/2015
5 2 A 22/2/2015
6 2 A 23/2/2015
7 1 B 23/2/2015
Where I want it to return the count of licenses of the day of the month with most usage of licenses the result should look like:
User Jan Feb
1 2 1 ...
2 0 2
I know I can get the total of licenses in a month using this query:
SELECT vlu.[Userkey],
COUNT(CASE WHEN MONTH = 1 THEN 1 END) as JAN,
COUNT(CASE WHEN MONTH = 2 THEN 1 END) as FEB,
COUNT(CASE WHEN MONTH = 3 THEN 1 END) as MAR,
COUNT(CASE WHEN MONTH = 4 THEN 1 END) as APR,
COUNT(CASE WHEN MONTH = 5 THEN 1 END) as MAY,
COUNT(CASE WHEN MONTH = 6 THEN 1 END) as JUN,
COUNT(CASE WHEN MONTH = 7 THEN 1 END) as JUL,
COUNT(CASE WHEN MONTH = 8 THEN 1 END) as AUG,
COUNT(CASE WHEN MONTH = 9 THEN 1 END) as SEP,
COUNT(CASE WHEN MONTH = 10 THEN 1 END) as OCT,
COUNT(CASE WHEN MONTH = 11 THEN 1 END) as NOV,
COUNT(CASE WHEN MONTH = 12 THEN 1 END) as DEC
FROM license_usage vlu
CROSS APPLY (SELECT MONTH(vlu.EndDate)) AS CA(Month)
WHERE vlu.[EndDate] >='2015-01-01'
AND vlu.[EndDate] < '2016-01-01'
GROUP BY vlu.[Userkey]
How can I get it to return my results?
Example:
http://sqlfiddle.com/#!3/be0b4/1
Got it by using distinct on the Count (*)
select umd.pbrUserkey,
max(case when mm = 1 then cnt else 0 end) as Jan,
max(case when mm = 2 then cnt else 0 end) as Feb,
max(case when mm = 3 then cnt else 0 end) as Mar,
max(case when mm = 4 then cnt else 0 end) as Apr,
max(case when mm = 5 then cnt else 0 end) as May
from (select vluk.pbrUserkey, month(vluk.EndDate) as mm, day(vluk.EndDate) as dd,
count(distinct vluk.idPackage) as cnt
from [license_usage] as vluk
where vluk.[EndDate] >= '2015-01-01' AND vluk.[EndDate] < '2016-01-01'
group by vluk.Userkey, month(vluk.EndDate), day(vluk.EndDate)
) umd
group by umd.Userkey;
If I understand correctly, you want the maximum by day usage per month for each user. The basic data you want is:
select UserKey, month(license_usage) as mm, day(license_usage) as dd,
count(distinct license) as cnt
from license_usage vlu
where vlu.EndDate] >= '2015-01-01' and vlu.EndDate < '2016-01-01'
group by UserKey, month(license_usage), day(license_usage);
Then you can pivot this in several ways, such as using conditional aggregation:
select UserKey,
max(case when mm = 1 then cnt else 0 end) as Jan,
. . .
from (select UserKey, month(license_usage) as mm, day(license_usage) as dd,
count(distinct license) as cnt
from license_usage vlu
where vlu.EndDate] >= '2015-01-01' AND vlu.EndDate < '2016-01-01'
group by UserKey, month(license_usage), day(license_usage)
) umd
group by UserKey;
CROSS APPLY is an interesting approach, but I can't think of a simpler way to get this information.

Group by year in sql

I am trying to group by year but was not able to do.I can get the column count but not year wise. this is what i tried.
select t_contract ,
sum(CASE t_contract when '18' then 1 else 0 end) as XL,
sum(CASE t_contract when '01' then 1 else 0 end) as VC,
sum(CASE t_contract when '75' then 1 else 0 end) as AN,
sum(CASE t_contract when '48' then 1 else 0 end) as CS
from icps.dbo.tickets
WHERE
t_date_time_issued >= DATEADD(year, -6, GETDATE())
GROUP BY contract
.. but i want to add year .. where i have t_date_time _issued column.
My another query is I have a column called t_zone_name and I want to sum all the rows where t_zone_anme like '%ICeland%' an i tried this:
sum(CASE t_zone_name like '%ICeland%' then 1 else 0 end) as ICELAND
but I get an error on statement like... thanks in advance.
LIKE
YEAR XL VC AN CS total
2010 50 50 50 50 200
2011 5 5 5 5 20
Try the below query:
SELECT t_contract, YEAR(t_date_time_issued) As Yr, SUM(CASE WHEN t_zone_name like '%ICeland%' THEN 1 ELSE 0 END) AS ICELAND
SUM(CASE t_contract when '18' then 1 else 0 end) as XL,
SUM(CASE t_contract when '01' then 1 else 0 end) as VC,
SUM(CASE t_contract when '75' then 1 else 0 end) as AN,
SUM(CASE t_contract when '48' then 1 else 0 end) as CS
FROM icps.dbo.tickets
WHERE YEAR(t_date_time_issued) >= (YEAR(GetDate()) - 6)
GROUP BY t_contract, YEAR(t_date_time_issued)
You might need change the order of t_contract and YEAR(t_date_time_issued) depending on which grouping you want to apply first.
As suggested by #ray I have replaced DATEPART(yyyy, t_date_time_issued) >= DATEPART(yyyy, DATEADD(year, -6, GETDATE())) with year(t_date_time_issued) >= (year(GetDate()) - 6)
If you want to group by year, in sql server, you might
GROUP BY DATEDIFF(year,t_date_time_issued, GETDATE())
In other DB engine, usually has method to get year part, or use substring to get year part from a time string.