Month number retention cohort calculation issue with Redshift

Month number retention cohort calculation issue with Redshift - sql

I'm trying to calculate user retention cohorts in redshift by month for the last 9 months. But I'm running into an issue where the month cohorts in the below query aren't being rolled into the correct month.
The data types I'm querying are:
userid - varchar
activated-varchar
Here is the query I'm trying to run:
with by_month as
(SELECT
userid
DATE_TRUNC('month', cast ("activated" as date)) AS joined_month
FROM customers
GROUP BY 1, 2),
first_month as
(select userid,
joined_month,
FIRST_VALUE(order_month) OVER (PARTITION BY userid ORDER BY
joined_month asc rows unbounded preceding) AS first
FROM by_month),
months as (select userid,
joined_month,
first,
extract(month from (joined_month - first_month)) as month_number
from first_month)
SELECT
first as "cohort",
SUM(CASE WHEN month_number = '0' THEN 1 ELSE 0 END) AS " Month 0",
SUM(CASE WHEN month_number = '1' THEN 1 ELSE 0 END) AS " Month 1",
SUM(CASE WHEN month_number = '2' THEN 1 ELSE 0 END) AS " Month 2",
SUM(CASE WHEN month_number = '3' THEN 1 ELSE 0 END) AS " Month 3",
SUM(CASE WHEN month_number = '4' THEN 1 ELSE 0 END) AS " Month 4",
SUM(CASE WHEN month_number = '5' THEN 1 ELSE 0 END) AS " Month 5",
SUM(CASE WHEN month_number = '6' THEN 1 ELSE 0 END) AS " Month 6",
SUM(CASE WHEN month_number = '7' THEN 1 ELSE 0 END) AS " Month 7",
SUM(CASE WHEN month_number = '8' THEN 1 ELSE 0 END) AS " Month 8",
SUM(CASE WHEN month_number = '9' THEN 1 ELSE 0 END) AS " Month 9"
from months
where first_month >= '2018-08-01'
GROUP BY 1
ORDER BY 1 desc
When I get the results back, I get an impossible number for a couple of cohorts:
Such as:
Cohort Month 0 Month 1
'2019-01-01' 95 120
I did some digging and found the month numbers aren't being counted correctly For instance, for the cohort of '2019-01-01 the month_ number is t's capturing 0,1, and 3 correctly but 2 is being miss-attributed to month 1. Any help on the fix would be much appreciated thank you!

Now, try
SELECT userid, joined_month, first_month, month_number FROM months
WHERE first = '2019-01-01'
(feel free to add other columns to drill down to the problem) add activated, order_month, etc until you get the right handle on what is causing it.

Related

infuse a sum of the value in the another column with a different filter than the total count column

First here's a sample table.
enter image description here
Provider_name patient date status length
AF AGUIR00001 07/05/2018 3 30
AF ABBOT00001 07/05/2018 30
BB ADAMS00001 07/05/2018 3 30
BB ACEVE00001 07/06/2018 3 30
I have created a query that lets me count the total number of appointments versus the number of appointments with a certain status(eg checked out). I was able to create it and group it by provider.
select provider_name,
count(patient) total,
sum(case when status = 3 then 1 else 0 end) as Checkedout
from appointment
group by provider_name
Then I moved on to the next phase which was to get the total length of those appointments with checkedout status. I made this query but it does not break down into each provider.
select provider_name,
count(patient) total,
sum(case when status = 3 then 1 else 0 end) as Checkedout,
(select sum(length) from appointment where status = 3
and date between '06/01/2018' and '07/06/2018')
from appointment where date between '06/01/2018' and '07/06/2018'
group by provider_name
I need it so that the last column in the query is segregated per provider_name.
Thank you in advance for helping me out.

Actually, you were on the right way, try this:
select provider_name,
count(patient) total,
sum(case when status = 3 then 1 else 0 end) as Checkedout,
sum(case when status = 3 then length else 0 end) as len_status3
from appointment
where date between '2018-01-06' and '2018-06-07'
group by provider_name;
According to your last comment, you need a WITH ROLLUP modifier for GROUP BY as in the following :
select coalesce(provider_name,'Total') as provider_name,
count(patient) total,
sum(case when status = 3 then 1 else 0 end) as Checkedout,
sum(case when status = 3 then length else 0 end) as len_status3
from appointment
where date between '2018-01-06' and '2018-06-07'
group by provider_name with rollup;
SQL Fiddle Demo

you shoul do as for checkedoutout
select provider_name,
count(patient) total,
sum(case when status = 3 then 1 else 0 end) as Checkedout,
sum( case when status = 3 then length else 0 ) as total_length
from appointment where date between '06/01/2018' and '07/06/2018'
group by provider_name

Counts based on hour SQL

How can I get the count per hour?
select count_hr_1, count_hr_2, count_hr_3 from db.table where year=2018 and month=01 and day=02 and hour=01 OR hour=02 OR hour=03;
This SQL/Query is probably invalid but I want to get the counts of hour 1,2,3

If you are able to process the resultset as multiple rows instead of one, you could use GROUP BY:
select hour, count(*)
from db.table
where
year = 2018
and month = 1
and day = 2
and hour in (1, 2, 3)
group by hour

select sum(case when hour = 1 then 1 else 0 end) as count_hr_1,
sum(case when hour = 2 then 1 else 0 end) as count_hr_2,
sum(case when hour = 3 then 1 else 0 end) as count_hr_3
from db.table
where year = 2018
and month = 1
and day = 2
and hour in (1,2,3)

SQL Query - Group and SUM of Values

I have the following database structure:
ID | Payment (Decimal) | PaymentDate (DateTime) | PaymentStatus(int)
I am currently able to get a grouping of all of the payments over time by Year and Date and get the total across all Payment Status's using the following query;
Select
YEAR = YEAR(DueDate),
MONTH = MONTH(DueDate),
MMM = UPPER(left(DATENAME(MONTH,DueDate),3)),
Totals = sum(Payment)
from
PaymentSchedules
Where DueDate IS NOT NULL
Group by
YEAR(DueDate),
MONTH(DueDate),
DATENAME(Month,DueDate)
Order By
YEAR,
MONTH
This gives me the results so far so good.
What I would like to be able to do is have added totals for the splits in each section. So for example if each payment could be Paid (1) or Unpaid (2) or Overdue (3) I would like to not only get the number of paid / unpaid / overdue but I would also like to get the total value of unpaid items / paid items / Overdue items for each Year / Month combination.

You just need to add SUMs with CASE statements inside to only sum payments when the correct status is detected, like this:
Select YEAR = YEAR(DueDate),
MONTH = MONTH(DueDate),
MMM = UPPER(left(DATENAME(MONTH,DueDate),3)),
TotalPaid = sum(case when PaymentStatus = 1 then Payment else 0 end),
TotalUnpaid = sum(case when PaymentStatus = 2 then Payment else 0 end),
TotalOverdue = sum(case when PaymentStatus = 3 then Payment else 0 end),
Totals = sum(Payment)
from PaymentSchedules
Where DueDate IS NOT NULL
Group by YEAR(DueDate),
MONTH(DueDate),
DATENAME(Month,DueDate)
Order By YEAR,
MONTH

Since there are only 3 categories, I would suggest use CASE statement directly.
Select
YEAR = YEAR(DueDate),
MONTH = MONTH(DueDate),
MMM = UPPER(left(DATENAME(MONTH,DueDate),3)),
Paid_sum = sum(CASE When PaymentStatus = 1 THEN Payment ELSE 0 END),
Unpaid_sum = sum(CASE When PaymentStatus = 2 THEN Payment ELSE 0 END),
Overdue_sum = sum(CASE When PaymentStatus = 3 THEN Payment ELSE 0 END),
Totals = sum(Payment)
from
PaymentSchedules
Where DueDate IS NOT NULL
Group by
YEAR(DueDate),
MONTH(DueDate),
DATENAME(Month,DueDate)
Order By
YEAR,
MONTH

Select distinct count usage divided by month

I do have a table license_Usage which works like a log of the usage of licenses in a day
ID User license date
1 1 A 22/1/2015
2 1 A 23/1/2015
3 1 B 23/1/2015
4 1 A 24/1/2015
5 2 A 22/2/2015
6 2 A 23/2/2015
7 1 B 23/2/2015
Where I want it to return the count of licenses of the day of the month with most usage of licenses the result should look like:
User Jan Feb
1 2 1 ...
2 0 2
I know I can get the total of licenses in a month using this query:
SELECT vlu.[Userkey],
COUNT(CASE WHEN MONTH = 1 THEN 1 END) as JAN,
COUNT(CASE WHEN MONTH = 2 THEN 1 END) as FEB,
COUNT(CASE WHEN MONTH = 3 THEN 1 END) as MAR,
COUNT(CASE WHEN MONTH = 4 THEN 1 END) as APR,
COUNT(CASE WHEN MONTH = 5 THEN 1 END) as MAY,
COUNT(CASE WHEN MONTH = 6 THEN 1 END) as JUN,
COUNT(CASE WHEN MONTH = 7 THEN 1 END) as JUL,
COUNT(CASE WHEN MONTH = 8 THEN 1 END) as AUG,
COUNT(CASE WHEN MONTH = 9 THEN 1 END) as SEP,
COUNT(CASE WHEN MONTH = 10 THEN 1 END) as OCT,
COUNT(CASE WHEN MONTH = 11 THEN 1 END) as NOV,
COUNT(CASE WHEN MONTH = 12 THEN 1 END) as DEC
FROM license_usage vlu
CROSS APPLY (SELECT MONTH(vlu.EndDate)) AS CA(Month)
WHERE vlu.[EndDate] >='2015-01-01'
AND vlu.[EndDate] < '2016-01-01'
GROUP BY vlu.[Userkey]
How can I get it to return my results?
Example:
http://sqlfiddle.com/#!3/be0b4/1

Got it by using distinct on the Count (*)
select umd.pbrUserkey,
max(case when mm = 1 then cnt else 0 end) as Jan,
max(case when mm = 2 then cnt else 0 end) as Feb,
max(case when mm = 3 then cnt else 0 end) as Mar,
max(case when mm = 4 then cnt else 0 end) as Apr,
max(case when mm = 5 then cnt else 0 end) as May
from (select vluk.pbrUserkey, month(vluk.EndDate) as mm, day(vluk.EndDate) as dd,
count(distinct vluk.idPackage) as cnt
from [license_usage] as vluk
where vluk.[EndDate] >= '2015-01-01' AND vluk.[EndDate] < '2016-01-01'
group by vluk.Userkey, month(vluk.EndDate), day(vluk.EndDate)
) umd
group by umd.Userkey;

If I understand correctly, you want the maximum by day usage per month for each user. The basic data you want is:
select UserKey, month(license_usage) as mm, day(license_usage) as dd,
count(distinct license) as cnt
from license_usage vlu
where vlu.EndDate] >= '2015-01-01' and vlu.EndDate < '2016-01-01'
group by UserKey, month(license_usage), day(license_usage);
Then you can pivot this in several ways, such as using conditional aggregation:
select UserKey,
max(case when mm = 1 then cnt else 0 end) as Jan,
. . .
from (select UserKey, month(license_usage) as mm, day(license_usage) as dd,
count(distinct license) as cnt
from license_usage vlu
where vlu.EndDate] >= '2015-01-01' AND vlu.EndDate < '2016-01-01'
group by UserKey, month(license_usage), day(license_usage)
) umd
group by UserKey;
CROSS APPLY is an interesting approach, but I can't think of a simpler way to get this information.

Group by year in sql

I am trying to group by year but was not able to do.I can get the column count but not year wise. this is what i tried.
select t_contract ,
sum(CASE t_contract when '18' then 1 else 0 end) as XL,
sum(CASE t_contract when '01' then 1 else 0 end) as VC,
sum(CASE t_contract when '75' then 1 else 0 end) as AN,
sum(CASE t_contract when '48' then 1 else 0 end) as CS
from icps.dbo.tickets
WHERE
t_date_time_issued >= DATEADD(year, -6, GETDATE())
GROUP BY contract
.. but i want to add year .. where i have t_date_time _issued column.
My another query is I have a column called t_zone_name and I want to sum all the rows where t_zone_anme like '%ICeland%' an i tried this:
sum(CASE t_zone_name like '%ICeland%' then 1 else 0 end) as ICELAND
but I get an error on statement like... thanks in advance.
LIKE
YEAR XL VC AN CS total
2010 50 50 50 50 200
2011 5 5 5 5 20

Try the below query:
SELECT t_contract, YEAR(t_date_time_issued) As Yr, SUM(CASE WHEN t_zone_name like '%ICeland%' THEN 1 ELSE 0 END) AS ICELAND
SUM(CASE t_contract when '18' then 1 else 0 end) as XL,
SUM(CASE t_contract when '01' then 1 else 0 end) as VC,
SUM(CASE t_contract when '75' then 1 else 0 end) as AN,
SUM(CASE t_contract when '48' then 1 else 0 end) as CS
FROM icps.dbo.tickets
WHERE YEAR(t_date_time_issued) >= (YEAR(GetDate()) - 6)
GROUP BY t_contract, YEAR(t_date_time_issued)
You might need change the order of t_contract and YEAR(t_date_time_issued) depending on which grouping you want to apply first.
As suggested by #ray I have replaced DATEPART(yyyy, t_date_time_issued) >= DATEPART(yyyy, DATEADD(year, -6, GETDATE())) with year(t_date_time_issued) >= (year(GetDate()) - 6)

If you want to group by year, in sql server, you might
GROUP BY DATEDIFF(year,t_date_time_issued, GETDATE())
In other DB engine, usually has method to get year part, or use substring to get year part from a time string.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Month number retention cohort calculation issue with Redshift - sql

Now, try SELECT userid, joined_month, first_month, month_number FROM months WHERE first = '2019-01-01' (feel free to add other columns to drill down to the problem) add activated, order_month, etc until you get the right handle on what is causing it.

Related

infuse a sum of the value in the another column with a different filter than the total count column

Counts based on hour SQL

SQL Query - Group and SUM of Values

Select distinct count usage divided by month

Group by year in sql

Categories

Resources