Calculating monthly churn

Calculating monthly churn - sql

I am trying to calculate a monthly churn rate (for a given month: number_of_people_who_unsubscribed / number_of_subscribers_at_beginning_of_month).
I have a subscribers table that looks like this:
id
start_date
end_date
1
2020-03-17
null
2
2020-06-21
2020-09-03
I can calculate churn for a single month with a query like this:
select
(
/* Subscriptions that ended during January */
select count(*)::decimal from subscriptions
where end_date is not null
and end_date >= '2021-01-01'
and end_date <= '2021-01-31'
) /
(
/* Subscriptions that were active at the beginning of January */
select count(*)::decimal from subscriptions
where end_date is null
or end_date >= '2021-01-01'
) as churn
That gives me a single percentage of users who unsubscribed during January. However, I'd like to output this percentage for every month, so I can display it as a line chart. I'm not sure where to start - it feels like I need to loop and run the query for each month but that feels wrong. How could I make the same calculation, but without specifying the month manually? We can assume that there is at least one start_date and one end_date per month, so some kind of group by might work.
Ultimately I'm looking for an output that looks something like:
month
churn
2020-03
0.076
2020-04
0.081
2020-05
0.062

Using your data logic and a sequence of months:
select to_char(running_month, 'yyyy-mm') as "month",
(
/* Subscriptions that ended during January */
select count(*)::numeric from subscriptions
where end_date is not null
and end_date >= running_month
and end_date <= running_month + interval '1 month - 1 day'
) /
(
/* Subscriptions that were active at the beginning of January */
select count(*)::numeric from listings
where end_date is null
or end_date >= running_month
) as churn
from generate_series ('2020-01-01'::date, '2020-12-01'::date, interval '1 month') as running_month;

Related

BigQuery SQL to change start date and end date into groups of months

I work with a hotel client where they have a BigQuery database which has hotel booking data. I've shared the relevant columns in the image below which list the names of each hotel, the arrival date of the guest, the departure date, and the revenue generated from the each booking:
My problem statement is that I have to showcase how many rooms have been booked, and how much revenue has been made for each hotel every month where my final grid would look similar to this:
The important points to remember are:
the depart_dt - arrival_dt are the number of nights that the guest is staying
the Rez_rate_total / (depart_dt - arrival_dt) is the revenue made per night
My problem here is trying to figure out how to change the start date and end date columns into groups of months. The challenge comes when a guest arrives in one month and leaves in the next month. For example, Row 5 in the original data has the guest coming in on 18th July and leaving on 1st Aug - so 13 days of his stay and 13 days of revenue has to be included in July and 1 day has to be included in August.
I haven't used SQL in a while so this is as far as I got:
WITH
temp_table AS (
SELECT
hotel_long_nm,
arrival_dt,
depart_dt,
DATE_DIFF(depart_dt, arrival_dt, day) AS room_nights,
rez_rate_total
FROM
`DATABASE.analytics.bookings` )
SELECT
*
FROM
temp_table
Any help would be greatly appreciated!

Consider the following approach:
with bookings as (
select hotel_long_nm, date(arrival_dt) as arrival_dt, date(depart_dt) as depart_dt, rez_rate_total from project.dataset.bookings
),
tmp as (
-- expose the dates in the reservation (excluding last day of reservation)
select *, generate_date_array(arrival_dt,date_sub(depart_dt, interval 1 day)) as stay_dates from bookings
),
calc as (
-- unnest and calculate the daily rate
select
hotel_long_nm,
stay_dt,
1 as stay_nights,
rez_rate_total/array_length(stay_dates) as rez_rate_daily
from tmp
left join unnest(stay_dates) as stay_dt
),
agg as (
-- aggregate to the year-month level
select
date_trunc(stay_dt, month) as year_month,
hotel_long_nm,
sum(stay_nights) as room_nights,
round(sum(rez_rate_daily),2) as rez_rate_total
from calc
group by 1,2
)
select * from agg
order by hotel_long_nm, year_month

You can consider this approach, following this logic.
Validate if both dates are in the same month
If are not in the same month, i get the final date of the month of
arrival date and subtract both dates
I get the first date of the month of the depart date and subtract
and subtract both dates
In this code you can see an example:
SELECT
/*arrival date*/
CURRENT_DATE() AS the_arival,
/*depart_dt*/
DATE_ADD(CURRENT_DATE(), INTERVAL 30 DAY) AS the_depart,
/*total of night between arrival date and depart date*/
DATE_DIFF(DATE_ADD(CURRENT_DATE(), INTERVAL 30 DAY) , CURRENT_DATE(), DAY) AS total_room_nights,
/* validate if the dates are in the same month or different month if equal 0 same month if >0 another month */
DATE_DIFF(DATE_ADD(CURRENT_DATE(), INTERVAL 30 DAY) , CURRENT_DATE(), MONTH) AS Same_Month,/*1 no and 0 yes/
/*in this case are in different month*/
/*I get the final date of the arrival month and subtract with the arrival date*/
DATE_DIFF(DATE_SUB(DATE_TRUNC(DATE_ADD(DATE_ADD(CURRENT_DATE(), INTERVAL 30 DAY), INTERVAL 1 MONTH), MONTH), INTERVAL 1 DAY),DATE_ADD(CURRENT_DATE(), INTERVAL 30 DAY), DAY) as total_room_nights_first_mont,
/*I get the initial date of the depart month and subtract with the depart date i add +1 because is the night between last day of the mont and first day of the next month*/
DATE_DIFF(DATE_ADD(CURRENT_DATE(), INTERVAL 30 DAY),DATE_TRUNC(DATE_ADD(CURRENT_DATE(), INTERVAL 30 DAY), MONTH), DAY)+1 as total_room_nights_second_month
You can see more information about the date function.Click Here.

Get average duration per week-day from a list of records with start and end date

I have a input table with three columns :
id => string
start_date => timestamptz
end_date => timestamptz
I want to get the average duration in seconds (end_date - start_date) per week-day number over records.
My problem is : If I have a record where interval between start_date and end_date is 4 days, I want to get the result per day, not only at the start_date or end_date, and if I have no records between 3 weeks for example, take no value for a weekday as 'zero' value in the average.
Example :
id
start_date
end_date
1 (Friday to Sunday)
2021-03-12T01:00:00.000Z
2021-03-14T01:00:00.000Z
2 (Friday)
2021-03-12T01:00:00.000Z
2021-03-12T05:00:00.000Z
3 (Wed.)
2021-03-03T16:00:00.000Z
2021-03-03T17:00:00.000Z
Expected result (european weekday here for example, sunday is 7) :
weekday
avg_duration_seconds
1
0
2
0
3
1800
4
0
5
48600
6
86400
7
3600
Thank's for your help !

Note: the following works on Postgres as you tagged that as well. I have no idea if this works on CockroachDB as well.
You can "expand" the start/end timestamps to days by using generate_series(). To calculate the effective duration on each day, the full days need to be treated differently than the partial days at the start and end. Once those timestamps are calculated it's easy to get the duration per day. The do a left join on all weekdays and group by them:
select x.weekday,
avg(extract(epoch from real_end - real_start)) as duration
from generate_series(1,7) as x(weekday)
left join (
select t.id,
extract(isodow from g.dt) as weekday,
case
when start_date < g.dt then date_trunc('day', g.dt)
else start_date
end as real_start,
case
when end_date::date > g.dt then date_trunc('day', g.dt::date + 1)
else end_date
end as real_end
from the_table t
cross join generate_series(start_date, end_date, interval '1 day') as g(dt)
) t on x.weekday = t.weekday
group by x.weekday
order by x.weekday;
I am not 100% my expressions for "real_start" and "real_end" cover all corner cases, but it should be enough to get you started.
This gives a slightly different result than your expected one, because you have the weekdays wrong for 2021-03-02 and 2021-03-11.
Online example

how to generate_date_array unnest with end_date after current_date but in results show me till current_date

WITH dates AS (
SELECT `day`
FROM UNNEST(GENERATE_DATE_ARRAY('2020-11-11', CURRENT_DATE(), INTERVAL 1 DAY)) `day`
)
The above gets the dates till current day.
The below where I add +60 at the end_date gets dates till after 60 days from current date.
WITH dates AS (
SELECT `day`
FROM UNNEST(GENERATE_DATE_ARRAY('2020-11-11', CURRENT_DATE()+60, INTERVAL 1 DAY)) `day`
)
I want to count records that had set_at_date from current_date to future. For example, the number of bookings from current day till 60 days later but without getting me at the results the future dates. Just dates and counts till today like this:
date
bookings
2021-02-26
30
2021-02-25
32
2021-02-24
28

Customizing the range of a week with date_trunc

I've been trying for hours now to write a date_trunc statement to be used in a group by where my week starts on a Friday and ends the following Thursday.
So something like
SELECT
DATE_TRUNC(...) sales_week,
SUM(sales) sales
FROM table
GROUP BY 1
ORDER BY 1 DESC
Which would return the results for the last complete week (by those standards) as 09-13-2019.

You can subtract 4 days and then add 4 days:
SELECT DATE_TRUNC(<whatever> - INTERVAL '4 DAY') + INTERVAL '4 DAY' as sales_week,
SUM(sales) as sales
FROM table
GROUP BY 1
ORDER BY 1 DESC

The expression
select current_date - cast(cast(7 - (5 - extract(dow from current_date)) as text) || ' days' as interval);
should always give you the previous Friday's date.

if by any chance you might have gaps in data (maybe more granular breakdowns vs just per week), you can generate a set of custom weeks and left join to that:
drop table if exists sales_weeks;
create table sales_weeks as
with
dates as (
select generate_series('2019-01-01'::date,current_date,interval '1 day')::date as date
)
,week_ids as (
select
date
,sum(case when extract('dow' from date)=5 then 1 else 0 end) over (order by date) as week_id
from dates
)
select
week_id
,min(date) as week_start_date
,max(date) as week_end_date
from week_ids
group by 1
order by 1
;

Count full months between two dates

I've been working on this for a few hours with no luck and have hit a wall. My data looks like this:
Date1 Date2
2012-05-06 2012-05-05
2012-03-20 2012-01-05
What I'm trying to do is add 1 to the count for every month between two dates. So my output would ideally look like this:
Year Month Sum
2012 2 1
In other words, it should check for "empty" months between two dates and add 1 to them.
This is the code I've worked out so far. It will basically count the number of months between the two dates and group them into months and years.
SELECT
EXTRACT(YEAR FROM Date2::date) as "Year",
EXTRACT(MONTH FROM Date2::date) as "Month",
SUM(DATE_PART('year', Date1::date) - DATE_PART('year', Date2::date)) * 12 +
(DATE_PART('month', Date1::date) - DATE_PART('month', Date2::date))
FROM
test
GROUP BY
"Year",
"Month",
ORDER BY
"Year" DESC,
"Month" DESC;
This is where I'm stuck - I don't know how to actually add 1 for each of the "empty" months.

Test setup
With some sample rows (should be provided in the question):
CREATE TABLE test (
test_id serial PRIMARY KEY
, date1 date NOT NULL
, date2 date NOT NULL
);
INSERT INTO test(date1, date2)
VALUES
('2012-03-20', '2012-01-05') -- 2012-02 lies in between
, ('2012-01-20', '2012-03-05') -- 2012-02 (reversed)
, ('2012-05-06', '2012-05-05') -- nothing
, ('2012-05-01', '2012-06-30') -- still nothing
, ('2012-08-20', '2012-11-05') -- 2012-09 - 2012-10
, ('2012-11-20', '2013-03-05') -- 2012-12 - 2013-02
;
Postgres 9.3 or newer
Use a LATERAL join:
SELECT to_char(mon, 'YYYY') AS year
, to_char(mon, 'MM') AS month
, count(*) AS ct
FROM (
SELECT date_trunc('mon', least(date1, date2)::timestamp) + interval '1 mon' AS d1
, date_trunc('mon', greatest(date1, date2)::timestamp) - interval '1 mon' AS d2
FROM test
) sub1
, generate_series(d1, d2, interval '1 month') mon -- implicit CROSS JOIN LATERAL
WHERE d2 >= d1 -- exclude ranges without gap right away
GROUP BY mon
ORDER BY mon;
What is the difference between LATERAL and a subquery in PostgreSQL?
Postgres 9.2 or older
No LATERAL, yet. Use a subquery instead:
SELECT to_char(mon, 'YYYY') AS year
, to_char(mon, 'MM') AS month
, count(*) AS ct
FROM (
SELECT generate_series(d1, d2, interval '1 month') AS mon
FROM (
SELECT date_trunc('mon', least(date1, date2)::timestamp) + interval '1 mon' AS d1
, date_trunc('mon', greatest(date1, date2)::timestamp) - interval '1 mon' AS d2
FROM test
) sub1
WHERE d2 >= d1 -- exclude ranges without gap right away
) sub2
GROUP BY mon
ORDER BY mon;
Result
year | month | ct
------+-------+----
2012 | 2 | 2
2012 | 9 | 1
2012 | 10 | 1
2012 | 12 | 1
2013 | 1 | 1
2013 | 2 | 1
db<>fiddle here
SQL Fiddle.
Explanation
You are looking for complete calendar months between the two dates.
These queries work with any dates or timestamps in ascending or descending order and should perform well.
The WHERE clause is optional, since generate_series() returns no row if start > end. But it should be a bit faster to exclude empty ranges a priori.
The cast to timestamp makes it a bit cleaner and faster. Rationale:
Generating time series between two dates in PostgreSQL

AFAIK you can simply substract/add dates in postgresql
'2001-06-27 14:43:21'::DATETIME - '2001-06-27 14:33:21'::DATETIME = '00:10:00'::INTERVAL
So in your case that request part should look like
DATE_PART('month', Date1::datetime - Date2::datetime) as "MonthInterval"

age(timestamp1, timestamp2) => returns interval
the we try to extract year and month out of the interval and add them accordingly.
select extract(year from age(timestamp1, timestamp2))*12 + extract(month from
age(timestamp1, timestamp2))

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Calculating monthly churn - sql

Related

BigQuery SQL to change start date and end date into groups of months

Get average duration per week-day from a list of records with start and end date

how to generate_date_array unnest with end_date after current_date but in results show me till current_date

Customizing the range of a week with date_trunc

Count full months between two dates

Categories

Resources