Calculate Churn by aggregating by date range in SQL - sql

I am trying to calculate the churn rate from a data that has customer_id, group, date. The aggregation is going to be by id, group and date. The churn formula is (customers in previous cohort - customers in last cohort)/customers in previous cohort
customers in previous cohort refers to cohorts in before 28 days
customers in last cohort refers to cohorts in last 28 days
I am not sure how to aggregate them by date range to calculate the churn.
Here is sample data that I copied from SQL Group by Date Range:
Date Group Customer_id
2014-03-01 A 1
2014-04-02 A 2
2014-04-03 A 3
2014-05-04 A 3
2014-05-05 A 6
2015-08-06 A 1
2015-08-07 A 2
2014-08-29 XXXX 2
2014-08-09 XXXX 3
2014-08-10 BB 4
2014-08-11 CCC 3
2015-08-12 CCC 2
2015-03-13 CCC 3
2014-04-14 CCC 5
2014-04-19 CCC 4
2014-08-16 CCC 5
2014-08-17 CCC 3
2014-08-18 XXXX 2
2015-01-10 XXXX 3
2015-01-20 XXXX 4
2014-08-21 XXXX 5
2014-08-22 XXXX 2
2014-01-23 XXXX 3
2014-08-24 XXXX 2
2014-02-25 XXXX 3
2014-08-26 XXXX 2
2014-06-27 XXXX 4
2014-08-28 XXXX 1
2014-08-29 XXXX 1
2015-08-30 XXXX 2
2015-09-31 XXXX 3
The goal is to calculate the churn rate every 28 days in between 2014 and 2015 by the formula given above. So, it is going to be aggregating the data by rolling it by 28 days and calculating the churn by the formula.
Here is what I tried to aggregate the data by date range:
SELECT COUNT(distinct customer_id) AS count_ids, Group,
DATE_SUB(CAST(Date AS DATE), INTERVAL 56 DAY) AS Date_min,
DATE_SUB(CURRENT_DATE, INTERVAL 28 DAY) AS Date_max
FROM churn_agg
GROUP BY count_ids, Group, Date_min, Date_max
Hope someone will help me with aggregation and churn calculation. I want to simply deduct the aggregated count_ids to deduct it from the next aggregated count_ids which is after 28 days. So this is going to be successive deduction of the same column value (count_ids). I am not sure if I have to use rolling window or simple aggregation to find the churn.

As corrected by #jarlh, it's not 2015-09-31 but 2015-09-30
You can use this to create 28 days calendar:
create table daysby28 (i int, _Date date);
insert into daysby28 (i, _Date)
SELECT i, cast('01-01-2014'as date) + i*INTERVAL '28 day'
from generate_series(0,50) i
order by 1;
After you use #jarlh churn_agg table creation he sent with the fiddle, with this query, you get what you want:
with cte as
(
select count(Customer) as TotalCustomer, Cohort, CohortDateStart From
(
select distinct a.Customer_id as Customer, b.i as Cohort, b._Date as CohortDateStart
from churn_agg a left join daysby28 b on a._Date >= b._Date and a._Date < b._Date + INTERVAL '28 day'
) a
group by Cohort, CohortDateStart
)
select a.CohortDateStart,
1.0*(b.TotalCustomer - a.TotalCustomer)/(1.0*b.TotalCustomer) as Churn from cte a
left join cte b on a.cohort > b.cohort
and not exists(select 1 from cte c where c.cohort > b.cohort and c.cohort < a.cohort)
order by 1
The fiddle of all together is here

Related

count number of records by month over the last five years where record date > select month

I need to show the number of valid inspectors we have by month over the last five years. Inspectors are considered valid when the expiration date on their certification has not yet passed, recorded as the month end date. The below SQL code is text of the query to count valid inspectors for January 2017:
SELECT Count(*) AS RecordCount
FROM dbo_Insp_Type
WHERE (dbo_Insp_Type.CERT_EXP_DTE)>=#2/1/2017#);
Rather than designing 60 queries, one for each month, and compiling the results in a final table (or, err, query) are there other methods I can use that call for less manual input?
From this sample:
Id
CERT_EXP_DTE
1
2022-01-15
2
2022-01-23
3
2022-02-01
4
2022-02-03
5
2022-05-01
6
2022-06-06
7
2022-06-07
8
2022-07-21
9
2022-02-20
10
2021-11-05
11
2021-12-01
12
2021-12-24
this single query:
SELECT
Format([CERT_EXP_DTE],"yyyy/mm") AS YearMonth,
Count(*) AS AllInspectors,
Sum(Abs([CERT_EXP_DTE] >= DateSerial(Year([CERT_EXP_DTE]), Month([CERT_EXP_DTE]), 2))) AS ValidInspectors
FROM
dbo_Insp_Type
GROUP BY
Format([CERT_EXP_DTE],"yyyy/mm");
will return:
YearMonth
AllInspectors
ValidInspectors
2021-11
1
1
2021-12
2
1
2022-01
2
2
2022-02
3
2
2022-05
1
0
2022-06
2
2
2022-07
1
1
ID
Cert_Iss_Dte
Cert_Exp_Dte
1
1/15/2020
1/15/2022
2
1/23/2020
1/23/2022
3
2/1/2020
2/1/2022
4
2/3/2020
2/3/2022
5
5/1/2020
5/1/2022
6
6/6/2020
6/6/2022
7
6/7/2020
6/7/2022
8
7/21/2020
7/21/2022
9
2/20/2020
2/20/2022
10
11/5/2021
11/5/2023
11
12/1/2021
12/1/2023
12
12/24/2021
12/24/2023
A UNION query could calculate a record for each of 50 months but since you want 60, UNION is out.
Or a query with 60 calculated fields using IIf() and Count() referencing a textbox on form for start date:
SELECT Count(IIf(CERT_EXP_DTE>=Forms!formname!tbxDate,1,Null)) AS Dt1,
Count(IIf(CERT_EXP_DTE>=DateAdd("m",1,Forms!formname!tbxDate),1,Null) AS Dt2,
...
FROM dbo_Insp_Type
Using the above data, following is output for Feb and Mar 2022. I did a test with Cert_Iss_Dte included in criteria and it did not make a difference for this sample data.
Dt1
Dt2
10
8
Or a report with 60 textboxes and each calls a DCount() expression with criteria same as used in query.
Or a VBA procedure that writes data to a 'temp' table.

(SQL)How to Get absence "day" from date under 1-31 column using PIVOT

I have a table abs_details that give data like follows -
PERSON_NUMBER ABS_DATE ABS_TYPE_NAME ABS_DAYS
1010 01-01-2022 PTO 1
1010 06-01-2022 PTO 0.52
1010 02-02-2022 VACATION 1
1010 03-02-2022 VACATION 0.2
1010 01-12-2021 PTO 1
1010 02-12-2021 sick 1
1010 30-12-2021 sick 1
1010 30-01-2022 SICK 1
I want this data to be displayed in the following way:
PERSON_NUMBER ABS_TYPE_NAME 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
1010 PTO 2 0.52
1010 VACATION 1 0.2
1010 SICK 1 2
For the days, 1-31 should should come in the header, if there is any absence taken on say 01st of the month or quarter passed then the value should go under 1 , if there is no value for date of the month, say no value is there from 07th-11th in the above case, then output should display the numbers but no value should be provided under it.
Is this feasible in SQL? I have an idea we can use pivot, but how to fix 1-31 header and give values underneath each day.
Any suggestions?
If I pass multiple quarter that is Q1(JAN-MAR), Q2(APR-JUN) it should sum up the values between the dates between those two quarters. if Just q1 then only q1 result
If I pass multiple month then it should display the sum of the values for an absence type in those multiple months.
I will be passing the year in the parameter and the above two should consider the year I pass.
Create a column which has all the dates, and pivot up using pivot function in oracle.
SELECT *
FROM
(
SELECT PERSON_NUMBER,
EXTRACT(DAY FROM TO_DATE(ABS_DATE)) AS DAY_X,
ABS_TYPE_NAME,
ABS_DAYS
FROM TABLE
-- Add additional filter here which you want
)
PIVOT(SUM(ABS_DAYS)
FOR DAY_X IN (0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31))
Db fiddle - https://dbfiddle.uk/?rdbms=oracle_21&fiddle=ad3af639235f7a6db415ec714a3ee0d9

SQL how to count but only count one instance if two columns match?

Wondering how to select from a table:
FIELDID personID purchaseID dateofPurchase
--------------------------------------------------
2 13 147 2014-03-21 00:00:00
3 15 165 2015-03-23 00:00:00
4 13 456 2018-03-24 00:00:00
5 1 133 2018-03-21 00:00:00
6 23 123 2013-03-22 00:00:00
7 25 456 2013-03-21 00:00:00
8 25 456 2013-03-23 00:00:00
9 22 456 2013-03-28 00:00:00
10 25 589 2013-03-21 00:00:00
11 82 147 1991-10-22 00:00:00
12 82 453 2003-03-22 00:00:00
I'd like to get a result table of two columns: weekday and the number of purchases of each weekday, but only count the distinct days of purchases if done by the same person on the same day - for example since personID 25 purchased two things on 2013-03-21, that should only count as one 'thursday' instead of 2.
Basically, if the personID and the dateofPurchase are the same for more than one row, only count it once is what I want.
Here is what I have currently: It does everything correctly except it will count the above scenario under the thursday twice, when I would only want to add one:
SELECT v.wkday as day, COUNT(*) as 'absences'
FROM dbo.AttendanceRecord pr CROSS APPLY
(VALUES (CASE WHEN DATEPART(WEEKDAY, date) IN (1, 7)
THEN 'Weekend'
ELSE DATENAME(WEEKDAY, date)
END)
) v(wkday)
GROUP BY v.wkday;
to clarify:
If an item is purchased for at least one puchaseID on a specific day they will be counted as purchased for that day, and do not need to be counted again for each new purchase ID on that day.
I think you want to count distinct persons, so that would be:
COUNT(DISTINCT personid) as absences
Note that single quotes are not appropriate around column aliases. If you need to escape them, use square braces.
EDIT:
If you want to count distinct person-days, then you can use:
COUNT(DISTINCT CONCAT(personid, ':', dateofpurchase) as absences

Take one month back in teradata sql

I have some tables (samples are brought here) like this
scores (the score is calculated once in each month for each branch_cust in the 28 for specific month)
Branch_cust model_date score
1 28/12/2013 4
1 28/01/2014 3
1 28/02/2014 2
1 28/03/2014 7
1 28/04/2014 3
1 28/05/2014 5
1 28/06/2014 6
2 28/12/2013 9
2 28/01/2014 10
2 28/02/2014 12
2 28/03/2014 11
2 28/04/2014 10
2 28/05/2014 7
2 28/06/2014 8
loans:
Branch_cust agreement_date
1 05-01-2014
1 29-01-2014
2 27-02-2014
2 28-02-2014
Loans:
desired output:
Branch_cust agreement_date loan_open_score
1 05-01-2014 4
1 29-01-2014 3
2 27-02-2014 10
2 28-02-2014 12
Logic to create the loan_open_score :
If the day in the month of the agreement_date is less then "28" then bring the score of the month previous to the month of the agreement date.
If the day is greater or equal to "28" then bring the score for the month equal to the month of the agreement date.
Example: In the sample data for branch_cust = 1 the agreement_date was 05-01-2014 - meaning - day = 5 so I need to go back to Dec 2013 and take the score from there.
Any help how to do this? thank's. I was thinking of "join" and then substract 1 in "case of.." but I don't know how to handle the case when the date is 'dd-01-YYYY' in sql-teradata.
updated : column data type of the dates are dates.
trunc(agreement_date,'mon') + 27 returns the 28th of the current month. Now you can apply some logic and join on this calculated date:
case when trunc(agreement_date,'mon') + 27 > agreement_date
then add_months(trunc(agreement_date,'mon') + 27,-1)
else trunc(agreement_date,'mon') + 27
end
Another option would be to get the latest model_date per agreement date and join it to the scores table. This way you don't have to manipulate dates.
select t.branch_cust,t.agreement_Date,s.score
from scores s
join (select distinct l.branch_cust,l.agreement_Date
,max(s.model_Date) over(partition by l.branch_cust,l.agreement_Date) as max_model_Date
from scores s
join loans l on s.branch_cust=l.branch_cust and l.agreement_Date >= s.model_Date
) t
on s.branch_cust=t.branch_cust and s.model_Date=t.max_model_Date
select *
from scores as s
join loans as l
on l.Branch_cust =
s.Branch_cust
and l.model_date =
add_months
(
trunc(S.agreement_date,'mm')+27
,case when extract(day from s.agreement_date) < 28 then -1 else 0 end
)

Usage compared to last year

I have a table that contains two columns - date an amount (usage). I want to compare my usage with last year's
Data
Date Amount
01-01-2015 23
02-01-2015 24
03-01-2015 19
04-01-2015 11
05-01-2015 5
06-01-2015 23
07-01-2015 21
08-01-2015 6
09-01-2015 26
10-01-2015 9
[...]
01-01-2016 30
02-01-2016 19
03-01-2016 5
04-01-2016 8
05-01-2016 15
06-01-2016 27
07-01-2016 19
08-01-2016 21
09-01-2016 24
10-01-2016 27
[until today's date]
The sql statement should return results up to today's date and the diff should be a running total - the amount on a day in 2016 minus the amount on the same day in 2015, plus the previous difference. Assuming today is Jan 10, the result would be:
Day Diff
01-01-2016 7 (30-23)
02-01-2016 2 (19-24+7)
03-01-2016 -12 (5-19+2)
04-01-2016 -15 (8-11-12)
05-01-2016 -5 (and so on...)
06-01-2016 -1
07-01-2016 -3
08-01-2016 12
09-01-2016 10
10-01-2016 28
I can't figure out to do this when the data is all in one table...
Thanks
Try this
SELECT t1.date Day,SUM(t2.Diff) FROM
amounts t1
INNER JOIN
(SELECT amounts.date Day, amounts.amount-prevamounts.amount Diff
FROM amounts inner join amounts prevamounts
ON amounts.date=date(prevamounts.date,'+1 years')) t2
ON t2.Day<=t1.date
group by t1.date;
demo here :
http://sqlfiddle.com/#!7/9bd4c/41