Repeated select statement for retention analysis - sql

I am working on retention analysis for a subscription product using a temporary table that logs all customer entitlements. Each line in the table has:
customer_id, produc_id, start_date, expiration_date, product_serial_number, ...
Customer_id is unique. If a customer renews the product at expiration date, new expiration date will be updated to the table daily.
Usually at the monthly analysis, I count customer_id, then group by YYYY-MM (extracted from start_date) as the base of the cohort. Then I put in a sum & case statement such as:
sum(case when expiration_date >= ADD_MONTHS(start_date,1) then 1 else 0 end) as 'T+1M'
sum(case when expiration_date >= ADD_MONTHS(start_date,2) then 1 else 0 end) as 'T+2M'
....
I hardcode every "sum + case when" line for each period I want to check for retention. It works well for monthly. But now I want to look closer at daily level for a customer behavior for 60 days before expiration date. I wonder if there is anyway more efficient than writing 60 lines such as:
sum(case when opt_out_date >= expiration_date-1) then 1 else 0 end) as 'OO_T-1'
sum(case when opt_out_date >= expiration_date-2) then 1 else 0 end) as 'OO_T-2'
....
sum(case when opt_out_date >= expiration_date-60) then 1 else 0 end) as 'OO_T-60'
opt_out_date is the date that a user decides to cancel the auto-billing feature.This analysis is to check right before expiration date, how many users manually cancel their auto-billing.
The environment is Teradata
Thanks alot.
Sample data and expected result added as recommended
Sample data
Expected result from query

Related

Converting dates into weekdays then correlating it and summing it

The query is simple but not functioning the way I want it,
I am trying to check the date I inspected is the correct day I am checking against.
Input
SELECT TO_CHAR(date '1982.03.09', 'DAY'),
(CASE When lower(TO_CHAR(date '1982.03.09', 'DAY')) like lower('TUESDAY')
then 1 else 0 end)
Output
The answer should have been 1 for the case statement.
I added lower to check if it had to something with the capitals
Reason
The reason why I use a case statement is because when a student has an afterschool activity on monday, I want to place either 1 or 0 in the table and calculate the sum of how many students have afterschool acitivity on monday and so on.
Need eventually
I am doing this so that I can create a table of the week with the number of children doing aftershool activities for each day.
Any help regarding fixing my query would be greatly appreciated!
Thanks
For whatever reason there are spaces behind the TUESDAY to_char() produces. You can trim() them away. But instead of relying on a string representation (that probably might change when the locale changes) you should better use extract() to get the day of the week in numerical representation, 0 for Sunday, 1 for Monday and so on.
SELECT to_char(DATE '1982.03.09', 'DAY'),
CASE
WHEN trim(to_char(DATE '1982.03.09', 'DAY')) = 'TUESDAY' THEN
1
ELSE
0
END,
CASE extract(dow FROM DATE '1982.03.09')
WHEN 2 THEN
1
ELSE
0
END;
I'm a personal fan of extract (<datepart> from <date>) in lieu of to_char for problems like this.
Based on the output you are trying to achieve, I might also recommend a poor man's pivot table:
select
student_id,
max (case when extract (dow from activity_date) = 1 then 1 else 0 end) as mo,
max (case when extract (dow from activity_date) = 2 then 1 else 0 end) as tu,
max (case when extract (dow from activity_date) = 3 then 1 else 0 end) as we,
max (case when extract (dow from activity_date) = 4 then 1 else 0 end) as th,
max (case when extract (dow from activity_date) = 5 then 1 else 0 end) as fr
from activities
where activity_date between :FROM_DATE and :THRU_DATE
group by
student_id
Normally this would be a good use case for filter (where, but that would leave null values on date/student records where there is no activity. Depending on how you render your output, that may or may not be okay (Excel would handle it fine).
select
student_id,
max (1) filter (where extract (dow from activity_date) = 1) as mo,
max (1) filter (where extract (dow from activity_date) = 2) as tu,
max (1) filter (where extract (dow from activity_date) = 3) as we,
max (1) filter (where extract (dow from activity_date) = 4) as th,
max (1) filter (where extract (dow from activity_date) = 5) as fr
from activities
group by
student_id

DB2 use of labeled duration not valid with multiple date intervals

I'm trying to refactor a MySQL query to run on DB2/iSeries and I'm getting the error Use of labeled duration not valid.
Looking at the documentation I feel like the usage below should be working.
Am I missing something?
SELECT
IFNULL(SUM(CASE WHEN CURDATE() BETWEEN n.start_date AND n.expire_date
THEN 1 ELSE 0 END), 0) AS current,
IFNULL(SUM(CASE WHEN CURDATE() - 365 DAY BETWEEN n.start_date AND n.expire_date
THEN 1 ELSE 0 END), 0) AS prior,
IFNULL(SUM(CASE WHEN '2018-12-31' - 7 DAY BETWEEN n.start_date AND n.expire_date
THEN 1 ELSE 0 END), 0) AS full
FROM salesnumbers;
The issue is likely your date intervals. Try using CURRENT DATE instead of CURDATE(). Also, you may list date intervals +/- some amount directly in DB2.
SELECT
COUNT(CASE WHEN CURRENT DATE BETWEEN n.start_date AND n.expire_date
THEN 1 END) AS current,
COUNT(CASE WHEN CURRENT DATE - 1 YEAR BETWEEN n.start_date AND n.expire_date
THEN 1 END) AS prior,
COUNT(CASE WHEN DATE('2018-12-31') - 7 DAY BETWEEN n.start_date AND n.expire_date
THEN 1 END) AS full
FROM salesnumbers;
Note that I replaced your conditional sums with conditional counts. This leaves the code slightly more terse, because we don't have to spell out an explicit ELSE condition (the default being NULL).

Transact SQL - Table with different Record types requiring calculation

I have a table of invoices and Record_Types that I need to reconcile to open invoice report. I have the process down and know what I need to do. Just dont know how to properly structure the query and would prefer to not create 3 tables.
Record Types.
Invoice = 1 Credit = 5 Payment = 7
Invoice_Number, Record_Type, Dollar figure
Outstanding_Balance = Invoice(1) -(Payment(7)-(Credit))
Invoice_number Record_type Gen_Numeric_3
Basically I need to take the record_Type 1 and subtract the total of record type 7's from the below.
Invoice_Num Rec_Type Dollar_Amt
00820437 1 536.7700000000
00820437 7 469.6200000000
00820437 7 67.1500000000
Any advice would be great. messer
You can do this with aggregation and case statements:
SELECT invoice_num,
SUM(CASE WHEN rec_type = 1 THEN dollar_amt ELSE 0 END) - (SUM(CASE WHEN rec_type=7 THEN dollar_amt ELSE 0 END) - SUM(CASE WHEN rec_type=5 THEN dollar_amt ELSE 0 END)) as outstanding_balance
FROM yourtable
GROUP BY invoice_num

How to write this SQL without duplicating the customer?

I have a table that has customers listed every month with and active_indicator. For each customer, I want to pull the active indicator for just two months (Dec 2014 and Dec 2015), but when I write the below code, I get a table where each customer is listed twice. I know I can do another step to roll up the table to the customer level using max, but is there anyway to do this in one simple SQL query?
select distinct
customer
,case when date='2015-12-01' then active_indicator else 0 end as Dec2015_active_ind
,case when date='2014-12-01' then active_indicator else 0 end as Dec2014_active_ind
from monthly_account_cust
where date in ('2015-12-01', '2014-12-01')
order by customer
Pretty sure you are looking for something like this.
select
customer
, max(case when date = '2015-12-01' then active_indicator else 0 end) as Dec2015_active_ind
, max(case when date = '2014-12-01' then active_indicator else 0 end) as Dec2014_active_ind
from monthly_account_cust
where date in ('2015-12-01','2014-12-01')
group by customer
order by customer

SQL sum of column value, unique per user per day

I have a postgres table that looks like this:
id | user_id | state | created_at
The state can be any of the following:
new, paying, paid, completing, complete, payment_failed, completion_failed
I need a statement that returns a report with the following:
sum of all paid states by date
sum of all completed states by date
sum of all new, paying, completing states by date with only one per user per day to be counted
sum of all payment_failed, completion_failed by date with only one per user per day to be counted
So far I have this:
SELECT
DATE(created_at) AS date,
SUM(CASE WHEN state = 'complete' THEN 1 ELSE 0 END) AS complete,
SUM(CASE WHEN state = 'paid' THEN 1 ELSE 0 END) AS paid
FROM orders
WHERE created_at BETWEEN ? AND ?
GROUP BY DATE(created_at)
A sum of the in progress and failed states is easy enough by adding this to the select:
SUM(CASE WHEN state IN('new','paying','completing') THEN 1 ELSE 0 END) AS in_progress,
SUM(CASE WHEN state IN('payment_failed','completion_failed') THEN 1 ELSE 0 END) AS failed
But i'm having trouble figuring out how to make only one per user_id per day in_progress and failed states to be counted.
The reason I need this is to manipulate the failure rate in our stats, as many users who trigger a failure or incomplete order go on to trigger more which inflates our failure rate.
Thanking you in advance.
SELECT created_at::date AS the_date
,SUM(CASE WHEN state = 'complete' THEN 1 ELSE 0 END) AS complete
,SUM(CASE WHEN state = 'paid' THEN 1 ELSE 0 END) AS paid
,COUNT(DISTINCT CASE WHEN state IN('new','paying','completing')
THEN user_id ELSE NULL END) AS in_progress
,COUNT(DISTINCT CASE WHEN state IN('payment_failed','completion_failed')
THEN user_id ELSE NULL END) AS failed
FROM orders
WHERE created_at BETWEEN ? AND ?
GROUP BY created_at::date
I use the_date as alias, since it is unwise (while allowed) to use the key word date as identifier.
You could use a similar technique for complete and paid, one is as good as the other there:
COUNT(CASE WHEN state = 'complete' THEN 1 ELSE NULL END) AS complete
Try something like:
SELECT
DATE(created_at) AS date,
SUM(CASE WHEN state = 'complete' THEN 1 ELSE 0 END) AS complete,
SUM(CASE WHEN state = 'paid' THEN 1 ELSE 0 END) AS paid,
COUNT(DISTINCT CASE WHEN state IN('new','paying','completing') THEN user_id ELSE NULL END) AS in_progress,
COUNT(DISTINCT CASE WHEN state IN('payment_failed','completion_failed') THEN user_id ELSE NULL END) AS failed
FROM orders
WHERE created_at BETWEEN ? AND ?
GROUP BY DATE(created_at);
The main idea - COUNT (DISTINCT ...) will count unique user_id and wont count NULL values.
Details: aggregate functions, 4.2.7. Aggregate Expressions
The whole query with same style counts and simplified CASE WHEN ...:
SELECT
DATE(created_at) AS date,
COUNT(CASE WHEN state = 'complete' THEN 1 END) AS complete,
COUNT(CASE WHEN state = 'paid' THEN 1 END) AS paid,
COUNT(DISTINCT CASE WHEN state IN('new','paying','completing') THEN user_id END) AS in_progress,
COUNT(DISTINCT CASE WHEN state IN('payment_failed','completion_failed') THEN user_id END) AS failed
FROM orders
WHERE created_at BETWEEN ? AND ?
GROUP BY DATE(created_at);