Count Records Prior to Date for Whole Year - sql

I have a historical database with about 9000 records with unique UserID and date they created an account CreatedDate that looks like this:
UserID CreatedDate
1 5/12/2019
2 1/1/2018
3 4/2/2015
4 8/9/2016
. ..
I would like to know how many accounts were created UP TO a certain date, but for multiple months.
For example, how many accounts were there in Jan 2020, Feb 2020, Mar 2020, so on and so forth.
The manual way would be to do this for each month but it would be tedious:
select count(*)
from SCHEMA
--KEEP REPLACING THE MONTH TO GET COUNTS
where CreatedDate <= '2020-01-31'
Just wondering if there is a more efficient way? A group by wouldn't work because it just totals for each month, but I'm trying to get a historical count. Thanks!

You seem to need running total for each month. If so, you need group by to compute total counts per month and then you have to sum them using analytical sum function.
This is how you would do it in Postgres (db fiddle). Other vendors may differ in the way how month is extracted but the principle is same.
with schema(UserID, CreatedDate) as (values
(1, date '2019-12-05'),
(2, date '2018-01-01'),
(3, date '2015-01-04'),
(4, date '2016-09-08')
)
select month, sum(cnt) over (order by month) from (
select date_trunc('month', CreatedDate)::date as month, count(*) as cnt
from schema
group by date_trunc('month', CreatedDate)::date
) x
Note if data has gaps in month sequence and you want continuous sequence (for example all months between 2015-01 and 2019-12), you have to pregenerate calendar (relation with all months) and left join table schema to it. (It is not in my example yet because of YAGNI.)

Related

How to spread annual amount and then add by month in SQL

Currently I'm working with a table that looks like this:
Month | Transaction | amount
2021-07-01| Annual Membership Fee| 45
2021-08-01| Annual Membership Fee| 145
2021-09-01| Annual Membership Fee| 2940
2021-10-01| Annual Membership Fee| 1545
the amount on that table is the total monthly amount (ex. I have 100 customers who paid $15 for the annual membership, so my total monthly amount would be $1500).
However what I would like to do (and I have no clue how) is divide the amount by 12 and spread it into the future in order to have a monthly revenue per month. As an example for 2021-09-01 I would get the following:
$2490/12 = $207.5 (dollars per month for the next 12 months)
in 2021-09-01 I would only get $207.5 for that specific month.
On 2021-10-01 I would get $1545/12 = $128.75 plus $207.5 from the previous month (total = $336.25 for 2021-10-01)
And the same operation would repeat onwards. The last period that I would collect my $207.5 from 2021-09-01 would be in 2022-08-01.
I was wondering if someone could give me an idea of how to perform this in a SQL query/CTE?
Assuming all the months you care about exist in your table, I would suggest something like:
SELECT
month,
(SELECT SUM(m2.amount/12) FROM mytable m2 WHERE m2.month BETWEEN ADD_MONTHS(m1.month, -11) AND m1.month) as monthlyamount
FROM mytable m1
GROUP BY month
ORDER BY month
For each month that exists in the table, this sums 1/12th of the current amount plus the previous 11 months (using the add_months function). I think that's what you want.
A few notes/thoughts:
I'm assuming (based on the column name) that all the dates in the month column end on the 1st, so we don't need to worry about matching days or having the group by return multiple rows for the same month.
You might want to round the SUMs I did, since in some cases dividing by 12 might give you more digits after the decimal than you want for money (although, in that case, you might also have to consider remainders).
If you really only have one transaction per month (like in your example), you don't need to do the group by.
If the months you care about don't exist in your table, then this won't work, but you could do the same thing generating a table of months. e.g. If you have an amount on 2020-01-01 but nothing in 2020-02-01, then this won't return a row for 2021-02-01.
CTE = set up dataset
CTE_2 = pro-rate dataset
FINAL SQL = select future_cal_month,sum(pro_rated_amount) from cte_2 group by 1
with cte as (
select '2021-07-01' cal_month,'Annual Membership Fee' transaction ,45 amount
union all select '2021-08-01' cal_month,'Annual Membership Fee' transaction ,145 amount
union all select '2021-09-01' cal_month,'Annual Membership Fee' transaction ,2940 amount
union all select '2021-10-01' cal_month,'Annual Membership Fee' transaction ,1545 amount)
, cte_2 as (
select
dateadd('month', row_number() over (partition by cal_month order by 1), cal_month) future_cal_month
,amount/12 pro_rated_amount
from
cte
,table(generator(rowcount => 12)) v)
select
future_cal_month
, sum(pro_rated_amount)
from
cte_2
group by
future_cal_month

Teradata loop for dates, column adding within loop

I have a table where every row is transaction and there are few columns: clients IDs and dates for every transaction.
I am trying to write a query which will give a table where column N shows number of clients whose first transaction happened in month N made transactions in months: N, N+1, N+2, ...
For example (desired table for 3 months data):
1 2 3
100 90 78
80 80
60
First row of the column 1 shows number of clients whose first transaction happened in month 1, second row shows how many of this clients stayed after 1 month, third row - after two month etc
My current query (Year is a column wit year for the date, like 2017, month is a number of month like 1 for January):
WITH not_in AS(
SELECT ID, Year, month
FROM table
WHERE trans_date<date "2017-01-01"),
ID_in AS(
SELECT ID, Year, month
FROM table
WHERE trans_date BETWEEN date "2017-01-01" AND date "2017-01-31"
),
from_this AS(
SELECT ID, Year, month
FROM table
)
SELECT Year, Month, count(distinct ID)
FROM from_this
WHERE ID IN (select ID from ID_in)
AND
ID NOT IN (select ID from not_in)
GROUP BY 1,2
ORDER BY 1,2
But this gives only one column (for January 2017) of the desired table. I need to change dates for other months in 2017, 2018 and so on manually.
How to avoid this?
I guess, it should be looped somehow. And I think, I should create volatile table and add columns to it within loop, then select * from it.
Also I can not find an instruction for variables declaration and while loops in Teradata, any clearifications are appreciated.

Group Data by Year, Oracle SQL

I am trying to create a query that counts records that existed within a year. The table looks like this:
Title_ID ISSUE_DATE EXPIRY_DATE CLIENT_NUMBER
123 '26-JUN-19' '17-AUG-20' 8529
124 '04-APR-19' '17-SEP-22' 8529
125 '09-MAY-15' '11-SEP-19' 3654
126 '31-DEC-19' '25-NOV-22' 9852
127 '27-OCT-18' '26-FEB-21' 2254
128 '05-OCT-11' '01-JAN-19' 9852
Specifically, I want to count the number of distinct CLIENT_NUMBERS of the records that existed in a given calendar year.
The record (title) exists from the ISSUE_DATE until the EXPIRY_DATE. If the record existed at any point within a year (Let's say 2019), then we are interested in including it in our client count.
So, if the record was issued in 2019 or if the record expired in 2019 or if the record was issued before 2019 and expired after 2019, then we are interested in including it in the client count for the year it existed.
I have built the following query that does this, but only for one specific year (2019). I'd like to build the query further so it look at each calendar year and counts the distinct client numbers when the client has an active title:
SELECT *
-- count(distinct client_number)
FROM
TITLE
WHERE
issue_date between '01-Jan-19' and '31-Dec-19'
or expiry_date between '01-Jan-19' and '31-Dec-19'
or (issue_date < '01-Jan-19' and expiry_date > '31-Dec-19')
Where I am having trouble is, my data is much larger than the subset I have provided. I would like to recursively get counts of distinct client numbers by year using the same kind of logic to include a record within a calendar year as I have outlined above. So, I'd like to have a table like this:
YEAR COUNT_OF_CLIENT_NUMBERS
2020 5469
2019 5587
2018 4852
2017 4501
2016 3265
etc
I think I've stretched by current SQL abilities at this point, so I thought Id ask to see if there are any suggestions to make this happen?
Thanks.
EDIT: to clarify, the issue date and the expiry date apply to the title, not the client. So, the title is issued on the issue date and expires on the expiry date. A client can own one or more title(s).
So, I am looking to get a count of how many distinct clients own active titles within a give year if one or more of their titles is active within that year. So the key is, a title is considered active if it was issued in that year OR it expired within that year OR it was issued before that year and expired after that year. A title CAN be active in multiple years (i.e. Issued on Feb. 4, 2014 and expires on Apr.7 2017, I want to include the client count for each year that titles exists....2014, 2015, 2016 and 2017).
So, I created a table to join to (thanks #GMB for the suggestion):
with calendar_year (y) as
(
select 2010 from dual
union all select y + 1 from calendar_year where y < 2020
)
select * from calendar_year
Which returns:
2010
2011
2012
2013
2014
etc
I want to join that to my titles table, but I am having issues recursively looking at the issue date and expiry date to join up the title to each year it existed in. Any help in that area, would be great!
You can use a recursive query to generate the years, then bring the table with a left join, and aggregate:
with dates (dt) as (
select date '2016-01-01' from dual
union all select add_months(dt, 1) from dates where dt < date '2020-01-01'
)
select d.dt, count(distinct t.client_number) count_of_client_numbers
from dates d
left join title t
on t.issue_date <= d.dt
and t.expiry_date > d.dt
group by d.dt
The upside of this approach is that you get results for each and every year, even those where no title started or ended.
You can get number of clients on any day by unpivoting the data, so there is one row per date. Then keep track of the "ins" and "outs".
You don't specify the database, but here is one approach:
select dte, sum(inc),
sum(sum(inc)) over (order by dte) as active_on_date
from ((select issue_date as dte, 1 as inc
from t
) union all
(select expiry_date as dte, -1 as inc
from t
)
) t
group by dte
order by dte;
EDIT:
Hmmm, the above may not do exactly what you want. If you want to count distinct client numbers rather than overall rows, then it might be simpler to just list the dates and join:
select d.dte, count(distinct t.client_id)
from (select date '2020-01-01' as dte from dual union all
select date '2019-01-01' as dte from dual union all
select date '2018-01-01' as dte from dual union all
. . .
) d left join
t
on d.dte between t.issue_dte and t.expiry_dte
group by d.dte
order by d.dte;

I want find customers transacting for any consecutive 3 months from year 2017 to 2018

I want to know the trick to find the list of customers who are transacting for consecutive 3 months ,that could be any 3 consecutive months with any number of occurrence.
example: suppose there is customer who transact in January then keep transacting till march then he stopped transacting.I want the list of these customer from my database .
I am working on AWS Athena.
One method uses aggregation and window functions:
select customer_id, yyyymm_2
from (select date_trunc(month, transactdate) as yyyymm, customer_id,
lag(date_trunc(month, transactdate), 2) over (partition by customer_id order by date_trunc(month, transactdate)) as prev_yyyymm_2
from t
where transactdate >= '2017-01-01' and
transactadte < '2019-01-01'
)
where prev_dt_2 = yyyymm - interval '2' month;
This aggregates transactions by month and looks at the transaction date two rows earlier. The outer filter checks that that date is exactly 2 months earlier.

how to produce a customer retention table /cohort analysis with SQL

I'm trying to write an SQL query (Presto SQL syntax) to produce a customer retention table (see sample below).
A customer who makes at least one transaction in a month is considered as retained for that month.
this is the table
user_id transaction_date
bdcff651- . 2018-01-01
bdcff641 . 2018-03-15
this is the result I would like to get
The first row should be understood as follows:
Out of all customers who made their first transaction in the month of Jan 2018 (defined as “Jan Activation Cohort”), 35% subsequently made a transaction during the one month period following their first transaction date, 23% in the next month, 15% in the next month and so on.
Date 1st Month 2nd Month 3rd Month
2018-01-01 35% 23% . 15%
2018-02-0 33 % 26% . 13%
2018-03-0 36% 27% 12%
As an example, if person XYZ makes his first transaction on 10th February 2018, his 1st month will be from 11th February 2018 to 10th March 2018, 2nd month will be from 11th March 2018 to 10th April 2018 and so on. This person’s details need to appear in the Feb 2018 cohort in the Customer Retention Table.
would appreciate any help! thanks.
You can use conditional aggregation. However, I am not sure what your real calculations are.
If I just use the built-in definitions of date_diff(), then the logic looks like:
select date_trunc(month, first_td) as yyyymm,
count(distinct user_id) as cnt,
(count(distinct case when date_diff(month, first_td, transaction_date) = 1
then user_id
end) /
count(distinct user_id)
) as month_1_ratio,
(count(distinct case when date_diff(month, first_td, transaction_date) = 2
then user_id
end) /
count(distinct user_id)
) as month_2_ratio
from (select t.*,
min(transaction_date) over (partition by user_id) as first_td
from t
) t
group by date_trunc(month, first_td)
order by yyyymm;
I am not familiar with Presto exactly, and do not have a way to test Presto code. However, it looks like from searching around a bit that it wouldn't be too hard to convert to Presto syntax from something like SQL Server syntax. Here is what I would do in SQL Server and you should be able to carry the concept over to Presto:
with transactions_info_per_user as (
select user_id, min(transaction_date) as first_transaction,
convert(datepart(year, min(transaction_date)) as varchar(4)) + convert(datepart(month, min(transaction_date)) as varchar(2)) as activation_cohort
from my_table
group by user_id
),
users_per_activation_cohort as (
select activation_cohort, count(*) as number_of_users
from transactions_info_per_user
group by activation_cohort
),
months_after_activation_per_purchase as (
select distinct mt.user_id, ti.activation_cohort, datediff(month, mt.transaction_date, ti.first_transaction) AS months_after_activation
from my_table mt
left join transactions_info_per_user as ti
on mt.user_id = ti.user_id
),
final as (
select activation_cohort, months_after_activation, count(*) as user_count_per_cohort_with_purchase_per_month_after_activation
from months_after_activation_per_purchase
group by activation_cohort, months_after_activation
)
select activation_cohort, months_after_activation,
convert(user_count_per_cohort_with_purchase_per_month_after_activation as decimal(9,2)) / convert(users_per_activation_cohort as decimal(9,2)) * 100
from final
--Then pivot months_after_activation into columns
I was very explicit with the naming of things so you could follow the thought process. Here is an example of how to pivot in Presto. Hopefully this helps you!