Presenting cumulative average in time series - sql

I am trying to present a time series of a score to view the trend.
Score is an Average of all of the scores from the first Date in the table until the of the end of Year-Month.
ie. Jan 2018 = where date < Jan 2018
Feb 2018 = where date < Feb 2018
I would like to present this as a Monthly score for each Year-Month (Dec 2017, Jan 2018)
If score was not an average, i could utilize the Cumulative option in the Timeseries, however this does not work when introducing Avg(Metric).
I am really scratching my head on this one. Any advice on how to structure the data and present this in Google Datastudio would be greatly appreciated.
I have access to the database, and we are utilizing Big query to create the views.

avg() should work. Something like this:
select t.*,
avg(val) over (partition by format_date('%Y%m', date))
from t;
Oops, this is the average for the current month. If you want the running average:
select format_date('%Y%m', date) as yyyymm,
(sum(sum(val)) over (order by min(date)) /
sum(count(*)) over (order by min(date))
) as running_avg
from t
group by yyyymm
order by yyyymm;

Related

A different kind of running total in Teradata

I have seen tickets about running totals, but this is a little different.
Let's say I have claims from January 2020 to max(date). I want to write a query to give me the claims totals for January 2020, then January to February 2020, then January to March 2020.... all the way to January to max(date), and all in the same query.
An additional month of data gets added each month. I would like the query to account for that and not hardcode anything.
This is a cumulative sum. Something like this:
select date, sum(claim) over (order by date)
from t;
If you need to aggregate by month, then:
select extract(year from date), extract(month from date),
sum(claims) as claims_in_month,
sum(sum(claims)) over (order by min(date)) as running_claims
from t
group by extract(year from date), extract(month from date);

Count Records Prior to Date for Whole Year

I have a historical database with about 9000 records with unique UserID and date they created an account CreatedDate that looks like this:
UserID CreatedDate
1 5/12/2019
2 1/1/2018
3 4/2/2015
4 8/9/2016
. ..
I would like to know how many accounts were created UP TO a certain date, but for multiple months.
For example, how many accounts were there in Jan 2020, Feb 2020, Mar 2020, so on and so forth.
The manual way would be to do this for each month but it would be tedious:
select count(*)
from SCHEMA
--KEEP REPLACING THE MONTH TO GET COUNTS
where CreatedDate <= '2020-01-31'
Just wondering if there is a more efficient way? A group by wouldn't work because it just totals for each month, but I'm trying to get a historical count. Thanks!
You seem to need running total for each month. If so, you need group by to compute total counts per month and then you have to sum them using analytical sum function.
This is how you would do it in Postgres (db fiddle). Other vendors may differ in the way how month is extracted but the principle is same.
with schema(UserID, CreatedDate) as (values
(1, date '2019-12-05'),
(2, date '2018-01-01'),
(3, date '2015-01-04'),
(4, date '2016-09-08')
)
select month, sum(cnt) over (order by month) from (
select date_trunc('month', CreatedDate)::date as month, count(*) as cnt
from schema
group by date_trunc('month', CreatedDate)::date
) x
Note if data has gaps in month sequence and you want continuous sequence (for example all months between 2015-01 and 2019-12), you have to pregenerate calendar (relation with all months) and left join table schema to it. (It is not in my example yet because of YAGNI.)

Cumulative total from a table

I have a table which calculates the headcount based on the date they are hired , but i want to see a cumulative hc for tat year for example I might have hired only 20 in 2016 but i should show my overall hc till 2015+20 in 2016 and the it should go on.
if my requirement is from 2019 onwards it should show the cumulative till 2019 and go from there.
select FISC_YR_ID,ASSOC_TYPE_NM,
count(ASSOC_BDGE_NBR) over(order by FISC_YR_ID,FISC_MTH_ID rows between unbounded preceding and current row) as CUM_HC
--order by FISC_YR_ID asc )
from HC_table
where FISC_YR_ID >2018
this is the table
I can't quite tell what the query has to do with the question or sample data, so this focuses on the question.
You can use a cumulative sum:
select year, hired, sum(hired) over (order by year)
from t;
If you want to filter this, then use a subquery:
select t.*
from (select year, hired, sum(hired) over (order by year) as cumulative
from t
) t
where year > 2018

SQL Server - Cumulative Sum over Last 12 Months, but starting from the Last Month (SQL Server 18)

I need to run a cumulative sum of a value over the Last 12 Months. So far, my cumulative calculation are working, but starting from the Current Month.
I need the total of Last 12 Months, starting from the Last Month.
Currently, I'm using OVER clause on SQL, starting to running the cumulative total from the current row/month.
Please, refer below my code example:
SELECT *,
SUM(Amount) OVER (PARTITION BY ID ORDER BY Date_Month ROWS BETWEEN 11 PRECEDING AND CURRENT ROW) AS TwelveMoTtl
FROM (
SELECT DISTINCT
CAST(DATEADD(MONTH, DATEDIFF(MONTH, 0, TransactionDt), 0) AS DATE) AS Date_Month,
ID,
SUM(Amount) AS Amount
FROM MyTable
WHERE TransactionDt >= '2019-01-01'
GROUP BY
ID,
CAST(DATEADD(MONTH, DATEDIFF(MONTH, 0, TransactionDt), 0) AS DATE)
Here is my results (using only one ID to simplify the example):
As my example, the calculation are starting from the current row, and running over the last 12 months.
If we take the February row for example, I need the cumulative sum from Jan, 2020 to February, 2019.
Any suggestions how could I do it?
Thanks,
You seem to understand window functions pretty well. You just have to adjust the window frame:
SUM(Amount) OVER (PARTITION BY ID
ORDER BY Date_Month
ROWS BETWEEN 12 PRECEDING AND 1 PRECEDING
)
I forgot that I may have NULL rows in my table. So, the solution as to do a cumulative sum, even if there's missing dates. For example:
I need to running over the last 12 calendar months whether there are amount in those months or not.
Any ideas?
Thanks,
Rafael

how to produce a customer retention table /cohort analysis with SQL

I'm trying to write an SQL query (Presto SQL syntax) to produce a customer retention table (see sample below).
A customer who makes at least one transaction in a month is considered as retained for that month.
this is the table
user_id transaction_date
bdcff651- . 2018-01-01
bdcff641 . 2018-03-15
this is the result I would like to get
The first row should be understood as follows:
Out of all customers who made their first transaction in the month of Jan 2018 (defined as “Jan Activation Cohort”), 35% subsequently made a transaction during the one month period following their first transaction date, 23% in the next month, 15% in the next month and so on.
Date 1st Month 2nd Month 3rd Month
2018-01-01 35% 23% . 15%
2018-02-0 33 % 26% . 13%
2018-03-0 36% 27% 12%
As an example, if person XYZ makes his first transaction on 10th February 2018, his 1st month will be from 11th February 2018 to 10th March 2018, 2nd month will be from 11th March 2018 to 10th April 2018 and so on. This person’s details need to appear in the Feb 2018 cohort in the Customer Retention Table.
would appreciate any help! thanks.
You can use conditional aggregation. However, I am not sure what your real calculations are.
If I just use the built-in definitions of date_diff(), then the logic looks like:
select date_trunc(month, first_td) as yyyymm,
count(distinct user_id) as cnt,
(count(distinct case when date_diff(month, first_td, transaction_date) = 1
then user_id
end) /
count(distinct user_id)
) as month_1_ratio,
(count(distinct case when date_diff(month, first_td, transaction_date) = 2
then user_id
end) /
count(distinct user_id)
) as month_2_ratio
from (select t.*,
min(transaction_date) over (partition by user_id) as first_td
from t
) t
group by date_trunc(month, first_td)
order by yyyymm;
I am not familiar with Presto exactly, and do not have a way to test Presto code. However, it looks like from searching around a bit that it wouldn't be too hard to convert to Presto syntax from something like SQL Server syntax. Here is what I would do in SQL Server and you should be able to carry the concept over to Presto:
with transactions_info_per_user as (
select user_id, min(transaction_date) as first_transaction,
convert(datepart(year, min(transaction_date)) as varchar(4)) + convert(datepart(month, min(transaction_date)) as varchar(2)) as activation_cohort
from my_table
group by user_id
),
users_per_activation_cohort as (
select activation_cohort, count(*) as number_of_users
from transactions_info_per_user
group by activation_cohort
),
months_after_activation_per_purchase as (
select distinct mt.user_id, ti.activation_cohort, datediff(month, mt.transaction_date, ti.first_transaction) AS months_after_activation
from my_table mt
left join transactions_info_per_user as ti
on mt.user_id = ti.user_id
),
final as (
select activation_cohort, months_after_activation, count(*) as user_count_per_cohort_with_purchase_per_month_after_activation
from months_after_activation_per_purchase
group by activation_cohort, months_after_activation
)
select activation_cohort, months_after_activation,
convert(user_count_per_cohort_with_purchase_per_month_after_activation as decimal(9,2)) / convert(users_per_activation_cohort as decimal(9,2)) * 100
from final
--Then pivot months_after_activation into columns
I was very explicit with the naming of things so you could follow the thought process. Here is an example of how to pivot in Presto. Hopefully this helps you!