Create column for rolling total for the previous month of a current rows date - sql

Context
Using Presto syntax, I'm trying to create an output table that has rolling totals of an 'amount' column value for each day in a month. In each row there will also be a column with a rolling total for the previous month, and also a column with the difference between the totals.
Output Requirements
completed: create month_to_date_amount column that stores rolling total from
sum of amount column. The range for the rolling total is between 1st of month and current row date column value. Restart rolling
total each month. I already have a working query below that creates this column.
SELECT
*,
SUM(amount) OVER (
PARTITION BY
team,
month_id
ORDER BY
date ASC
) month_to_date_amount
FROM (
SELECT -- this subquery is required to handle duplicate dates
date,
SUM(amount) AS amount,
team,
month_id
FROM input_table
GROUP BY
date,
team,
month_id
) AS t
create prev_month_to_date_amount column that:
a. stores previous months rolling amount for the current rows date and team and add to same
output row.
b. Return 0 if there is no record matching the previous month date. (Ex. Prev months date for March 31 is Feb 31 so does not exist). Also a record will not exist for days that have no amount values. Example output table is below.
create movement column that stores the difference
amount between month_to_date_amount column and
prev_month_to_date_amount column from current row.
Question
Could someone assist with my 2nd and 3rd requirements above to achieve my desired output shown below? By either adding on to my current query above, or creating another more efficient one if necessary. A solution with multiple queries is fine.
Input Table
team
date
amount
month_id
A
2022-04-01
1
2022-04
A
2022-04-01
1
2022-04
A
2022-04-02
1
2022-04
B
2022-04-01
3
2022-04
B
2022-04-02
3
2022-04
B
2022-05-01
4
2022-05
B
2022-05-02
4
2022-05
C
2022-05-01
1
2022-05
C
2022-05-02
1
2022-05
C
2022-06-01
5
2022-06
C
2022-06-02
5
2022-06

This answer is a good example of using the window function LAG. In summary the query partitions the data by Team and Day of Month, and uses LAG to get the previous months amount and calculate the movement value.
e.g. for Team B data. The window function will create two partition sets: one with the Team B 01/04/2022 and 01/05/2022 rows, and one with the Team B 02/04/2022 and 02/05/2022 rows, order each partition set by date. Then for each set for each row, use LAG to get the data from the previous row (if one exists) to enable calculation of the movement and retrieve the previous months amount.
I hope this helps.
;with
totals
as
(
select
*,
sum(amount) over(
partition by team, month_id
order by date, team) monthToDateAmount
from
( select
date,
sum(amount) as amount,
team,
month_id
from input_table
group by
date,
team,
month_id
) as x
),
totalsWithMovement
as
(
select
*,
monthToDateAmount
- coalesce(lag(monthToDateAmount) over(
partition by team,day(date(date))
order by team, date),0)
as movement,
coalesce(lag(monthToDateAmount) over
(partition by team, day(date(date))
order by team,month_id),0)
as prevMonthToDateAmount
from
totals
)
select
date, amount, team, monthToDateAmount,
prevMonthToDateAmount, movement
from
totalswithmovement
order by
team, date;

Related

Extract previous row calculated value for use in current row calculations - Postgres

Have a requirement where I would need to rope the calculated value of the previous row for calculation in the current row.
The following is a sample of how the data currently looks :-
ID
Date
Days
1
2022-01-15
30
2
2022-02-18
30
3
2022-03-15
90
4
2022-05-15
30
The following is the output What I am expecting :-
ID
Date
Days
CalVal
1
2022-01-15
30
2022-02-14
2
2022-02-18
30
2022-03-16
3
2022-03-15
90
2022-06-14
4
2022-05-15
30
2022-07-14
The value of CalVal for the first row is Date + Days
From the second row onwards it should take the CalVal value of the previous row and add it with the current row Days
Essentially, what I am looking for is means to access the previous rows calculated value for use in the current row.
Is there anyway we can achieve the above via Postgres SQL? I have been tinkering with window functions and even recursive CTEs but have had no luck :(
Would appreciate any direction!
Thanks in advance!
select
id,
date,
coalesce(
days - (lag(days, 1) over (order by date, days))
, days) as days,
first_date + cast(days as integer) as newdate
from
(
select
-- get a running sum of days
id,
first_date,
date,
sum(days) over (order by date, days) as days
from
(
select
-- get the first date
id,
(select min(date) from table1) as first_date,
date,
days
from
table1
) A
) B
This query get the exact output you described. I'm not at all ready to say it is the best solution but the strategy employed is to essential create a running total of the "days" ... this means that we can just add this running total to the first date and that will always be the next date in the desired sequence. One finesse: to put the "days" back into the result, we calculated the current running total less the previous running total to arrive at the original amount.
assuming that table name is table1
select
id,
date,
days,
first_value(date) over (order by id) +
(sum(days) over (order by id rows between unbounded preceding and current row))
*interval '1 day' calval
from table1;
We just add cumulative sum of days to first date in table. It's not really what you want to do (we don't need date from previous row, just cumulative days sum)
Solution with recursion
with recursive prev_row as (
select id, date, days, date+ days*interval '1 day' calval
from table1
where id = 1
union all
select t.id, t.date, t.days, p.calval + t.days*interval '1 day' calval
from prev_row p
join table1 t on t.id = p.id+ 1
)
select *
from prev_row

prestosql get average from last 7 days for each day

The question I have is very similar to the question here, but I am using Presto SQL (on aws athena) and couldn't find information on loops in presto.
To reiterate the issue, I want the query that:
Given table that contains: Day, Number of Items for this Day
I want: Day, Average Items for Last 7 Days before "Day"
So if I have a table that has data from Dec 25th to Jan 25th, my output table should have data from Jan 1st to Jan 25th. And for each day from Jan 1-25th, it will be the average number of items from last 7 days.
Is it possible to do this with presto?
maybe you can try this one
calendar Common Table Expression (CTE) is used to generate dates between two dates range.
with calendar as (
select date_generated
from (
values (sequence(date'2021-12-25', date'2022-01-25', interval '1' day))
) as t1(date_array)
cross join unnest(date_array) as t2(date_generated)),
temp CTE is basically used to make a date group which contains last 7 days for each date group.
temp as (select c1.date_generated as date_groups
, format_datetime(c2.date_generated, 'yyyy-MM-dd') as dates
from calendar c1, calendar c2
where c2.date_generated between c1.date_generated - interval '6' day and c1.date_generated
and c1.date_generated >= date'2021-12-25' + interval '6' day)
Output for this part:
date_groups
dates
2022-01-01
2021-12-26
2022-01-01
2021-12-27
2022-01-01
2021-12-28
2022-01-01
2021-12-29
2022-01-01
2021-12-30
2022-01-01
2021-12-31
2022-01-01
2022-01-01
last part is joining day column from your table with each date and then group it by the date group
select temp.date_groups as day
, avg(your_table.num_of_items) avg_last_7_days
from your_table
join temp on your_table.day = temp.dates
group by 1
You want a running average (AVG OVER)
select
day, amount,
avg(amount) over (order by day rows between 6 preceding and current row) as avg_amount
from mytable
order by day
offset 6;
I tried many different variations of getting the "running average" (which I now know is what I was looking for thanks to Thorsten's answer), but couldn't get the output I wanted exactly with my other columns (that weren't included in my original question) in the table, but this ended up working:
SELECT day, <other columns>, avg(amount) OVER (
PARTITION BY <other columns>
ORDER BY date(day) ASC
ROWS 6 PRECEDING) as avg_7_days_amount FROM table ORDER BY date(day) ASC

How to spread annual amount and then add by month in SQL

Currently I'm working with a table that looks like this:
Month | Transaction | amount
2021-07-01| Annual Membership Fee| 45
2021-08-01| Annual Membership Fee| 145
2021-09-01| Annual Membership Fee| 2940
2021-10-01| Annual Membership Fee| 1545
the amount on that table is the total monthly amount (ex. I have 100 customers who paid $15 for the annual membership, so my total monthly amount would be $1500).
However what I would like to do (and I have no clue how) is divide the amount by 12 and spread it into the future in order to have a monthly revenue per month. As an example for 2021-09-01 I would get the following:
$2490/12 = $207.5 (dollars per month for the next 12 months)
in 2021-09-01 I would only get $207.5 for that specific month.
On 2021-10-01 I would get $1545/12 = $128.75 plus $207.5 from the previous month (total = $336.25 for 2021-10-01)
And the same operation would repeat onwards. The last period that I would collect my $207.5 from 2021-09-01 would be in 2022-08-01.
I was wondering if someone could give me an idea of how to perform this in a SQL query/CTE?
Assuming all the months you care about exist in your table, I would suggest something like:
SELECT
month,
(SELECT SUM(m2.amount/12) FROM mytable m2 WHERE m2.month BETWEEN ADD_MONTHS(m1.month, -11) AND m1.month) as monthlyamount
FROM mytable m1
GROUP BY month
ORDER BY month
For each month that exists in the table, this sums 1/12th of the current amount plus the previous 11 months (using the add_months function). I think that's what you want.
A few notes/thoughts:
I'm assuming (based on the column name) that all the dates in the month column end on the 1st, so we don't need to worry about matching days or having the group by return multiple rows for the same month.
You might want to round the SUMs I did, since in some cases dividing by 12 might give you more digits after the decimal than you want for money (although, in that case, you might also have to consider remainders).
If you really only have one transaction per month (like in your example), you don't need to do the group by.
If the months you care about don't exist in your table, then this won't work, but you could do the same thing generating a table of months. e.g. If you have an amount on 2020-01-01 but nothing in 2020-02-01, then this won't return a row for 2021-02-01.
CTE = set up dataset
CTE_2 = pro-rate dataset
FINAL SQL = select future_cal_month,sum(pro_rated_amount) from cte_2 group by 1
with cte as (
select '2021-07-01' cal_month,'Annual Membership Fee' transaction ,45 amount
union all select '2021-08-01' cal_month,'Annual Membership Fee' transaction ,145 amount
union all select '2021-09-01' cal_month,'Annual Membership Fee' transaction ,2940 amount
union all select '2021-10-01' cal_month,'Annual Membership Fee' transaction ,1545 amount)
, cte_2 as (
select
dateadd('month', row_number() over (partition by cal_month order by 1), cal_month) future_cal_month
,amount/12 pro_rated_amount
from
cte
,table(generator(rowcount => 12)) v)
select
future_cal_month
, sum(pro_rated_amount)
from
cte_2
group by
future_cal_month

Teradata loop for dates, column adding within loop

I have a table where every row is transaction and there are few columns: clients IDs and dates for every transaction.
I am trying to write a query which will give a table where column N shows number of clients whose first transaction happened in month N made transactions in months: N, N+1, N+2, ...
For example (desired table for 3 months data):
1 2 3
100 90 78
80 80
60
First row of the column 1 shows number of clients whose first transaction happened in month 1, second row shows how many of this clients stayed after 1 month, third row - after two month etc
My current query (Year is a column wit year for the date, like 2017, month is a number of month like 1 for January):
WITH not_in AS(
SELECT ID, Year, month
FROM table
WHERE trans_date<date "2017-01-01"),
ID_in AS(
SELECT ID, Year, month
FROM table
WHERE trans_date BETWEEN date "2017-01-01" AND date "2017-01-31"
),
from_this AS(
SELECT ID, Year, month
FROM table
)
SELECT Year, Month, count(distinct ID)
FROM from_this
WHERE ID IN (select ID from ID_in)
AND
ID NOT IN (select ID from not_in)
GROUP BY 1,2
ORDER BY 1,2
But this gives only one column (for January 2017) of the desired table. I need to change dates for other months in 2017, 2018 and so on manually.
How to avoid this?
I guess, it should be looped somehow. And I think, I should create volatile table and add columns to it within loop, then select * from it.
Also I can not find an instruction for variables declaration and while loops in Teradata, any clearifications are appreciated.

How do I sum values between a date and that same date plus 12 weeks?

I have data at the Cust_ID and Week_Date level with sales_dollars. Each Week_Date represents the sales_dollars for that week. I want to sum the dollars for each Cust_ID for the next 8 weeks. I want the final data at the Cust_ID and Week_Date level. Is that possible using sql?
You seem to want a window functions:
select t.*,
sum(sales) over (partition by cust_id
order by week_date
rows between current row and 7 following
) as sales_8weeks
from t;
This assumes that all weeks have data. If you want to put a range in by date, you can do that but the exact syntax varies by database.