Getting Average for Weekdays and Weekends within 30 days in BQ - google-bigquery

I'm trying to get the average repairs in the weekdays and weekends within the last 30 days. Each day is tagged whether it's a weekday or a weekend. Holidays are tagged as weekends.
If I use:
AVG(Completed_Repairs) OVER(PARTITION BY day_type ORDER BY UNIX_DATE(WORK_DT) RANGE BETWEEN 30 PRECEDING AND CURRENT ROW)
I only get either the average repairs for all weekdays or for all weekends in the last 30 days depending on what type of day the date is. But I also need the average for the opposite to compute a prorated monthly number. I basically would need another column with the value of the opposite day type.

If I understood correctly, not partitioning might be the way:
with
input as (
select cast('2022-10-11' as date) as WORK_DT, "weekday" as day_type, 307 as completed_repairs union all
select cast('2022-10-12' as date) as WORK_DT, "weekday" as day_type, 100 as completed_repairs union all
select cast('2022-10-09' as date) as WORK_DT, "weekend" as day_type, 750 as completed_repairs union all
select cast('2022-10-10' as date) as WORK_DT, "weekend" as day_type, 647 as completed_repairs
)
select
*,
avg(if(day_type = 'weekday', completed_repairs,0)) OVER(ORDER BY UNIX_DATE(WORK_DT) RANGE BETWEEN 30 PRECEDING AND CURRENT ROW) as avg_weekday,
avg(if(day_type = 'weekend', completed_repairs,0)) OVER(ORDER BY UNIX_DATE(WORK_DT) RANGE BETWEEN 30 PRECEDING AND CURRENT ROW) as avg_weekend,
from input
order by work_dt
You can replace the 0 by null if you don't want the weekends to impact the average of the weekdays and vice-versa.
If you'd rather have a column "matching" and a column "opposite" you can then use the result of this to write a condition depending on the day_type and the column name.

Related

Extract previous row calculated value for use in current row calculations - Postgres

Have a requirement where I would need to rope the calculated value of the previous row for calculation in the current row.
The following is a sample of how the data currently looks :-
ID
Date
Days
1
2022-01-15
30
2
2022-02-18
30
3
2022-03-15
90
4
2022-05-15
30
The following is the output What I am expecting :-
ID
Date
Days
CalVal
1
2022-01-15
30
2022-02-14
2
2022-02-18
30
2022-03-16
3
2022-03-15
90
2022-06-14
4
2022-05-15
30
2022-07-14
The value of CalVal for the first row is Date + Days
From the second row onwards it should take the CalVal value of the previous row and add it with the current row Days
Essentially, what I am looking for is means to access the previous rows calculated value for use in the current row.
Is there anyway we can achieve the above via Postgres SQL? I have been tinkering with window functions and even recursive CTEs but have had no luck :(
Would appreciate any direction!
Thanks in advance!
select
id,
date,
coalesce(
days - (lag(days, 1) over (order by date, days))
, days) as days,
first_date + cast(days as integer) as newdate
from
(
select
-- get a running sum of days
id,
first_date,
date,
sum(days) over (order by date, days) as days
from
(
select
-- get the first date
id,
(select min(date) from table1) as first_date,
date,
days
from
table1
) A
) B
This query get the exact output you described. I'm not at all ready to say it is the best solution but the strategy employed is to essential create a running total of the "days" ... this means that we can just add this running total to the first date and that will always be the next date in the desired sequence. One finesse: to put the "days" back into the result, we calculated the current running total less the previous running total to arrive at the original amount.
assuming that table name is table1
select
id,
date,
days,
first_value(date) over (order by id) +
(sum(days) over (order by id rows between unbounded preceding and current row))
*interval '1 day' calval
from table1;
We just add cumulative sum of days to first date in table. It's not really what you want to do (we don't need date from previous row, just cumulative days sum)
Solution with recursion
with recursive prev_row as (
select id, date, days, date+ days*interval '1 day' calval
from table1
where id = 1
union all
select t.id, t.date, t.days, p.calval + t.days*interval '1 day' calval
from prev_row p
join table1 t on t.id = p.id+ 1
)
select *
from prev_row

prestosql get average from last 7 days for each day

The question I have is very similar to the question here, but I am using Presto SQL (on aws athena) and couldn't find information on loops in presto.
To reiterate the issue, I want the query that:
Given table that contains: Day, Number of Items for this Day
I want: Day, Average Items for Last 7 Days before "Day"
So if I have a table that has data from Dec 25th to Jan 25th, my output table should have data from Jan 1st to Jan 25th. And for each day from Jan 1-25th, it will be the average number of items from last 7 days.
Is it possible to do this with presto?
maybe you can try this one
calendar Common Table Expression (CTE) is used to generate dates between two dates range.
with calendar as (
select date_generated
from (
values (sequence(date'2021-12-25', date'2022-01-25', interval '1' day))
) as t1(date_array)
cross join unnest(date_array) as t2(date_generated)),
temp CTE is basically used to make a date group which contains last 7 days for each date group.
temp as (select c1.date_generated as date_groups
, format_datetime(c2.date_generated, 'yyyy-MM-dd') as dates
from calendar c1, calendar c2
where c2.date_generated between c1.date_generated - interval '6' day and c1.date_generated
and c1.date_generated >= date'2021-12-25' + interval '6' day)
Output for this part:
date_groups
dates
2022-01-01
2021-12-26
2022-01-01
2021-12-27
2022-01-01
2021-12-28
2022-01-01
2021-12-29
2022-01-01
2021-12-30
2022-01-01
2021-12-31
2022-01-01
2022-01-01
last part is joining day column from your table with each date and then group it by the date group
select temp.date_groups as day
, avg(your_table.num_of_items) avg_last_7_days
from your_table
join temp on your_table.day = temp.dates
group by 1
You want a running average (AVG OVER)
select
day, amount,
avg(amount) over (order by day rows between 6 preceding and current row) as avg_amount
from mytable
order by day
offset 6;
I tried many different variations of getting the "running average" (which I now know is what I was looking for thanks to Thorsten's answer), but couldn't get the output I wanted exactly with my other columns (that weren't included in my original question) in the table, but this ended up working:
SELECT day, <other columns>, avg(amount) OVER (
PARTITION BY <other columns>
ORDER BY date(day) ASC
ROWS 6 PRECEDING) as avg_7_days_amount FROM table ORDER BY date(day) ASC

How to calculate 7 day average with dates instead of rows in SQL?

I have the below script for calculating the 7 day average new cases using partition and 7 preceding rows. Is there any way to do this by using dates instead? For example, on 2020-01-26, the average is calculated as 0.8 instead of 0.57 if the 7 preceding dates were included in the data. I know it's not a material difference but am just wondering if there is a more accurate way?
select country, date,
d_confirmed,
avg(d_confirmed) over(partition by country ORDER BY date rows 6 preceding) As "7_day_avg"
from coronavirusdata_country_combined
where (country = 'Canada' or country = 'Australia')
order by country, date
postgres prior to v 11 doesn't support RANGE BETWEEN INTERVAL like Oracle does (Oracle can have varying window lengths like "all rows where date column X is between 7 days back from the current row date X and the current row date X"), but if your PG is up to date, then you can adjust your query to say this instead of the rows between:
RANGE BETWEEN INTERVAL '7 DAY' PRECEDING AND CURRENT ROW
If your PG is older and doesn't you could consider reducing your dataset so it only has one row per day, by summing the individual records - this means your 7 day average can be done with an absolute row count because there is one row per day
select
country,
date,
d_confirmed,
avg(d_confirmed) over(partition by country ORDER BY date rows 6 preceding) As "7_day_avg"
from (
select country, date, sum(d_confirmed) as d_confirmed
from coronavirusdata_country_combined
group by country, date
) x
where (country = 'Canada' or country = 'Australia')
order by country, date
Try using RANGE instead of ROWS in the OVER clause:
RANGE BETWEEN 7 PRECEDING AND CURRENT ROW
Then the 7 will denote 7 days rather than 7 rows.

Average over rolling date period

I have 4 dimensions, which one of them is date. I need to calculate for each date, the average in the last 30 days, per each dimension value.
I have tried to run average over a partition by the 4 dimensions in a form of:
SELECT
Date, Produce,Company, Song, Revenues,
Average(case when Date between Date -Interval '31' day and Date - Interval '1' Day then Revenues else null End) over (partition by Date,Company,Song,Revenues order by Date) as "Running Average"
From
Base_Table
I get only nulls with every aggregation I tried.
Help is appreciated. Thanks
You can try below -
SELECT
Date, Produce,Company, Song, Revenues,
Average(Revenues) over (partition by Company,Song rows between 30 preceding and current row) as "Running Average"
From
Base_Table

Calculate MAX for value over a relative date range

I am trying to calculate the max of a value over a relative date range. Suppose I have these columns: Date, Week, Category, Value. Note: The Week column is the Monday of the week of the corresponding Date.
I want to produce a table which gives the MAX value within the last two weeks for each Date, Week, Category combination so that the output produces the following: Date, Week, Category, Value, 2WeeksPriorMAX.
How would I go about writing that query? I don't think the following would work:
SELECT Date, Week, Value,
MAX(Value) OVER (PARTITION BY Category
ORDER BY Week
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) as 2WeeksPriorMAX
The above query doesn't account for cases where there are missing values for a given Category, Week combination within the last 2 weeks, and therefore it would span further than 2 weeks when it analyzes the 2 preceding rows.
Left joining or using a lateral join/subquery might be expensive. You can do this with window functions, but you need to have a bit more logic:
select t.*,
(case when lag(date, 1) over (partition by category order by date) < date - interval '2 week'
then value
when lag(date, 2) over (partition by category order by date) < date - interval '2 week'
then max(value) over (partition by category order by date rows between 1 preceding and current row)
else max(value) over (partition by category order by date rows between 2 preceding and current row)
end) as TwoWeekMax
from t;