Is there a way to find the highest 90 day total value for any 90 day period in the last year? - sql

I'm thinking the only way to do it is to sum the values between (today - 365) and (today -65 + 90) then move on by 1 day each time, but that would be impractical. Is there a way around it?

If you have one row on each day:
select top (1) t.*
from (select t.*, sum(x) over (order by date rows between 89 preceding and current row) as sum_90
from t
) t
order by sum_90 desc;

Related

How do I use SQL to perform a cumulative sum where the increments have an expiration?

Say the scenario is this:
I have a database of student infractions. When a student is late to class, or misses a homework assignment they get an infraction.
student_id
infraction_type
day
1
tardy
0
2
missed_assignment
0
1
tardy
29
2
missed_assignment
15
1
tardy
99
2
missed_assignment
29
The school has three strike system, at each infraction disciplinary action is taken. Call them D0,D1,D2.
Infractions expire after 30 days.
I want to be able to perform a query to calculate the total counts of disciplinary actions taken in a given time period.
So the number of disciplinary actions taken in the last 100 days (at day 99) would be
disciplinary_action
count
D0
3
D1
2
D2
1
A table generated showing the disciplinary actions taken would look like:
student_id
infraction_type
day
disciplinary_action_gen
1
tardy
0
D0
2
missed_assignment
0
D0
1
tardy
29
D1
2
missed_assignment
15
D1
1
tardy
99
D0
2
missed_assignment
29
D2
What SQL query could I use to do such a cumulative sum?
You can solve your problem by checking in the following order:
if <30 days have passed from the last two infractions, assign D2
if <30 days have passed from last infraction, assign D1
assign D0 (given its the first infraction)
This will work assuming your DBMS supports the tools used for this solution, namely:
the CASE expression, to conditionally assign infraction values
the LAG window function, to retrieve the previous "day" values
SELECT *,
CASE WHEN day - LAG(day,2) OVER(PARTITION BY student_id
ORDER BY day ) < 30 THEN 'D2'
WHEN day - LAG(day,1) OVER(PARTITION BY student_id
ORDER BY day ) < 30 THEN 'D1'
ELSE 'D0'
END AS disciplinary_action_gen
FROM tab
Check a MySQL demo here.
A similar approach using COUNT() as a window function and a frame definition -
SELECT
*,
CONCAT(
'D',
LEAST(
3,
COUNT(*) OVER (
PARTITION BY student_id
ORDER BY day ASC
RANGE BETWEEN 30 PRECEDING AND CURRENT ROW
)
) - 1
) AS disciplinary_action_gen
FROM infractions;
The frame definition (RANGE BETWEEN 30 PRECEDING AND CURRENT ROW) tells the server that we want to include all rows with a day value between (current row's value of day - 30) and (the current row's value of day). So, if the current row has a day value of 99, the count will be for all rows in the partition with a day value between 69 and 99.
To get the disciplinary counts, we can simply wrap this in a normal GROUP BY -
SELECT disciplinary_action, COUNT(*) AS count
FROM (
SELECT
CONCAT(
'D',
LEAST(
3,
COUNT(*) OVER (
PARTITION BY student_id
ORDER BY day ASC
RANGE BETWEEN 30 PRECEDING AND CURRENT ROW
)
) - 1
) AS disciplinary_action
FROM infractions
) t
GROUP BY disciplinary_action;
If your infractions are stored with a date, as opposed to the days in your example, this can be easily updated to use a date interval in the frame definition. And, if looking at counts of disciplinary actions in the last 100 days we need to include the previous 30 days, as these could impact the action (D0, D1 or D2) on the first day we are interested in.
SELECT disciplinary_action, COUNT(*) AS count
FROM (
SELECT
`date`,
CONCAT(
'D',
LEAST(
3,
COUNT(*) OVER (
PARTITION BY student_id
ORDER BY `date` ASC
RANGE BETWEEN INTERVAL 30 DAY PRECEDING AND CURRENT ROW
)
) - 1
) AS disciplinary_action
FROM infractions
WHERE `date` >= CURRENT_DATE - INTERVAL 130 DAY
) t
WHERE `date` >= CURRENT_DATE - INTERVAL 100 DAY
GROUP BY disciplinary_action;
Here's a db<>fiddle

Getting Average for Weekdays and Weekends within 30 days in BQ

I'm trying to get the average repairs in the weekdays and weekends within the last 30 days. Each day is tagged whether it's a weekday or a weekend. Holidays are tagged as weekends.
If I use:
AVG(Completed_Repairs) OVER(PARTITION BY day_type ORDER BY UNIX_DATE(WORK_DT) RANGE BETWEEN 30 PRECEDING AND CURRENT ROW)
I only get either the average repairs for all weekdays or for all weekends in the last 30 days depending on what type of day the date is. But I also need the average for the opposite to compute a prorated monthly number. I basically would need another column with the value of the opposite day type.
If I understood correctly, not partitioning might be the way:
with
input as (
select cast('2022-10-11' as date) as WORK_DT, "weekday" as day_type, 307 as completed_repairs union all
select cast('2022-10-12' as date) as WORK_DT, "weekday" as day_type, 100 as completed_repairs union all
select cast('2022-10-09' as date) as WORK_DT, "weekend" as day_type, 750 as completed_repairs union all
select cast('2022-10-10' as date) as WORK_DT, "weekend" as day_type, 647 as completed_repairs
)
select
*,
avg(if(day_type = 'weekday', completed_repairs,0)) OVER(ORDER BY UNIX_DATE(WORK_DT) RANGE BETWEEN 30 PRECEDING AND CURRENT ROW) as avg_weekday,
avg(if(day_type = 'weekend', completed_repairs,0)) OVER(ORDER BY UNIX_DATE(WORK_DT) RANGE BETWEEN 30 PRECEDING AND CURRENT ROW) as avg_weekend,
from input
order by work_dt
You can replace the 0 by null if you don't want the weekends to impact the average of the weekdays and vice-versa.
If you'd rather have a column "matching" and a column "opposite" you can then use the result of this to write a condition depending on the day_type and the column name.

Extract previous row calculated value for use in current row calculations - Postgres

Have a requirement where I would need to rope the calculated value of the previous row for calculation in the current row.
The following is a sample of how the data currently looks :-
ID
Date
Days
1
2022-01-15
30
2
2022-02-18
30
3
2022-03-15
90
4
2022-05-15
30
The following is the output What I am expecting :-
ID
Date
Days
CalVal
1
2022-01-15
30
2022-02-14
2
2022-02-18
30
2022-03-16
3
2022-03-15
90
2022-06-14
4
2022-05-15
30
2022-07-14
The value of CalVal for the first row is Date + Days
From the second row onwards it should take the CalVal value of the previous row and add it with the current row Days
Essentially, what I am looking for is means to access the previous rows calculated value for use in the current row.
Is there anyway we can achieve the above via Postgres SQL? I have been tinkering with window functions and even recursive CTEs but have had no luck :(
Would appreciate any direction!
Thanks in advance!
select
id,
date,
coalesce(
days - (lag(days, 1) over (order by date, days))
, days) as days,
first_date + cast(days as integer) as newdate
from
(
select
-- get a running sum of days
id,
first_date,
date,
sum(days) over (order by date, days) as days
from
(
select
-- get the first date
id,
(select min(date) from table1) as first_date,
date,
days
from
table1
) A
) B
This query get the exact output you described. I'm not at all ready to say it is the best solution but the strategy employed is to essential create a running total of the "days" ... this means that we can just add this running total to the first date and that will always be the next date in the desired sequence. One finesse: to put the "days" back into the result, we calculated the current running total less the previous running total to arrive at the original amount.
assuming that table name is table1
select
id,
date,
days,
first_value(date) over (order by id) +
(sum(days) over (order by id rows between unbounded preceding and current row))
*interval '1 day' calval
from table1;
We just add cumulative sum of days to first date in table. It's not really what you want to do (we don't need date from previous row, just cumulative days sum)
Solution with recursion
with recursive prev_row as (
select id, date, days, date+ days*interval '1 day' calval
from table1
where id = 1
union all
select t.id, t.date, t.days, p.calval + t.days*interval '1 day' calval
from prev_row p
join table1 t on t.id = p.id+ 1
)
select *
from prev_row

How to calculate moving average value using every nth row (e.g. 24th,48th and 72nd) in sql?

Here is the snip of my database I want to calculate average energy consumption for the last three days in the exact hour. So if I have consumption at 24.10.2016. 10h, I want to add column with average consumption for the last three days at the same hour, so for 23.10.2016. 10h, 22.10.2016. 10h and 21.10.2016. 10h. My records are measured every hour, so in order to calculate this average I have to look at every 24th row and haven't found any way. How can I modify my query to get what I want:
select avg(consumption) over (order by entry_date rows between 72
preceding and 24 preceding) from my_data;
Or is there some other way?
Maybe try this one:
select entry_date, EXTRACT(HOUR FROM entry_date),
avg(consumption) over (PARTITION BY EXTRACT(HOUR FROM entry_date)
order by entry_date rows between 72 preceding and 24 preceding)
from my_data;
and you may use RANGE BETWEEN INTERVAL '72' HOUR PRECEDING AND INTERVAL '24' HOUR PRECEDING instead of ROWS. This covers situation when you have gaps or duplicate time values.
I think you can do this another way by using filters.
Select avg(consumption) from my_data
where
entry_date between #StartDate and #EndDate
and datepart(HOUR, entry_date)=#hour
If you're on MySQL
Select avg(consumption) from my_data
where
entry_date between #StartDate and #EndDate
and HOUR(entry_date)=#hour

Calculate MAX for value over a relative date range

I am trying to calculate the max of a value over a relative date range. Suppose I have these columns: Date, Week, Category, Value. Note: The Week column is the Monday of the week of the corresponding Date.
I want to produce a table which gives the MAX value within the last two weeks for each Date, Week, Category combination so that the output produces the following: Date, Week, Category, Value, 2WeeksPriorMAX.
How would I go about writing that query? I don't think the following would work:
SELECT Date, Week, Value,
MAX(Value) OVER (PARTITION BY Category
ORDER BY Week
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) as 2WeeksPriorMAX
The above query doesn't account for cases where there are missing values for a given Category, Week combination within the last 2 weeks, and therefore it would span further than 2 weeks when it analyzes the 2 preceding rows.
Left joining or using a lateral join/subquery might be expensive. You can do this with window functions, but you need to have a bit more logic:
select t.*,
(case when lag(date, 1) over (partition by category order by date) < date - interval '2 week'
then value
when lag(date, 2) over (partition by category order by date) < date - interval '2 week'
then max(value) over (partition by category order by date rows between 1 preceding and current row)
else max(value) over (partition by category order by date rows between 2 preceding and current row)
end) as TwoWeekMax
from t;