How to calculate a dynamic average value between rows? - sql

How to calculate a dynamic average value between rows?
first 12 months status_flag is going to be N and from 13th month onward we need to take the average of sales for first 13 rows and compare it with min and max values and if it lies in between min and max then set the status_flag as Y else set it as N.
Same for 14th row take the average of first 14 rows and compare it with min and max... and so on.
How to do this?

I think the challenging part is to get the average sales. You can use the Analytic Functions:
select Storeid, Months, Min, Max, sales,
avg(sales) over (order by Months RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as avg_sales
from your_table;
The rest should be easier. Note, RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW is the default, so you can just skip it.
with a as
(select Storeid, Months, Min, Max, sales,
avg(sales) over (order by Months) as avg_sales
from your_table)
select Storeid, Months, Min, Max, sales, avg_sales,
case
when Months <= 12 then 'N'
else
case
when avg_sales between Min and Max then 'Y'
else 'N'
end
end as Status_flag
from a;

Update table t set status_flag =
case when
(Select count(*)
From table
where month <= t.Month) > 12
and
(select avg(sales)
from table
where Month <= t.Month)
Between Min and Max
then 'Y' else 'N' end

Related

Getting Average for Weekdays and Weekends within 30 days in BQ

I'm trying to get the average repairs in the weekdays and weekends within the last 30 days. Each day is tagged whether it's a weekday or a weekend. Holidays are tagged as weekends.
If I use:
AVG(Completed_Repairs) OVER(PARTITION BY day_type ORDER BY UNIX_DATE(WORK_DT) RANGE BETWEEN 30 PRECEDING AND CURRENT ROW)
I only get either the average repairs for all weekdays or for all weekends in the last 30 days depending on what type of day the date is. But I also need the average for the opposite to compute a prorated monthly number. I basically would need another column with the value of the opposite day type.
If I understood correctly, not partitioning might be the way:
with
input as (
select cast('2022-10-11' as date) as WORK_DT, "weekday" as day_type, 307 as completed_repairs union all
select cast('2022-10-12' as date) as WORK_DT, "weekday" as day_type, 100 as completed_repairs union all
select cast('2022-10-09' as date) as WORK_DT, "weekend" as day_type, 750 as completed_repairs union all
select cast('2022-10-10' as date) as WORK_DT, "weekend" as day_type, 647 as completed_repairs
)
select
*,
avg(if(day_type = 'weekday', completed_repairs,0)) OVER(ORDER BY UNIX_DATE(WORK_DT) RANGE BETWEEN 30 PRECEDING AND CURRENT ROW) as avg_weekday,
avg(if(day_type = 'weekend', completed_repairs,0)) OVER(ORDER BY UNIX_DATE(WORK_DT) RANGE BETWEEN 30 PRECEDING AND CURRENT ROW) as avg_weekend,
from input
order by work_dt
You can replace the 0 by null if you don't want the weekends to impact the average of the weekdays and vice-versa.
If you'd rather have a column "matching" and a column "opposite" you can then use the result of this to write a condition depending on the day_type and the column name.

A cumulative sum of consecutive workdays that resets to 1 when consecutive days = 0, per ID

I have 3 columns:
Employee ID(numerical)
Day of work(a date yyyy-mm-dd when employee had a shift)
is_consecutive_work_day (1 if days of work are consecutive, else 0)
I need a 4th: Consecutive_work_days (a cumulative sum of is_consecutive_work_day, which resets to 1 when is_consecutive_work_day = 0). So this will go to a maximum of 5 for any employee id. Some will have 1,2,3 others 1,2...etc.
What am failing to figure out is how to write the 4th column (consecutive_work_days). Not how to write a consecutive sum per employee id, but specifically how to reset to 1 when is_consecutive_work_day = 0 per employee id.
May I ask for your help regarding this 4th column please? Thanks.
You can use window functions. lag() lets you access the previous day_of_work for the same employee, which you can compare to the current day_of_work: if there is a one day difference, then you can set is_consecutive_work_day to 1.
select
employee_id,
day_of_work,
case
when day_of_work
= lag(day_of_work) over(partition by employee_id order by day_of_work)
+ interval 1 day
then 1
else 0
end is_consecutive_work_day
from mytable
To compute the cumulative sum, it is a bit more complicated. We can use some gaps-and-island technique to put each record in the group it belongs to: basically, everytime is_consecutive_work_day of 0 is met, a new group starts; we can then do a window sum() over each group:
select
employee_id,
day_of_work,
is_consecutive_work_day,
sum(is_consecutive_work_day)
over(partition by employee_id, grp order by day_of_work)
consecutive_work_days
from (
select
t.*,
sum(1 - is_consecutive_work_day) over(partition by employee_id order by day_of_work) grp
from (
select
t.*,
case
when day_of_work
= lag(day_of_work) over(partition by employee_id order by day_of_work)
+ interval 1 day
then 1
else 0
end is_consecutive_work_day
from mytable t
) t
) t
Although this seem like a gap-and-islands problem, there is a simpler solution. Simply calculate the maximum previous value that is 0 and take the date difference.
The only caveat is if there is none.
That would be:
select t.*,
datediff(day_of_work,
coalesce(max(case when is_consecutive_work_day = 0 then day_of_work end) over (partition by employee_id),
date_add(min(day_of_work) partition by employee_id), 1)
)
) as fourth_column
from t;

Average over rolling date period

I have 4 dimensions, which one of them is date. I need to calculate for each date, the average in the last 30 days, per each dimension value.
I have tried to run average over a partition by the 4 dimensions in a form of:
SELECT
Date, Produce,Company, Song, Revenues,
Average(case when Date between Date -Interval '31' day and Date - Interval '1' Day then Revenues else null End) over (partition by Date,Company,Song,Revenues order by Date) as "Running Average"
From
Base_Table
I get only nulls with every aggregation I tried.
Help is appreciated. Thanks
You can try below -
SELECT
Date, Produce,Company, Song, Revenues,
Average(Revenues) over (partition by Company,Song rows between 30 preceding and current row) as "Running Average"
From
Base_Table

Postgres SQL: Sum of ids greater than a day, computed day by day over a series

Looking to compute a moving sum day by day over a date range. i.e. Looking to sum all values greater than or equal to the date but do it row by row. I know that a window function is needed, but need some help with the actual function.
** I need to compute the sum greater than each date in a row. Notice on 2017-08-02 I do not count the value from the day before
Example data:
2017-08-1, 1
2017-08-2, 5
2017-08-3, 4
2017-08-4, 3
2017-08-5, 2
Desired Result:
2017-08-1, 15
2017-08-2, 14
2017-08-3, 9
2017-08-4, 5
2017-08-5, 2
Here is what I have to produce this data.
SELECT DATE_TRUNC('day', created_at),
COUNT(*)
FROM table
GROUP BY 1
ORDER BY 1 DESC
Just use cumulative sums:
SELECT DATE_TRUNC('day', created_at),
COUNT(*),
SUM(COUNT(*)) OVER (ORDER BY DATE_TRUNC('day', created_at) DESC) as sum_greater_than
FROM table
GROUP BY 1
ORDER BY 1 DESC;

Calculate MAX for value over a relative date range

I am trying to calculate the max of a value over a relative date range. Suppose I have these columns: Date, Week, Category, Value. Note: The Week column is the Monday of the week of the corresponding Date.
I want to produce a table which gives the MAX value within the last two weeks for each Date, Week, Category combination so that the output produces the following: Date, Week, Category, Value, 2WeeksPriorMAX.
How would I go about writing that query? I don't think the following would work:
SELECT Date, Week, Value,
MAX(Value) OVER (PARTITION BY Category
ORDER BY Week
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) as 2WeeksPriorMAX
The above query doesn't account for cases where there are missing values for a given Category, Week combination within the last 2 weeks, and therefore it would span further than 2 weeks when it analyzes the 2 preceding rows.
Left joining or using a lateral join/subquery might be expensive. You can do this with window functions, but you need to have a bit more logic:
select t.*,
(case when lag(date, 1) over (partition by category order by date) < date - interval '2 week'
then value
when lag(date, 2) over (partition by category order by date) < date - interval '2 week'
then max(value) over (partition by category order by date rows between 1 preceding and current row)
else max(value) over (partition by category order by date rows between 2 preceding and current row)
end) as TwoWeekMax
from t;