Calculate MAX for value over a relative date range - sql

I am trying to calculate the max of a value over a relative date range. Suppose I have these columns: Date, Week, Category, Value. Note: The Week column is the Monday of the week of the corresponding Date.
I want to produce a table which gives the MAX value within the last two weeks for each Date, Week, Category combination so that the output produces the following: Date, Week, Category, Value, 2WeeksPriorMAX.
How would I go about writing that query? I don't think the following would work:
SELECT Date, Week, Value,
MAX(Value) OVER (PARTITION BY Category
ORDER BY Week
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) as 2WeeksPriorMAX
The above query doesn't account for cases where there are missing values for a given Category, Week combination within the last 2 weeks, and therefore it would span further than 2 weeks when it analyzes the 2 preceding rows.

Left joining or using a lateral join/subquery might be expensive. You can do this with window functions, but you need to have a bit more logic:
select t.*,
(case when lag(date, 1) over (partition by category order by date) < date - interval '2 week'
then value
when lag(date, 2) over (partition by category order by date) < date - interval '2 week'
then max(value) over (partition by category order by date rows between 1 preceding and current row)
else max(value) over (partition by category order by date rows between 2 preceding and current row)
end) as TwoWeekMax
from t;

Related

PostgreSQL subquery - calculating average of lagged values

I am looking at Sales Rates by month, and was able to query the 1st table. I am quite new to PostgreSQL and am trying to figure out how I can query the second (I had to do the 2nd one in Excel for now)
I have the current Sales Rate and I would like to compare it to the Sales Rate 1 and 2 months ago, as an averaged rate.
I am not asking for an answer how exactly to solve it because this is not the point of getting better, but just for hints for functions to use that are specific to PostgreSQL. What I am trying to calculate is the 2 month average in the 2nd table based on the lagged values of the 2nd table. Thanks!
Here is the query for the 1st table:
with t1 as
(select date,
count(sales)::numeric/count(poss_sales) as SR_1M_before
from data
where date between '2019-07-01' and '2019-11-30'
group by 1),
t2 as
(select date,
count(sales)::numeric/count(poss_sales) as SR_2M_before
from data
where date between '2019-07-01' and '2019-10-31'
group by 1)
select t0.date,
count(t0.sales)::numeric/count(t0.poss_sales) as Sales_Rate
t1.SR_1M_before,
t2.SR_2M_before
from data as t0
left join t1 on t0.date=t1.date
left join t2 on t0.date=t1.date
where date between '2019-07-01' and '2019-12-31'
group by 1,3,4
order by 1;
As commented by a_horse_with_no_name, you can use window functions to take the average of the two previous monthes with a range clause:
select
date,
count(sales)::numeric/count(poss_sales) as Sales_Rate,
avg(count(sales)::numeric/count(poss_sales)) over(
order by date
rows between '2 month' preceding and '1 month' preceding
) Sales_Rate,
count(sales)::numeric/count(poss_sales) as Sales_Rate
- avg(count(sales)::numeric/count(poss_sales)) over(
order by date
rows between '2 month' preceding and '1 month' preceding
) PercentDeviation
from data
where date between '2019-07-01' and '2019-12-31'
group by date
order by date;
Your data is a bit confusing -- it would be less confusing if you had decimal places (that is, 58% being the average of 57% and 58% is not obvious).
Because you want to have NULL values on the first two rows, I'm going to calculate the values using sum() and count():
with q as (
<whatever generates the data you have shown>
)
select q.*,
(sum(sales_rate) over (order by date
rows between 2 preceding and 1 preceding
) /
nullif(count(*) over (order by date
rows between 2 preceding and 1 preceding
)
) as two_month_average
from q;
You could also express this using case and avg():
select q.*,
(case when row_number() over (order by date) > 2)
then avg(sales_rate) over (order by date
rows between 2 preceding and 1 preceding
)
end) as two_month_average
from q;

PostgreSQL: RANGE BETWEEN INTERVAL '10 DAY' AND CURRENT ROW

I have a table which stores, for every item, the daily price. If the price hasn't been updated, there isn't a record for that item on that day.
I need to write a query which retrieves, for every item, the most recent price with a lookback window of 10 days from the current row date otherwise return NULL. I was thinking to achieve that using a RANGE BETWEEN INTERVAL statement. Something like:
SELECT
DATE(datetime),
item_id,
LAST(price) OVER(
PARTITION BY item_id
ORDER BY datetime DESC
RANGE BETWEEN INTERVAL '10 DAYS' AND CURRENT ROW
) AS most_recent_price_within_last_10days
FROM ...
GROUP BY
date,
item_id,
price
Unfortunately this query raises an error:
LINE 20: RANGE BETWEEN INTERVAL '10 DAY' PRECEDING AND CURRENT ROW
^
I came across an old blog post saying such operation is not possible in Postgres. Is this still accurate?
You could use ROW_NUMBER() to pull out the most recent record within the last 10 days for each item:
SELECT *
FROM (
SELECT
DATE(datetime),
item_id,
price AS most_recent_price_within_last_10days,
ROW_NUMBER() OVER(PARTITION BY item_id ORDER BY datetime DESC) rn
FROM ...
WHERE datetime > NOW() - INTERVAL '10 DAY'
) x WHERE rn = 1
In the subquery, the WHERE clause does the filtering on the date range; ROW_NUMBER() assigns a rank to each record within groups of records having the same item_id, with the most recent record first. Then, the outer query just filters in records having row number 1.
One method is to use LAG() and some comparison:
(CASE WHEN LAG(datetime) OVER (PARTITION BY item_id ORDER BY datetime) > datetime - interval '10 days'
THEN LAG(price) OVER (PARTITION BY item_id ORDER BY datetime)
END) as most_recent_price_within_last_10days
That is, the price you are looking for is on the preceding row. The only question is whether the date on that row is recent enough.

Subtract the value of the most recent row with the value of the previous row (day -1)

I have a table with incremental value for each day. I'd like to subtract the value of the most recent row with the value of the previous row (day -1)
For example, this would be perfect :
SUM(value) OVER (PARTITION BY item_name ORDER BY date ROWS BETWEEN 1 PRECEDING AND CURRENT ROW)
However, I would need to apply a DIFF function instead of a SUM function.
Simply use lag():
select val - lag(val) over (partition by item_name order by date)

Last value from previous quarter, given that it's date is less than or equal the current rows date - SQL

I have the following table data:
I generated the last two columns with the following:
SELECT PublishDate, QuarterEndDate, Value, FiscalYear, FiscalQuarter,
FIRST_VALUE(Value) OVER(PARTITION BY FiscalYear ORDER BY FiscalQuarter, PublishDate
ROWS 1 PRECEDING) as LAST_VAL,
Value - FIRST_VALUE(Value) OVER(PARTITION BY FiscalYear ORDER BY FiscalQuarter,PublishDate
ROWS 1 PRECEDING) as QTR_DIFF
FROM tabledata
I am trying to calculate what the differences are between quarters given that the information was published.
Basically, I want to calculate the current row value minus the last value of the previous fiscal quarter (in the same fiscal year) given that its PublishDate is less than or equal to the current rows PublishDate.
If it is the first quarter than the first quarter numbers should be retained without any change.
In the above figure, the highlighted rows show a couple of the problems:
1) The zero value for the first quarter even though it should be the values itself (i.e. 19461)
2) The preceding row is taking the previous row not the previous quarter - its taking the last value as the preceding row's value - not the last value from the quarter given that the publish date is less than or equal to it.
Any help would be greatly appreciated... Thanks!
I have not tested this yet but looking at your code you are partitioning by the year only, I wonder if this will work:
SELECT PublishDate, QuarterEndDate, Value, FiscalYear, FiscalQuarter,
FIRST_VALUE(Value) OVER(PARTITION BY FiscalYear,FiscalQuarter ORDER BY FiscalQuarter, PublishDate
ROWS 1 PRECEDING) as LAST_VAL,
Value - FIRST_VALUE(Value) OVER(PARTITION BY FiscalYear,FiscalQuarter ORDER BY FiscalQuarter,PublishDate
ROWS 1 PRECEDING) as QTR_DIFF
FROM tabledata

How can I select one row for each week in a date range that spans more than a year?

In my postgreSQL data base, I have a table with columns of dates and prices. ('transdate' and 'price')
I would like to form a query which selects one row for each week over a date range which spans more than one year.
From another question/answer here, I implemented this code which works for date ranges of less than a year:
;with cte as
(
select *,
row_number() over (partition by Extract (week from transdate) order by transdate desc) as rn
from "tablename" where transdate between '06-01-1999' and '06-01-1999'::timestamp + `'50 week'::interval
)
select transdate, price from cte where rn = 1 order by transdate;
However, when I extend the interval greater than 50 weeks, it still only selects a max of 12 months.
How can I re-write this code to select one date/price from every week in the range?
Your problem is that week numbers wrap around at year boundaries but you want to look at the week number and the year at the same time. Lucky for you, you can PARTITION BY several things at once:
row_number() over (
partition by extract(week from transdate),
extract(year from transdate)
order by transdate desc
) as rn