Redshift - Max window function with Condition - sql

I'm trying to get the max value of a specific field in the last 3 months, last 6 months and max since inception with respect to a specific date and partitioned by part_id.
For max since inception, I used, below query.
select
part_id,
date_field,
MAX(val) OVER(partition by part_id order by date_field rows unbounded preceding) as max_since_inception
FROM my_table;
How do I add a condition to get the max value only in the last 3 months of my date_field?
Eg. if date_field is 2020-09-25,
max_l3m must have max value between 2020-06-25 and 2020-09-25;
max_l6m must have max value between 2020-03-25 and 2020-09-25;
max_since_inception must have max_value since inception till 2020-09-25
and partitioned by part_id

Redshift doesn't support window frames, so you are stuck with a self join or some other complicated construct:
select t.part_id, t.date_field,
max(case when tprev.date_field > t.date_field - interval '3 month' then value end) as max_l3m,
max(case when tprev.date_field > t.date_field - interval '6 month' then value end) as max_l6m
from t join
t tprev
on tprev.part_id = t.part_id and tprev.date_field <= t.date_field;
You might also want to limit the lookback period to 6 months, if that is the longest timeframe you really need.

Related

Extract previous row calculated value for use in current row calculations - Postgres

Have a requirement where I would need to rope the calculated value of the previous row for calculation in the current row.
The following is a sample of how the data currently looks :-
ID
Date
Days
1
2022-01-15
30
2
2022-02-18
30
3
2022-03-15
90
4
2022-05-15
30
The following is the output What I am expecting :-
ID
Date
Days
CalVal
1
2022-01-15
30
2022-02-14
2
2022-02-18
30
2022-03-16
3
2022-03-15
90
2022-06-14
4
2022-05-15
30
2022-07-14
The value of CalVal for the first row is Date + Days
From the second row onwards it should take the CalVal value of the previous row and add it with the current row Days
Essentially, what I am looking for is means to access the previous rows calculated value for use in the current row.
Is there anyway we can achieve the above via Postgres SQL? I have been tinkering with window functions and even recursive CTEs but have had no luck :(
Would appreciate any direction!
Thanks in advance!
select
id,
date,
coalesce(
days - (lag(days, 1) over (order by date, days))
, days) as days,
first_date + cast(days as integer) as newdate
from
(
select
-- get a running sum of days
id,
first_date,
date,
sum(days) over (order by date, days) as days
from
(
select
-- get the first date
id,
(select min(date) from table1) as first_date,
date,
days
from
table1
) A
) B
This query get the exact output you described. I'm not at all ready to say it is the best solution but the strategy employed is to essential create a running total of the "days" ... this means that we can just add this running total to the first date and that will always be the next date in the desired sequence. One finesse: to put the "days" back into the result, we calculated the current running total less the previous running total to arrive at the original amount.
assuming that table name is table1
select
id,
date,
days,
first_value(date) over (order by id) +
(sum(days) over (order by id rows between unbounded preceding and current row))
*interval '1 day' calval
from table1;
We just add cumulative sum of days to first date in table. It's not really what you want to do (we don't need date from previous row, just cumulative days sum)
Solution with recursion
with recursive prev_row as (
select id, date, days, date+ days*interval '1 day' calval
from table1
where id = 1
union all
select t.id, t.date, t.days, p.calval + t.days*interval '1 day' calval
from prev_row p
join table1 t on t.id = p.id+ 1
)
select *
from prev_row

Rank with Condition in Oracle

I'm looking for a query that will calculate rank based on a condition, as following:
The "rank index" is the calculation I'm looking for - if the difference between previous time and current time is less than 6 hours, than the counter should remain the same. If more than 6 hours have past than promote the index by 1.
Any ideas?
Based on your explanation, the last value should be 4 not 5.
Use lag() and a cumulative sum. Assuming the datetime column is stored as a date:
select t.*,
sum(case when prev_datetime > datetime - interval '6' hour then 0 else 1 end) over
(order by datetime) as rank_index
from (select t.*,
lag(datetime) over (order by datetime) as prev_datetime
from t
) t;
Note: If you want this for each key1/key2 combination, then you want to include partition by key1, key2 in the window specifications.

PostgreSQL subquery - calculating average of lagged values

I am looking at Sales Rates by month, and was able to query the 1st table. I am quite new to PostgreSQL and am trying to figure out how I can query the second (I had to do the 2nd one in Excel for now)
I have the current Sales Rate and I would like to compare it to the Sales Rate 1 and 2 months ago, as an averaged rate.
I am not asking for an answer how exactly to solve it because this is not the point of getting better, but just for hints for functions to use that are specific to PostgreSQL. What I am trying to calculate is the 2 month average in the 2nd table based on the lagged values of the 2nd table. Thanks!
Here is the query for the 1st table:
with t1 as
(select date,
count(sales)::numeric/count(poss_sales) as SR_1M_before
from data
where date between '2019-07-01' and '2019-11-30'
group by 1),
t2 as
(select date,
count(sales)::numeric/count(poss_sales) as SR_2M_before
from data
where date between '2019-07-01' and '2019-10-31'
group by 1)
select t0.date,
count(t0.sales)::numeric/count(t0.poss_sales) as Sales_Rate
t1.SR_1M_before,
t2.SR_2M_before
from data as t0
left join t1 on t0.date=t1.date
left join t2 on t0.date=t1.date
where date between '2019-07-01' and '2019-12-31'
group by 1,3,4
order by 1;
As commented by a_horse_with_no_name, you can use window functions to take the average of the two previous monthes with a range clause:
select
date,
count(sales)::numeric/count(poss_sales) as Sales_Rate,
avg(count(sales)::numeric/count(poss_sales)) over(
order by date
rows between '2 month' preceding and '1 month' preceding
) Sales_Rate,
count(sales)::numeric/count(poss_sales) as Sales_Rate
- avg(count(sales)::numeric/count(poss_sales)) over(
order by date
rows between '2 month' preceding and '1 month' preceding
) PercentDeviation
from data
where date between '2019-07-01' and '2019-12-31'
group by date
order by date;
Your data is a bit confusing -- it would be less confusing if you had decimal places (that is, 58% being the average of 57% and 58% is not obvious).
Because you want to have NULL values on the first two rows, I'm going to calculate the values using sum() and count():
with q as (
<whatever generates the data you have shown>
)
select q.*,
(sum(sales_rate) over (order by date
rows between 2 preceding and 1 preceding
) /
nullif(count(*) over (order by date
rows between 2 preceding and 1 preceding
)
) as two_month_average
from q;
You could also express this using case and avg():
select q.*,
(case when row_number() over (order by date) > 2)
then avg(sales_rate) over (order by date
rows between 2 preceding and 1 preceding
)
end) as two_month_average
from q;

Calculate MAX for value over a relative date range

I am trying to calculate the max of a value over a relative date range. Suppose I have these columns: Date, Week, Category, Value. Note: The Week column is the Monday of the week of the corresponding Date.
I want to produce a table which gives the MAX value within the last two weeks for each Date, Week, Category combination so that the output produces the following: Date, Week, Category, Value, 2WeeksPriorMAX.
How would I go about writing that query? I don't think the following would work:
SELECT Date, Week, Value,
MAX(Value) OVER (PARTITION BY Category
ORDER BY Week
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) as 2WeeksPriorMAX
The above query doesn't account for cases where there are missing values for a given Category, Week combination within the last 2 weeks, and therefore it would span further than 2 weeks when it analyzes the 2 preceding rows.
Left joining or using a lateral join/subquery might be expensive. You can do this with window functions, but you need to have a bit more logic:
select t.*,
(case when lag(date, 1) over (partition by category order by date) < date - interval '2 week'
then value
when lag(date, 2) over (partition by category order by date) < date - interval '2 week'
then max(value) over (partition by category order by date rows between 1 preceding and current row)
else max(value) over (partition by category order by date rows between 2 preceding and current row)
end) as TwoWeekMax
from t;

How do I get a rolling view of counts?

Goal is to count anyone who fits a criteria on three months back from specified date. The (BetweenDate -3 months) is the tricky part. I am operating within a yearly window not 3 months back from getDate() I need it to be three months back from within -3 months of Y. Any ideas?
CREATE TABLE MONTH3LOOK AS Select
to_CHAR(DATE_OF_SERVICE_3013,'YYYY-MM') "Date"
,COUNT(DISTINCT case when (regexp_instr(IS_CONCAT,'(2957|29570|29571|29572|29573|29574|29575|29576|29577|29578|29579)')>0)
and
(DATE_OF_SERVICE_3013 between trunc(DATE_OF_SERVICE_3013,'MM') and add_months(trunc(DATE_OF_SERVICE_3013,'MM'),-3))
then USER end) AS Recip
FROM .NET_SERVICE
WHERE DATE_OF_SERVICE_3013 BETWEEN
TO_DATE('2013-10','YYYY-MM') AND
TO_DATE('2014-03','YYYY-MM')
group by to_CHAR(DATE_OF_SERVICE_3013,'YYYY-MM')
You will likely need to use analytic functions to get your counts and the distinct operator to simulate the group by since including the group by operator interferes with the operation of the analytic functions:
select distinct trunc(date_of_service_3013,'MM') "Date"
, count(case when regexp_like(IS_CONCAT, '(1234|5678|etc)') then user end)
over (order by trunc(date_of_service_3013, 'mm')
range between interval '3' month preceding
and current row) recip
from your_table
where DATE_OF_SERVICE_3013 BETWEEN TO_DATE('2013-10','YYYY-MM')
AND TO_DATE('2014-03','YYYY-MM');
Another way to take the effect of the group by operation into account is to change use both analytic and aggregate functions:
select trunc(date_of_service_3013,'MM') "Date"
, sum(count(case when regexp_like(IS_CONCAT, '1234|5678|etc') then user end))
over (order by trunc(date_of_service_3013, 'mm')
range between interval '3' month preceding
and current row) recip
from your_table
group by trunc(date_of_service_3013,'MM')
where DATE_OF_SERVICE_3013 BETWEEN TO_DATE('2013-10','YYYY-MM')
AND TO_DATE('2014-03','YYYY-MM');
Here the aggregate count works on a month by month basis as per the group by clause, then it uses the analytic sum to add up those counts.
One thing about these two solutions, the where clause will prevent any records prior to 2013-10 from being counted. If you want to include records prior to 2013-10 in the counts but only output 2013-10 to 2014-03 then you'll need to do it in two stages using either of the two queries above inside the with t1 as (...) subfactored query block with the starting date adjusted appropriately:
with t1 as (
select distinct trunc(date_of_service_3013,'MM') "Date"
, count(case when regexp_like(IS_CONCAT, '1234|5678|etc') then user end)
over (order by trunc(date_of_service_3013, 'mm')
range between interval '3' month preceding
and current row) recip
from your_table
where DATE_OF_SERVICE_3013 BETWEEN TO_DATE('2013-07','YYYY-MM')
AND TO_DATE('2014-03','YYYY-MM')
)
select * from t1 where "Date" >= TO_DATE('2013-10','YYYY-MM');