PostgreSQL: RANGE BETWEEN INTERVAL '10 DAY' AND CURRENT ROW - sql

I have a table which stores, for every item, the daily price. If the price hasn't been updated, there isn't a record for that item on that day.
I need to write a query which retrieves, for every item, the most recent price with a lookback window of 10 days from the current row date otherwise return NULL. I was thinking to achieve that using a RANGE BETWEEN INTERVAL statement. Something like:
SELECT
DATE(datetime),
item_id,
LAST(price) OVER(
PARTITION BY item_id
ORDER BY datetime DESC
RANGE BETWEEN INTERVAL '10 DAYS' AND CURRENT ROW
) AS most_recent_price_within_last_10days
FROM ...
GROUP BY
date,
item_id,
price
Unfortunately this query raises an error:
LINE 20: RANGE BETWEEN INTERVAL '10 DAY' PRECEDING AND CURRENT ROW
^
I came across an old blog post saying such operation is not possible in Postgres. Is this still accurate?

You could use ROW_NUMBER() to pull out the most recent record within the last 10 days for each item:
SELECT *
FROM (
SELECT
DATE(datetime),
item_id,
price AS most_recent_price_within_last_10days,
ROW_NUMBER() OVER(PARTITION BY item_id ORDER BY datetime DESC) rn
FROM ...
WHERE datetime > NOW() - INTERVAL '10 DAY'
) x WHERE rn = 1
In the subquery, the WHERE clause does the filtering on the date range; ROW_NUMBER() assigns a rank to each record within groups of records having the same item_id, with the most recent record first. Then, the outer query just filters in records having row number 1.

One method is to use LAG() and some comparison:
(CASE WHEN LAG(datetime) OVER (PARTITION BY item_id ORDER BY datetime) > datetime - interval '10 days'
THEN LAG(price) OVER (PARTITION BY item_id ORDER BY datetime)
END) as most_recent_price_within_last_10days
That is, the price you are looking for is on the preceding row. The only question is whether the date on that row is recent enough.

Related

IF in sql to choose which values to select

I am trying to use an IF or CASE statement in sql to choose when to select a value in a column. Essentially I have some data in a table like so:
My goal is to see which items are ordered multiple weeks in a row by the same customer. I have 1 month of dates, but I can do 7 separate queries with 1 query for each day of the week. I'm trying to do something like:
Select item, date, customer, truck
If customer, item combo appears in multiple weeks
Please let me know if you have any idea how I can do this!
Assuming you have at most one row per week per customer and item (as in the sample data), you can use lead() and lag(). The following assumes that you mean exactly 7 days apart:
select t.*
from (select t.*,
lag(orderdate) over (partition by customer, itemid order by orderdate) as prev_orderdate,
lead(orderdate) over (partition by customer, itemid order by orderdate) as next_orderdate
from t
) t
where prev_orderdate = orderdate - interval '7 day' or
next_order_date = orderdate + interval '7 day';
Note that date/time functionality is highly database dependent, so you might have to adjust for your database functions.

PostgreSQL subquery - calculating average of lagged values

I am looking at Sales Rates by month, and was able to query the 1st table. I am quite new to PostgreSQL and am trying to figure out how I can query the second (I had to do the 2nd one in Excel for now)
I have the current Sales Rate and I would like to compare it to the Sales Rate 1 and 2 months ago, as an averaged rate.
I am not asking for an answer how exactly to solve it because this is not the point of getting better, but just for hints for functions to use that are specific to PostgreSQL. What I am trying to calculate is the 2 month average in the 2nd table based on the lagged values of the 2nd table. Thanks!
Here is the query for the 1st table:
with t1 as
(select date,
count(sales)::numeric/count(poss_sales) as SR_1M_before
from data
where date between '2019-07-01' and '2019-11-30'
group by 1),
t2 as
(select date,
count(sales)::numeric/count(poss_sales) as SR_2M_before
from data
where date between '2019-07-01' and '2019-10-31'
group by 1)
select t0.date,
count(t0.sales)::numeric/count(t0.poss_sales) as Sales_Rate
t1.SR_1M_before,
t2.SR_2M_before
from data as t0
left join t1 on t0.date=t1.date
left join t2 on t0.date=t1.date
where date between '2019-07-01' and '2019-12-31'
group by 1,3,4
order by 1;
As commented by a_horse_with_no_name, you can use window functions to take the average of the two previous monthes with a range clause:
select
date,
count(sales)::numeric/count(poss_sales) as Sales_Rate,
avg(count(sales)::numeric/count(poss_sales)) over(
order by date
rows between '2 month' preceding and '1 month' preceding
) Sales_Rate,
count(sales)::numeric/count(poss_sales) as Sales_Rate
- avg(count(sales)::numeric/count(poss_sales)) over(
order by date
rows between '2 month' preceding and '1 month' preceding
) PercentDeviation
from data
where date between '2019-07-01' and '2019-12-31'
group by date
order by date;
Your data is a bit confusing -- it would be less confusing if you had decimal places (that is, 58% being the average of 57% and 58% is not obvious).
Because you want to have NULL values on the first two rows, I'm going to calculate the values using sum() and count():
with q as (
<whatever generates the data you have shown>
)
select q.*,
(sum(sales_rate) over (order by date
rows between 2 preceding and 1 preceding
) /
nullif(count(*) over (order by date
rows between 2 preceding and 1 preceding
)
) as two_month_average
from q;
You could also express this using case and avg():
select q.*,
(case when row_number() over (order by date) > 2)
then avg(sales_rate) over (order by date
rows between 2 preceding and 1 preceding
)
end) as two_month_average
from q;

Calculate MAX for value over a relative date range

I am trying to calculate the max of a value over a relative date range. Suppose I have these columns: Date, Week, Category, Value. Note: The Week column is the Monday of the week of the corresponding Date.
I want to produce a table which gives the MAX value within the last two weeks for each Date, Week, Category combination so that the output produces the following: Date, Week, Category, Value, 2WeeksPriorMAX.
How would I go about writing that query? I don't think the following would work:
SELECT Date, Week, Value,
MAX(Value) OVER (PARTITION BY Category
ORDER BY Week
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) as 2WeeksPriorMAX
The above query doesn't account for cases where there are missing values for a given Category, Week combination within the last 2 weeks, and therefore it would span further than 2 weeks when it analyzes the 2 preceding rows.
Left joining or using a lateral join/subquery might be expensive. You can do this with window functions, but you need to have a bit more logic:
select t.*,
(case when lag(date, 1) over (partition by category order by date) < date - interval '2 week'
then value
when lag(date, 2) over (partition by category order by date) < date - interval '2 week'
then max(value) over (partition by category order by date rows between 1 preceding and current row)
else max(value) over (partition by category order by date rows between 2 preceding and current row)
end) as TwoWeekMax
from t;

Standard deviation of a set of dates

I have a table of transactions with columns id | client_id | datetime and I have calculated the mean of days between transactions to know how often this transactions are made by each client:
SELECT *, ((date_last_transaction - date_first_transaction)/total_transactions) AS frequency
FROM (
SELECT client_id, COUNT(id) AS total_transactions, MIN(datetime) AS date_first_transaction, MAX(datetime) AS date_last_transaction
FROM transactions
GROUP BY client_id
) AS t;
What would be the existing methods to calculate the standard deviation (in days) in a set of dates with postgresql? Preferably with only one query, if it is posible :-)
I have found this way:
SELECT extract(day from date_trunc('day', (
CASE WHEN COUNT(*) <= 1 THEN
0
ELSE
SUM(time_since_last_invoice)/(COUNT(*)-1)
END
) * '1 day'::interval)) AS days_between_purchases,
extract(day from date_trunc('day', (
CASE WHEN COUNT(*) <= 2 THEN
0
ELSE
STDDEV(time_since_last_invoice)
END
) * '1 day'::interval)) AS range_of_days
FROM (
SELECT client_id, datetime, COALESCE(datetime - lag(datetime)
OVER (PARTITION BY client_id ORDER BY client_id, datetime
ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING
), 0
) AS time_since_last_invoice
FROM my_table
GROUP BY client_id, datetime
ORDER BY client_id, datetime
)
Explanation:
This query groups by client and date and then calculates the difference between each pair of transaction dates (datetime) by client_id and returns a table with these results. After this, the external query processes the table and calculates de average time between differences greater than 0 (first value in each group is excluded because is the first transaction and therefore the interval is 0).
The standard deviation is calculated when there existe 2 o more transaction dates for the same client, to avoid division by zero errors.
All differences are returned in PostgreSQL interval format.

How can I create a week-to-date metric in vertica?

I have a table which stores year-to-date metrics once per client per day. The schema simplified looks roughly like so, lets call this table history::
bus_date | client_id | ytd_costs
I'd like to create a view that adds a week-to-date costs, essentially any cost that occurs after the prior friday would be considered part of the week-to-date. Currently, I have the following but I'm concerned about the switch case logic.
Here is an example of the logic I have right now to show that this works.
I also got to use the timeseries clause which I've never used before...
;with history as (
select bus_date,client_id,ts_first_Value(value,'linear') "ytd_costs"
from (select {ts'2016-10-07'} t,1 client_id,5.0 "value" union all select {ts'2016-10-14'},1, 15) k
timeseries bus_Date as '1 day' over (partition by client_id order by t)
)
,history_with_wtd as (select bus_date
,client_id
,ytd_costs
,ytd_costs - decode(
dayofweek(bus_date)
,6,first_value(ytd_costs) over (partition by client_id order by bus_date range '1 week' preceding)
,first_value(ytd_costs) over (partition by client_id,date_trunc('week',bus_date+3) order by bus_date)
) as "wtd_costs"
,ytd_costs - 5 "expected_wtd"
from history)
select *
from history_with_wtd
where date_trunc('week',bus_date) = '2016-10-10'
In Sql server, I could just use the lag function, as I can pass a variable to the look-back clause. but in Vertica no such option exists.
How about you partition by week starting on Saturday? First grab the first day of the week, then offset to start on Saturday. trunc(bus_date + 1,'D') - 1
Also notice the window frame is from the start of the partition (Saturday, unbounded preceding) to the current row.
select
bus_date
,client_id
,ytd_costs
,ytd_costs - first_value(ytd_costs) over (
partition by client_id, trunc(bus_date + 1,'D') - 1
order by bus_date
range between unbounded preceding and current row) wtd_costs
from sos.history
order by client_id, bus_date