postgresql window function minimum period to calculate average - sql

i want to calculate rolling average over a rolling period of 252 days but only if 252 days data available in table otherwise null value for rows.
currently i am using this query:
SELECT datestamp, symbol, avg(close) OVER (PARTITION BY symbol ORDER BY datestamp ROWS BETWEEN 251 PRECEDING AND CURRENT ROW) FROM daily_prices.
it is giving avg also if 252 days data not available.
i want acheive result as we get with pandas rolling function by defining min_period value.

"i want to calculate rolling average over a rolling period of 252 days"
The clause ROWS BETWEEN 251 PRECEDING AND CURRENT ROW doesn't refer to a period of time but to the number of rows in the window and which preceed the current row according to the ORDER BY datestamp clause.
I would suggest you a slightly different solution for the window function in order to implement the period of time :
SELECT datestamp, symbol, avg(close) OVER (PARTITION BY symbol ORDER BY datestamp RANGE BETWEEN '251 days' PRECEDING AND CURRENT ROW) FROM daily_prices
Then I don't understand in which case you want a null value. In the window of a current row, you will have at least the current row, so that the avg() can't be null.

Just do a count over the same window and use it to modify your results.
I used a named window to avoid specifying the same window repeatedly.
with daily_prices as (select 1 as symbol, 5 as close, t as datestamp from generate_series(now()-interval '1 year',now(),interval '1 day') f(t))
SELECT
datestamp,
symbol,
case when count(close) OVER w = 252 then
avg(close) OVER w
end
FROM daily_prices
window w as (PARTITION BY symbol ORDER BY datestamp ROWS BETWEEN 251 PRECEDING AND CURRENT ROW);

Related

how to use current row data in a condition inside a window function on a partition with range between unbounded preceding and current row

I need to use the current row values in a condition used in a window function on a partition with range. See my example query I am looking for.
select
count(case when orderdate >=
(**current_row.orderdate** -30) then 1 end)
over (partition by customerid order by orderdate range between unbounded preceding and current row)
from xyz
Not getting correct syntax.
Please see below example output required. This is just an example, the 30 days and the logic is just a sample.

How to use a moving limited RANGE window to multiple ORDER BYs?

This is my table:
userID
Year
Month
Day
NbOfVisits
I would like to calculate the 7 days moving average, my query is as follows:
select userID,year,month,day, sum(nbofvisits) OVER (Partition by userID order by year,month,day RANGE BETWEEN 7 PRECEDING AND CURRENT ROW) as nbVisits7days
from table
order by userID, year, month, day;
But I keep getting this error: "A range window frame with value boundaries cannot be used in a window specification with multiple order by expressions". I understand I have multiple "Order Bys", but I can't think of a straight way other than this.
Following Jon Armstrong's comment, I managed to run my intended query as follows:
select userID,year,month,day, sum(nbofvisits) OVER (Partition by userID order by TIMESTAMP(concat(annee,'-',mois,'-',jour)) RANGE BETWEEN INTERVAL '7' DAY PRECEDING AND CURRENT ROW) as nbVisits7days
from table
order by userID, year, month, day;
Thank you!

Grafana, postgresql: aggregate function calls cannot contain window function calls

In Grafana, we want to show bars indicating maximum of 15-minut averages in the choosen time interval. Our data has regular 1-minute intervals. The database is Postgresql.
To show the 15-minute averages, we use the following query:
SELECT
timestamp AS time,
AVG(rawvalue) OVER(ORDER BY timestamp ROWS BETWEEN 7 PRECEDING AND 7 FOLLOWING) AS value,
'15-min Average' AS metric
FROM database.schema
WHERE $__timeFilter(timestamp) AND device = '$Device'
ORDER BY time
To show bars indicating maximum of raw values in the choosen time interval, we use the following query:
SELECT
$__timeGroup(timestamp,'$INTERVAL') AS time,
MAX(rawvalue) AS value,
'Interval Max' AS metric
FROM database.schema
WHERE $__timeFilter(timestamp) AND device = '$Device'
GROUP BY $__timeGroup(timestamp,'$INTERVAL')
ORDER BY time
A naive combination of both solutions does not work:
SELECT
$__timeGroup(timestamp,'$INTERVAL') AS time,
MAX(AVG(rawvalue) OVER(ORDER BY timestamp ROWS BETWEEN 7 PRECEDING AND 7 FOLLOWING)) AS value,
'Interval Max 15-min Average' AS metric
FROM database.schema
WHERE $__timeFilter(timestamp) AND device = '$Device'
GROUP BY $__timeGroup(timestamp,'$INTERVAL')
ORDER BY time
We get error: "pq: aggregate function calls cannot contain window function calls".
There is a suggestion on SO to use "with" (Count by criteria over partition) but I do not know hot to use it in our case.
Use the first query as a CTE (or with) for the second one. The order by clause of the CTE and the where clause of the second query as well as the metric column of the CTE are no longer needed. Alternatively you can use the first query as a derived table in the from clause of the second one.
with t as
(
SELECT
timestamp AS time,
AVG(rawvalue) OVER(ORDER BY timestamp ROWS BETWEEN 7 PRECEDING AND 7 FOLLOWING) AS value
FROM database.schema
WHERE $__timeFilter(timestamp) AND device = '$Device'
)
SELECT
$__timeGroup(time,'$INTERVAL') AS time,
MAX(value) AS value,
'Interval Max 15-min Average' AS metric
FROM t
GROUP BY 1 ORDER BY 1;
Unrelated but what are $__timeFilter and $__timeGroup? Their sematics are clear but where do they come from? BTW you may find this function useful.

Is there a way to calculate percentile using percentile_cont() function over a rolling window in Big Query?

I have a dataset with the following columns
city
user
week
month
earnings
Ideally I want to calculate a 50th % from percentile_cont(earnings,0.5) over (partition by city order by month range between 1 preceding and current row). But Big query doesn't support window framing in percentile_cont. Can anyone please help me if there is a work around this problem.
If I understand correctly, you can aggregate into an array and then unnest:
select t.*,
(select percentile_cont(earning) over ()
from unnest(ar_earnings) earning
limit 1
) as median_2months
from (select t.*,
array_agg(earnings) over (partition by city
order by month
range between 1 preceding and current month
) as ar_earnings
from t
) t;
You don't provide sample data, but this version assumes that month is an incrementing integer that represents the month. You may need to adjust the range depending on the type.

BigQuery - counting number of events within a sliding time frame

I would like to count the number of events within a sliding time frame.
For example, say I would like to know how many bids were in the last 1000 seconds for the Google stock (GOOG).
I'm trying the following query:
SELECT
symbol,
start_date,
start_time,
bid_price,
count(if(max(start_time)-start_time<1000,1,null)) over (partition by symbol order by start_time asc) cnt
FROM [bigquery-samples:nasdaq_stock_quotes.quotes]
where symbol = 'GOOG'
The logic is as follow: the partition window (by symbol) is ordered with the bid time (leaving alone the bid date for sake of simplicity).
For each window (defined by the row at the "head" of the window) I would like to count the number of rows which have start_time that is less than 1000 seconds than the "head" row time.
I'm trying to use max(start_time) to get the top row in the window. This doesn't seem to work and I get an error:
Error: MAX is an analytic function and must be accompanied by an OVER clause.
Is it possible to have two an analytic functions in one column (both count and max in this case)? Is there a different solution to the problem presented?
Try using the range function.
SELECT
symbol,
start_date,
start_time,
bid_price,
count(market_center) over (partition by symbol order by start_time RANGE 1000 PRECEDING) cnt
FROM [bigquery-samples:nasdaq_stock_quotes.quotes]
where symbol = 'GOOG'
order by 2, 3
I used market_center just as a counter, additional fields can be used as well.
Note: the RANGE function is not documented in BigQuery Query Reference, however it's a standard SQL function which appears to work in this case