I’m trying to calculate a 3 month rolling average grouped by region and month, as in
Region Month Avg(var_a) Avg(var_b)
Northland Dec-Jan-Feb 7.1 5.9
Southland Dec-Jan-Feb 7.2 6.1
Northland Nov-Dec-Jan 7.4 6.1
Southland Nov-Dec-Jan 7.5 6.2
Northland Oct-Nov-Dec 7.5 6.2
Southland Oct-Nov-Dec 7.5 6.1
Note that month is expanded for illustrative purposes, I’d really expect the output to just say a single month.
Now I can do this by creating a CTE grouping by region and month, then joining to it a couple times like
With month_rollup_cte as
(Select region,month,sum(var_a) a_sum, sum(var_b) b_sum, count(1) cnt
From vw_score_by_region
Group by region,month)
Select c1.region, c1.month,sum(c1.a_sum + c2.a_sum + c3.a_sum) / sum(c1.cnt + c2.cnt + c3.cnt) a_avg, sum(c1.b_sum + c2.b_sum + c3.b_sum) / sum(c1.cnt + c2.cnt + c3.cnt) b_avg
From month_rollup_cte c1
Join month_rollup_cte c2 on c1.region = c2. Region and c1.month = dateadd(mm,1,c2.month)
Join month_rollup_cte c3 on c1.region = c3. Region and c1.month = dateadd(mm,2,c3.month)
Group by c1.region, c1.month;
But that’s ugly, imagine if you had to do a 6 month rolling average or 12 month rolling average… I’m trying to use the t-sql 2012 analytic functions, specifically the RANGE option. I’ve used ROWS preceding before, but never range.
What I tried was
select region,avg(var_a) OVER (order by (year(entry_month) * 100 + month(entry_month)) range between 2 preceding and 1 following)
from [dbo].[vw_score_by_region]
group by region
But I get a syntax error:
*Msg 8120, Level 16, State 1, Line 2
Column 'dbo.vw_score_by_region.month' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.*
Clearly I'm doing something silly, but I'm not sure what.
First of all RANGE is only supported with UNBOUNDED and CURRENT ROW frame delimiters, It cannot be used with N PRECEDING or N FOLLOWING. From your title, looks like your want to get 3 months rolling avg (sliding avg), then you'd better to use ROWS
Using ROWS (This is more likely what you need) SQl Fiddle Demo
select region,
avg(var_a) OVER (partition by region
order by (entry_month)
rows between 2 preceding and current row) as ThreeMonthSlidingAvg
from [dbo].[vw_score_by_region]
Note: No need to calcuate year+month, if entry_month is date or datetime, it is sortable already, thanks for Steve's correction.
Using RANGE:
select region,
avg(var_a) OVER (partition by region,(year(entry_month) * 12 + month(entry_month))/3
order by (entry_month)
range between unbounded preceding and current row) as ThreeMonthSlidingAvg
from [dbo].[vw_score_by_region]
Note: Using RANGE you have to control the partition width, since you want to agg by 3 month, and range doesn't support N PRECEDING and N FOLLOWING, it only supports following:
| UNBOUNDED PRECEDING | Starts the window at first row of the partition
| UNBOUNDED FOLLOWING | Ends the window at last row of the partition
| CURRENT ROW | Starts or Ends the window at current row
Related
I'm trying to compute a rolling average using SQL and I was able to compute said average using the OVER and ROWS BETWEEN commands. I obtained a rolling average using the rows between the preceding 1 and following 1 rows, code seen below along with output in figure 1.
How can I modify my SQL query below so that I compute this rolling average using a 6 hour window for the observed row value CHARTTIME (i.e., 3 hours before and 3 hours after). Any help or advice would be greatly appreciated.
sql_query <- "SELECT SUBJECT_ID, HADM_ID, chartevents.ITEMID, VALUE, CHARTTIME, LABEL, AVG(SAFE_CAST(VALUE AS INT)) OVER(
PARTITION BY SUBJECT_ID
ORDER BY CHARTTIME
ROWS BETWEEN 1 PRECEDING and 1 FOLLOWING
) AS MV_AVG
FROM chartevents
INNER JOIN d_items ON chartevents.ITEMID = d_items.ITEMID
WHERE chartevents.ITEMID IN (211,220045)
ORDER BY SUBJECT_ID, CHARTTIME
"
Figure 1
I have a dataset with the following columns
city
user
week
month
earnings
Ideally I want to calculate a 50th % from percentile_cont(earnings,0.5) over (partition by city order by month range between 1 preceding and current row). But Big query doesn't support window framing in percentile_cont. Can anyone please help me if there is a work around this problem.
If I understand correctly, you can aggregate into an array and then unnest:
select t.*,
(select percentile_cont(earning) over ()
from unnest(ar_earnings) earning
limit 1
) as median_2months
from (select t.*,
array_agg(earnings) over (partition by city
order by month
range between 1 preceding and current month
) as ar_earnings
from t
) t;
You don't provide sample data, but this version assumes that month is an incrementing integer that represents the month. You may need to adjust the range depending on the type.
I'm using SQL Server and I need to get the sum of the previous 6 rows of my table and place the results in its own column.
I'm able to get the 6th row back with the below query:
SELECT id
,FileSize
,LAG(FileSize,6) OVER (ORDER BY DAY(CompleteTime)) previous
FROM Jobs_analytics
group by id, CompleteTime, Jobs_analytics.FileSize
which gives me the six row back, but what I need is the sum of all six rows previous.
any help would be appreciate
Mike
You can use:
SELECT ja.id, ja.FileSize, CompleteTime,
SUM(FileSize) OVER (ORDER CompleteTime ROWS BETWEEN 5 PRECEDING AND CURRENT ROW) as previous
FROM Jobs_analytics ja;
I don't see why GROUP BY is necessary. There are no aggregation functions.
Note that this takes 6 days including the current day. If you want the six preceding rows:
SELECT ja.id, ja.FileSize, DATE,
SUM(FileSize) OVER (ORDER BY CompleteTime ja.id ROWS BETWEEN 6 PRECEDING AND 1 PRECEDING) as previous
FROM Jobs_analytics ja
I have the following SQL statement, which I think should update 1 field, using some pretty simple standard deviation logic, and based on ID and Date. I think the ID and Date has to be included to get everything aligned right. So, here is the code that I'm testing.
UPDATE Price_Test2
SET Vol30Days = STDEV(PX_BID) OVER (ORDER BY ID_CUSIP, AsOfDate ROWS BETWEEN 30 PRECEDING AND CURRENT ROW) FROM Price_Test2
WHERE ID_CUSIP in (SELECT DISTINCT ID_CUSIP FROM Price_Test2)
It seems like it should work fine, but something is off because I'm getting an error that says: Cannot use both a From clause and a subquery in the where clause or in the data values list in an Update statement.
I am using SQL Server 2019.
You are using aggregation functions in an update. What you want is an updatable subquery (or CTE):
UPDATE p
SET Vol30Days = new_Vol30Days,
Vol60Days = new_Vol60Days,
Vol90Days = new_Vol90Days
FROM (SELECT p.*,
STDEV(PX_BID) OVER (ORDER BY Date ROWS BETWEEN 30 PRECEDING AND CURRENT ROW) as new_Vol30day,
STDEV(PX_BID) OVER (ORDER BY Date ROWS BETWEEN 60 PRECEDING AND CURRENT ROW) as new_Vol60day,
STDEV(PX_BID) OVER (ORDER BY Date ROWS BETWEEN 90 PRECEDING AND CURRENT ROW) as new_Vol60day
FROM prices p
) p;
Hi I am a newbie when it comes to SQL and was hoping someone can help me in this matter. I've been using the lag function here and there but was wondering if there is a way to rewrite it to make it into a sum range. So instead of prior one month, i want to take the prior 12 months and sum them together for each period. I don't want to write 12 lines of lag but was wondering if there is a way to get it with less lines of code. Note there will be nulls and if one of the 12 records is null then it should be null.
I know you can write write subquery to do this, but was wondering if this is possible. Any help would be much appreciated.
You want the "window frame" part of the window function. A moving 12-month average would look like:
select t.*,
sum(balance) over (order by period rows between 11 preceding and current row) as moving_sum_12
from t;
You can review window frames in the documentation.
If you want a cumulative sum, you can leave out the window frame entirely.
I should note that you can also do this using lag(), but it is much more complicated:
select t.*,
(balance +
lag(balance, 1, 0) over (order by period) +
lag(balance, 2, 0) over (order by period) +
. . .
lag(balance, 11, 0) over (order by period) +
) as sum_1112
from t;
This uses the little known third argument to lag(), which is the default value to use if the record is not available. It replaces a coalesce().
EDIT:
If you want NULL if 12 values are not available, then use case and count() as well:
select t.*,
(case when count(*) over (order by period rows between 11 preceding and current row) = 12
then sum(balance) over (order by period rows between 11 preceding and current row)
end) as moving_sum_12
from t;