Rolling average in SQL using window function over DATETIME range - sql

I'm trying to compute a rolling average using SQL and I was able to compute said average using the OVER and ROWS BETWEEN commands. I obtained a rolling average using the rows between the preceding 1 and following 1 rows, code seen below along with output in figure 1.
How can I modify my SQL query below so that I compute this rolling average using a 6 hour window for the observed row value CHARTTIME (i.e., 3 hours before and 3 hours after). Any help or advice would be greatly appreciated.
sql_query <- "SELECT SUBJECT_ID, HADM_ID, chartevents.ITEMID, VALUE, CHARTTIME, LABEL, AVG(SAFE_CAST(VALUE AS INT)) OVER(
PARTITION BY SUBJECT_ID
ORDER BY CHARTTIME
ROWS BETWEEN 1 PRECEDING and 1 FOLLOWING
) AS MV_AVG
FROM chartevents
INNER JOIN d_items ON chartevents.ITEMID = d_items.ITEMID
WHERE chartevents.ITEMID IN (211,220045)
ORDER BY SUBJECT_ID, CHARTTIME
"
Figure 1

Related

How to use a window function in snowflake to look back 12 months

I am having trouble using window functions in SNOWFLAKE to look at historical data (from 12 months prior). When I add a dimension, this code doesn't work.
SELECT
DATE_TRUNC('MONTH',pl.DATE) AS MONTH,
COUNT(DISTINCT PL.ID) AS CURRENT,
PL.DIMENSION,
FIRST_VALUE(count(DISTINCT pl.ID)) OVER (PARTITION BY PL.DIMENSION ORDER BY MONTH ASC ROWS BETWEEN 12 PRECEDING AND 12 PRECEDING) AS 1_YEAR_AGO
from table1 pl
group by MONTH, PL.DIMENSION
ORDER BY MONTH
here are the results if i filter on the dimension:
i am wanting more rows.. for example month = 2019-10-01, CURRENT_ would be NULL and 1_YR_AGO should be 1 and so on.. what am I missing? (I put examples of this in the highlighted section of the picture. the results are unhighlighted.
NOTE: I've also tried a lag and it does the same thing here.

Is there a way to calculate percentile using percentile_cont() function over a rolling window in Big Query?

I have a dataset with the following columns
city
user
week
month
earnings
Ideally I want to calculate a 50th % from percentile_cont(earnings,0.5) over (partition by city order by month range between 1 preceding and current row). But Big query doesn't support window framing in percentile_cont. Can anyone please help me if there is a work around this problem.
If I understand correctly, you can aggregate into an array and then unnest:
select t.*,
(select percentile_cont(earning) over ()
from unnest(ar_earnings) earning
limit 1
) as median_2months
from (select t.*,
array_agg(earnings) over (partition by city
order by month
range between 1 preceding and current month
) as ar_earnings
from t
) t;
You don't provide sample data, but this version assumes that month is an incrementing integer that represents the month. You may need to adjust the range depending on the type.

SQL Server need the total of the previous 6 rows

I'm using SQL Server and I need to get the sum of the previous 6 rows of my table and place the results in its own column.
I'm able to get the 6th row back with the below query:
SELECT id
,FileSize
,LAG(FileSize,6) OVER (ORDER BY DAY(CompleteTime)) previous
FROM Jobs_analytics
group by id, CompleteTime, Jobs_analytics.FileSize
which gives me the six row back, but what I need is the sum of all six rows previous.
any help would be appreciate
Mike
You can use:
SELECT ja.id, ja.FileSize, CompleteTime,
SUM(FileSize) OVER (ORDER CompleteTime ROWS BETWEEN 5 PRECEDING AND CURRENT ROW) as previous
FROM Jobs_analytics ja;
I don't see why GROUP BY is necessary. There are no aggregation functions.
Note that this takes 6 days including the current day. If you want the six preceding rows:
SELECT ja.id, ja.FileSize, DATE,
SUM(FileSize) OVER (ORDER BY CompleteTime ja.id ROWS BETWEEN 6 PRECEDING AND 1 PRECEDING) as previous
FROM Jobs_analytics ja

Rolling averages in SQL Server 2012 using range

I’m trying to calculate a 3 month rolling average grouped by region and month, as in
Region Month Avg(var_a) Avg(var_b)
Northland Dec-Jan-Feb 7.1 5.9
Southland Dec-Jan-Feb 7.2 6.1
Northland Nov-Dec-Jan 7.4 6.1
Southland Nov-Dec-Jan 7.5 6.2
Northland Oct-Nov-Dec 7.5 6.2
Southland Oct-Nov-Dec 7.5 6.1
Note that month is expanded for illustrative purposes, I’d really expect the output to just say a single month.
Now I can do this by creating a CTE grouping by region and month, then joining to it a couple times like
With month_rollup_cte as
(Select region,month,sum(var_a) a_sum, sum(var_b) b_sum, count(1) cnt
From vw_score_by_region
Group by region,month)
Select c1.region, c1.month,sum(c1.a_sum + c2.a_sum + c3.a_sum) / sum(c1.cnt + c2.cnt + c3.cnt) a_avg, sum(c1.b_sum + c2.b_sum + c3.b_sum) / sum(c1.cnt + c2.cnt + c3.cnt) b_avg
From month_rollup_cte c1
Join month_rollup_cte c2 on c1.region = c2. Region and c1.month = dateadd(mm,1,c2.month)
Join month_rollup_cte c3 on c1.region = c3. Region and c1.month = dateadd(mm,2,c3.month)
Group by c1.region, c1.month;
But that’s ugly, imagine if you had to do a 6 month rolling average or 12 month rolling average… I’m trying to use the t-sql 2012 analytic functions, specifically the RANGE option. I’ve used ROWS preceding before, but never range.
What I tried was
select region,avg(var_a) OVER (order by (year(entry_month) * 100 + month(entry_month)) range between 2 preceding and 1 following)
from [dbo].[vw_score_by_region]
group by region
But I get a syntax error:
*Msg 8120, Level 16, State 1, Line 2
Column 'dbo.vw_score_by_region.month' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.*
Clearly I'm doing something silly, but I'm not sure what.
First of all RANGE is only supported with UNBOUNDED and CURRENT ROW frame delimiters, It cannot be used with N PRECEDING or N FOLLOWING. From your title, looks like your want to get 3 months rolling avg (sliding avg), then you'd better to use ROWS
Using ROWS (This is more likely what you need) SQl Fiddle Demo
select region,
avg(var_a) OVER (partition by region
order by (entry_month)
rows between 2 preceding and current row) as ThreeMonthSlidingAvg
from [dbo].[vw_score_by_region]
Note: No need to calcuate year+month, if entry_month is date or datetime, it is sortable already, thanks for Steve's correction.
Using RANGE:
select region,
avg(var_a) OVER (partition by region,(year(entry_month) * 12 + month(entry_month))/3
order by (entry_month)
range between unbounded preceding and current row) as ThreeMonthSlidingAvg
from [dbo].[vw_score_by_region]
Note: Using RANGE you have to control the partition width, since you want to agg by 3 month, and range doesn't support N PRECEDING and N FOLLOWING, it only supports following:
| UNBOUNDED PRECEDING | Starts the window at first row of the partition
| UNBOUNDED FOLLOWING | Ends the window at last row of the partition
| CURRENT ROW | Starts or Ends the window at current row

Optimizing a Vertica SQL query to do running totals

I have a table S with time series data like this:
key day delta
For a given key, it's possible but unlikely that days will be missing.
I'd like to construct a cumulative column from the delta values (positive INTs), for the purposes of inserting this cumulative data into another table. This is what I've got so far:
SELECT key, day,
SUM(delta) OVER (PARTITION BY key ORDER BY day asc RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW),
delta
FROM S
In my SQL flavor, default window clause is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, but I left that in there to be explicit.
This query is really slow, like order of magnitude slower than the old broken query, which filled in 0s for the cumulative count. Any suggestions for other methods to generate the cumulative numbers?
I did look at the solutions here:
Running total by grouped records in table
The RDBMs I'm using is Vertica. Vertica SQL precludes the first subselect solution there, and its query planner predicts that the 2nd left outer join solution is about 100 times more costly than the analytic form I show above.
I think you're essentially there. You may just need to update the syntax a bit:
SELECT s_qty,
Sum(s_price)
OVER(
partition BY NULL
ORDER BY s_qty ASC rows UNBOUNDED PRECEDING ) "Cumulative Sum"
FROM sample_sales;
Output:
S_QTY | Cumulative Sum
------+----------------
1 | 1000
100 | 11000
150 | 26000
200 | 28000
250 | 53000
300 | 83000
2000 | 103000
(7 rows)
reference link:
https://dwgeek.com/vertica-cumulative-sum-average-and-example.html/
Sometimes it's faster to just use a correlated subquery:
SELECT
[key]
, [day]
, delta
, (SELECT SUM(delta) FROM S WHERE [key] < t1.[key]) AS DeltaSum
FROM S t1