When making the "frame" for a windowed analytic function, one can specify a literal number of rows to "look back" over. E.g., the following will get the trailing 26 weeks weekly sales for a households.
,sum(sales) over (partition by household_id order by week_id rows 26 preceding) as x26
But... what if you wanted to look back (or forward) with an offset? E.g., if for week n, you wanted the sales for the 26 weeks that ended 8 weeks before week n? As I was typing this, it occurred to me that I could probably do it in parts. I.e.,
,sum(sales) over (partition by household_id order by week_id rows 34 preceding) as x34
,sum(sales) over (partition by household_id order by week_id rows 8 preceding) as x8
...and have trailing26_offeset8 = x34 - x8
Hm... Glad I asked. But anyway, do you know if there's an feature that will let me specify the offset right in the partition specification itself?
Thanks!
Try using between in the window range specification:
sum(sales) over (partition by household_id
order by week_id
rows between 34 preceding and 8 preceding
) as x34
Related
I am having trouble using window functions in SNOWFLAKE to look at historical data (from 12 months prior). When I add a dimension, this code doesn't work.
SELECT
DATE_TRUNC('MONTH',pl.DATE) AS MONTH,
COUNT(DISTINCT PL.ID) AS CURRENT,
PL.DIMENSION,
FIRST_VALUE(count(DISTINCT pl.ID)) OVER (PARTITION BY PL.DIMENSION ORDER BY MONTH ASC ROWS BETWEEN 12 PRECEDING AND 12 PRECEDING) AS 1_YEAR_AGO
from table1 pl
group by MONTH, PL.DIMENSION
ORDER BY MONTH
here are the results if i filter on the dimension:
i am wanting more rows.. for example month = 2019-10-01, CURRENT_ would be NULL and 1_YR_AGO should be 1 and so on.. what am I missing? (I put examples of this in the highlighted section of the picture. the results are unhighlighted.
NOTE: I've also tried a lag and it does the same thing here.
I have a dataset with the following columns
city
user
week
month
earnings
Ideally I want to calculate a 50th % from percentile_cont(earnings,0.5) over (partition by city order by month range between 1 preceding and current row). But Big query doesn't support window framing in percentile_cont. Can anyone please help me if there is a work around this problem.
If I understand correctly, you can aggregate into an array and then unnest:
select t.*,
(select percentile_cont(earning) over ()
from unnest(ar_earnings) earning
limit 1
) as median_2months
from (select t.*,
array_agg(earnings) over (partition by city
order by month
range between 1 preceding and current month
) as ar_earnings
from t
) t;
You don't provide sample data, but this version assumes that month is an incrementing integer that represents the month. You may need to adjust the range depending on the type.
Update. Caius Jard provided what I needed.
This query works to create a 30 row moving average but I need it to calculate the average monthly settle prices for PRODUCT_SYMBOL IN ('BK','CL','CY','WJ') instead of the last 30 rows and I can't figure it out.
SELECT PRODUCT_SYMBOL
,CONTRACT_YEAR
,CONTRACT_DETAIL
,TRADEDATE
,SETTLE
,AVG(SETTLE) OVER (
PARTITION BY CONTRACT_DETAIL
ORDER BY TRADEDATE
ROWS BETWEEN 29 PRECEDING and CURRENT ROW
) AS MA30
FROM Pricing.dbo.MasterReport$
ORDER BY Tradedate ASC
Can you try this :
SELECT PRODUCT_SYMBOL
,CONTRACT_YEAR
,CONTRACT_DETAIL
,TRADEDATE
,SETTLE
,AVG(SETTLE) OVER (
PARTITION BY CONTRACT_DETAIL
ORDER BY TRADEDATE ASC
ROWS 29 PRECEDING
) AS MA30
FROM Pricing.dbo.MasterReport$
ORDER BY Tradedate ASC
I am having some performance issues with a query in SQL Server 2012.
The query is used to insert data in a table using window functions to aggregate sales data in different ways (Previous month, previous year month, Cycle to date, YTD, MAT).
After doing a pretty extensive research in windows functions I think that an appropriate index in the table from which the data is read would help a lot, but I am struggling to find the correct one (too many columns involved)...
The source table from which the table reads the data has around 50 million rows and is truncated and reloaded in a daily basis by an SSIS package that can be modified to drop and create the indexes in each execution.
Could somebody suggest what index might work (if any) or any other performance improvement method?
The select statement is as follows:
SELECT
PERIOD,
CUENTA_ID,
PROD_ID,
TIPO_VENTA,
VENTA_EUROS,
CICLO,
DELEGADO_B2B,
SUM(VENTA_EUROS) OVER (PARTITION BY CUENTA_ID, PROD_ID, TIPO_VENTA,DELEGADO_B2B ORDER BY PERIOD ROWS BETWEEN 12 PRECEDING AND 12 PRECEDING) AS VENTA_EUROS_PREV,
SUM(VENTA_EUROS) OVER (PARTITION BY CUENTA_ID, PROD_ID, TIPO_VENTA,DELEGADO_B2B,YEAR ORDER BY PERIOD ROWS UNBOUNDED PRECEDING) AS VENTA_EUROS_YTD,
SUM(VENTA_EUROS) OVER (PARTITION BY CUENTA_ID, PROD_ID, TIPO_VENTA,DELEGADO_B2B,YEAR, CICLO ORDER BY PERIOD ROWS UNBOUNDED PRECEDING) AS VENTA_EUROS_CTD,
SUM(VENTA_EUROS) OVER (PARTITION BY CUENTA_ID, PROD_ID, TIPO_VENTA,DELEGADO_B2B ORDER BY PERIOD ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) AS VENTA_EUROS_MONTH_PREV,
SUM(VENTA_EUROS) OVER (PARTITION BY CUENTA_ID, PROD_ID, TIPO_VENTA,DELEGADO_B2B ORDER BY PERIOD ROWS 11 PRECEDING) AS VENTA_EUROS_MAT
FROM _REPORTING.[dbo].[RPT_VENTA_MENSUAL_STEP_1]
WHERE YEAR>=YEAR(DATEADD(day,-1,GETDATE()))-1
I checked the execution plan and the parts that are taking the biggest percentages are the three sortings for the three different "OVER(PARTITION BY)
Here is the plan:
https://www.brentozar.com/pastetheplan/?id=B1fsgwjBE
Thanks & Regards
The first thing the index needs to resolve is the WHERE clause. Unfortunately, it has an inequality, which pretty much makes it impossible for the optimizer to help with the windowing clauses.
If you had:
WHERE YEAR = YEAR(DATEADD(day, -1, GETDATE())) - 1
Then the optimizer could take advantage of an index on (YEAR, CUENTA_ID, PROD_ID, TIPO_VENTA, DELEGADO_B2B, PERIOD).
I need some help with windowing functions.
I have been playing around with sql 2012 windowing functions recently. I know that you can calculate the sum within a window and the running total within a window. But i was wondering; is it possible to calculate the previous running total i.e. the running total not including the current row ? I assume you would need to use the ROW or RANGE argument and I know there is a CURRENT ROW option but I would need a CURRENT ROW - I which is invalid syntax. My knowledge of the ROW and RANGE arguments is limited so any help would be gratefully received.
I know that there are many solutions to this problem, but I am looking to understand the ROW, RANGE arguments and I assume the problem can be cracked with these. I have included one possible way to calculate the previous running total but I wonder if there is a better way.
USE AdventureWorks2012
SELECT s.SalesOrderID
, s.SalesOrderDetailID
, s.OrderQty
, SUM(s.OrderQty) OVER (PARTITION BY SalesOrderID) AS RunningTotal
, SUM(s.OrderQty) OVER (PARTITION BY SalesOrderID
ORDER BY SalesOrderDetailID) - s.OrderQty AS PreviousRunningTotal
-- Sudo code - I know this does not work
--, SUM(s.OrderQty) OVER (PARTITION BY SalesOrderID
-- ORDER BY SalesOrderDetailID
-- ROWS BETWEEN UNBOUNDED PRECEDING
-- AND CURRENT ROW - 1)
-- AS SudoCodePreviousRunningTotal
FROM Sales.SalesOrderDetail s
WHERE SalesOrderID IN (43670, 43669, 43667, 43663)
ORDER BY s.SalesOrderID
, s.SalesOrderDetailID
, s.OrderQty
Thanks in advance
You could subtract the current row's value:
SUM(s.OrderQty) OVER (PARTITION BY SalesOrderID
ORDER BY SalesOrderDetailID) - s.OrderQty
Or according to the syntax at MSDN and ypercube's answer:
<window frame preceding> ::=
{
UNBOUNDED PRECEDING
| <unsigned_value_specification> PRECEDING
| CURRENT ROW
}
-->
SUM(s.OrderQty) OVER (PARTITION BY SalesOrderID
ORDER BY SalesOrderDetailID
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)