How to Calc Exponential Moving Average using SQL Server 2012 Window Functions - sql

I know that it is easy to calculate simple moving average using SQL Server 2012 window functions and OVER() clause. But how can I calculate exponential moving average using this approach? Thanks!

The formula for EMA(x) is:
EMA(x1) = x1
EMA(xn) = α * xn + (1 - α) * EMA(xn-1)
With β := 1 - α that is equivalent to
EMA(xn) = βn-1 * x1 + α * βn-2 * x2 + α * βn-3 * x3 + ... + α * xn
In that form it is easy to implement with LAG. For a 4 row EMA it would look like this:
SELECT LAG(x,3)OVER(ORDER BY ?) * POWER(#beta,3) +
LAG(x,2)OVER(ORDER BY ?) * POWER(#beta,2) * #alpha +
LAG(x,1)OVER(ORDER BY ?) * POWER(#beta,1) * #alpha +
x * #alpha
FROM ...
OK, as you seem to be after the EWMA_Chart I created a SQL Fiddle showing how to get there. However, be aware that it is using a recursive CTE that requires one recursion per row returned. So on a big dataset you will most likely get disastrous performance. The recursion is necessary as each row depends on all rows that happened before. While you could get all preceding rows with LAG() you cannot also reference preceding calculations as LAG() cannot reference itself.
Also, the formular in the spreadsheet you attached below does not make sense. It seems to be trying to calculate the EWMA_Chart value but it is failing at that. In the above SQLFiddle I included a column [Wrong] that calculates the same value that the spreadsheet is calculating.
Either way, if you need to use this on a big dataset, you are probably better of writing a cursor.
This is the code that does the calculation in above SQLFiddle. it references th vSMA view that calculates the 10 row moving average.
WITH
smooth AS(
SELECT CAST(0.1818 AS NUMERIC(20,5)) AS alpha
),
numbered AS(
SELECT Date, Price, SMA, ROW_NUMBER()OVER(ORDER BY Date) Rn
FROM vSMA
WHERE SMA IS NOT NULL
),
EWMA AS(
SELECT Date, Price, SMA, CAST(SMA AS NUMERIC(20,5)) AS EWMA, Rn
, CAST(SMA AS NUMERIC(20,5)) AS Wrong
FROM numbered
WHERE Rn = 1
UNION ALL
SELECT numbered.Date, numbered.Price, numbered.SMA,
CAST(EWMA.EWMA * smooth.alpha + CAST(numbered.SMA AS NUMERIC(20,5)) * (1 - smooth.alpha) AS NUMERIC(20,5)),
numbered.Rn
, CAST((numbered.Price - EWMA.EWMA) * smooth.alpha + EWMA.EWMA AS NUMERIC(20,5))
FROM EWMA
JOIN numbered
ON EWMA.rn + 1 = numbered.rn
CROSS JOIN smooth
)
SELECT Date, Price, SMA, EWMA
, Wrong
FROM EWMA
ORDER BY Date;

Related

How to write SQL to calculate running average with some additional formulae?

Following is the image that has running average calculated by me. But the requirement is a bit extra on top of the running average.
Following is the image where the requirement is in the Microsoft Excel sheet.
So, in order to calculate the running average with formulae like =(3*C4+2*C5+1*C6)/6 that have been gathered in excel sheet, what SQL Query could be written?
Also, if it's not feasible through SQL, then how could I use the Column D from the second image as my measure in SSAS?
use LAG() with offset and follow your formula accordingly
avg_val = ( (3.0 * lag(Open_, 2) over (order by M, [WEEK]) )
+ (2.0 * lag(Open_, 1) over (order by M, [WEEK]) )
+ (1.0 * Open_) ) / 6

Calculate a specific moving average using sql query

Consider that I have a table with one column "A" and I would like to create another column called "B" such that
B[i] = 0.2*A[i] + 0.8*B[i-1]
where B[0]=0.
My problem is that I cannot use the OVER() function because I want to use the values in B while I am trying to construct B. Any idea would be appreciated. Thanks
This is a rather complex mathematical exercise. You want to accumulate exponentially decreasing amounts from previous rows.
It is a little confusing because the amount going in on each row is 20%, but that is just a factor in the formula.
In any case, this seems to do what you want:
select t.*,
sum(power(0.8, -n) * a * 0.2) over (order by id) / power(0.8, -n)
from (select t.8,
row_number() over (order by id) - 1 as n
from t
) x;
Here is a db<>fiddle using Postgres.

How to find neighboring records in the SQL table in terms of month and year?

Please help me to optimize my SQL query.
I have a table with the fields: date, commodity_id, exp_month_id, exp_year, price, where the first 4 fields are the primary key. The months are designated with the alphabet-ordered letters: e.g. F (for Jan), G (for Feb.), H (for March), etc. Thus the letter of more distant from Jan. month will be larger than the letter of the less distant month (F < G < H < ...). Some commodity_ids have all 12 months in the table, some only 5 or 3, which are constant for all years.
I need to calculate the difference between prices (gradient) of the neighboring records in terms of exp_month_id, exp_year. As the first step, I want to define for every couple (exp_month_id, exp_year) the valid couple (next_month_id, next_year). The main problem here, that if the current exp_month_id is the last in the year, then next_year = exp_year + 1 and next_month_id should be the first one in the year.
I have written the following query to do the job:
WITH trading_months AS (
SELECT DISTINCT commodity_id,
exp_month_id
FROM futures
ORDER BY exp_month_id
)
SELECT DISTINCT f.commodity_id,
f.exp_month_id,
f.exp_year,
(
WITH [temp] AS (
SELECT exp_month_id
FROM trading_months
WHERE commodity_id = f.commodity_id
)
SELECT exp_month_id
FROM [temp]
WHERE exp_month_id > f.exp_month_id
UNION ALL
SELECT exp_month_id
FROM [temp]
LIMIT 1
)
AS next_month_id,
(
SELECT CASE WHEN EXISTS (
SELECT commodity_id,
exp_month_id
FROM trading_months
WHERE commodity_id = f.commodity_id AND
exp_month_id > f.exp_month_id
LIMIT 1
)
THEN f.exp_year ELSE f.exp_year + 1 END
)
AS next_year
FROM futures AS f
This query serves as a base for a dynamic table (view) which is subsequently used for calculating the gradient. However, the execution of this query takes more than one second and thus the whole process takes minutes. I wonder if you could help me optimizing the query.
Note: The following requires Sqlite 3.25 or newer for window function support:
Lack of sample data (Preferably as a CREATE TABLE and INSERT statements for easy importing) and expected results makes it hard to test, but if your end goal is computing the difference in prices between expiration dates (Making your question a bit of an XY problem, maybe something like:
SELECT date, commodity_id, price, exp_year, exp_month_id
, price - lag(price, 1) OVER (PARTITION BY commodity_id ORDER BY exp_year, exp_month_id) AS "change from last price"
FROM futures;
Thanks to the hint of #Shawn to use window functions I could rewrite the query in much shorter form:
CREATE VIEW "futures_nextmonths_win" AS
WITH trading_months AS (
SELECT DISTINCT commodity_id,
exp_month_id,
exp_year
FROM futures)
SELECT commodity_id,
exp_month_id,
exp_year,
lead(exp_month_id) OVER w AS next_month_id,
lead(exp_year) OVER w AS next_year
FROM trading_months
WINDOW w AS (PARTITION BY commodity_id ORDER BY exp_year, exp_month_id);
which is also slightly faster then the original one.

how to calculate percentage in sql without using traditonal methods

A B A/B
461264 307638 66.7178
334673 217099 64.869
372045 220354 59.2278
427186 181755 42.547
435871 214099 49.1198
Total 2031039 1140945 56.1754
-> this average is calculated by using (214099/435871) * 100.
I want a solution in which I have to take only one column i.e a/b and by using this only I want to calculate the average percentage.
For the average percentage (not overall percentage), perform the calc, then select the average:
select avg(x.pc)
from
(
select a/b as pc
from MyTable
) x
For the overall percentage, you need:
select x.a/x.b
from
(
select sum(a) a, sum(b) b
from MyTable
) x
Are you looking for this?
select avg(a / b)
from t;
Or this:
select avg(a) / avg(b)
from t;
The two answers are different. I can't easily tell from your question which version you want.

SQL Server : how to select a fixed amount of rows (select every x-th value)

A short description: I have a table with data that is updated over a certain time period. Now the problem is, that - depending on the nature of the sensor which sends the data - in this time period there could be either 50 data sets or 50.000. As I want to visualize this data (using ASP.NET / c#), for a first preview I would like to SELECT just 1000 values from the table.
I already have an approach doing this: I count the rows in the time period of interest, with a simple "where" clause to specify the sensor-id, save it as a variable in SQL, and then divide the count() by 1000. I've tried it in MS Access, where it works just fine:
set #divider = select count(*) from table where [...]
SELECT (Int([RowNumber]/#divider)), First(Value)
FROM myTable
GROUP BY (Int([RowNumber]/#divider));
The trick in Access was, that I simply have a data field ("RowNumber"), which is my PK/ID, and goes from 0 up. I tried to accomplish that in SQL Server using the ROW_NUMBER() method, which works more or less. I've got the right syntax for the method, but I can not use the GROUP BY statement
Windowed functions can only appear in the SELECT or ORDER BY
clauses.
meaning ROW_NUMBER() can't be in the GROUP BY statement.
Now I'm kinda stuck. I've tried to save the ROW_NUMBER value into a char or a separate column, and GROUP BY it later on, but I couldn't get it done. And somehow I start to think, that my strategy might have its weaknesses ...? :/
To clarify once more: I don't need to SELECT TOP 1000 from my table, because this would just mean that I select the first 1000 values (depending on the sorting). I need to SELECT every x-th value, while I can compute the x (and I could even round it to an INT, if that would help to get it done). I hope I was able to describe the problem understandable ...
This is my first post here on StackOverflow, I hope I didn't forget anything essential or important, if you need any further information (table structure, my queries so far, ...) please don't hesitate to ask. Any help or hint is highly appreciated - thanks in advance! :)
Update: SOLUTION! Big thanks to https://stackoverflow.com/users/52598/lieven!!!
Here is how I did it in the end:
I declare 2 variables - I count my rows and SET it into the first var. Then I use ROUND() on the just assigned variable, and divide it by 1000 (because in the end I want ABOUT 1000 values!). I split this operation into 2 variables, because if I used the value from the COUNT function as basis for my ROUND operation, there were some mistakes.
declare #myvar decimal(10,2)
declare #myvar2 decimal(10,2)
set #myvar = (select COUNT(*)
from value_table
where channelid=135 and myDate >= '2011-01-14 22:00:00.000' and myDate <= '2011-02-14 22:00:00.000'
)
set #myvar2 = ROUND(#myvar/1000, 0)
Now I have the rounded value, which I want to be my step-size (take every x-th value -> this is our "x" ;)) stored in #myvar2. Next I will subselect the data of the desired timespan and channel, and add the ROW_NUMBER() as column "rn", and finally add a WHERE-clause to the outer SELECT, where I divide the ROW_NUMBER through #myvar2 - when the modulus is 0, the row will be SELECTed.
select * from
(
select (ROW_NUMBER() over (order by id desc)) as rn, myValue, myDate
from value_table
where channel_id=135 and myDate >= '2011-01-14 22:00:00.000' and myDate<= '2011-02-14 22:00:00.000'
) d
WHERE rn % #myvar2 = 0
Works like a charm - once again all my thanks to https://stackoverflow.com/users/52598/lieven, see the comment below for the original posting!
In essence, all you need to do to select the x-th value is retain all rows where the modulus of the rownumber divided by x is 0.
WHERE rn % #x_thValues = 0
Now to be able to use your ROW_NUMBER's result, you'll need to wrap the entire statement into in a subselect
SELECT *
FROM (
SELECT *
, rn = ROW_NUMBER() OVER (ORDER BY Value)
FROM DummyData
) d
WHERE rn % #x_thValues = 0
Combined with a variable to what x-th values you need, you might use something like this testscript
DECLARE #x_thValues INTEGER = 2
;WITH DummyData AS (SELECT * FROM (VALUES (1), (2), (3), (4)) v (Value))
SELECT *
FROM (
SELECT *
, rn = ROW_NUMBER() OVER (ORDER BY Value)
FROM DummyData
) d
WHERE rn % #x_thValues = 0
One more option to consider:
Select Top 1000 *
From dbo.SomeTable
Where ....
Order By NewID()
but to be honest- like the previous answer more than this one.
The question could be about performance..