Count(*) -1 in the denominator - sql - sql

Can someone please help me with the count(*)-1 in the denominator here - why is -1 needed in the query below
Q: the query helps in finding average days between orders for each customer
A: select CustomerID
, cast(DATEDIFF(dd, min(OrderDate), max(OrderDate)) as decimal) / (count() - 1) as [Avg_day]
from Orders
group by CustomerID
having count() > 1

Consider a sequence of times such as:
A........B........C........D
You want to find the average time between two events. Well, this is defined as:
( (B - A) + (C - B) + (D - C) ) / 3
You can expand this out:
B/3 - A/3 + C/3 - B/3 + D/3 - C/3
Notice that Bs and Cs cancel out, so you are left with:
-A/3 + D/3
which is
(D - A) / 3
That is your original expression. The 3 is one less than the number of points you started with.
This generalizes to any number of events. The divisor is one less than the total number of events (really, the number of adjacent pairs).

Related

MDX - Sum one measure at leaf lvl where another measure exists at leaf lvl

I need to work out Stockturn at any lvl in an item hierarchy.
So far the calc is SUM the last 3 months [COGS] * 4 divided by [SOH] (SOH Value)
The below is embedded in a BI tool and works. It is in the Item grid and filters for every member in the Item dimension at the level display:
((SUM({.lag(3):.lag(1)}, ([Type].[Type].[Actual], [Measures].[Cogs]))) * 4) /
([Type].[Type].[Actual], [Measures].[SOH], )
What it needs changed/added is only returning [SOH] for Items where the [COGS] measure has values.
So if I have items that do not have COGS then do not include them:
Item
SOH
COGS
A
10
20
B
15
40
C
20
Do not include C as it will throw the calc.
The simplest approach - assuming you are using SSAS 2012 or newer - would be to use Divide instead of the division operator /:
Divide( ((SUM({.lag(3):.lag(1)}, ([Type].[Type].[Actual], [Measures].[Cogs]))) * 4),
([Type].[Type].[Actual], [Measures].[SOH], )
)
Divide returns NULL (empty value) instead of infinity when dividing by zero or null.
Edit: I see you want to avoid the calculation of the numerator is null or 0, not the denominator. In this case, you would use Iif:
IIF((SUM({.lag(3):.lag(1)}, ([Type].[Type].[Actual], [Measures].[Cogs])) <> 0,
((SUM({.lag(3):.lag(1)}, ([Type].[Type].[Actual], [Measures].[Cogs]))) * 4)
/
([Type].[Type].[Actual], [Measures].[SOH], )
, NULL)
Iif is a function with three arguments, a condition, the value to use if the condition is true, and the condition to use if the condition is false.

One-dimensional earth mover's distance in BigQuery/SQL

Let P and Q be two finite probability distributions on integers, with support between 0 and some large integer N. The one-dimensional earth mover's distance between P and Q is the minimum cost you have to pay to transform P into Q, considering that it costs r*|n-m| to "move" a probability r associated to integer n to another integer m.
There is a simple algorithm to compute this. In pseudocode:
previous = 0
sum = 0
for i from 0 to N:
previous = P(i) - Q(i) + previous
sum = sum + abs(previous) // abs = absolute value
return sum
Now, suppose you have two tables that contain each a probability distribution. Column n contains integers, and column p contains the corresponding probability. The tables are correct (all probabilities are between 0 and 1, their sum is I want to compute the earth mover's distance between these two tables in BigQuery (Standard SQL).
Is it possible? I feel like one would need to use analytical functions, but I don't have much experience with them, so I don't know how to get there.
What if N (the maximum integers) is very large, but my tables are not? Can we adapt the solution to avoid doing a computation for each integer i?
Hopefully I fully understand your problem. This seems to be what you're looking for:
WITH Aggr AS (
SELECT rp.n AS n, SUM(rp.p - rq.p)
OVER(ORDER BY rp.n ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS emd
FROM P rp
LEFT JOIN Q rq
ON rp.n = rq.n
) SELECT SUM(ABS(a.emd)) AS total_emd
FROM Aggr a;
WRT question #2, note that we only scan what's actually in tables, regardless of the N, assuming a one-to-one match for every n in P with n in Q.
I adapted Michael's answer to fix its issues, here's the solution I ended up with. Suppose the integers are stored in column i and the probability in column p. First I join the two tables, then I compute EMD(i) for all i using the window, then I sum all absolute values.
WITH
joined_table AS (
SELECT
IFNULL(table1.i, table2.i) AS i,
IFNULL(table1.p, 0) AS p,
IFNULL(table2.p, 0) AS q,
FROM table1
OUTER JOIN table2
ON table1.i = table2.i
),
aggr AS (
SELECT
(SUM(p-q) OVER win) * (i - (LAG(i,1) OVER win)) AS emd
FROM joined_table
WINDOW win AS (
ORDER BY i
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
)
)
SELECT SUM(ABS(emd)) AS total_emd
FROM aggr

Add new column that finds percentage off value in existing column

In Oracle, I want to write a SELECT query that will find 1.75 percentage off an existing column that has whole numbers in each row and round it to the nearest dollar.
I have somewhat of my query down, but can't figure out how to write the formula to perform the percentage off calculation.
SELECT R.LAST_NAME, O.RENT_FEE, ROUND(RENT_FEE,0) AS DISCOUNT
FROM ROOM_UNIT R, OWNER O
WHERE R.OWNER_NUM = O.OWNER_NUM
From math lessons we know that discounting by x percent is equivalent to multiplying by
(1-x/100)
In your case, discounting by 1.75% means multiplying by, (1 - 0.0175) or 0.9825:
SELECT R.LAST_NAME, O.RENT_FEE, ROUND((1 - 0.0175) * RENT_FEE,0) AS DISCOUNT
FROM ROOM_UNIT R, OWNER O -- ^^^^^^^^^^^^
WHERE R.OWNER_NUM = O.OWNER_NUM

How to Calc Exponential Moving Average using SQL Server 2012 Window Functions

I know that it is easy to calculate simple moving average using SQL Server 2012 window functions and OVER() clause. But how can I calculate exponential moving average using this approach? Thanks!
The formula for EMA(x) is:
EMA(x1) = x1
EMA(xn) = α * xn + (1 - α) * EMA(xn-1)
With β := 1 - α that is equivalent to
EMA(xn) = βn-1 * x1 + α * βn-2 * x2 + α * βn-3 * x3 + ... + α * xn
In that form it is easy to implement with LAG. For a 4 row EMA it would look like this:
SELECT LAG(x,3)OVER(ORDER BY ?) * POWER(#beta,3) +
LAG(x,2)OVER(ORDER BY ?) * POWER(#beta,2) * #alpha +
LAG(x,1)OVER(ORDER BY ?) * POWER(#beta,1) * #alpha +
x * #alpha
FROM ...
OK, as you seem to be after the EWMA_Chart I created a SQL Fiddle showing how to get there. However, be aware that it is using a recursive CTE that requires one recursion per row returned. So on a big dataset you will most likely get disastrous performance. The recursion is necessary as each row depends on all rows that happened before. While you could get all preceding rows with LAG() you cannot also reference preceding calculations as LAG() cannot reference itself.
Also, the formular in the spreadsheet you attached below does not make sense. It seems to be trying to calculate the EWMA_Chart value but it is failing at that. In the above SQLFiddle I included a column [Wrong] that calculates the same value that the spreadsheet is calculating.
Either way, if you need to use this on a big dataset, you are probably better of writing a cursor.
This is the code that does the calculation in above SQLFiddle. it references th vSMA view that calculates the 10 row moving average.
WITH
smooth AS(
SELECT CAST(0.1818 AS NUMERIC(20,5)) AS alpha
),
numbered AS(
SELECT Date, Price, SMA, ROW_NUMBER()OVER(ORDER BY Date) Rn
FROM vSMA
WHERE SMA IS NOT NULL
),
EWMA AS(
SELECT Date, Price, SMA, CAST(SMA AS NUMERIC(20,5)) AS EWMA, Rn
, CAST(SMA AS NUMERIC(20,5)) AS Wrong
FROM numbered
WHERE Rn = 1
UNION ALL
SELECT numbered.Date, numbered.Price, numbered.SMA,
CAST(EWMA.EWMA * smooth.alpha + CAST(numbered.SMA AS NUMERIC(20,5)) * (1 - smooth.alpha) AS NUMERIC(20,5)),
numbered.Rn
, CAST((numbered.Price - EWMA.EWMA) * smooth.alpha + EWMA.EWMA AS NUMERIC(20,5))
FROM EWMA
JOIN numbered
ON EWMA.rn + 1 = numbered.rn
CROSS JOIN smooth
)
SELECT Date, Price, SMA, EWMA
, Wrong
FROM EWMA
ORDER BY Date;

How do I incorporate a Double Moving Average in straight SQL in a single query?

EDIT: I need to do this in ACCESS.
I am an SQL virgin and would greatly appreciate any magical assistance!
For a simple 12 month forecast, I am utilizing a 12 month Double Moving Average.
I have managed to pull the Single Moving Average through Query 1 (below).
Based on the table created by Query 1, I have written another query (Query 2) to get the Double Moving Average.
As such, my current process requires two queries. My efforts so far to combine these two steps into a single query have not been successful.
My Question: Is there any way to calculate Double Moving Average in a single query?
QUERY 1 - For Single Moving Average:
SELECT A.*, IIf([A].[VOL]>0,
(SELECT AVG(B.[VOL])
FROM [Turnover] as B
WHERE (B.Code = A.Code) AND (B.YM Between A.YM - 1 AND A.YM - ([12] * 31))),
(SELECT AVG(B.[VOL])
FROM [Turnover] as B
WHERE (B.Code = A.Code) AND (B.YM Between Now() - 31 AND A.YM - ([12] * 31)))) AS [Mvg Avg 1],
INTO [Model 12m]
FROM [Turnover] AS A;
QUERY 2 - Double Moving Average (currently this refers to QUERY 1):
SELECT A.*, IIf([A].[Mvg Avg 1]>0,(SELECT AVG(B.[Mvg Avg 1])
FROM [Model 12m] as B
WHERE (B.Code = A.Code) AND (B.YM Between A.YM - 1 AND A.YM - ([12] * 31))),(SELECT AVG(B.[Mvg Avg 1])
FROM [Model 12m] as B
WHERE (B.Code = A.Code) AND (B.YM Between Now() - 31 AND A.YM - ([12] * 31)))) AS [2 Mvg Avg],
INTO [Model 12m - 2MA]
FROM [Model 12m] AS A;
My Question: Is there any way to calculate Double Moving Average in a
single query?
Since you're using Microsoft Access, there should be no need to do that. Save the first query as a new query. (For people who don't use MS Access, saving the SQL statement as a new query is equivalent to the SQL statement CREATE VIEW ....) Then use the second query as-is, or save it as another new query.
MS Access is really good at optimizing queries built on queries (views built on views).