Moving Average in SQL Server - sql

I have the below table with values:
Row_ID FQFY Average
1 2018-Q1 70%
2 2018-Q2 60%
3 2018-Q3 50%
4 2018-Q4 90%
5 2019-Q1 70%
6 2019-Q2 80%
7 2019-Q3 20%
8 2019-Q4 NULL
9 2020-Q1 30%
Starting from 4th row, I have a requirement to calculate the moving average of preceding 4 row values. And if there is any NULL value, then the requirement is to ignore this NULL while doing the average
Can someone please help me here with the code in SQL Server?

Use AVG with an appropriate window frame:
SELECT *, AVG(Average) OVER (ORDER BY FQFY ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) rollingAverage
FROM yourTable;
Regarding the NULL requirement, AVG by default already ignores NULL values.

Related

Partition rows based on percent value of the data range

What will be the select query if I want to rank and partition rows based on percent range of the partitioning column?
For example, Let say I have the below table structure (the column 'Rank' needs to be populated).
And I want to rank the rows based on the order of score for the row, but only within the +/- 10% of the amount range from the current row's amount. That is, for the first row, the amount is 2.3, and +/-10% of 2.3 is: 2.07 - 2.53. So while ranking first row, I should rank based on the score and consider only those rows which has the amount in the range 2.07 - 2.53 (in this case its id's 1,5,11). Based on this logic the percentile rank is populated in the last column, and the rank for first row will be 0.5. Similarly, perform the same steps for each row.
Question is how can I do this with PERCENT_RANK() or RANK() or NTILE() with partition clause as part of a select query? The original table does not have the last column, that is the column that needs to be populated. I need the percentile ranking of the row based on the score within the 10% range of amount.
PRODUCT_ID
Amount
Score
Percent_Rank
1
2.3
45
0.5
2
2.7
30
0
3
2.0
40
0.5
4
2.6
50
1
5
2.2
35
0
6
5.1
25
0
7
4.8
40
1
8
6.1
60
0
9
22.1
70
0.33
10
8.2
20
0
11
2.1
50
1
12
22.2
60
0
13
22.3
80
1
14
22.4
75
0.66
I tried using the PERCENT_RANK() over partition() but its not considering the range. I cannot use range unbounded preceding and following in the partition clause because I need the range to be within 10% of the amount in the current row.
You may try PERCENT_RANK() with a self join as the following:
SELECT PRODUCT_ID, Amount, Score, Percent_Rank
FROM
(
SELECT A.PRODUCT_ID, A.Amount ,A.Score, B.Amount AS B_Amount,
PERCENT_RANK() OVER (PARTITION BY A.PRODUCT_ID ORDER BY B.SCORE) Percent_Rank
FROM table_name A JOIN table_name B
ON B.Amount BETWEEN A.Amount-A.Amount *0.1 AND A.Amount+A.Amount*0.1
) T
WHERE Amount=B_Amount
See a demo.
I think that you can just nest your percent_rank in a subquery once you have calculated the bucket number based on equally spaced scores.
The trickiest part of this example is actually getting the fixed width buckets. It might be simpler if we could use width_bucket() but some databases don't support that, so I had to compute manually (in 'bucketed' inline table).
Here is the example. I used postgres to create the mockup test table, because it has a very nice generate_series(), but the actual example SQL should run on any database.
create table product_scores as (
select
product_id,
score
from
generate_series(1,2) product_id,
generate_series(1,50) score);
This created a table with two product ids and 50 scores for each one.
with ranges as (
select
product_id,
(max(score)-min(score))*(1+1e-10) range,
min(score) minscore from product_scores
group by product_id),
bucketed as (
select
ranges.product_id,
score,
floor((score-minscore)*10.0/range) bucket
from
ranges
inner
join product_scores
on
ranges.product_id=product_scores.product_id)
select
product_id,
score,
bucket,
percent_rank()
over (partition by product_id,bucket order by score) from bucketed;
No the 1e-10 is not a joke. Unfortunately roundoff error would assign the highest value to a bucket all by itself unless we expand the range by a tiny amount. But once we have a workable range we can then calculate the partition easily enough by checking range.
Then having the partition number you can to the percent_rank() as usual, as shown.

In hive window, what would happen if the value of CURRENT ROW is smaller than that of UNBOUNDED PRECEDING

when i use RANGE to specify the window in hive, I get some confused results.
there is a test table.
select
id,
val,
sum(val) over(order by val rows between unbounded preceding and
current row) rows_sum,
sum(val) over(order by val range between unbounded preceding and
current row) range_sum
from test
here is the result of above query. and this is also my expecting results
id
val
rows_sum
range_sum
1
1
1
2
2
1
2
2
3
3
5
5
4
6
11
23
5
6
17
23
6
6
23
23
but for range_sum field, if i change the order rule from asc to desc. say
sum(val) over(order by val desc range between unbounded preceding and
current row) range_sum
here is the result
val
range_sum
6
18
6
18
6
18
3
21
1
23
1
23
but my expecting results for range_sum is
val
range_sum
6
18
6
18
6
18
3
NULL
1
NULL
1
NULL
There’re two ways of defining frame in Hive, ROWS AND RANGE.for example ,SUM(val) RANGE BETWEEN 100 PRECEDING AND 200 FOLLOWING selects rows by the distance from the current row’s value. Say current val is 200, and this frame will includes rows whose val values range from 100 to 400
so in my example above:range between unbounded preceding and current row
when the val is 6, the frame include 3 rows with val = 6, so the sum is 18.
but when we consider the 4th row, which val is 3. because the ordering rule is desc, the val of the UNBOUNDED PRECEDING row is 6, and CURRENT ROW IS 3.The frame should include rows whose val is between 6 and 3, and no rows satisfy this condition. but the query result is 21 . i mean it likes between 3 and 6 , not between 6 and 3.
It works as designed and according to the standard, the same behavior is in other databases.
It is easier to find specification for Hive and other databases like Oracle than standard document (for free). For example see "Windowing Specifications in HQL" and Oracle "Window Function Frame Specification"
First the partition is ordered, then bounds are calculated and frame between bounds is used. Frame is taken according to the ORDER BY, not always >=bound1 and <=bound2.
For order by DESC Bound1>=Row.value>=Bound2. Frame includes rows from the partition start through the current row, including all peers of the current row (rows equal to the current row according to the ORDER BY clause).
For order by ASC Bound1<=Row.value<=Bound2.
UNBOUNDED PRECEDING:
The bound (bound1) is the first partition row (according to the order).
CURRENT ROW:
For ROWS, the bound (bound2) is the current row. For RANGE, the bound is the peers of the current row (rows with the same value as current row).
Also read this excellent explanation from Sybase:
The sort order of the ORDER BY values is a critical part of the test
for qualifying rows in a value-based frame; the numeric values alone
do not determine exclusion or inclusion

snowflake ntile of engagement rate over month

I'm trying to write a query that ranks a customer's engagement rate by decile within a month, against all other customers.
I tried:
ntile(10) over (partition by rec_month order by engagement_rate) as decile
but I don't think that is getting me what I need. It appears to just be splitting the vhosts up into 10 equally sized groups. I want percentiles.
I also tried:
ntile(10) over (partition by rec_month, vhost order by engagement_rate) as decile
But that is only calculating it within customer (vhost) within the month.
How do I calculate the engagement_rate decile against all other customers (Vhosts) within the month?
At first I suspect you a rank function, like DENSE_RANK or RANK (which is sparse, the on I'd use), and then turn that to a percentage by dividing by the count(*) and then truncating into deciles by (trunc(V/10)*10), but then I suspect you might want the output of the PERCENT_RANK function, but the snowflake documentation example is not as clarifying as I would hope, to know it solves your problem.
select column1,
column2,
round(percent_rank() over (order by column2),3) as p_rank,
trunc(p_rank*10)*10 as decile
from values ('a',1),('b',2),('c',3),('d',4),('e',5),('f',6),('g',7);
gives
COLUMN1 COLUMN2 P_RANK DECILE
a 1 0 0
b 2 0.167 10
c 3 0.333 30
d 4 0.5 50
e 5 0.667 60
f 6 0.833 80
g 7 1 100
but maybe you want to use ntile instead of the truncating of the percentage. The round is just there to make the answers above less verbose.

sql DB calculation moving summary‏‏‏‏‏

I would like to calculate moving summary‏‏‏‏‏:
Total amount:100
first receipt: 20
second receipt: 10
the first row in calculation column is a difference between total amount and the first receipt: 100-20=80
the second row in calculation column is a difference between the first calculated_row and the first receip: 80-10=70
The presentation is supposed to present receipt_amount, balance:
receipt_amount | balance
20 | 80
10 | 70
I'll be glad to use your help
Thanks :-)
You didn't really give us much information about your tables and how they are structured.
I'm assuming that there is an orders table that contains the total_amount and a receipt_table that contains each receipt (as a positive value):
As you also didn't specify your DBMS, this is ANSI SQL:
select sum(amount) over (order by receipt_nr) as running_sum
from (
select total_amount as amount
from orders
where order_no = 1
union all
select -1 * receipt_amount
from the_receipt_table
where order_no =
) t
First of all- thanks for your response.
I work with Cache DB which can be used both SQL and ORACLE syntax.
Basically, the data is locaed in two different tables, but I have them in one join query.
Couple of rows with different receipt amounts and each row (receipt) has the same total amount.
Foe example:
Receipt_no Receipt_amount Total_amount Balance
1 20 100 80
1 10 100 70
1 30 100 40
2 20 50 30
2 10 50 20
So, the calculation is supposed to be in a way that in the first receipt the difference calculation is made from the total_amount and all other receipts (in the same receipt_no) are being reduced from the balance
Thanks!

Optimizing a Vertica SQL query to do running totals

I have a table S with time series data like this:
key day delta
For a given key, it's possible but unlikely that days will be missing.
I'd like to construct a cumulative column from the delta values (positive INTs), for the purposes of inserting this cumulative data into another table. This is what I've got so far:
SELECT key, day,
SUM(delta) OVER (PARTITION BY key ORDER BY day asc RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW),
delta
FROM S
In my SQL flavor, default window clause is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, but I left that in there to be explicit.
This query is really slow, like order of magnitude slower than the old broken query, which filled in 0s for the cumulative count. Any suggestions for other methods to generate the cumulative numbers?
I did look at the solutions here:
Running total by grouped records in table
The RDBMs I'm using is Vertica. Vertica SQL precludes the first subselect solution there, and its query planner predicts that the 2nd left outer join solution is about 100 times more costly than the analytic form I show above.
I think you're essentially there. You may just need to update the syntax a bit:
SELECT s_qty,
Sum(s_price)
OVER(
partition BY NULL
ORDER BY s_qty ASC rows UNBOUNDED PRECEDING ) "Cumulative Sum"
FROM sample_sales;
Output:
S_QTY | Cumulative Sum
------+----------------
1 | 1000
100 | 11000
150 | 26000
200 | 28000
250 | 53000
300 | 83000
2000 | 103000
(7 rows)
reference link:
https://dwgeek.com/vertica-cumulative-sum-average-and-example.html/
Sometimes it's faster to just use a correlated subquery:
SELECT
[key]
, [day]
, delta
, (SELECT SUM(delta) FROM S WHERE [key] < t1.[key]) AS DeltaSum
FROM S t1