PostgreSQL calculate rolling average with group and order - sql

I have a table as follows
id | x | y | value
------+--------+-------+------------
1 | 1 | 1 | 25
1 | 1 | 2 | 42
1 | 2 | 3 | 98
1 | 2 | 4 | 54
1 | 3 | 5 | 67
2 | 1 | 1 | 78
2 | 1 | 2 | 45
2 | 2 | 3 | 96
I have to group this by id while maintaining the order by id, x, and y (in the respective order) and calculate the rolling average for previous n number of rows. For example if n = 3
id | x | y | value | rollingAvg
------+--------+-------+--------+-----------
1 | 1 | 1 | 25 | 25
1 | 1 | 2 | 42 | (25 / 1) = 25
1 | 2 | 3 | 98 | (25+42/2) = 33.5
1 | 2 | 4 | 54 | (25+42+98 /3) = 55
1 | 3 | 5 | 67 | (42+98+54 /3) = 64.67
2 | 1 | 1 | 78 | 78
2 | 1 | 2 | 45 | (78/1) = 78
2 | 2 | 3 | 96 | (78+45 / 2) = 61.5
Logic is
1) If the row is the 1st when grouped by id, the value should be the average
2) The average should not include the current row
Thanks in advance

We can use the AVG() function with a window frame to cover the previous three rows only:
select
id,
x,
y,
coalesce(avg(value) over
(partition by id order by y rows between 3 preceding AND 1 preceding), value) as rollingAvg
from your_table
order by id, y;
Demo
The call to COALESCE() is necessary, because you seem to expect that if the previous three rows are all NULL (which happens for the first record in each id group), then the current row's value should be used.

Related

How to minus by period in another column

I need results in minus column like:
For example, we take first result by A = 23(1)
and we 34(2) - 23(1) = 11, then 23(3) - 23(1)...
And so on. For each category.
+--------+----------+--------+-------+
| Period | Category | Result | Minus |
+--------+----------+--------+-------+
| 1 | A | 23 | n/a |
| 1 | B | 24 | n/a |
| 1 | C | 25 | n/a |
| 2 | A | 34 | 11 |
| 2 | B | 23 | -1 |
| 2 | C | 1 | -24 |
| 3 | A | 23 | 0 |
| 3 | B | 90 | 66 |
| 3 | C | 21 | -4 |
+--------+----------+--------+-------+
Could you help me?
Could we use partitions or lead here?
SELECT
*,
Result - FIRST_VALUE(Result) OVER (PARTITION BY Category ORDER BY Period) AS Minus
FROM
yourTable
This doesn't create the hello values, but returns 0 instead. I'm not sure returning arbitrary string in an integer column makes sense, so I didn't do it.
If you really need to avoid the 0 you could just use a CASE statement...
CASE WHEN 1 = Period
THEN NULL
ELSE Result - FIRST_VALUE(Result) OVER (PARTITION BY Category ORDER BY Period)
END
Or, even more robustly...
CASE WHEN 1 = ROW_NUMBER() OVER (PARTITION BY Category ORDER BY Period)
THEN NULL
ELSE Result - FIRST_VALUE(Result) OVER (PARTITION BY Category ORDER BY Period)
END
(Apologies for any typos, etc, I'm on my phone.)
You can do:
select b.*, b.result - a.result as "minus"
from t a
join t b on b.category = a.category and a.period = 1
Result:
period category result minus
------- --------- ------- -----
1 A 23 0
1 B 24 0
1 C 25 0
2 A 34 11
2 B 23 -1
2 C 1 -24
3 A 23 0
3 B 90 66
3 C 21 -4
See running example at DB Fiddle.
Ok, just not to duplicate my question
How to do it?
If
For each new Sub period we should repeat first_value logic
+------------+----------+------------+----------+---------+
| Sub period | Period | Category | Result | Minus |
+------------+----------+------------+----------+---------+
| SA | 1 | A | 23 | n/a |
| SA | 2 | A | 34 | 11 |
| SA | 3 | A | 35 | 12 |
| SA | 4 | A | 36 | 13 |
| KS | 1 | A | 23 | n/a |
| KS | 2 | A | 21 | -2 |
| KS | 3 | A | 23 | 0 |
| KS | 4 | A | 21 | -2 |
+------------+----------+------------+----------+---------+

Running Count by Group and Flag in BigQuery?

I have a table that looks like the below:
Row | Fullvisitorid | Visitid | New_Session_Flag
1 | A | 111 | 1
2 | A | 120 | 0
3 | A | 128 | 0
4 | A | 133 | 0
5 | A | 745 | 1
6 | A | 777 | 0
7 | B | 388 | 1
8 | B | 401 | 0
9 | B | 420 | 0
10 | B | 777 | 1
11 | B | 784 | 0
12 | B | 791 | 0
13 | B | 900 | 1
14 | B | 904 | 0
What I want to do is if it's the first row for a fullvisitorid then mark the field as 1, otherwise use the above row as the value, but if the new_session_flag = 1 then use the above row plus 1, example of output I'm looking for below:
Row | Fullvisitorid | Visitid | New_Session_Flag | Rank_Session_Order
1 | A | 111 | 1 | 1
2 | A | 120 | 0 | 1
3 | A | 128 | 0 | 1
4 | A | 133 | 0 | 1
5 | A | 745 | 1 | 2
6 | A | 777 | 0 | 2
7 | B | 388 | 1 | 1
8 | B | 401 | 0 | 1
9 | B | 420 | 0 | 1
10 | B | 777 | 1 | 2
11 | B | 784 | 0 | 2
12 | B | 791 | 0 | 2
13 | B | 900 | 1 | 3
14 | B | 904 | 0 | 3
As you can see:
Row 1 is 1 because it's the first time fullvisitorid A appears
Row 2 is 1 because it's not the first time fullvisitorid A appears and new_session_flag <> 1 therefore it uses the above row (i.e. 1)
Row 5 is 2 because it's not the first time fullvisitorid A appears and new_session_Flag = 1 therefore it uses the above row (i.e 1) plus 1
Row 7 is 1 because it's the first time fullvisitorid B appears
etc.
I believe this can be done through a retain statement in SAS but is there an equivalent in Google BigQquery?
Hopefully the above makes sense, let me know if not.
Thanks in advance
Below is for BigQuery Standard SQL
#standardSQL
SELECT *,
COUNTIF(New_Session_Flag = 1) OVER(PARTITION BY Fullvisitorid ORDER BY Visitid) Rank_Session_Order
FROM `project.dataset.table`
The answer by Mikhail Berlyant using a conditional window count is corret and works. I am answering because I find that a window sum is even simpler (and possibly more efficient on a large dataset):
select
t.*,
sum(new_session_flag) over(partition by fullvisitorid order by visid_id) rank_session_order
from mytable t
This works because the new_session_flag contains 0s and 1s only; so counting the 1s is actually equivalent to suming all values.

Using a Group BY or PARTITION on multiple columns where records are grouped together if ANY of the columns return a match

Let's say I have the following table:
Record_ID | Match_criteria_1 | Match_criteria_2 | Match_criteria_3 | Dollars
1 | A | V | F | 10
2 | A | W | G | 20
3 | B | W | H | 30
4 | B | X | I | 40
5 | C | Y | F | 50
6 | C | Z | J | 60
If I try to use a 'GROUP BY' or 'Over (PARTITION BY)' on Match_criteria_1, Match_criteria_2, and Match_criteria_3, I would end up with separate 6 groups/partitions.
SELECT *, sum(Dollars) OVER (PARTITION BY Match_criteria_1, Match_criteria_2, Match_Criteria_3) AS Total_Dollars
FROM My_table
Record_ID | Match_criteria_1 | Match_criteria_2 | Match_criteria_3 | Dollars | Total_Dollars
1 | A | V | F | 10 | 10
2 | A | W | G | 20 | 20
3 | B | W | H | 30 | 30
4 | B | X | I | 40 | 40
5 | C | Y | F | 50 | 50
6 | C | Z | J | 60 | 60
As you can see, none of the records have the same Match_criteria_1, Match_criteria_2, and Match_criteria_3.
But what if I wanted to group records that have the same Match_criteria_1, Match_criteria_2 OR Match_criteria_3?
So using my example, Record 1 matches with Record 2 due to Match_criteria_1, Record 2 matches with Record 3 due to Match_criteria_2, Record 3 matches with Record 4 due to Match_criteria_1, Record 5 matches with Record 1 due to Match_Criteria_3, and Record 6 matches with Record 5 due to Match_criteria 1 (so sort of a transitive property thing going on). The desired result then is :
Record_ID | Match_criteria_1 | Match_criteria_2 | Match_criteria_3 | Dollars | Total_Dollars
1 | A | V | F | 10 | 210
2 | A | W | G | 20 | 210
3 | B | W | H | 30 | 210
4 | B | X | I | 40 | 210
5 | C | Y | F | 50 | 210
6 | C | Z | J | 60 | 210
where Total_dollars is the sum of every record due to the fact that all six policies match with each other due to transitivity. So Records 1 and 6 may have no match criteria in common but they are still grouped together because they both match with Record 5.
Perhaps I am understanding this incorrectly, but I think you could just get the total dollars for all rows that match other rows and join to that? I'm not sure exactly what your expected output should be if a row doesn't match. This answer will still include that row, but the total dollars would not include the amount in that row.
SELECT test.*, total_dollars
FROM test,
(SELECT sum(dollars) as total_dollars
FROM (select distinct test.* FROM test
JOIN test test2 ON (ARRAY[test.match_criteria_1, test.match_criteria_2, test.match_criteria_3] && ARRAY[test2.match_criteria_1, test2.match_criteria_2, test2.match_criteria_3])
AND test.record_id != test2.record_id order by 1
) sub)
sub2;
I added another row that shouldn't match any of the others:
insert into test VALUES (7, 'M', 'N', 'O', 100);
and here are the results:
record_id | match_criteria_1 | match_criteria_2 | match_criteria_3 | dollars | total_dollars
-----------+------------------+------------------+------------------+---------+---------------
1 | A | V | F | 10 | 210
2 | A | W | G | 20 | 210
3 | B | W | H | 30 | 210
4 | B | X | I | 40 | 210
5 | C | Y | F | 50 | 210
6 | C | Z | J | 60 | 210
7 | M | N | O | 100 | 210
(7 rows)

SQL Increment number in select statement

I have an issue where I need group a set of values and increase the group number when the variance between 2 columns is greater than or equal to 4, please see below.
UPDATE: I added a date column so you can view the order, but I need the group to update based off of the variance not the date.
+--------+-------+-------+----------+--------------+
| Date | Col 1 | Col 2 | Variance | Group Number |
+--------+-------+-------+----------+--------------+
| 1-Jun | 2 | 1 | 1 | 1 |
| 2-Jun | 1 | 1 | 0 | 1 |
| 3-Jun | 3 | 2 | 1 | 1 |
| 4-Jun | 4 | 1 | 3 | 1 |
| 5-Jun | 5 | 1 | 4 | 2 |
| 6-Jun | 1 | 1 | 0 | 2 |
| 7-Jun | 23 | 12 | 11 | 3 |
| 8-Jun | 12 | 11 | 1 | 3 |
| 9-Jun | 2 | 1 | 1 | 3 |
| 10-Jun | 13 | 4 | 9 | 4 |
| 11-Jun | 2 | 1 | 1 | 4 |
+--------+-------+-------+----------+--------------+
The group number is simply the number of times that 4 or greater appears in the variance column. You can get this using a correlated subquery:
select t.*,
(select 1 + count(*)
from table t2
where t2.date < t.date and t2.variance >= 4
) as GroupNumber
from table t;
In SQL Server 2012+, you can also do this using a cumulative sum:
select t.*,
sum(case when variance >= 4 then 1 else 0 end) over
(order by date rows between unbounded preceding and 1 preceding
) as GroupNumber
from table t;

How can I increment counter when the value in another column changes?

I have the following table
ID
12
12
25
25
78
78
78
And I need to be able to increment the counter value when the ID changes.
ID **COUNTER**
12 1
12 1
25 2
25 2
78 3
78 3
78 3
How can this be done? Is it even possible?
You can use dense_rank():
select id,
dense_rank() over(order by id) Counter
from yourtable
See SQL Fiddle with Demo
Result:
| ID | COUNTER |
----------------
| 12 | 1 |
| 12 | 1 |
| 25 | 2 |
| 25 | 2 |
| 78 | 3 |
| 78 | 3 |
| 78 | 3 |