I have a dataset as shown below, wondering how I can do a rolling average with its current record followed by next two records. Example: lets consider the first record whose total is 3 followed by 4 and 7 ,Now the rolling 3 day average for first record would be 4.6 and so on.
Date Total
1 3
2 4
3 7
4 1
5 2
6 4
Expected output:
Date Total 3day_rolling_Avg
1 3 4.6
2 4 4
3 7 3.3
4 1 2.3
5 2 null
6 4 null
PS: Having "null" value isn't important. This is just a sample data where I need to look at more than 3 days(Ex: 30 days rolling)
I think that the simplest approach is a window avg(), with the poper window frame:
select
t.*,
avg(total)
over(order by date rows between current row and 2 following) as "3d_rolling_avg"
from mytable t
If you want to return a null value when there is less than 2 leading rows, as show in your expected results, then you can use row_number() on top of it:
select
t.*,
case when rank() over(order by date desc) <= 2
then avg(total)
over(order by date rows between current row and 2 following)
end as "3d_rolling_avg"
from mytable t
Related
I have some data like the following in a Snowflake database
DEVICE_SERIAL
REASON_CODE
VERSION
MESSAGE_CREATED_AT
NEXT_REASON_CODE
BA1254862158
1
4
2022-06-23 02:06:03
4
BA1254862158
4
4
2022-06-23 02:07:07
1
BA1110001111
1
5
2022-06-16 16:19:04
4
BA1110001111
4
5
2022-06-16 17:43:04
1
BA1110001111
5
5
2022-06-20 14:37:45
4
BA1110001111
4
5
2022-06-20 17:31:12
1
that's the result of a previous query. I'm trying to get the difference between message_created_at timestamps where the device_serial is the same between subsequent rows, and the first row (of the pair for the difference) has reason_code of 1 or 5, and the second row of the pair has reason_code 4.
For this example, my desired output would be
DEVICE_SERIAL
VERSION
DELTA_SECONDS
BA1254862158
4
64
BA1110001111
5
5040
BA1110001111
5
10407
It's easy to calculate the time difference between every pair of rows (just lead or lag + datediff). But I'm not sure how to structure a query to select only the desired rows so that I can get a datediff between them, without calculating spurious datediffs.
My ultimate goal is to see how these datediffs change between versions. I am but a lowly C programmer, my SQL-fu is weak.
with data as (
select *,
count(case when reason_code in (1, 5) then 1 end)
over (partition by device_serial order by message_created_at) as grp
/* or alternately bracket by the end code */
-- count(case when reason_code = 4 then 1 end)
-- over (partition by device_serial order by message_created_at desc) as grp
from T
)
select device_serial, min(version) as version,
datediff(second, min(message_created_at), max(message_created_at)) as delta_seconds
from data
group by device_serial, grp
Say I have a table with two columns: the time and the value. I want to be able to get a table with :
for each time get the max values of every next n seconds.
If I want the max value of every next 3 seconds, the following table:
time
value
1
6
2
1
3
4
4
2
5
5
6
1
7
1
8
3
9
7
Should return:
time
value
max
1
6
6
2
1
4
3
4
5
4
2
5
5
5
5
6
1
3
7
1
7
8
3
NULL
9
7
NULL
Is there a way to do this directly with an sql query?
You can use the max window function:
select *,
case
when row_number() over(order by time desc) > 2 then
max(value) over(order by time rows between current row and 2 following)
end as max
from table_name;
Fiddle
The case expression checks that there are more than 2 rows after the current row to calculate the max, otherwise null is returned (for the last 2 rows ordered by time).
Similar Version to Zakaria, but this solution uses about 40% less CPU resources (scaled to 3M rows for benchmark) as the window functions both use the same exact OVER clause so SQL can better optimize the query.
Optimized Max Value of Rolling Window of 3 Rows
SELECT *,
MaxValueIn3SecondWindow = CASE
/*Check 3 rows exists to compare. If 3 rows exists, then calculate max value*/
WHEN 3 = COUNT(*) OVER (ORDER BY [Time] ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING)
/*Returns max [Value] between the current row and the next 2 rows*/
THEN MAX(A.[Value]) OVER (ORDER BY [Time] ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING)
END
FROM #YourTable AS A
i have a tricky task,
lets assume we have table "Racings", and there we have columns TRACK, CAR, CIRCLE_TIME
here is an example how data could be look like:
id
track
car
circle_time
10
1
10
15
9
1
10
14
8
1
10
16
7
1
10
15
6
1
10
13
5
2
10
7
4
2
10
4
3
2
10
5
2
3
10
8
1
3
10
10
what i need, i to add one more coumn like avg3_circle_time which will show me an average time from last 3 circle_time from each track, example:
id
track
car
circle_time
avg3_circle_time
10
1
10
15
15
9
1
10
14
15
8
1
10
16
14.6
7
1
10
15
null
6
1
10
13
null
5
2
10
7
5.3
4
2
10
4
null
3
2
10
5
null
2
3
10
8
null
1
3
10
10
null
I know how it could works in oracle, you could use something like rowid, but in case of postgresql i don't know, i have a draft like .....avg(circle_time) OVER(PARTITION BY track,car.....) as avg3_circle_time..... help me to solve that task please
You can use window functions to calculate moving averages:
SELECT track, id, car, circle_time, AVG(circle_time) OVER (
PARTITION BY track
ORDER BY id
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
)
FROM t
ORDER BY track, id
Depending on your definition of previous three, the window could be ROWS BETWEEN 3 PRECEDING AND 1 PRECEDING.
If you want only values when at least 3 circles available
select *
, case when lag(id, 2) over(partition by TRACK, CAR order by id) is not null then
avg(CIRCLE_TIME) over(partition by TRACK, CAR order by id rows between 2 preceding and current row) end a
from Racing
order by id desc;
db<>fiddle
Output
id track car circle_time a
10 1 10 15 15.0000000000000000
9 1 10 14 15.0000000000000000
8 1 10 16 14.6666666666666667
7 1 10 15 null
6 1 10 13 null
5 2 10 7 5.3333333333333333
4 2 10 4 null
3 2 10 5 null
2 3 10 8 null
1 3 10 10 null
Use LAED() then checking one of the next 2 rows is NULL or not. THEN sum of three values for calculating average.
-- PostgreSQL
SELECT *
, CASE WHEN next_circle_time IS NULL OR next_next_circle_time IS NULL
THEN NULL
ELSE ((t.circle_time + COALESCE(next_circle_time, 0) + COALESCE(next_next_circle_time, 0)) / 3 :: DECIMAL) :: DECIMAL(10, 1)
END avg_circle_time
FROM (SELECT *
, LEAD(circle_time, 1) OVER (PARTITION BY track ORDER BY id DESC) next_circle_time
, LEAD(circle_time, 2) OVER (PARTITION BY track ORDER BY id DESC) next_next_circle_time
FROM Racings) t
Another way Use AVG()
SELECT *
, CASE WHEN LEAD(circle_time, 2) OVER (PARTITION BY track ORDER BY id DESC) IS NULL
OR LEAD(circle_time, 1) OVER (PARTITION BY track ORDER BY id DESC) IS NULL
THEN NULL
ELSE AVG(circle_time) OVER (PARTITION BY track ORDER BY id DESC ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING)
END :: DECIMAL(10, 2) avg_circle_time
FROM Racings
Please check from url where both query exists https://dbfiddle.uk/?rdbms=postgres_11&fiddle=f0cd868623725a1b92bf988cfb2deba3
Several of the posted answers end up repeating the window definition. You can avoid this with the window clause:
select *,
case when row_number() over(track_window) > 2
then trunc(avg(CIRCLE_TIME) over(track_window rows 2 preceding), 1)
end a
from Racing
window track_window as (partition by track order by id)
order by id desc
Note how, in this sample, track_window is defined once, then reused for both row_number and avg. In the latter case, the window clause is embellished with a frame as well (rows 2 preceding).
I have data that looks like this:
ID num_of_days
1 0
2 0
2 8
2 9
2 10
2 15
3 10
3 20
I want to add another column that increments in value only if the num_of_days column is divisible by 5 or the ID number increases so my end result would look like this:
ID num_of_days row_num
1 0 1
2 0 2
2 8 2
2 9 2
2 10 3
2 15 4
3 10 5
3 20 6
Any suggestions?
Edit #1:
num_of_days represents the number of days since the customer last saw a doctor between 1 visit and the next.
A customer can see a doctor 1 time or they can see a doctor multiple times.
If it's the first time visiting, the num_of_days = 0.
SQL tables represent unordered sets. Based on your question, I'll assume that the combination of id/num_of_days provides the ordering.
You can use a cumulative sum . . . with lag():
select t.*,
sum(case when prev_id = id and num_of_days % 5 <> 0
then 0 else 1
end) over (order by id, num_of_days)
from (select t.*,
lag(id) over (order by id, num_of_days) as prev_id
from t
) t;
Here is a db<>fiddle.
If you have a different ordering column, then just use that in the order by clauses.
The first six balls mean first over, next six balls mean second over & so on than how to get average runs for each over.
input as
Ball no Runs
1 4
2 6
3 3
4 2
5 6
6 1
1 2
2 4
3 6
4 3
5 1
6 1
1 2
output should be:
Over no avg runs
1 3.66
2 2.83
As Gordon Linoff suggested, SQL table represents unordered sets, So you have to use an ordered column in your table. If you can use such a column you may use below query -
SELECT Over_no AVG(Runs) avg_runs
FROM (SELECT Ball_no, Runs, CEIL(ROW_NUMBER() OVER(ORDER BY ORDER_COLUMN, Ball_no) RN / 6) Over_no
FROM YOUR_TABLE)
GROUP BY Over_no;
I have managed to solve my problem with the following query:
SELECT ROWNUM OVER_NO, AVG_RUNS
FROM(
SELECT ROWNUM RN,
ROUND(AVG(RUNS)OVER(ORDER BY ROWNUM RANGE BETWEEN CURRENT ROW AND 5 FOLLOWING),2) AVG_RUNS
FROM TABLE_NAME
)
WHERE RN=1 OR RN=7;