Snowflake SQL: trying to calculate time difference between subsets of subsequent rows - sql

I have some data like the following in a Snowflake database
DEVICE_SERIAL
REASON_CODE
VERSION
MESSAGE_CREATED_AT
NEXT_REASON_CODE
BA1254862158
1
4
2022-06-23 02:06:03
4
BA1254862158
4
4
2022-06-23 02:07:07
1
BA1110001111
1
5
2022-06-16 16:19:04
4
BA1110001111
4
5
2022-06-16 17:43:04
1
BA1110001111
5
5
2022-06-20 14:37:45
4
BA1110001111
4
5
2022-06-20 17:31:12
1
that's the result of a previous query. I'm trying to get the difference between message_created_at timestamps where the device_serial is the same between subsequent rows, and the first row (of the pair for the difference) has reason_code of 1 or 5, and the second row of the pair has reason_code 4.
For this example, my desired output would be
DEVICE_SERIAL
VERSION
DELTA_SECONDS
BA1254862158
4
64
BA1110001111
5
5040
BA1110001111
5
10407
It's easy to calculate the time difference between every pair of rows (just lead or lag + datediff). But I'm not sure how to structure a query to select only the desired rows so that I can get a datediff between them, without calculating spurious datediffs.
My ultimate goal is to see how these datediffs change between versions. I am but a lowly C programmer, my SQL-fu is weak.

with data as (
select *,
count(case when reason_code in (1, 5) then 1 end)
over (partition by device_serial order by message_created_at) as grp
/* or alternately bracket by the end code */
-- count(case when reason_code = 4 then 1 end)
-- over (partition by device_serial order by message_created_at desc) as grp
from T
)
select device_serial, min(version) as version,
datediff(second, min(message_created_at), max(message_created_at)) as delta_seconds
from data
group by device_serial, grp

Related

SQL Get max value of n next rows

Say I have a table with two columns: the time and the value. I want to be able to get a table with :
for each time get the max values of every next n seconds.
If I want the max value of every next 3 seconds, the following table:
time
value
1
6
2
1
3
4
4
2
5
5
6
1
7
1
8
3
9
7
Should return:
time
value
max
1
6
6
2
1
4
3
4
5
4
2
5
5
5
5
6
1
3
7
1
7
8
3
NULL
9
7
NULL
Is there a way to do this directly with an sql query?
You can use the max window function:
select *,
case
when row_number() over(order by time desc) > 2 then
max(value) over(order by time rows between current row and 2 following)
end as max
from table_name;
Fiddle
The case expression checks that there are more than 2 rows after the current row to calculate the max, otherwise null is returned (for the last 2 rows ordered by time).
Similar Version to Zakaria, but this solution uses about 40% less CPU resources (scaled to 3M rows for benchmark) as the window functions both use the same exact OVER clause so SQL can better optimize the query.
Optimized Max Value of Rolling Window of 3 Rows
SELECT *,
MaxValueIn3SecondWindow = CASE
/*Check 3 rows exists to compare. If 3 rows exists, then calculate max value*/
WHEN 3 = COUNT(*) OVER (ORDER BY [Time] ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING)
/*Returns max [Value] between the current row and the next 2 rows*/
THEN MAX(A.[Value]) OVER (ORDER BY [Time] ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING)
END
FROM #YourTable AS A

Resetting a Count in SQL

I have data that looks like this:
ID num_of_days
1 0
2 0
2 8
2 9
2 10
2 15
3 10
3 20
I want to add another column that increments in value only if the num_of_days column is divisible by 5 or the ID number increases so my end result would look like this:
ID num_of_days row_num
1 0 1
2 0 2
2 8 2
2 9 2
2 10 3
2 15 4
3 10 5
3 20 6
Any suggestions?
Edit #1:
num_of_days represents the number of days since the customer last saw a doctor between 1 visit and the next.
A customer can see a doctor 1 time or they can see a doctor multiple times.
If it's the first time visiting, the num_of_days = 0.
SQL tables represent unordered sets. Based on your question, I'll assume that the combination of id/num_of_days provides the ordering.
You can use a cumulative sum . . . with lag():
select t.*,
sum(case when prev_id = id and num_of_days % 5 <> 0
then 0 else 1
end) over (order by id, num_of_days)
from (select t.*,
lag(id) over (order by id, num_of_days) as prev_id
from t
) t;
Here is a db<>fiddle.
If you have a different ordering column, then just use that in the order by clauses.

SQLite - Rolling Average/Sum

I have a dataset as shown below, wondering how I can do a rolling average with its current record followed by next two records. Example: lets consider the first record whose total is 3 followed by 4 and 7 ,Now the rolling 3 day average for first record would be 4.6 and so on.
Date Total
1 3
2 4
3 7
4 1
5 2
6 4
Expected output:
Date Total 3day_rolling_Avg
1 3 4.6
2 4 4
3 7 3.3
4 1 2.3
5 2 null
6 4 null
PS: Having "null" value isn't important. This is just a sample data where I need to look at more than 3 days(Ex: 30 days rolling)
I think that the simplest approach is a window avg(), with the poper window frame:
select
t.*,
avg(total)
over(order by date rows between current row and 2 following) as "3d_rolling_avg"
from mytable t
If you want to return a null value when there is less than 2 leading rows, as show in your expected results, then you can use row_number() on top of it:
select
t.*,
case when rank() over(order by date desc) <= 2
then avg(total)
over(order by date rows between current row and 2 following)
end as "3d_rolling_avg"
from mytable t

How to get average runs for each over in SQL?

The first six balls mean first over, next six balls mean second over & so on than how to get average runs for each over.
input as
Ball no Runs
1 4
2 6
3 3
4 2
5 6
6 1
1 2
2 4
3 6
4 3
5 1
6 1
1 2
output should be:
Over no avg runs
1 3.66
2 2.83
As Gordon Linoff suggested, SQL table represents unordered sets, So you have to use an ordered column in your table. If you can use such a column you may use below query -
SELECT Over_no AVG(Runs) avg_runs
FROM (SELECT Ball_no, Runs, CEIL(ROW_NUMBER() OVER(ORDER BY ORDER_COLUMN, Ball_no) RN / 6) Over_no
FROM YOUR_TABLE)
GROUP BY Over_no;
I have managed to solve my problem with the following query:
SELECT ROWNUM OVER_NO, AVG_RUNS
FROM(
SELECT ROWNUM RN,
ROUND(AVG(RUNS)OVER(ORDER BY ROWNUM RANGE BETWEEN CURRENT ROW AND 5 FOLLOWING),2) AVG_RUNS
FROM TABLE_NAME
)
WHERE RN=1 OR RN=7;

How to find the SQL medians for a grouping

I am working with SQL Server 2008
If I have a Table as such:
Code Value
-----------------------
4 240
4 299
4 210
2 NULL
2 3
6 30
6 80
6 10
4 240
2 30
How can I find the median AND group by the Code column please?
To get a resultset like this:
Code Median
-----------------------
4 240
2 16.5
6 30
I really like this solution for median, but unfortunately it doesn't include Group By:
https://stackoverflow.com/a/2026609/106227
The solution using rank works nicely when you have an odd number of members in each group, i.e. the median exists within the sample, where you have an even number of members the rank method will fall down, e.g.
1
2
3
4
The median here is 2.5 (i.e. half the group is smaller, and half the group is larger) but the rank method will return 3. To get around this you essentially need to take the top value from the bottom half of the group, and the bottom value of the top half of the group, and take an average of the two values.
WITH CTE AS
( SELECT Code,
Value,
[half1] = NTILE(2) OVER(PARTITION BY Code ORDER BY Value),
[half2] = NTILE(2) OVER(PARTITION BY Code ORDER BY Value DESC)
FROM T
WHERE Value IS NOT NULL
)
SELECT Code,
(MAX(CASE WHEN Half1 = 1 THEN Value END) +
MIN(CASE WHEN Half2 = 1 THEN Value END)) / 2.0
FROM CTE
GROUP BY Code;
Example on SQL Fiddle
In SQL Server 2012 you can use PERCENTILE_CONT
SELECT DISTINCT
Code,
Median = PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY Value) OVER(PARTITION BY Code)
FROM T;
Example on SQL Fiddle
SQL Server does not have a function to calculate medians, but you could use the ROW_NUMBER function like this:
WITH RankedTable AS (
SELECT Code, Value,
ROW_NUMBER() OVER (PARTITION BY Code ORDER BY VALUE) AS Rnk,
COUNT(*) OVER (PARTITION BY Code) AS Cnt
FROM MyTable
)
SELECT Code, Value
FROM RankedTable
WHERE Rnk = Cnt / 2 + 1
To elaborate a bit on this solution, consider the output of the RankedTable CTE:
Code Value Rnk Cnt
---------------------------
4 240 2 3 -- Median
4 299 3 3
4 210 1 3
2 NULL 1 2
2 3 2 2 -- Median
6 30 2 3 -- Median
6 80 3 3
6 10 1 3
Now from this result set, if you only return those rows where Rnk equals Cnt / 2 + 1 (integer division), you get only the rows with the median value for each group.