Comparing time difference for every other row - sql

I'm trying to determine the length of time in days between using the AR_Event_Creation_Date_Time for every other row. For example, the number of days between the 1 and 2 row, 3rd and 4th, 5th and 6th etc. In other words, there will be a number of days value for every even row and NULL for every odd row. My code below works if there are only two rows per borrower number but falls down when there are more than two. In the results, notice the change in 1002092539
SELECT Borrower_Number,
Workgroup_Name,
FORMAT(AR_Event_Creation_Date_Time,'d','en-us') AS Tag_Date,
Usr_Usrnm,
DATEDIFF(day, LAG(AR_Event_Creation_Date_Time,1) OVER(PARTITION BY
Borrower_Number Order By Borrower_Number), AR_Event_Creation_Date_Time) Diff
FROM Control_Mail

You need to add in a row number. Also your partition by is non-deterministic:
SELECT Borrower_Number,
Workgroup_Name,
FORMAT(AR_Event_Creation_Date_Time,'d','en-us') AS Tag_Date,
Usr_Usrnm,
DATEDIFF(day, LAG(AR_Event_Creation_Date_Time,1) OVER(PARTITION BY Borrower_Number, (rn - 1) / 2 ORDER BY AR_Event_Creation_Date_Time),
AR_Event_Creation_Date_Time) Diff
FROM (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY Borrower_Number ORDER BY AR_Event_Creation_Date_Time) AS rn
FROM Control_Mail
) C
```

Related

SQL - Update column based on date comparasion with previous occurrence

I have a huge table;
I want to create a third column based on the time difference between two dates for the same id. If the difference is less than a month, then it's active, if it is between 1-2 months then inactive and anything more than 2 is dormant. The expected outcome is below;( note last entries don't have activity definitions as I don't have previous occurrences.)
My question would be, how to do such operation.
case when date_>=date_add((select max(date_) from schema.table),-30) then 'Active' when date_<date_add((select max(date_) from schema.table),-30) and date_>= date_add((select max(date_) from schema.table),-60) then 'Inactive' when date_<date_add((select max(date_) from schema.table),-60) then 'Dormant3' end as Activity
the code I came up with is not what I need as it only checks for the final entry date in the table. What I need is more akin to a for loop and checking the each row and comparing it to the previous occurrence.
edit:
By partitioning over id and dense ranking them, I reached something that almost works. I just need to compare to the previous element in the dense rank groups.
Create base data first using LEAD()
Then compare than with original row.
SELECT ID, DATE,
CASE
WHEN DATEDIFF(DATE,PREVIOUS_DATE) <=30 THEN 'Active'
DATEDIFF(DATE,PREVIOUS_DATE) between 31 and 60 'Active'
ELSE 'Dormant'
END as Activity
(SELECT ID, DATE, LEAD(DATE) OVER( partition by id ORDER BY DATE) PREVIOUS_DATE FROM MYTABLE) RS

How to compare the value of one row with the upper row in one column of an ordered table?

I have a table in PostgreSQL that contains the GPS points from cell phones. It has an integer column that stores epoch (the number of seconds from 1960). I want to order the table based on time (epoch column), then, break the trips to sub trips when there is no GPS record for more than 2 minutes.
I did it with GeoPandas. However, it is too slow. I want to do it inside the PostgreSQL. How can I compare each row of the ordered table with the previous row (to see if the epoch has a difference of 2 minutes or more)?
In fact, I do not know how to compare each row with the upper row.
You can use lag():
select t.*
from (select t.*,
lag(timestamp_epoch) over (partition by trip order by timestamp_epoch) as last_timestamp_epoch
from t
) t
where last_timestamp_epoch < timestamp_epoch - 120
I want to order the table based on time (epoch column), then, break the trips to sub trips when there is no GPS record for more than 2 minutes.
After comparing to the previous (or next) row, with the window function lag() (or lead()), form groups based on the gaps to get sub trip numbers:
SELECT *, count(*) FILTER (WHERE step) OVER (PARTITION BY trip ORDER BY timestamp_epoch) AS sub_trip
FROM (
SELECT *
, (timestamp_epoch - lag(timestamp_epoch) OVER (PARTITION BY trip ORDER BY timestamp_epoch)) > 120 AS step
FROM tbl
) sub;
Further reading:
Select longest continuous sequence

SQL Server need the total of the previous 6 rows

I'm using SQL Server and I need to get the sum of the previous 6 rows of my table and place the results in its own column.
I'm able to get the 6th row back with the below query:
SELECT id
,FileSize
,LAG(FileSize,6) OVER (ORDER BY DAY(CompleteTime)) previous
FROM Jobs_analytics
group by id, CompleteTime, Jobs_analytics.FileSize
which gives me the six row back, but what I need is the sum of all six rows previous.
any help would be appreciate
Mike
You can use:
SELECT ja.id, ja.FileSize, CompleteTime,
SUM(FileSize) OVER (ORDER CompleteTime ROWS BETWEEN 5 PRECEDING AND CURRENT ROW) as previous
FROM Jobs_analytics ja;
I don't see why GROUP BY is necessary. There are no aggregation functions.
Note that this takes 6 days including the current day. If you want the six preceding rows:
SELECT ja.id, ja.FileSize, DATE,
SUM(FileSize) OVER (ORDER BY CompleteTime ja.id ROWS BETWEEN 6 PRECEDING AND 1 PRECEDING) as previous
FROM Jobs_analytics ja

SQL Server Determine the Amount of Time Above a Threshold

I am trying to determine the amount of time my data spends above a certain threshold. I have a SQL table of values that looks like this:
Where the first column is datetime and the second column is value. This is time series data so it is a large table and cannot be changed. I want to know the first value that crosses over the threshold (say it is 50 for the example) this is my beginning, the last value that crosses back over the threshold which is the end, and the duration spent over the threshold.
In my data example the Beginning would be 9/20/2019 19:18, the end would be 9/20/2019 19:46 and the duration would be 28 minutes.
This needs to be written in one sql statement due to the requirements of the project. I am just wondering if this is possible and how to do it. Thanks!
You can use lead() and some aggregation:
select t.*
from (select t.*,
datediff(minute,
ts, lead(ts) over (order by ts)
) as diff_minutes
from (select t.*,
lead(value) over (order by ts) as next_value
from t
) t
where (value < 50 and next_value >= 50) or
(value >= 50 and next_value < 50
) t
where value < 50;
Your question is a little tricky because you want the time span to start just before the period in question. That is actually a simplification. The above implements:
Identify the next value.
Keep a row when next_value or current value exceeds the threshold or vice versa. This is the first row before and last row after the period.
Then use lead() to get the ending timestamp.
Finally filter down to just the first row.
Another approach is perhaps simpler. Define the groups based on the count of rows that are under the threshold up to or before the row. This keeps the previous row with the following group.
Then aggregate:
select min(ts), max(ts),
datediff(minute, min(ts), max(ts)) as diff_minute
from (select t.*,
sum(case when value < 50 then 1 else 0 end) over (order by ts) as grp
from t
) t
group by grp;
It looks like you are sampling every 10 seconds. If that is pretty solid, you can just count how many records are above 50 during a selected interval, and multiply by 10 seconds, that will be the duration that exceeds 50.

Group records for hourly count

My goal is to build an hourly count for records that have a start date/time and an end date/time. The actual records are never more than 24 hours from start to finish but many times are less. It works if I bounce every record against my "clock" which has 24 slots for every date up to "today". But it can take forever to run as there can be 2000 records in a day.
This is the detail I get:
The date/times in green are what I want as the start date/time for a group. The blue date/times are what I want as the end date time for the group.
Like this:
I have tried partitioning but because, in the second pic, the 4th row has the same values as the 2nd row, it groups them together even though there is a time span between them - the third row.
This is a gaps-and-islands problem. The start and end dates match on adjacent rows, so a difference of row numbers seems sufficient:
select id, min(startdatetime), max(enddatetime),
d_id, class, location
from (select t.*,
row_number() over (partition by id order by startdatetime) as seqnum,
row_number() over (partition by id, d_id, class, location) as seqnum_2
from t
) t
group by id, d_id, class, location, (seqnum - seqnum_2);
order by id, min(startdatetime);