How to filter out multiple downtime events in SQL Server? - sql

There is a query I need to write that will filter out multiples of the same downtime event. These records get created at the exact same time with multiple different timestealrs which I don't need. Also, in the event of multiple timestealers for a downtime event I need to make the timestealer 'NULL' instead.
Example table:
Id
TimeStealer
Start
End
Is_Downtime
Downtime_Event
1
Machine 1
2022-01-01 01:00:00
2022-01-01 01:01:00
1
Malfunction
2
Machine 2
2022-01-01 01:00:00
2022-01-01 01:01:00
1
Malfunction
3
NULL
2022-01-01 00:01:00
2022-01-01 00:59:59
0
Operating
What I need the query to return:
Id
TimeStealer
Start
End
Is_Downtime
Downtime_Event
1
NULL
2022-01-01 01:00:00
2022-01-01 01:01:00
1
Malfunction
2
NULL
2022-01-01 00:01:00
2022-01-01 00:59:59
0
Operating

Seems like this is a top 1 row of each group, but with the added logic of making a column NULL when there are multiple rows. You can achieve that by also using a windowed COUNT, and then a CASE expression in the outer SELECT to only return the value of TimeStealer when there was 1 event:
WITH CTE AS(
SELECT V.Id,
V.TimeStealer,
V.Start,
V.[End],
V.Is_Downtime,
V.Downtime_Event,
ROW_NUMBER() OVER (PARTITION BY V.Start, V.[End], V.Is_Downtime,V.Downtime_Event ORDER BY ID) AS RN,
COUNT(V.ID) OVER (PARTITION BY V.Start, V.[End], V.Is_Downtime,V.Downtime_Event) AS Events
FROM(VALUES('1','Machine 1',CONVERT(datetime2(0),'2022-01-01 01:00:00'),CONVERT(datetime2(0),'2022-01-01 01:01:00'),'1','Malfunction'),
('2','Machine 2',CONVERT(datetime2(0),'2022-01-01 01:00:00'),CONVERT(datetime2(0),'2022-01-01 01:01:00'),'1','Malfunction'),
('3','NULL',CONVERT(datetime2(0),'2022-01-01 00:01:00'),CONVERT(datetime2(0),'2022-01-01 00:59:59'),'0','Operating'))V(Id,TimeStealer,[Start],[End],Is_Downtime,Downtime_Event))
SELECT ROW_NUMBER() OVER (ORDER BY ID) AS ID,
CASE WHEN C.Events = 1 THEN C.TimeStealer END AS TimeStealer,
C.Start,
C.[End],
C.Is_Downtime,
C.Downtime_Event
FROM CTE C
WHERE C.RN = 1;

Related

Row number with condition

I want to increase the row number of a partition based on a condition. This question refers to the same problem, but in my case, the column I want to condition on is another window function.
I want to identify the session number of each user (id) depending on how long ago was their last recorded action (ts).
My table looks as follows:
id ts
1 2022-08-01 09:00:00 -- user 1, first session
1 2022-08-01 09:10:00
1 2022-08-01 09:12:00
1 2022-08-03 12:00:00 -- user 1, second session
1 2022-08-03 12:03:00
2 2022-08-01 11:04:00 -- user 2, first session
2 2022-08-01 11:07:00
2 2022-08-25 10:30:00 -- user 2, second session
2 2022-08-25 10:35:00
2 2022-08-25 10:36:00
I want to assign each user a session identifier based on the following conditions:
If the user's last action was 30 or more minutes ago (or doesn't exist), then increase (or initialize) the row number.
If the user's last action was less than 30 minutes ago, don't increase the row number.
I want to get the following result:
id ts session_id
1 2022-08-01 09:00:00 1
1 2022-08-01 09:10:00 1
1 2022-08-01 09:12:00 1
1 2022-08-03 12:00:00 2
1 2022-08-03 12:03:00 2
2 2022-08-01 11:04:00 1
2 2022-08-01 11:07:00 1
2 2022-08-25 10:30:00 2
2 2022-08-25 10:35:00 2
2 2022-08-25 10:36:00 2
If I had a separate column with the seconds since their last session, I could simply add 1 to each user's partitioned sum. However, this column is a window function itself. Hence, the following query doesn't work:
select
id
,ts
,extract(
epoch from (
ts - lag(ts, 1) over(partition by id order by ts)
)
) as seconds_since -- Number of seconds since last action (works well)
,sum(
case
when coalesce(
extract(
epoch from (
ts - lag(ts, 1) over (partition by id order by ts)
)
), 1800
) >= 1800 then 1
else 0 end
) over (partition by id order by ts) as session_id -- Window inside window (crashes)
from
t
order by
id
,ts
ERROR: Aggregate window functions with an ORDER BY clause require a frame clause
Use LAG() window function to get the previous ts of each row and create flag column indicating if the difference between the 2 timestamps is greater than 30 minutes.
Then use SUM() window function over that flag:
SELECT
id
,ts
,SUM(flag) OVER (
PARTITION BY id
ORDER BY ts
rows unbounded preceding -- necessary in aws-redshift
) as session_id
FROM (
SELECT
*
,COALESCE((LAG(ts) OVER (PARTITION BY id ORDER BY ts) < ts - INTERVAL '30 minute')::int, 1) flag
FROM
tablename
) t
;
See the demo.

Specific grouping elements in SQL Server

I've got a problem with my SQL task and didn't find any answers yet.
I've got table with this sample data:
ID
Value
Date
1
1
2020-01-01
1
2
2020-03-02
1
1
2020-03-21
1
1
2020-04-14
1
3
2020-05-01
1
1
2020-08-09
1
1
2020-09-12
1
1
2020-10-12
1
3
2020-12-04
All I want to get is:
ID
Value
Date
1
1
2020-01-01
1
2
2020-03-02
1
1
2020-03-21
1
3
2020-05-01
1
1
2020-08-09
1
3
2020-12-04
Some kind of changing value history, but only if the value was changed - when value on new record is the same, get value with min date.
I tried with grouping and row_number, but got no positive results. Any ideas how to do that?
One way to articulate your logic is to say that you want to retain a record when the previous record, as ordered by the date (within a given ID), has a different value than the current record.
WITH cte AS (
SELECT *, LAG(Value) OVER (PARTITION BY ID ORDER BY Date) LagValue
FROM yourTable
)
SELECT ID, Value, Date
FROM cte
WHERE LagValue <> Value OR LagValue IS NULL
ORDER BY Date;
Demo

How to number rows according to values in columns

Imagine I have an event log (ordered by UserID and Start, Start_of_previous_event is added using LAG(), inactive time = Start - Start_of_previous_event):
UserID Event Start Start_of_previous_event inactive_time
1 Onboarding 2024-01-01 01:00:00 null null
1 Main 2024-01-01 01:01:00 2024-01-01 01:00:00 1
1 Cart 2024-01-01 01:05:00 2024-01-01 01:01:00 4
1 Main 2024-01-01 02:00:00 2024-01-01 01:05:00 55
2 Onboarding 2024-01-01 01:00:00 null null
How can I add a column with a session_ids ? New session starts after 30 minutes of inactive time and for new UserID.
Session_id column for the above example:
1
1
1
2
3
Is there a way to avoid it if I want to group the resulting table like this:
Select Event, Count(distinct session_id)
from sessions
group by Event
You can assign the session with date arithmetic and a cumulative sum. Date arithmetic varies by database, but this should give you the idea:
select el.*,
sum(case when start_of_previous_event > start - interval '30 minute'
then 0 else 1
end) over (order by userid order by start) as session_cnt
from eventlog el;

Get MAX count but keep the repeated calculated value if highest

I have the following table, I am using SQL Server 2008
BayNo FixDateTime FixType
1 04/05/2015 16:15:00 tyre change
1 12/05/2015 00:15:00 oil change
1 12/05/2015 08:15:00 engine tuning
1 04/05/2016 08:11:00 car tuning
2 13/05/2015 19:30:00 puncture
2 14/05/2015 08:00:00 light repair
2 15/05/2015 10:30:00 super op
2 20/05/2015 12:30:00 wiper change
2 12/05/2016 09:30:00 denting
2 12/05/2016 10:30:00 wiper repair
2 12/06/2016 10:30:00 exhaust repair
4 12/05/2016 05:30:00 stereo unlock
4 17/05/2016 15:05:00 door handle repair
on any given day need do find the highest number of fixes made on a given bay number, and if that calculated number is repeated then it should also appear in the resultset
so would like to see the result set as follows
BayNo FixDateTime noOfFixes
1 12/05/2015 00:15:00 2
2 12/05/2016 09:30:00 2
4 12/05/2016 05:30:00 1
4 17/05/2016 15:05:00 1
I manage to get the counts of each but struggling to get the max and keep the highest calculated repeated value. can someone help please
Use window functions.
Get the count for each day by bayno and also find the min fixdatetime for each day per bayno.
Then use dense_rank to compute the highest ranked row for each bayno based on the number of fixes.
Finally get the highest ranked rows.
select distinct bayno,minfixdatetime,no_of_fixes
from (
select bayno,minfixdatetime,no_of_fixes
,dense_rank() over(partition by bayno order by no_of_fixes desc) rnk
from (
select t.*,
count(*) over(partition by bayno,cast(fixdatetime as date)) no_of_fixes,
min(fixdatetime) over(partition by bayno,cast(fixdatetime as date)) minfixdatetime
from tablename t
) x
) y
where rnk = 1
Sample Demo
You are looking for rank() or dense_rank(). I would right the query like this:
select bayno, thedate, numFixes
from (select bayno, cast(fixdatetime) as date) as thedate,
count(*) as numFixes,
rank() over (partition by cast(fixdatetime as date) order by count(*) desc) as seqnum
from t
group by bayno, cast(fixdatetime as date)
) b
where seqnum = 1;
Note that this returns the date in question. The date does not have a time component.

Need to get the minimum start date and maximum end date, when there is no break in months

i have 8 rows as shown below,
Column1 Start_date end_date Row_number
1 2014-02-01 2014-02-28 1
1 2014-03-01 2014-03-31 2
1 2014-04-01 2014-04-30 3
1 2014-05-01 2014-05-31 4
1 2014-07-01 2014-07-31 5
1 2015-02-01 2015-02-28 6
1 2015-03-01 2015-03-31 7
I need result like below,
Column1 Start_date end_date
1 2014-02-01 2014-05-31
1 2014-07-01 2014-07-31
1 2015-02-01 2015-03-31
so when the end_date of first row is one day less than the start_date in next row, I need to group all the continuous rows like that and get the result as I shown. I need to do this only via SQL. please let me know, if anyone have any idea to solve this.
In the input record, you can see, first 4 rows are continuous, and 5th row is not continuous and 6th and 7th row is a continuous one.
Thanks in advance.
The trick here is that you need to first filter out only entries that are the ends of an interval, and then merge them together, rather than trying to keep a running count in one go.
So I don't know what flavour of SQL you're running, and I have no idea what you're trying to signify with Column1, but this should do the trick (written in SQL server flavour, but the only functions you need to adjust are the dateadd and the isnull). The fiddle is here
SELECT DISTINCT
CASE WHEN Q1.IsStart = 1
THEN Q1.start_date
ELSE LAG(start_date) OVER(ORDER BY Q1.Row_number) END AS start_date,
CASE WHEN Q1.IsEnding = 1
THEN Q1.end_date
ELSE LEAD(end_date) OVER(ORDER BY Q1.Row_number) END AS end_date
FROM
(SELECT
start_date,
end_date,
Row_number,
CASE WHEN DATEADD(day,1,end_date) =
ISNULL(LEAD(start_date) OVER(ORDER BY Row_number),
end_date)
THEN 0
ELSE 1 END AS IsEnding,
CASE WHEN DATEADD(day,-1,start_date) =
ISNULL(LAG(end_date) OVER(ORDER BY Row_number),
start_date)
THEN 0
ELSE 1 END AS IsStart
FROM table1) Q1
WHERE Q1.IsEnding = 1 OR Q1.IsStart = 1
For ANSI SQL/For those of you without LAG or LEAD:
SELECT
StartDates.start_date,
MIN(EndDates.end_date)
FROM
(SELECT
MainEntry.start_date,
MainEntry.row_number
FROM
mytable MainEntry
LEFT OUTER JOIN mytable PrevEntry ON PrevEntry.row_number - 1 = MainEntry.row_number
WHERE
PrevEntry.end_date IS NULL OR
EXTRACT(day FROM (MainEntry.start_date - PrevEntry.end_date)) > 1) StartDates
INNER JOIN
(SELECT
MainEntry.end_date,
MainEntry.row_number
FROM
mytable MainEntry
LEFT OUTER JOIN mytable NextEntry ON NextEntry.row_number + 1 = MainEntry.row_number
WHERE
NextEntry.start_date IS NULL OR
EXTRACT(day FROM (NextEntry.start_date - MainEntry.end_date)) > 1) EndDates
ON StartDates.row_number <= EndDates.row_number
GROUP BY
StartDates.start_date
Note that the GROUP BY could contain StartDates.row_number if that takes advantage of an index. Also note that this ANSI solution initially missed the edge cases of rows without any pairs (had INNER JOINs inside the subqueries).