Count consecutive recurring values - sql

I am struggling to find any info on this on the internet after a couple of hours of searching, trial, error and failure. We have the following table structure:
Name
EventDateTime
Mark
Dave
2021-03-24 09:00:00
Present
Dave
2021-03-24 14:00:00
Absent
Dave
2021-03-25 09:00:00
Absent
Dave
2021-03-26 09:00:00
Absent
Dave
2021-03-27 09:00:00
Present
Dave
2021-03-27 14:00:00
Absent
Dave
2021-03-28 09:00:00
Absent
Dave
2021-03-29 10:00:00
Absent
Dave
2021-03-30 13:00:00
Absent
Jane
2021-03-30 13:00:00
Absent
Basically registers for people for events. We need to pull a report to see who we have not had contact from for more x consecutive days. Consecutive meaning for the days that they have events in the data not consecutive calendar days. Also if there is a present on one of the days where they were also absent the count needs to start again from the next day they were absent.
The first issue I've got is getting distinct dates where there are only absences, then the 2nd is getting the number of consecutive days of absences - I've done the 2nd in MySQL with variables but struggled to migrate this over to PostgreSQL where the reporting is done from.
An example of the output I'd want is:
Name
EventDateTime
Mark
ConsecCount
Dave
2021-03-24 09:00:00
Present
0
Dave
2021-03-24 14:00:00
Absent
0
Dave
2021-03-25 09:00:00
Absent
1
Dave
2021-03-26 09:00:00
Absent
2
Dave
2021-03-27 09:00:00
Present
0
Dave
2021-03-27 14:00:00
Absent
0
Dave
2021-03-28 09:00:00
Absent
1
Dave
2021-03-29 10:00:00
Absent
2
Dave
2021-03-30 13:00:00
Absent
3
Jane
2021-03-30 13:00:00
Absent
0
This table is currently at 639931 records and they have been generated since 1st October and will continue to grow at this rate.
Any help, or advise on where to start that would be great.

This can be achieved using window functions as follows:
WITH with_row_numbers AS (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY Name ORDER BY EventDateTime) AS this_row_number,
(CASE WHEN Mark = 'Present' THEN ROW_NUMBER() OVER (PARTITION BY Name ORDER BY EventDateTime) ELSE 0 END) AS row_number_if_present
FROM events
)
SELECT
Name,
EventDateTime,
Mark,
GREATEST(0, this_row_number - MAX(row_number_if_present) OVER (PARTITION BY Name ORDER BY EventDateTime) - 1)
FROM with_row_numbers
Original answer with LATERAL join
WITH with_row_numbers AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY Name ORDER BY EventDateTime)
FROM events e
)
SELECT
t1.Name,
t1.EventDateTime,
t1.Mark,
GREATEST(0, t1.ROW_NUMBER - COALESCE(sub.prev_present_row_number, 0) - 1) AS ConsecCount
FROM with_row_numbers AS t1
CROSS JOIN LATERAL (
SELECT MAX(row_number) AS prev_present_row_number
FROM with_row_numbers t2
WHERE t2.Name = t1.Name
AND t2.EventDateTime <= t1.EventDateTime
AND t2.Mark = 'Present'
) sub

Related

How can i create a new column count in SQL table where count=1 if hours column >=6 else count=0

I aim to first achieve this
id
employee
Datelog
TimeIn
TimeOut
Hours
Count
5
Two
2022-08-10
09:00:00
16:00:00
07:00:00
1
4
Two
2022-08-09
09:00:00
16:00:00
07:00:00
1
3
Two
2022-08-08
09:00:00
16:00:00
07:00:00
1
2
One
2022-08-05
09:00:00
16:00:00
07:00:00
1
1
Two
2022-08-04
09:00:00
10:00:00
01:00:00
0
and now my main objective here is to give a bonus of 2k to employees whose Totalcount per month >=3.
employee
Month
TotalCount
Bonus
Two
August
3
2000
One
August
1
0
Here's the answer using Postgres. It's pretty much generic other than extracting the month out of datelog that might have a slightly different syntax.
select employee
,max(date_part('month', datelog ))
,count(*)
,case when count(*) >= 3 then 2000 else 0 end as bonus
from t
where hours >= time '06:00:00'
group by employee
employee
max
count
bonus
Two
8
3
2000
One
8
1
0
Fiddle

SQL: Calculate duration based on dates and parameters (Change Log)

I have a dataset that is like a ticketing system change log, I am trying to calculate kind of like an SLA time across the records, like how long did this specific ticket sit with this Group 2 for before it was either resolved or moved to another group to resolve.
The data looks like so:
ID
field
value
start
end
1
assignment_group
Group 1
2022-03-21 08:00:00
2022-03-21 08:05:00
1
incident_state
Work in Progress
2022-03-21 08:05:00
2022-03-21 08:30:00
1
assignment_group
Group 2
2022-03-21 08:35:00
2022-03-21 08:50:00
1
assigned_to
User 1
2022-03-21 08:50:00
2022-03-21 08:51:00
1
incident_state
Work in Progress
2022-03-21 09:00:00
2022-03-21 09:30:00
1
incident_state
Resolved
2022-03-21 09:30:00
2022-03-21 09:31:00
2
assignment_group
Group 2
2022-01-21 11:30:00
2022-01-21 11:35:00
2
assigned_to
User 1
2022-01-21 11:35:00
2022-01-21 11:37:00
2
incident_state
Work in Progress
2022-01-21 11:40:00
2022-01-21 11:55:00
2
assignment_group
Group 3
2022-01-21 11:58:00
2022-01-21 12:00:00
2
assigned_to
User 2
2022-01-21 12:05:00
2022-01-21 12:06:00
2
incident_state
Resolved
2022-01-21 12:10:00
2022-01-21 12:07:00
The issue I am having is calculating the duration based on the start time the ticket was assigned to a specific group and the end time of when either the ticket was resolved by that group or moved to another group to resolve. For example, I am only interested in Group 2, the duration for ticket 1 for when the ticket was sitting with Group 2 till the ticket was resolved by Group 2 is 2022-03-21 08:35:00 to 2022-03-21 09:31:00, so duration is 1 hour and 1 minute. But for Ticket 2, the ticket sat with Group 2 from 2022-01-21 11:30:00 till it was transferred to another group to resolve at 2022-01-21 11:58:00.
My code looks like so at the moment, I join two tables to pull in the ticket information and then the ticket state changes (so every time an action is taken on that ticket). Then I am left with the table above. I kind of guess I need to use a lead function but I can't figure out how to get the correct end time for the correct record (When incident_state = resolved OR assignment_group = another group):
WITH incidents as
(
SELECT number, sys_id AS SYS_ID_INCIDENT
FROM tables.ServiceIncidents
),
changes as
(
SELECT id, start, field, field_value, value, `end`
FROM tables.IncidentInstances
),
incident_changes as
(
SELECT *, TIMESTAMP_DIFF(changes.`end`,changes.start, MINUTE) as Duration, row_number() over (partition by number order by start) as RN
FROM incidents
LEFT JOIN changes
ON (incidents.SYS_ID_INCIDENT = changes.id)
),
IAMtickets as
(
SELECT i.number, i.SYS_ID_INCIDENT, start, field, value, `end`, Duration, RN
FROM incident_changes i
INNER JOIN
(SELECT DISTINCT number FROM incident_changes WHERE value ='Group 2') r
ON i.number = r.number
),
cte as
(
SELECT *, lead(value) over (partition by IAMtickets.number order by start)
FROM IAMtickets
)
SELECT * FROM CTE
I want the output to be something like this:
ID
Assigned to
Duration
Outcome
1
User 1
1 hour 1 minute
Resolved
2
User 1
28 minutes
Transferred

SQL query to select the start and end datetime of a value with system versioned tables

Basically, I want to use system versioned tables to find out the start and end date all users held a position within a company.
I'm struggling with the amount of other changes made to the record (Other field changes that create a new versioned record).
I originally tried to Group By UserId, CompanyId, Position and then take the min SysStartTime and max SysEndTime. Which at first glance did work. However it does not work if a position is changed back to its original value.
SELECT DISTINCT
cu.UserId,
cu.CompanyId,
cu.Position,
MIN(cu.SysStartTime) AS StartTime,
MAX(cu.SysEndTime) AS EndTime
FROM dbo.CompanyUser FOR SYSTEM_TIME ALL cu
GROUP BY cu.UserId, cu.CompanyId, cu.Position
Focusing on UserId 1, they were an 'Assistant', then a 'Manager', then back to an 'Assistant' again. I want to get the start and end date of each of these positions reguardless of how many Other changes are made between positions.
UserId CompanyId Position Other SysStartTime SysEndTime
-------- ----------- ----------- ------- ---------------------- ---------------------
1 1 Assistant A 2019-12-01 13:00:00 2019-12-01 14:00:00
2 1 Manager A 2019-12-01 13:00:00 2019-12-01 20:00:00
1 1 Assistant B 2019-12-01 14:00:00 2019-12-01 17:00:00
1 1 Manager A 2019-12-01 17:00:00 2019-12-01 20:00:00
2 1 Executive A 2019-12-01 20:00:00 9999-12-31 23:59:59
3 1 CEO A 2019-12-01 13:00:00 9999-12-31 23:59:59
1 1 Assistant A 2019-12-01 20:00:00 9999-12-31 23:59:59
I want a query that will return the following:
UserId CompanyId Position SysStartTime SysEndTime
-------- ----------- ----------- ---------------------- ---------------------
1 1 Assistant 2019-12-01 13:00:00 2019-12-01 17:00:00
2 1 Manager 2019-12-01 13:00:00 2019-12-01 20:00:00
1 1 Manager 2019-12-01 17:00:00 2019-12-01 20:00:00
2 1 Executive 2019-12-01 20:00:00 9999-12-31 23:59:59
3 1 CEO 2019-12-01 13:00:00 9999-12-31 23:59:59
1 1 Assistant 2019-12-01 20:00:00 9999-12-31 23:59:59
Thanks
This should do what you need (Fiddle).
WITH T
AS (SELECT *,
LAG(Position) OVER (PARTITION BY UserId ORDER BY SysStartTime) AS PrevPosition
FROM dbo.CompanyUser FOR SYSTEM_TIME ALL cu)
SELECT UserId,
CompanyId,
Position,
Other,
SysStartTime,
SysEndTime = LEAD(SysStartTime, 1, SysEndTime) OVER (PARTITION BY UserId ORDER BY SysStartTime)
FROM T
WHERE EXISTS (SELECT PrevPosition
EXCEPT
SELECT Position)
ORDER BY UserId,
SysStartTime
You should use LAG to achieve this.
SELECT UserId, CompanyId, Position, StartTime, EndTime
FROM
(
SELECT DISTINCT
cu.UserId,
cu.CompanyId,
cu.Position,
LAG(cu.Position) OVER(PARTITION BY cu.UserId,cu.Position ORDER BY (SELECT NULL)) NextPosition
MIN(cu.SysStartTime) AS StartTime,
MAX(cu.SysEndTime) AS EndTime
FROM dbo.CompanyUser FOR SYSTEM_TIME ALL cu
GROUP BY cu.UserId, cu.CompanyId, cu.Position
)T
WHERE Position <> ISNULL(NextPosition,'')
Result
UserId CompanyId Position SysStartTime SysEndTime
-------- ----------- ----------- ---------------------- ---------------------
1 1 Assistant 2019-12-01 13:00:00 2019-12-01 17:00:00
2 1 Manager 2019-12-01 13:00:00 2019-12-01 20:00:00
1 1 Manager 2019-12-01 17:00:00 2019-12-01 20:00:00
2 1 Executive 2019-12-01 20:00:00 9999-12-31 23:59:59
3 1 CEO 2019-12-01 13:00:00 9999-12-31 23:59:59
1 1 Assistant 2019-12-01 20:00:00 9999-12-31 23:59:59

SQL SELECT Difference between two days greater than 1 day

I have table T1
ID SCHEDULESTART SCHEDULEFINISH
1 2018-05-12 14:00:00 2018-05-14 11:00:00
2 2018-05-30 14:00:00 2018-06-01 11:00:00
3 2018-02-28 14:00:00 2018-03-02 11:00:00
4 2018-02-28 14:00:00 2018-03-01 11:00:00
5 2018-05-30 14:00:00 2018-05-31 11:00:00
I want to select all rows where difference in days (it's not important difference in hours) is greater than 1 day.
If SCHEDULESTART or SCHEDULEFINISH are on the same day or SCHEDULEFINISH is on next day then these rows should NOT be selected.
So the result should return rows with IDs: 1 2 3
because first row have difference in two days, second row (1st June is 2 days after 30th May ) and 3rd row (2nd March is 2 days after 28 February).
Is this possible somehow?
I know the function DAY but this will return only day number in that one month!!!
I must beging my query with
SELECT ID FROM T1 WHERE ...
Thanks in advance
In DB2, this should work:
select t1.*
from t1
where date(schedulestart) < date(schedulefinish) - 1 day;

SQL previous value

I've created a SQL statement:
SELECT
ROW_NUMBER() OVER (ORDER BY Q2.FUNCTIONAL_LOCATION) AS Rowy,
Q1.FACT_MEASUREMENT_KEY,
CONVERT(VARCHAR,Q1.Doc_Time,102) AS TIME
FROM
dbo.DIM_PROJECT_TECH_OBJ Q2
INNER JOIN
dbo.FACT_MEASUREMENT Q1 ON (Q1.PROJECT_TECH_OBJ_KEY = Q2.PROJECT_TECH_OBJ_KEY)
WHERE
Q1.Measurement_Position = 'XXX'
Getting this result:
1 16124 08:00:00
2 53969 12:30:00
3 54282 17:15:00
4 55231 18:00:00
5 56196 15:00:00
6 16123 08:00:00
7 55393 12:30:00
8 55423 09:30:00
9 54283 08:00:00
My goal is to obtain the "Record-1" TIME in each row (expecting an error for the first one), like this:
1 16124 8:00:00
2 53969 12:30:00 8:00:00
3 54282 17:15:00 12:30:00
4 55231 18:00:00 17:15:00
5 56196 15:00:00 18:00:00
6 16123 8:00:00 15:00:00
7 55393 12:30:00 8:00:00
8 55423 9:30:00 12:30:00
9 54283 8:00:00 9:30:00
I've already failed trying to use:
ROW_NUMBER() OVER (PARTITION BY Q1.FACT_MEASUREMENT_KEY ORDER BY Q1.FACT_MEASUREMENT_KEY ) AS RowP
But I got error code 102.
This version of SQL Server does not support LAG statements.
Thanks in advance for any suggestion/help.
Regs
You can do this with a self join on the results of your query:
WITH t as (
SELECT ROW_NUMBER() OVER (ORDER BY Q2.FUNCTIONAL_LOCATION) AS Rowy,
Q1.FACT_MEASUREMENT_KEY,
CONVERT(VARCHAR(255), Q1.Doc_Time, 102) AS TIME
FROM dbo.DIM_PROJECT_TECH_OBJ Q2 INNER JOIN
dbo.FACT_MEASUREMENT Q1
ON Q1.PROJECT_TECH_OBJ_KEY = Q2.PROJECT_TECH_OBJ_KEY
WHERE Q1.Measurement_Position = 'XXX'
)
select t.*, tprev.time
from t left join
t tprev
on tprev.rowy = t.rowy - 1;
When you upgrade to SQL Server 2012+, you can replace this with lag().
Also, when you use varchar() (and related types in SQL Server), always use a length. SQL Server has different default lengths in different contexts -- and the default might not be good enough in some cases.