postgresql timeclock entry with no matching pair - sql

I have a table:
CREATE TABLE timeclock(
employeeid INT
, logdate DATE
, logtime TIME
, timetype VARCHAR(1)
);
INSERT INTO test VALUES
(1, '2013-01-01', '07:00', 'I'),
(1, '2013-01-01', '07:01', 'I'),
(1, '2013-01-01', '16:00', 'O'),
(1, '2013-01-01', '16:01', 'O'),
(2, '2013-01-01', '07:00', 'I'),
(2, '2013-01-01', '16:00', 'O'),
(1, '2013-01-02', '07:00', 'I'),
(1, '2013-01-02', '16:30', 'O'),
(2, '2013-01-02', '06:30', 'I'),
(2, '2013-01-02', '15:30', 'O'),
(2, '2013-01-02', '16:30', 'I'),
(2, '2013-01-02', '23:30', 'O'),
(3, '2013-01-01', '06:30', 'I'),
(3, '2013-01-02', '16:30', 'O'),
(4, '2013-01-01', '20:30', 'I'),
(4, '2013-01-02', '05:30', 'O'),
(5, '2013-01-01', '20:30', 'O'),
(5, '2013-01-02', '05:30', 'I');
I need to get the the time IN and OUT of each employee, disregarding duplicate entries
and identifying orphan entries (without a matching IN or OUT) so that I can put it in a separate list for notification of missing entries.
so far I have this sql that I modified which I got from Peter Larsson's Island and Gaps solution (link) :
WITH cteIslands ( employeeid, timetype, logdate, logtime, grp)
AS ( SELECT employeeid, timetype, logdate, logtime,
ROW_NUMBER()
OVER ( ORDER BY employeeid, logdate, logtime )
- ROW_NUMBER()
OVER ( ORDER BY timetype, employeeid,
logdate, logtime ) AS grp
FROM timeclock
),
cteGrouped ( employeeid, timetype, logdate, logtime )
AS ( SELECT employeeid, MIN(timetype), logdate,
CASE WHEN MIN(timetype) = 'I'
THEN MIN(logtime)
ELSE MAX(logtime)
END AS logtime
FROM cteIslands
GROUP BY employeeid, logdate, grp
)
select * from cteIslands
order by employeeid, logdate, logtime
The above works fine in satisfying the removal of duplicate entries but now I cant seem to get the orphan entries. I think LEAD or LAG can be used on this but I am new with postgresql. I hope someone here can help me on this.
Edit:
I somehow need to add a new field that I can use so that I know which records are orphaned.
somethine like the table below:
EMPID TYPE LOGDATE LOGTIME ORPHAN_FLAG
1 I 2013-01-01 07:00:00 0
1 O 2013-01-01 16:01:00 0
1 I 2013-01-02 07:00:00 0
1 O 2013-01-02 16:30:00 0
2 I 2013-01-01 07:00:00 0
2 O 2013-01-01 16:00:00 0
2 I 2013-01-02 06:30:00 0
2 O 2013-01-02 15:30:00 0
2 I 2013-01-02 16:30:00 0
2 O 2013-01-02 23:30:00 0
3 I 2013-01-01 06:30:00 0
3 O 2013-01-02 16:30:00 0
4 I 2013-01-01 20:30:00 0
4 O 2013-01-02 05:30:00 0
5 O 2013-01-01 20:30:00 1 <--- NO MATCHING IN
5 I 2013-01-02 05:30:00 1 <--- NO MATCHING OUT

First, I think you should rethink your design a little bit. It makes little sense to record a clock-out entry without having clocked in, and you can use things like partial indexes to ensure that clocked in entries are easy to look up when they don't have a clock out entry.
So I would start by considering moving your storage tables to something like:
CREATE TABLE timeclock(
employeeid INT
, logdate DATE
, logintime TIME
, logouttime time
, timetype VARCHAR(1)
);
The bad news is that if you can't do that, your orphanned report will be quite difficult to make perform well because you are doing a self join where you hope every row in a large table will have a corresponding other entry. This is going to require, at best, two sequential scans on the table and at worst a sequential scan with a nested loop index scan (assuming proper indexes, the alternative, a nested loop sequential scan would be even worse).
Handling this where you have rollover between dates (clock in at 11pm, clock out at 2am) will make this problem very hard to avoid.
Now since you have your CTE's working fine except for orphanned records, my recommendation is that union with another query on the same table looking for those which are not found properly in your current query.

Related

Denormalize column

I have data in my database like this:
Code
meta
meta_ID
date
A
1,2
1
01/01/2022 08:08:08
B
1,2
2
01/01/2022 02:00:00
B
null
2
01/01/1900 02:00:00
C
null
3
01/01/2022 02:00:00
D
8
8
01/01/2022 02:00:00
E
5,6,7
5
01/01/2022 02:00:00
F
1,2
2
01/01/2022 02:00:00
I want to have this with the last date (comparing with day, month year)
Code
meta
meta_ID
list_Code
date
A
2,3
1
A,B,F
01/01/2022 08:08:08
B
1,3
2
A,B,F
01/01/2022 02:00:00
C
null
3
C
01/01/2022 02:00:00
D
8
8
D
01/01/2022 02:00:00
E
5,6,7
5
E
01/01/2022 02:00:00
F
1,2
3
A,B,F
01/01/2022 02:00:00
I want to have the list of code having the same meta group, do you know how to do it with SQL Server?
The code below inputs the 1st table and outputs the 2nd table exactly. The Meta and Date columns had duplicate values, so in the CTE I took the MAX for both fields. Different logic can be applied if needed.
It uses XML Path to merge all rows into one column to create the List_Code column. The Stuff function removes the leading comma (,) delimiter.
CREATE TABLE MetaTable
(
Code VARCHAR(5),
Meta VARCHAR(100),
Meta_ID INT,
Date DATETIME
)
GO
INSERT INTO MetaTable
VALUES
('A', '1,2', '1', '01/01/2022 08:08:08'),
('B', '1,2','2', '01/01/2022 02:00:00'),
('B', NULL,'2', '01/01/1900 02:00:00'),
('C', NULL,'3', '01/01/2022 02:00:00'),
('D', '8','8', '01/01/2022 02:00:00'),
('E', '5,6,7', '5', '01/01/2022 02:00:00'),
('F', '1,2','2', '01/01/2022 02:00:00')
GO
WITH CTE_Meta
AS
(
SELECT
Code,
MAX(Meta) AS 'Meta',
Meta_ID,
MAX(Date) AS 'Date'
FROM MetaTable
GROUP BY
Code,
Meta_ID
)
SELECT
T1.Code,
T1.Meta,
T1.Meta_ID,
STUFF
(
(
SELECT ',' + Code
FROM CTE_Meta T2
WHERE ISNULL(T1.Meta, '') = ISNULL(T2.Meta, '')
FOR XML PATH('')
), 1, 1, ''
) AS 'List_Code',
T1.Date
FROM CTE_Meta T1
ORDER BY 1
I like the first answer using XML. It's very concise. This is more verbose, but might be more flexible if the data can have different meta values spread about in different records. The CAST to varchar(12) in various places is just for the display. I use STRING_AGG and STRING_SPLIT instead of XML.
WITH TestData as (
SELECT t.*
FROM (
Values
('A', '1,2', '1', '01/01/2022 08:08:08'),
('B', '1,2', '2', '01/01/2022 02:00:00'),
('B', null, '2', '01/01/1900 02:00:00'),
('C', null, '3', '01/01/2022 02:00:00'),
('D', '8', '8', '01/01/2022 02:00:00'),
('E', '5,6,7', '5', '01/01/2022 02:00:00'),
('F', '1,2', '2', '01/01/2022 02:00:00'),
('G', '16', '17', '01/01/2022 02:00:00'),
('G', null, '17', '01/02/2022 03:00:00'),
('G', '19', '18', '01/03/2022 04:00:00'),
('G', '19', '18', '01/03/2022 04:00:00'),
('G', '20', '19', '01/04/2022 05:00:00'),
('G', '20', '20', '01/05/2022 06:00:00')
) t (Code, meta, meta_ID, date)
), CodeLookup as ( -- used to find the Code from the meta_ID
SELECT DISTINCT meta_ID, Code
FROM TestData
), Normalized as ( -- split out the meta values, one per row
SELECT t.Code, s.Value as [meta], meta_ID, [date]
FROM TestData t
OUTER APPLY STRING_SPLIT(t.meta, ',') s
), MetaLookup as ( -- used to find the distinct list of meta values for a Code
SELECT n.Code, CAST(STRING_AGG(n.meta, ',') WITHIN GROUP ( ORDER BY n.meta ASC ) as varchar(12)) as [meta]
FROM (
SELECT DISTINCT Code, meta
FROM Normalized
WHERE meta is not NULL
) n
GROUP BY n.Code
), MetaIdLookup as ( -- used to find the distinct list of meta_ID values for a Code
SELECT n.Code, CAST(STRING_AGG(n.meta_ID, ',') WITHIN GROUP ( ORDER BY n.meta_ID ASC ) as varchar(12)) as [meta_ID]
FROM (
SELECT DISTINCT Code, meta_ID
FROM Normalized
) n
GROUP BY n.Code
), ListCodeLookup as ( -- for every code, get all codes for the meta values
SELECT l.Code, CAST(STRING_AGG(l.lookupCode, ',') WITHIN GROUP ( ORDER BY l.lookupCode ASC ) as varchar(12)) as [list_Code]
FROM (
SELECT DISTINCT n.Code, c.Code as [lookupCode]
FROM Normalized n
INNER JOIN CodeLookup c
ON c.meta_ID = n.meta
UNION -- every record needs it's own code in the list_code?
SELECT DISTINCT n.Code, n.Code as [lookupCode]
FROM Normalized n
) l
GROUP BY l.Code
)
SELECT t.Code, m.meta, mi.meta_ID, lc.list_Code, t.[date]
FROM (
SELECT Code, MAX([date]) as [date]
FROM TestData
GROUP BY Code
) t
LEFT JOIN MetaLookup m
ON m.Code = t.Code
LEFT JOIN MetaIdLookup mi
ON mi.Code = t.Code
LEFT JOIN ListCodeLookup lc
ON lc.Code = t.Code
Code meta meta_ID list_Code date
---- ------------ ------------ ------------ -------------------
A 1,2 1 A,B,F 01/01/2022 08:08:08
B 1,2 2 A,B,F 01/01/2022 02:00:00
C NULL 3 C 01/01/2022 02:00:00
D 8 8 D 01/01/2022 02:00:00
E 5,6,7 5 E 01/01/2022 02:00:00
F 1,2 2 A,B,F 01/01/2022 02:00:00
G 16,19,20 17,18,19,20 G 01/05/2022 06:00:00

need to get a subsequent record with a specific value

i have the following table :
Dt Status
05.23.2019 10:00:00 A
05.23.2019 11:00:00 B
05.23.2019 12:00:00 B
05.23.2019 13:00:00 D
05.23.2019 14:00:00 A
05.23.2019 15:00:00 B
05.23.2019 16:00:00 C
05.23.2019 17:00:00 D
05.23.2019 18:00:00 A
For each status A i need to get the next status D. The result should be like this :
Status1 Status2 Dt1 Dt2
A D 05.23.2019 10:00:00 05.23.2019 13:00:00
A D 05.23.2019 14:00:00 05.23.2019 17:00:00
A null 05.23.2019 18:00:00 null
I have my own solution based on cross/outer apply , In terms of performance i need solution without cross/outer apply.
We can try using ROW_NUMBER here along with some pivot logic:
WITH cte AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY Status ORDER BY Dt) rn
FROM yourTable
WHERE Status IN ('A', 'D')
)
SELECT
MAX(CASE WHEN Status = 'A' THEN Status END) AS Status1,
MAX(CASE WHEN Status = 'D' THEN Status END) AS Status2,
MAX(CASE WHEN Status = 'A' THEN Dt END) AS Dt1,
MAX(CASE WHEN Status = 'D' THEN Dt END) AS Dt2
FROM cte
GROUP BY rn
ORDER BY rn;
Demo
The idea here is to generate a row number sequence along your entire table, for each separate Status value (A or D). Then, aggregate by that row number sequence to bring the A and D records together.
As result columns Status1 and Status2 always seem to be "A" and "D" respectively, I omitted them in my result.
DECLARE #Data TABLE
(
[Dt] SMALLDATETIME,
[Status] CHAR(1)
);
INSERT INTO #Data ([Dt], [Status]) VALUES
('2019-05-23 10:00:00', 'A'),
('2019-05-23 11:00:00', 'B'),
('2019-05-23 12:00:00', 'B'),
('2019-05-23 13:00:00', 'D'),
('2019-05-23 14:00:00', 'A'),
('2019-05-23 15:00:00', 'B'),
('2019-05-23 16:00:00', 'C'),
('2019-05-23 17:00:00', 'D'),
('2019-05-23 18:00:00', 'A'),
('2019-05-23 19:00:00', 'D'),
('2019-05-23 20:00:00', 'D'),
('2019-05-23 21:00:00', 'A'),
('2019-05-23 22:00:00', 'A'),
('2019-05-23 23:00:00', 'A');
SELECT
D.[Dt] AS [Dt1],
[LastDBeforeNextA].[Dt] AS [Dt2]
FROM
#Data AS D
OUTER APPLY (SELECT TOP (1) [Dt]
FROM #Data
WHERE [Status] = 'A' AND [Dt] > D.[Dt]
ORDER BY [Dt]) AS [NextA]
OUTER APPLY (SELECT TOP (1) [Dt]
FROM #Data
WHERE [Status] = 'D' AND [Dt] < [NextA].[Dt] AND [Dt] > D.[Dt]
ORDER BY [Dt] DESC) AS [LastDBeforeNextA]
WHERE
D.[Status] = 'A' AND
([NextA].[Dt] > [LastDBeforeNextA].[Dt] OR ([LastDBeforeNextA].[Dt] IS NULL AND [NextA].[Dt] IS NULL))
It initially gets all records from the table where status is 'A' (using expression D.[Status] = 'A' in the WHERE-clause).
For each record found, it joins the date of the next record with status A (table expression with alias NextA) and the date of the last record with status D that comes right before the next A-record but after the current A-record (table expression with alias LastDBeforeNextA).
Results are valid when a D-record is found (expression [NextA].[Dt] > [LastDBeforeNextA].[Dt] in the WHERE-clause) or when there is no D-record yet (expression [LastDBeforeNextA].[Dt] IS NULL in the WHERE-clause). In the latter case, you need to get the latest A-record, however (expression [NextA].[Dt] IS NULL in the WHERE-clause), since there can be multiple A-records after the last D-record.

Calculate total time worked in a day with multiple stops and starts

I can use DATEDIFF to find the difference between one set of dates like this
DATEDIFF(MINUTE, #startdate, #enddate)
but how would I find the total time span between multiple sets of dates? I don't know how many sets (stops and starts) I will have.
The data is on multiple rows with start and stops.
ID TimeStamp StartOrStop TimeCode
----------------------------------------------------------------
1 2017-01-01 07:00:00 Start 1
2 2017-01-01 08:15:00 Stop 2
3 2017-01-01 10:00:00 Start 1
4 2017-01-01 11:00:00 Stop 2
5 2017-01-01 10:30:00 Start 1
6 2017-01-01 12:00:00 Stop 2
This code would work assuming that your table only store data from one person, and they should be of the order Start/Stop/Start/Stop
WITH StartTime AS (
SELECT
TimeStamp
, ROW_NUMBER() PARTITION BY (ORDER BY TimeStamp) RowNum
FROM
<<table>>
WHERE
TimeCode = 1
), StopTime AS (
SELECT
TimeStamp
, ROW_NUMBER() PARTITION BY (ORDER BY TimeStamp) RowNum
FROM
<<table>>
WHERE
TimeCode = 2
)
SELECT
SUM (DATEDIFF( MINUTE, StartTime.TimeStamp, StopTime.TimeStamp )) As TotalTime
FROM
StartTime
JOIN StopTime ON StartTime.RowNum = StopTime.RowNum
This will work if your starts and stops are reliable. Your sample has two starts in order - 10:00 and 10:30 starts. I assume in production you will have an employee id to group on, so I added this to the sample data in place of the identity column.
Also in production, the CTE sets will be reduced by using a parameter on date. If there are overnight shifts, you would want your stops CTE to use dateadd(day, 1, #startDate) as your upper bound when retrieving end date.
Set up sample:
declare #temp table (
EmpId int,
TimeStamp datetime,
StartOrStop varchar(55),
TimeCode int
);
insert into #temp
values
(1, '2017-01-01 07:00:00', 'Start', 1),
(1, '2017-01-01 08:15:00', 'Stop', 2),
(1, '2017-01-01 10:00:00', 'Start', 1),
(1, '2017-01-01 11:00:00', 'Stop', 2),
(2, '2017-01-01 10:30:00', 'Start', 1),
(2, '2017-01-01 12:00:00', 'Stop', 2)
Query:
;with starts as (
select t.EmpId,
t.TimeStamp as StartTime,
row_number() over (partition by t.EmpId order by t.TimeStamp asc) as rn
from #temp t
where Timecode = 1 --Start time code?
),
stops as (
select t.EmpId,
t.TimeStamp as EndTime,
row_number() over (partition by t.EmpId order by t.TimeStamp asc) as rn
from #temp t
where Timecode = 2 --Stop time code?
)
select cast(min(sub.StartTime) as date) as WorkDay,
sub.EmpId as Employee,
min(sub.StartTime) as ClockIn,
min(sub.EndTime) as ClockOut,
sum(sub.MinutesWorked) as MinutesWorked
from
(
select strt.EmpId,
strt.StartTime,
stp.EndTime,
datediff(minute, strt.StartTime, stp.EndTime) as MinutesWorked
from starts strt
inner join stops stp
on strt.EmpId = stp.EmpId
and strt.rn = stp.rn
)sub
group by sub.EmpId
This works assuming your table has an incremental ID and interleaving start/stop records
--Data sample as provided
declare #temp table (
Id int,
TimeStamp datetime,
StartOrStop varchar(55),
TimeCode int
);
insert into #temp
values
(1, '2017-01-01 07:00:00', 'Start', 1),
(2, '2017-01-01 08:15:00', 'Stop', 2),
(3, '2017-01-01 10:00:00', 'Start', 1),
(4, '2017-01-01 11:00:00', 'Stop', 2),
(5, '2017-01-01 10:30:00', 'Start', 1),
(6, '2017-01-01 12:00:00', 'Stop', 2)
--let's see every pair start/stop and discard stop/start
select start.timestamp start, stop.timestamp stop,
datediff(mi,start.timestamp,stop.timestamp) minutes
from #temp start inner join #temp stop
on start.id+1= stop.id and start.timecode=1
--Sum all for required result
select sum(datediff(mi,start.timestamp,stop.timestamp) ) totalMinutes
from #temp start inner join #temp stop
on start.id+1= stop.id and start.timecode=1
Results
+-------------------------+-------------------------+---------+
| start | stop | minutes |
+-------------------------+-------------------------+---------+
| 2017-01-01 07:00:00.000 | 2017-01-01 08:15:00.000 | 75 |
| 2017-01-01 10:00:00.000 | 2017-01-01 11:00:00.000 | 60 |
| 2017-01-01 10:30:00.000 | 2017-01-01 12:00:00.000 | 90 |
+-------------------------+-------------------------+---------+
+--------------+
| totalMinutes |
+--------------+
| 225 |
+--------------+
Maybe the tricky part is the join clause. We need to join #table with itself by deferring 1 ID. Here is where on start.id+1= stop.id did its work.
In the other hand, for excluding stop/start couple we use start.timecode=1. In case we don't have a column with this information, something like stop.id%2=0 works just fine.

SQL Server - Selecting periods without changes in data

What I am trying to do is to select periods of time where the rest of data in the table was stable based on one column and check was there a change in second column value in this period.
Table:
create table #stable_periods
(
[Date] date,
[Car_Reg] nvarchar(10),
[Internal_Damages] int,
[External_Damages] int
)
insert into #stable_periods
values ('2015-08-19', 'ABC123', 10, 10),
('2015-08-18', 'ABC123', 9, 10),
('2015-08-17', 'ABC123', 8, 9),
('2015-08-16', 'ABC123', 9, 9),
('2015-08-15', 'ABC123', 10, 10),
('2015-08-14', 'ABC123', 10, 10),
('2015-08-19', 'ABC456', 5, 3),
('2015-08-18', 'ABC456', 5, 4),
('2015-08-17', 'ABC456', 8, 4),
('2015-08-16', 'ABC456', 9, 4),
('2015-08-15', 'ABC456', 10, 10),
('2015-01-01', 'ABC123', 1, 1),
('2015-01-01', 'ABC456', NULL, NULL);
--select * from #stable_periods
-- Unfortunately I can’t post pictures yet but you get the point of how the table looks like
What I would like to receive is
Car_Reg FromDate ToDate External_Damages Have internal damages changed in this period?
ABC123 2015-08-18 2015-08-19 10 Yes
ABC123 2015-08-16 2015-08-17 9 Yes
ABC123 2015-08-14 2015-08-15 10 No
ABC123 2015-01-01 2015-01-01 1 No
ABC456 2015-08-19 2015-08-19 3 No
ABC456 2015-08-16 2015-08-18 4 Yes
ABC456 2015-08-15 2015-08-15 10 No
ABC456 2015-01-01 2015-01-01 NULL NULL
Basically to build period frames where [External_Damages] were constant and check did the [Internal_Damages] change in the same period (doesn't matter how many times).
I spend a lot of time trying but I am afraid that my level of abstraction thinking in much to low...
Will be great to see any suggestions.
Thanks,
Bartosz
I believe this is a form of Islands Problem.
Here is a solution using ROW_NUMBER and GROUP BY:
SQL Fiddle
WITH CTE AS(
SELECT *,
RN = DATEADD(DAY, - ROW_NUMBER() OVER(PARTITION BY Car_reg, External_Damages ORDER BY [Date]), [Date])
FROM #stable_periods
)
SELECT
Car_Reg,
FromDate = MIN([Date]),
ToDate = MAX([Date]) ,
External_Damages,
Change =
CASE
WHEN MAX(External_Damages) IS NULL THEN NULL
WHEN COUNT(DISTINCT Internal_Damages) > 1 THEN 'Yes'
ELSE 'No'
END
FROM CTE c
GROUP BY Car_Reg, External_Damages, RN
ORDER BY Car_Reg, ToDate DESC

SQL - Setting Value From Hierarchical Children

I am writing an application which gets task data from a project planning MS SQL table (let's call the table tasks). For simplicity the table fields can be thought of as follows:
task_id, parent_id, name, start_date, end_date
All parent tasks have NULL as start and end dates. Only the children (with no children of their own) have a start and end date.
I want to get the tasks data and in the process set the start date of each parent based upon the earliest start date of all the parent's children and recursive grandchildren and set the end date to be the latest end date of all the children and recursive grandchildren. Is this possible please?
I assume from your question that you use Sql Server. I think this is what you want. It is done with recursive common table expression. It begins with leaf children and goes up to top most parents:
DECLARE #t TABLE(id INT, pid INT, sd DATE, ed DATE)
INSERT INTO #t VALUES
(1, NULL, NULL, NULL),
(2, 1, NULL, NULL),
(3, 2, '20150201', '20150215'),
(4, 2, '20150101', '20150201'),
(5, 1, NULL, NULL),
(6, 5, '20150301', '20150401'),
(7, 1, NULL, NULL),
(8, 7, NULL, NULL),
(9, 8, '20140101', '20141230'),
(10, 8, '20140102', '20141231')
;WITH cte AS(
SELECT * FROM #t WHERE sd IS NOT NULL
UNION ALL
SELECT t.id, t.pid, c.sd, c.ed FROM #t t
JOIN cte c ON c.pid = t.id
)
SELECT id, pid, MIN(sd) AS sd, MAX(ed) AS ed
FROM cte
GROUP BY id, pid
ORDER BY id
Output:
id pid sd ed
1 NULL 2014-01-01 2015-04-01
2 1 2015-01-01 2015-02-15
3 2 2015-02-01 2015-02-15
4 2 2015-01-01 2015-02-01
5 1 2015-03-01 2015-04-01
6 5 2015-03-01 2015-04-01
7 1 2014-01-01 2014-12-31
8 7 2014-01-01 2014-12-31
9 8 2014-01-01 2014-12-30
10 8 2014-01-02 2014-12-31