How to extend data with respect to incomplete dates in T-SQL? - sql

I have the first table like below:
Node Date Value
01R-123 2023-01-10 09
01R-123 2023-01-09 11
01R-123 2023-01-08 18
01R-123 2023-01-07 87
01R-123 2023-01-06 32
01R-123 2023-01-05 22
01R-123 2023-01-04 16
01R-123 2023-01-03 24
01R-123 2023-01-02 24
01R-123 2023-01-01 24
And second table like this :
Node Timestamp Method
01R-123 2023-01-10 Jet
01R-123 2023-01-09 Jet
01R-123 2023-01-08 Jet
01R-123 2023-01-05 Jet
01R-123 2023-01-04 Jet
01R-123 2023-01-03 Jet
01R-123 2022-12-30 Jet
01R-123 2022-12-29 Jet
01R-123 2022-12-28 Jet
01R-123 2022-12-25 Jet
These two tables are joined according below detail:
Based on two conditions-
First: a.[Node] = b.[Node]
Second: a.[Date] = b.[Timestamp]
Now the question is:
In the first table, date is continuous but not in the second table AND when both tables are joined using above condition the dates and correspondent values available in the second table are shown. But I need to get the date in a continuous manner.
After all, I need data like below table:
Node Date Value Method
01R-123 2023-01-10 09 Jet
01R-123 2023-01-09 11 Jet
01R-123 2023-01-08 18 Jet
01R-123 2023-01-07 87 Jet
01R-123 2023-01-06 32 Jet
01R-123 2023-01-05 22 Jet
01R-123 2023-01-04 16 Jet
01R-123 2023-01-03 24 Jet
01R-123 2023-01-02 24 Jet
01R-123 2023-01-01 24 Jet
Again, joining condition for both table is also DATE

This will work up to 3 days gap, you can easily extend it.
CREATE TABLE Table1
([Node] varchar(7), [Date] datetime, [Value] int)
;
INSERT INTO Table1
([Node], [Date], [Value])
VALUES
('01R-123', '2023-01-10 00:00:00', 09),
('01R-123', '2023-01-09 00:00:00', 11),
('01R-123', '2023-01-08 00:00:00', 18),
('01R-123', '2023-01-07 00:00:00', 87),
('01R-123', '2023-01-06 00:00:00', 32),
('01R-123', '2023-01-05 00:00:00', 22),
('01R-123', '2023-01-04 00:00:00', 16),
('01R-123', '2023-01-03 00:00:00', 24),
('01R-123', '2023-01-02 00:00:00', 24),
('01R-123', '2023-01-01 00:00:00', 24)
;
CREATE TABLE Table2
([Node] varchar(7), [Timestamp] datetime, [Method] varchar(3))
;
INSERT INTO Table2
([Node], [Timestamp], [Method])
VALUES
('01R-123', '2023-01-10 00:00:00', 'Jet'),
('01R-123', '2023-01-09 00:00:00', 'Jet'),
('01R-123', '2023-01-08 00:00:00', 'Jet'),
('01R-123', '2023-01-05 00:00:00', 'Jet'),
('01R-123', '2023-01-04 00:00:00', 'Jet'),
('01R-123', '2023-01-03 00:00:00', 'Jet'),
('01R-123', '2022-12-30 00:00:00', 'Jet'),
('01R-123', '2022-12-29 00:00:00', 'Jet'),
('01R-123', '2022-12-28 00:00:00', 'Jet'),
('01R-123', '2022-12-25 00:00:00', 'Jet')
;
select a.*, coalesce(b.method,c.method,d.method,e.method) Method
from table1 a
left join table2 b on
a.[Node] = b.[Node]
and a.[Date] = b.[Timestamp]
left join table2 c on
a.[Node] = c.[Node]
and a.[Date] = dateadd(d,1,c.[Timestamp])
left join table2 d on
a.[Node] = d.[Node]
and a.[Date] = dateadd(d,2,d.[Timestamp])
left join table2 e on
a.[Node] = e.[Node]
and a.[Date] = dateadd(d,3,e.[Timestamp])
order by a.[Date]
TEST
http://sqlfiddle.com/#!18/dd9a3/14

I suggest using last_value with the IGNORE NULLS option. Something like:
SELECT a.*, LAST_VALUE(b.method) IGNORE NULLS OVER (PARTITION BY a.node ORDER BY a.date DESC) method
FROM a LEFT OUTER JOIN b
ON a.node = b.node AND a.date = b.timestamp
ORDER BY a.node, a.date DESC
You can see a Fiddle of it here.
This looks for the previous non-null value of b.method within this a.node grouping (because of the partition by a.node) ordered by a.date descending. Since this looks at all previous values including the current row (default behavior if no rows/range is specified in the OVER clause), there's no need to have a special case (e.g. CASE or COALESCE) to handle the non-null scenario.
For an older version of SQL Server that cannot do last_value with IGNORE NULLS (as per request in comments), maybe you could do something like the following. I'm confident this works in SQL Server 2016, but I'm not sure if there are nicer ways of doing it.
SELECT node, date, value, method
FROM
(
SELECT a.*, b.method, row_number() OVER (partition by a.node, a.date ORDER BY CASE WHEN method IS NULL THEN 1 ELSE 0 END, datediff(day, a.date, b.timestamp)) rn
FROM a LEFT OUTER JOIN b
ON a.node = b.node AND a.date <= b.timestamp
) sq
WHERE rn = 1
ORDER BY node, date desc
The idea here is to do the join with a <= (which will include the match we want as well as others) then use row_number to prioritize the best match (we sort the non-null values first, then datediff(day, a.date, b.timestamp) to get the closest timestamp).

Related

SQL join table and limiting condition matching result

I have a table of rates and transactions from which I want to find out conversion rate based on the latest updated currency rate (related to transaction timestamp)
Table - rates
('2018-04-01 00:00:00', 'EUR', 'RUB', '1.71'),
('2018-04-01 01:00:05', 'EUR', 'RUB', '1.82'),
('2018-04-01 00:00:00', 'USD', 'RUB', '0.71'),
('2018-04-01 00:00:05', 'USD', 'RUB', '0.82'),
('2018-04-01 00:01:00', 'USD', 'RUB', '0.92'),
('2018-04-01 01:02:00', 'USD', 'RUB', '0.62'),
Table - transactions
('2018-04-01 00:00:00', 1, 'EUR', 2.45),
('2018-04-01 01:00:00', 1, 'EUR', 8.45),
('2018-04-01 01:30:00', 1, 'USD', 3.5),
My attempt to limit those additional data
select * from transactions tr1
left outer join rates ex1
on tr1.ts >= ex1.ts
and tr1.currency = ex1.from_currency;
The result I'm getting contains all of the exchange rate update that has happened previously
2
2018-04-01 00:00:00 1 EUR 2.45 2018-04-01 00:00:00 EUR RUB 1.71 (correct)
2018-04-01 01:00:00 1 EUR 8.45 2018-04-01 00:00:00 EUR RUB 1.71 (correct)
2018-04-01 01:30:00 1 USD 3.5 2018-04-01 01:02:00 USD RUB 0.62 (only this should remain)
2018-04-01 01:30:00 1 USD 3.5 2018-04-01 00:01:00 USD RUB 0.92
2018-04-01 01:30:00 1 USD 3.5 2018-04-01 00:00:05 USD RUB 0.82
2018-04-01 01:30:00 1 USD 3.5 2018-04-01 00:00:00 USD RUB 0.71
I tried to define my own statement (my previous query):
where ex1.ts = (select max(ex2.ts) from rates ex2
where ex2.from_currency=ex1.from_currency
and ex2.to_currency=ex1.from_currency);
But that doesn't return anything...
Postgres has the very handy distinct on for getting one row per group. This should do what you want:
select t.*, r.*
from transactions t left join
(select distinct on (from_currency, to_currency) r.*
from rates r
order by from_currency, to_currency, ts desc
) r
on r.from_currency = t.currency and
r.to_currency = 'RUB';
EDIT:
If you want the latest date for each transaction, then use a lateral join:
select t.*, r.*
from transactions t left join lateral
(select r.*
from rates r
where r.from_currency = t.currency and
r.ts <= t.ts
order by r.ts desc
limit 1
) r
on 1=1;
You can solve this using Window/Analytic functions. In the partition, we order by ex1.ts in descending order, so that we get the rate closest to the transaction timestamp ts. tr1.ts >= ex.ts condition in the Join ensures that we are only getting the exchange rates on or before the transaction time.
select dt.*
from
(
select tr1.*, ex1.*,
row_number() over (partition by ex1.from_currency order by ex1.ts desc) as rn
from transactions tr1
left outer join exchange_rates ex1
on tr1.ts >= ex1.ts
and tr1.currency = ex1.from_currency
) as dt
where dt.rn = 1

create dynamic records from time stamps

I have the following table:
Id Date Time Location leadHourDiff
3 2017-01-01 2017-01-01 13:00:00.000 Boston 2
15 2017-01-01 2017-01-01 13:00:00.000 Philly 1
16 2017-01-01 2017-01-01 15:00:00.000 Philly 1
and i would like dynamically create the hour records between Time and (Time + leadHourDiff)
so the end result would be:
Date Time Location
2017-01-01 2017-01-01 13:00:00.000 Boston --main record
2017-01-01 2017-01-01 14:00:00.000 Boston --new record
2017-01-01 2017-01-01 15:00:00.000 Boston --new record
2017-01-01 2017-01-01 13:00:00.000 Philly --main record
2017-01-01 2017-01-01 14:00:00.000 Philly --new record
2017-01-01 2017-01-01 15:00:00.000 Philly --main record
2017-01-01 2017-01-01 16:00:00.000 Philly --new record
One option is to use a numbers table (This can be generated with a recursive cte) and join the leadHourDiff column on to that.
with numbers(num) as (select 0
union all
select num+1 from numbers where num < 100 --change this as needed
)
select t.*,dateadd(hour,n.num,t.datetime_col) as new_datetime
from tbl t
join numbers n on t.leadHourDiff >= n.num
A simple way is to use a recursive CTE:
with cte as (
select id, date, time, Location, leadHourDiff
from t
union all
select id, date, dateadd(hour, 1, time), location, leadHourDiff - 1
from cte
where leadHourDiff >= 0
)
select date, time, Location
from cte
order by location, date, time;
Here's how I ended up doing this. Also, forgot to mention that I only wanted the missing time values. That was an tpyo on my part. Here's the whole solution
CREATE TABLE #Orders(
Id int IDENTITY(1,1)
,[Time] datetime
,[Location] varchar(20)
,OrderAmt int
)
INSERT INTO #Orders
SELECT '2017-01-01 11:00:00', 'Boston', 23 UNION ALL
SELECT '2017-01-01 12:00:00', 'Boston', 31 UNION ALL
SELECT '2017-01-01 13:00:00', 'Boston', 45 UNION ALL
SELECT '2017-01-01 16:00:00', 'Boston', 45 UNION ALL ---15
SELECT '2017-01-01 17:00:00', 'Boston', 67 UNION ALL
SELECT '2017-01-01 18:00:00', 'Boston', 89 UNION ALL
SELECT '2017-01-01 19:00:00', 'Boston', 90 UNION ALL
SELECT '2017-01-01 20:00:00', 'Boston', 123 UNION ALL
SELECT '2017-01-01 21:00:00', 'Boston', 145 UNION ALL
SELECT '2017-01-01 22:00:00', 'Boston', 156 UNION ALL
SELECT '2017-01-01 23:00:00', 'Boston', 145 UNION ALL
SELECT '2017-01-02 00:00:00', 'Boston', 167 UNION ALL
SELECT '2017-01-01 11:00:00', 'Philly', 23 UNION ALL
SELECT '2017-01-01 12:00:00', 'Philly', 31 UNION ALL
SELECT '2017-01-01 13:00:00', 'Philly', 45 UNION ALL
SELECT '2017-01-01 15:00:00', 'Philly', 45 UNION ALL
SELECT '2017-01-01 17:00:00', 'Philly', 67 UNION ALL
SELECT '2017-01-01 18:00:00', 'Philly', 89 UNION ALL
SELECT '2017-01-01 19:00:00', 'Philly', 90 UNION ALL
SELECT '2017-01-01 20:00:00', 'Philly', 123 UNION ALL
SELECT '2017-01-01 21:00:00', 'Philly', 145 UNION ALL
SELECT '2017-01-01 22:00:00', 'Philly', 156 UNION ALL
SELECT '2017-01-01 23:00:00', 'Philly', 145 UNION ALL
SELECT '2017-01-02 00:00:00', 'Philly', 167
;WITH HourDiff AS (
SELECT *
FROM
(
SELECT
Id
,CAST([Time] AS date) AS [Date]
,[Time]
,[Location]
,COALESCE(lead(DATEPART(HOUR, [Time])) OVER(PARTITION BY [Location], CAST([Time] AS date) ORDER BY [Time] ASC ) - DATEPART(HOUR, [Time]),1)-1 AS leadHourDiff
FROM #Orders
) t1
WHERE t1.leadHourDiff <> 0
)
, CTE AS (
SELECT
Location
,DATEADD(HOUR, leadHourDiff, [Time]) AS missingTime
FROM HourDiff
UNION ALL
SELECT
Location
,DATEADD(HOUR, leadHourDiff - 1, [Time]) AS missingTime
FROM HourDiff
WHERE Time < DATEADD(HOUR, leadHourDiff - 1, [Time])
)
SELECT
Location
,CAST(missingTime AS time) AS missingTime
FROM CTE
ORDER BY Location, missingTime
DROP TABLE #Orders
Final result:
Location missingTime
Boston 14:00:00.000
Boston 15:00:00.000
Philly 14:00:00.000
Philly 16:00:00.000
UPDATE:
here's an update..the final CTE was not working properly when i add new data for new york
new data for new york:
SELECT '2017-01-01 11:00:00', 'New York', 23 UNION ALL
SELECT '2017-01-01 20:00:00', 'New York', 31 UNION ALL
new final CTE:
, CTE AS (
SELECT
Location
,DATEADD(HOUR, leadHourDiff, [Time]) AS missingTime
,[Time]
,leadHourDiff
FROM HourDiff
UNION ALL
SELECT
Location
,DATEADD(HOUR, leadHourDiff - 1 , [Time]) AS missingTime
,[Time]
,leadHourDiff - 1
FROM CTE
WHERE leadHourDiff >= 0
AND Time < DATEADD(HOUR, leadHourDiff - 1, [Time])
)
Final result:
Location missingTime
Boston 14:00:00.0000000
Boston 15:00:00.0000000
New York 12:00:00.0000000
New York 13:00:00.0000000
New York 14:00:00.0000000
New York 15:00:00.0000000
New York 16:00:00.0000000
New York 17:00:00.0000000
New York 18:00:00.0000000
New York 19:00:00.0000000
Philly 14:00:00.0000000
Philly 16:00:00.0000000

modify output from 1 query

need Your suggestion Guy's. I don't know what the title of my question. but I has 1 query which give an ouput like this picture :
and this is my query :
select to_char(aa.DATE_AWAL, 'dd/mm/yyyy hh24:mi') DATE_AWAL, to_char(aa.DATE_AKHIR, 'dd/mm/yyyy hh24:mi') DATE_AKHIR,
to_char(aa.DATE_AWAL, 'hh24:mi') TIME_AWAL, to_char(aa.DATE_AKHIR, 'hh24:mi') TIME_AKHIR,
cc.NAMARUANG,aa.IDMEETING from TMEETING_ROOM aa
inner join MMEETING_TYPE bb on aa.IDTYPE=bb.IDMEETING
inner join MMEETING_ROOM cc on aa.IDMEETINGROOM = cc.IDMEETINGROOM
inner join HR.VWKARYAWAN dd on aa.IDPENGUSUL=dd.IDKARYAWAN
inner join HR.MLOKASI ee on aa.IDLOKASI = ee.IDLOKASI
where aa.IS_DELETE IS NULL
and aa.IDCANCEL IS NULL
and (
wm_overlaps (
wm_period(aa.DATE_AWAL, aa.DATE_AKHIR),
wm_period(
TO_DATE(TO_CHAR(trunc(sysdate) + 08/24, 'yyyy-mm-dd hh24:mi'), 'yyyy-mm-dd hh24:mi'),
TO_DATE(TO_CHAR(trunc(sysdate) + 23/24, 'yyyy-mm-dd hh24:mi'), 'yyyy-mm-dd hh24:mi')
)
) = 1
) and aa.idlokasi = 'I' order by cc.NAMARUANG asc, aa.DATE_AWAL asc;
Can any body give me suggestion how to make from this query can like this picture:
I'm newbie using oracle SQL
Note: the time and room are dynamic.
Here is an example of how you might achieve a "generic" pivot table in MySQL.
The technique used requires row numbering (and in v8 of MySQL there will be an easier way to do this) but for now it requires using #variables.
Then, with each row number, we "transform" rows to columns using case expressions inside he max() function (conditional aggregates).
You will need to decide how many columns you need, and note that the order by inside the subquery t is vital to successfully arranging the data.
SQL Fiddle
MySQL 5.6 Schema Setup:
CREATE TABLE YourQueryHere
(`id` int, `date_column` datetime, `code_column` varchar(7), `data_for_cells1` varchar(5), `data_for_cells2` varchar(5))
;
INSERT INTO YourQueryHere
(`id`, `date_column`, `code_column`, `data_for_cells1`, `data_for_cells2`)
VALUES
(1, '2017-11-14 00:00:00', 'Bintang', '09:00', '10:30'),
(2, '2017-11-14 00:00:00', 'Bintang', '11:00', '12:30'),
(3, '2017-11-14 00:00:00', 'Bintang', '14:00', '17:00'),
(4, '2017-11-14 00:00:00', 'Sapporo', '11:30', '14:00'),
(5, '2017-11-14 00:00:00', 'Sapporo', '14:30', '15:00'),
(6, '2017-11-14 00:00:00', 'Tiger', '08:00', '09:30'),
(7, '2017-11-14 00:00:00', 'Tiger', '11:00', '12:00')
;
Query 1:
select
code_column
, max(case when RowNumber = 1 then concat(data_for_cells1, ' ', data_for_cells2) end) as pivcol1
, max(case when RowNumber = 2 then concat(data_for_cells1, ' ', data_for_cells2) end) as pivcol2
, max(case when RowNumber = 3 then concat(data_for_cells1, ' ', data_for_cells2) end) as pivcol3
, max(case when RowNumber = 4 then concat(data_for_cells1, ' ', data_for_cells2) end) as pivcol4
from (
select *
, #counter :=IF(#prev=code_column,#counter+1,1)AS RowNumber
, #prev := code_column
from YourQueryHere
cross join (select #counter:=0, #prev:= '') vars
order by
code_column, date_column
) t
group by
code_column
order by
code_column
;
Results:
| code_column | pivcol1 | pivcol2 | pivcol3 | pivcol4 |
|-------------|-------------|-------------|-------------|---------|
| Bintang | 09:00 10:30 | 11:00 12:30 | 14:00 17:00 | (null) |
| Sapporo | 11:30 14:00 | 14:30 15:00 | (null) | (null) |
| Tiger | 08:00 09:30 | 11:00 12:00 | (null) | (null) |

Complex query with multiple conditions

The 'complex'-part of the title might be subjective, but for me, it is rather complex.
I have a table called Contracts (C) and FinancialYears (FY). A contract will have multiple financial years (one per year), created automatically, if a specific status is met (for example, cancelled contracts won't get new financial year records, but approved contracts will). It's the FY that is having a specific status each years. For example:
--------------------FinancialYears-------------------
ContractID: 1 | 1 | 1
StatusID: 2 | 3 | 5
dStart: 01-01-2012 | 01-01-2013 | 01-01-2014
dEnd: 31-12-2012 | 31-12-2013 | 31-12-2014
Year: 2012 | 2013 | 2014
-----------------------------------------------------
(For example: StatusID (2, 3, 5), (Proposed, Approved, Cancelled))
Now assume a user wants to find out how many contracts are approved at this point of time, then the query should be looking at the most recent financial year of the contract, and that's what I'm having a hard time with.
I have to write a query that does the following:
SELECT *
FROM Contracts C
INNER JOIN FinancialYears FY ON FY.ContractID = C.ContractID
WHERE StatusID = X AND (dStart < GETDATE() AND dEnd > GETDATE())
// This would search on the financial year of the contract which has its valid
period in-between today.
But since a, for example, cancelled contract will not have a new financial year in the next year, I would never be able to query today on a cancelled contract of 2014, so I need to adjust the following condition to the query somehow:
// IF (dStart < GETDATE() AND dEnd > GETDATE()) RETURNS 0, THEN DO INSTEAD:
SELECT TOP 1
//
WHERE (dEnd < GETDATE)
ORDER BY ENDDATE DESC
// With other words: if there is no ongoing financial year between the given time interval,
then select the most recent financial year in the past.
Could anyone help me out here?
Thank you.
Here's a quick mock-up:
SELECT *
FROM Contracts C
cross apply (
select top 1 ContractID
from FinancialYears where dStart < GETDATE()
order by dEnd desc
) F on C.ConractID = F.ContractID
But you'll probably need some extra criteria to find all of the Contracts, for example customer code or something.
You just need to filter the contracts where the current date falls between the start and end date of contracts that are approved if I'm not mistaken.
Here's a demo SQL Fiddle
MS SQL Server Schema Setup:
CREATE TABLE FinancialYearContracts
([ContractID] int, [StatusID] int, [dStart] datetime, [dEnd] datetime, [Year] int)
;
INSERT INTO FinancialYearContracts
([ContractID], [StatusID], [dStart], [dEnd], [Year])
VALUES
(1, 2, '2012-01-01 00:00:00', '2012-12-31 00:00:00', 2012),
(1, 3, '2013-01-01 00:00:00', '2013-12-31 00:00:00', 2013),
(1, 5, '2014-01-01 00:00:00', '2014-12-31 00:00:00', 2014),
(2, 2, '2013-01-01 00:00:00', '2013-12-31 00:00:00', 2013),
(2, 3, '2014-01-01 00:00:00', '2014-12-31 00:00:00', 2014),
(2, 3, '2015-01-01 00:00:00', '2015-12-31 00:00:00', 2015),
(3, 2, '2014-01-01 00:00:00', '2014-12-31 00:00:00', 2014),
(3, 3, '2015-01-01 00:00:00', '2015-12-31 00:00:00', 2015),
(4, 2, '2014-01-01 00:00:00', '2014-12-31 00:00:00', 2014),
(5, 2, '2013-01-01 00:00:00', '2013-12-31 00:00:00', 2013),
(5, 3, '2014-01-01 00:00:00', '2014-12-31 00:00:00', 2014),
(5, 3, '2015-01-01 00:00:00', '2015-12-31 00:00:00', 2015),
(6, 2, '2013-01-01 00:00:00', '2013-12-31 00:00:00', 2012),
(6, 3, '2014-01-01 00:00:00', '2014-12-31 00:00:00', 2013),
(6, 5, '2015-01-01 00:00:00', '2015-12-31 00:00:00', 2014)
;
Query to generate results:
declare #DateFilter as datetime = GETDATE()
declare #Status as int = 3
SELECT *
FROM FinancialYearContracts
WHERE #DateFilter BETWEEN dStart AND dEnd AND StatusID = #Status
Results:
| CONTRACTID | STATUSID | DSTART | DEND | YEAR |
|------------|----------|--------------------------------|---------------------------------|------|
| 2 | 3 | January, 01 2015 00:00:00+0000 | December, 31 2015 00:00:00+0000 | 2015 |
| 3 | 3 | January, 01 2015 00:00:00+0000 | December, 31 2015 00:00:00+0000 | 2015 |
| 5 | 3 | January, 01 2015 00:00:00+0000 | December, 31 2015 00:00:00+0000 | 2015 |
This shows contracts that are currently in the approved status based on the sample data I put together.

postgresql timeclock entry with no matching pair

I have a table:
CREATE TABLE timeclock(
employeeid INT
, logdate DATE
, logtime TIME
, timetype VARCHAR(1)
);
INSERT INTO test VALUES
(1, '2013-01-01', '07:00', 'I'),
(1, '2013-01-01', '07:01', 'I'),
(1, '2013-01-01', '16:00', 'O'),
(1, '2013-01-01', '16:01', 'O'),
(2, '2013-01-01', '07:00', 'I'),
(2, '2013-01-01', '16:00', 'O'),
(1, '2013-01-02', '07:00', 'I'),
(1, '2013-01-02', '16:30', 'O'),
(2, '2013-01-02', '06:30', 'I'),
(2, '2013-01-02', '15:30', 'O'),
(2, '2013-01-02', '16:30', 'I'),
(2, '2013-01-02', '23:30', 'O'),
(3, '2013-01-01', '06:30', 'I'),
(3, '2013-01-02', '16:30', 'O'),
(4, '2013-01-01', '20:30', 'I'),
(4, '2013-01-02', '05:30', 'O'),
(5, '2013-01-01', '20:30', 'O'),
(5, '2013-01-02', '05:30', 'I');
I need to get the the time IN and OUT of each employee, disregarding duplicate entries
and identifying orphan entries (without a matching IN or OUT) so that I can put it in a separate list for notification of missing entries.
so far I have this sql that I modified which I got from Peter Larsson's Island and Gaps solution (link) :
WITH cteIslands ( employeeid, timetype, logdate, logtime, grp)
AS ( SELECT employeeid, timetype, logdate, logtime,
ROW_NUMBER()
OVER ( ORDER BY employeeid, logdate, logtime )
- ROW_NUMBER()
OVER ( ORDER BY timetype, employeeid,
logdate, logtime ) AS grp
FROM timeclock
),
cteGrouped ( employeeid, timetype, logdate, logtime )
AS ( SELECT employeeid, MIN(timetype), logdate,
CASE WHEN MIN(timetype) = 'I'
THEN MIN(logtime)
ELSE MAX(logtime)
END AS logtime
FROM cteIslands
GROUP BY employeeid, logdate, grp
)
select * from cteIslands
order by employeeid, logdate, logtime
The above works fine in satisfying the removal of duplicate entries but now I cant seem to get the orphan entries. I think LEAD or LAG can be used on this but I am new with postgresql. I hope someone here can help me on this.
Edit:
I somehow need to add a new field that I can use so that I know which records are orphaned.
somethine like the table below:
EMPID TYPE LOGDATE LOGTIME ORPHAN_FLAG
1 I 2013-01-01 07:00:00 0
1 O 2013-01-01 16:01:00 0
1 I 2013-01-02 07:00:00 0
1 O 2013-01-02 16:30:00 0
2 I 2013-01-01 07:00:00 0
2 O 2013-01-01 16:00:00 0
2 I 2013-01-02 06:30:00 0
2 O 2013-01-02 15:30:00 0
2 I 2013-01-02 16:30:00 0
2 O 2013-01-02 23:30:00 0
3 I 2013-01-01 06:30:00 0
3 O 2013-01-02 16:30:00 0
4 I 2013-01-01 20:30:00 0
4 O 2013-01-02 05:30:00 0
5 O 2013-01-01 20:30:00 1 <--- NO MATCHING IN
5 I 2013-01-02 05:30:00 1 <--- NO MATCHING OUT
First, I think you should rethink your design a little bit. It makes little sense to record a clock-out entry without having clocked in, and you can use things like partial indexes to ensure that clocked in entries are easy to look up when they don't have a clock out entry.
So I would start by considering moving your storage tables to something like:
CREATE TABLE timeclock(
employeeid INT
, logdate DATE
, logintime TIME
, logouttime time
, timetype VARCHAR(1)
);
The bad news is that if you can't do that, your orphanned report will be quite difficult to make perform well because you are doing a self join where you hope every row in a large table will have a corresponding other entry. This is going to require, at best, two sequential scans on the table and at worst a sequential scan with a nested loop index scan (assuming proper indexes, the alternative, a nested loop sequential scan would be even worse).
Handling this where you have rollover between dates (clock in at 11pm, clock out at 2am) will make this problem very hard to avoid.
Now since you have your CTE's working fine except for orphanned records, my recommendation is that union with another query on the same table looking for those which are not found properly in your current query.