Related
I've been working on an issue for a few days now, and I can't seem to find the right fix. Does anybody have an idea?
Case
We want to create a new a new sequence number whenever an employee has resigned for more than 1 day. We have the delta of the current employment record and the previous, so we can check the sequence. We want to calculate the min(Start) and max(End) of each employment record which isn't separated more than 1 day apart.
Data
Employee
Contract
Unit
Start
End
Delta
John Doe
1
Unit A
2014-01-01
2017-12-31
NULL
John Doe
2
Unit A
2018-02-01
2018-12-31
31
John Doe
3
Unit B
2019-01-01
2020-05-31
1
John Doe
4
Unit A
2020-06-01
NULL
1
With the query it should give back:
Employee
Contract
Unit
Start
End
Delta
Sequence
John Doe
1
Unit A
2014-01-01
2017-12-31
NULL
1
John Doe
2
Unit A
2018-02-01
2018-12-31
31
2
John Doe
3
Unit B
2019-01-01
2020-05-31
1
2
John Doe
4
Unit A
2020-06-01
NULL
1
2
That is because sequence 1 end at 31-12-2017, and a new one starts in February of 2018, so there has been more than 1 day of separation between the records. The following all have a sequence of 2 because it is continuing.
Query
I've tried a few things already with lag() and lead(), but I keep working myself into a corner with the data sample that I have. When I run it on the full set it won't work.
SELECT
Employee,
Start,
End,
DeltaPrevious,
Delta,
DeltaNext,
case
when DeltaPrevious IS NULL AND Delta = 1 then 1
when DeltaPrevious = 1 AND Delta > 1 then min(Contract) OVER (PARTITION BY Employee ORDER BY Contract ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
when DeltaPrevious > 1 AND Delta = 1 then min(Contract) OVER (PARTITION BY Employee ORDER BY Contract ROWS BETWEEN 1 PRECEDING AND CURRENT ROW)
end as Sequence
FROM
Contracts
ORDER BY
Employee, Start ASC
Hope that someone has a great idea.
Thanks,
Basically, you want to use lag() to get the previous date and then do a cumulative sum. This looks like:
select c.*,
sum(case when prev_end >= dateadd(day, -1, start) then 0 else 1
end) over (partition by employee order by start) as ranking
from (select c.*,
lag(end) over (partition by employee order by start) as prev_end
from contracts c
) c;
You mention that you might want to recalculate the new start and end. You would just use the above as a subquery/CTE and aggregate on employee and ranking.
If I understood correctly from the definition of Sequence in your second table, you are more interested in the DeltaNext than in the Delta(Previous). Here an attempt, including the code to create a sample input date with two more employees:
CREATE TABLE #input_table (Employee VARCHAR(255), [Contract] INT, Unit VARCHAR(6), [Start] DATE, [End] DATE)
INSERT INTO #input_table
VALUES
('John Doe', 1, 'Unit A', '2014-01-01', '2017-12-31'),
('John Doe', 2, 'Unit A', '2018-02-01', '2018-12-31'),
('John Doe', 3, 'Unit B', '2019-01-01', '2020-05-31'),
('John Doe', 4, 'Unit A', '2020-06-01', NULL),
('Alice', 1, 'Unit A', '2020-01-01', NULL),
('Bob', 1, 'Unit C', '2020-01-01', '2020-02-20')
First we create the deltas:
SELECT *
, DeltaPrev = DATEDIFF(DAY, LAG([End], 1, NULL) OVER(PARTITION BY Employee
ORDER BY [Start]), [Start]) -- Not relevant (?)
, DeltaNext = DATEDIFF(DAY, [End], LEAD([Start], 1, NULL) OVER(PARTITION BY Employee ORDER BY [Start]))
INTO #cte_delta -- I'll create a CTE at the end
FROM #input_table
Then we define Sequence:
SELECT *
, [Sequence] = CASE WHEN DeltaNext > 1 THEN 1 ELSE 2 END
INTO #cte_sequence
FROM #cte_delta
We then group the same Sequences by assigning a unique ROW_NUMBER for each employee with consecutive/ same Sequences:
SELECT *
, GRP = ROW_NUMBER() OVER(PARTITION BY Employee ORDER BY [Start]) - ROW_NUMBER() OVER(PARTITION BY Employee, [Sequence] ORDER BY [Start])
INTO #cte_grp
FROM #cte_sequence
Finally we calculate the min and max of the contract duration:
SELECT *
, MIN([Start]) OVER(PARTITION BY Employee, GRP) AS ContractStart
, CASE WHEN COUNT(*) OVER(PARTITION BY Employee, GRP) = COUNT([End])
OVER(PARTITION BY Employee, GRP) THEN MAX([End]) OVER(PARTITION BY Employee, GRP) ELSE NULL END AS ContractEnd
FROM cte_grp
The COUNT(*) and COUNT([End]) comparison is necessary or else the ContractEnd would be the max non-NULL value, i.e. 2018-02-01.
The whole code with CTEs here:
WITH cte_delta AS (
SELECT *
, DeltaPrev = DATEDIFF(DAY, LAG([End], 1, NULL) OVER(PARTITION BY Employee ORDER BY [Start]), [Start]) -- Not relevant (?)
, DeltaNext = DATEDIFF(DAY, [End], LEAD([Start], 1, NULL) OVER(PARTITION BY Employee ORDER BY [Start]))
FROM #input_table
)
, cte_sequence AS (
SELECT *
, [Sequence] = CASE WHEN DeltaNext > 1 THEN 1 ELSE 2 END
FROM cte_delta
)
, cte_grp AS (
SELECT *
, GRP = ROW_NUMBER() OVER(PARTITION BY Employee ORDER BY [Start]) - ROW_NUMBER() OVER(PARTITION BY Employee, [Sequence] ORDER BY [Start])
FROM cte_sequence
)
SELECT *
, MIN([Start]) OVER(PARTITION BY Employee, GRP) AS ContractStart
, CASE WHEN COUNT(*) OVER(PARTITION BY Employee, GRP) = COUNT([End]) OVER(PARTITION BY Employee, GRP) THEN MAX([End]) OVER(PARTITION BY Employee, GRP) ELSE NULL END AS ContractEnd
FROM cte_grp
Here the output:
Employee
Contract
Unit
Start
End
DeltaPrev
DeltaNext
Sequence
GRP
ContractStart
ContractEnd
Alice
1
Unit A
2020-01-01
NULL
NULL
NULL
2
0
2020-01-01
NULL
Bob
1
Unit C
2020-01-01
2020-02-20
NULL
NULL
2
0
2020-01-01
2020-02-20
John Doe
1
Unit A
2014-01-01
2017-12-31
NULL
32
1
0
2014-01-01
2017-12-31
John Doe
2
Unit A
2018-02-01
2018-12-31
32
1
2
1
2018-02-01
NULL
John Doe
3
Unit B
2019-01-01
2020-05-31
1
1
2
1
2018-02-01
NULL
John Doe
4
Unit A
2020-06-01
NULL
1
NULL
2
1
2018-02-01
NULL
Feel free to select DISTINCT records according to your needs.
I need to get all dates between DATE_FROM and DATE_TO of every ID of table LEAVE excluding weekends, work suspensions and holidays. Considering this record (ID, DATE_FROM, DATE_TO):
001 04-OCT-2018 09-OCT-2018
002 05-OCT-2018 05-OCT-2018
...
n 01-OCT-2018 05-OCT-2018
I need to get all the dates between those ranges in this format (ID, DAY_TOKEN):
001 04-OCT-2018
001 05-OCT-2018
001 08-OCT-2018
001 09-OCT-2018
002 05-OCT-2018
...
n 01-OCT-2018
n 02-OCT-2018
n 03-OCT-2018
n 04-OCT-2018
n 05-OCT-2018
I am using this query modified from the queries I found:
SELECT ID, a.date_from + rnum - 1 AS day_token
FROM (SELECT a.ID, a.date_from, a.date_to, ROWNUM AS rnum
FROM all_objects, leave a
-- Aside from ALL_OBJECT, I cross join it with my LEAVE table
WHERE ROWNUM <= a.date_to - a.date_from + 1) a
WHERE TO_CHAR (a.date_from + rnum - 1, 'DY') NOT IN ('SAT', 'SUN');
AND NOT EXISTS (SELECT 1
FROM holiday b
WHERE b.schedule = d.date_from + rnum - 1)
AND NOT EXISTS (SELECT 1
FROM suspension c
WHERE c.schedule = d.date_from + rnum - 1)
The problem is that only the first record will expand properly and the other records will not be included in the record set unless the DATE_FROM and DATE_TO is of the same date.
I want to avoid using a PL-SQL function as much as possible, but if it's impossible to achieve the resultset I needed without using a function, please tell me at least the reason why.
Here's how to create as many rows for each ID as there are days between FROM and TO dates, without weekends (Saturdays and Sundays):
SQL> with leave (id, date_from, date_to) as
2 (select '001', date '2018-10-04', date '2018-10-09' from dual union all
3 select '002', date '2018-10-05', date '2018-10-05' from dual union all
4 select '003', date '2018-10-02', date '2018-10-08' from dual
5 ),
6 inter as
7 (select l.id,
8 l.date_from + column_value datum,
9 to_char(l.date_from + column_value, 'day') day
10 from leave l,
11 table(cast(multiset(select level from dual
12 connect by level <= l.date_to - l.date_from + 1
13 ) as sys.odcinumberlist))
14 )
15 select id, datum
16 from inter
17 where to_char(datum, 'dy') not in ('sat', 'sun');
ID DATUM
--- -----------
001 05-oct-2018
001 08-oct-2018
001 09-oct-2018
001 10-oct-2018
003 03-oct-2018
003 04-oct-2018
003 05-oct-2018
003 08-oct-2018
003 09-oct-2018
9 rows selected.
SQL>
As line 18 (and so forth), add additional conditions (remove holidays, suspensions, whatever).
(BTW, I wonder who & why downvoted your question; it is well-formed, shows what you have, your attempt to solve it ... really, a mystery to me).
This question already has an answer here:
SQL: Gaps and Islands, Grouped dates
(1 answer)
Closed 5 years ago.
I have the following dataset:
enter image description here
Here is script for this data:
;with dataset AS (
select 'EMP01' AS EMP_ID,CAST('2018-01-01' AS DATE) AS PERIOD_START,CAST('2018-01-31' AS DATE) AS PERIOD_END,CAST('2018-01-07' AS DATE) AS CUT_DATE
UNION
select 'EMP01' AS EMP_ID,CAST('2018-01-01' AS DATE) AS PERIOD_START,CAST('2018-01-31' AS DATE) AS PERIOD_END,CAST('2018-01-15' AS DATE) AS CUT_DATE
UNION
select 'EMP02' AS EMP_ID,CAST('2018-01-01' AS DATE) AS PERIOD_START,CAST('2018-01-31' AS DATE) AS PERIOD_END,CAST('2018-01-09' AS DATE) AS CUT_DATE
)
select *
from dataset
I need to divide these periods (PERIOD_START and PERIOD_END) by CUT_DATE (exclude cut dates from that periods) The number of cut dates could be any (3,5,8 etc).
Expecting result for the dataset above is:
If your version of SQL Server supports LAG, you can use this.
SELECT EMPLOYEE_ID,
ITEM_TYPE,
MIN(APPLY_DATE) AS STARTDATE,
MAX(APPLY_DATE) AS ENDDATE
FROM
(SELECT T.*,
SUM(CASE WHEN PREV_TYPE=ITEM_TYPE THEN 0 ELSE 1 END)
OVER(PARTITION BY EMPLOYEE_ID ORDER BY APPLY_DATE) AS GRP
FROM (SELECT D.*,
LAG(ITEM_TYPE) OVER(PARTITION BY EMPLOYEE_ID ORDER BY APPLY_DATE) AS PREV_TYPE
FROM DATA D
) T
) T
WHERE ITEM_TYPE IN ('Sickness','Vacation')
GROUP BY EMPLOYEE_ID,ITEM_TYPE,GRP
The logic is to get the previous row's item_type (based on ascending order of apply_date) and compare it with the current row's value. If they are equal, they belong to the same group. Else you start a new group. This is done in the sum window function. After groups are assigned, you just need to get the max and min date for an employee_id,item_type.
Sample Demo
You would use the LAG function.
If you order by something, the LAG function gives the previous value;
a full description can be found at: http://www.sqlservercentral.com/articles/T-SQL/106783/
Take a look at vkp's answer for a full query
This is another way if way if lag is supported.
Rextester Sample
with tbl as
(select d.*
,case when (item_type = lag(item_type) over (partition by employee_id order by apply_date))
then 0
else 1
end grp_tmp
from DATA2 d
where
item_type <> 'Worked'
)
,tbl2 as
(select t.*
,sum(grp_tmp) over (order by employee_id,apply_date
rows between unbounded preceding and current row
)
as grp
from tbl t
)
select
EMPLOYEE_ID
,ITEM_TYPE
,(CONVERT(VARCHAR(24),min(apply_date),103)
+' - '
+CONVERT(VARCHAR(24),max(apply_date),103)
) as range
from tbl2
group by EMPLOYEE_ID,
ITEM_TYPE
,grp
order by
employee_id
,min(apply_date);
Output
+-------------+-----------+-------------------------+
| EMPLOYEE_ID | ITEM_TYPE | range |
+-------------+-----------+-------------------------+
| 1 | Sickness | 23/05/2017 - 24/05/2017 |
| 1 | Vacation | 26/05/2017 - 29/05/2017 |
| 1 | Sickness | 01/06/2017 - 01/06/2017 |
| 2 | Sickness | 25/05/2017 - 30/05/2017 |
+-------------+-----------+-------------------------+
Currently I have data in a table as shown below:
date id value
1-Jan-13 1 100
2-Jan-13 1 100
3-Jan-13 1 100
4-Jan-13 1 200
5-Jan-13 1 200
6-Jan-13 1 100
7-Jan-13 1 100
I am trying to group the records based on the id and val and version records with startdate and end date .
Desired output:
start date end date id value
1-Jan-13 3-Jan-13 1 100
4-Jan-13 5-Jan-13 1 200
6-Jan-13 7-Jan-13 1 100
I'm not an expert in Teradata but you most likely, since windowing functions are supported (specifically ROW_NUMBER), be able to do something like this
SELECT MIN(date) start_date, MAX(date) end_date, id, value
FROM
(
SELECT date, id, value,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY date) -
ROW_NUMBER() OVER (PARTITION BY id, value ORDER BY date) island
FROM table1
) q
GROUP BY id, value, island
ORDER BY start_date, end_date
Sample output:
| START_DATE | END_DATE | ID | VALUE |
|------------|------------|----|-------|
| 2013-01-01 | 2013-01-03 | 1 | 100 |
| 2013-01-04 | 2013-01-05 | 1 | 200 |
| 2013-01-06 | 2013-01-07 | 1 | 100 |
Here is SQLFiddle demo (It's a SQL Server demo, but should work as expected in Teradata)
The ROW_NUMBER version can be further simplified: modified SQL Fiddle
For Teradata:
SELECT
id,val,MIN(dt),MAX(dt)
FROM
(
SELECT
id,val,dt,
dt - ROW_NUMBER() OVER (PARTITION BY id ORDER BY val, dt) AS dummy
FROM table1
) AS dt
GROUP BY 1,2,dummy
And there are some hardly known functions in TD13.10 for processing time series data:
WITH cte(id,val,pd) AS
(
SELECT id, val, PERIOD(dt, dt+1) AS pd
FROM table1
)
SELECT
id, val,
BEGIN(pd) AS start_dt,
LAST(pd) AS end_dt
FROM
TABLE (TD_NORMALIZE_MEET
(NEW VARIANT_TYPE(cte.id,cte.val)
,cte.pd)
RETURNS (id INTEGER
,val INTEGER
,pd PERIOD(DATE)
,Nrm_Count INTEGER)
HASH BY id
LOCAL ORDER BY id, val, pd
) A
ORDER BY start_dt, end_dt
I've got a bit of a messy table on my hands that has two fields, a date field and a time field that are both strings. What I need to do is get the minimum date from those fields, or just the record itself if there is no date/time attached to it. Here's some sample data:
ID First Last Date Time
1 Joe Smith 2013-09-06 04:00
1 Joe Smith 2013-09-06 02:00
2 Jack Jones
3 John Jack 2013-09-05 06:00
3 John Jack 2013-09-15 15:00
What I would want from a query is to get the following:
ID First Last Date Time
1 Joe Smith 2013-09-06 02:00
2 Jack Jones
3 John Jack 2013-09-05 06:00
The min date/time for ID 1 and 3 and then just ID 2 back because he doesn't have a date/time. I cam up with the following query that gives me ID's 1 and 3 exactly as I would want them:
SELECT *
FROM test as t
where
cast(t.date + ' ' + t.time as Datetime ) = (select top 1 cast(p.date + ' ' + p.time as Datetime ) as dtime from test as p where t.ID = p.ID order by dtime)
But it doesn't return row number 2 at all. I imagine there's a better way to go about doing this. Any ideas?
You can do this with row_number():
select ID, First, Last, Date, Time
from (select t.*,
row_number() over (partition by id order by date, time) as seqnum
from test t
) t
where seqnum = 1;
Although storing dates and times as strings is not recommended, you at least do it right. The values use the ISO standard format (or close enough) so alphabetic sorting is the same as date/time sorting.
Assuming [Date] and [Time] are the types I think they are, and not strings:
SELECT ID,[First],[Last],[Date],[Time] FROM
(
SELECT ID,[First],[Last],[Date],[Time],rn = ROW_NUMBER()
OVER (PARTITION BY ID ORDER BY [Date], [Time])
FROM dbo.test
) AS t WHERE rn = 1;
Example:
DECLARE #x TABLE
(
ID INT,
[First] VARCHAR(32),
[Last] VARCHAR(32),
[Date] DATE,
[Time] TIME(0)
);
INSERT #x VALUES
(1,'Joe ','Smith','2013-09-06','04:00'),
(1,'Joe ','Smith','2013-09-06','02:00'),
(2,'Jack','Jones',NULL, NULL ),
(3,'John','Jack ','2013-09-05','06:00'),
(3,'John','Jack ','2013-09-15','15:00');
SELECT ID,[First],[Last],[Date],[Time] FROM
(
SELECT ID, [First],[Last],[Date],[Time],rn = ROW_NUMBER()
OVER (PARTITION BY ID ORDER BY [Date], [Time])
FROM #x
) AS x WHERE rn = 1;
Results:
ID First Last Date Time
-- ----- ----- ---------- --------
1 Joe Smith 2013-09-06 02:00:00
2 Jack Jones NULL NULL
3 John Jack 2013-09-05 06:00:00
Try:
SELECT
*
FROM
test as t
WHERE
CAST(t.date + ' ' + t.time as Datetime) =
(
select top 1 cast(p.date + ' ' + p.time as Datetime ) as dtime
from test as p
where t.ID = p.ID
order by dtime
)
OR (t.date='' AND t.time='')