Pick right Effective Dates with backdated and future dates scenario - sql

Scenario 1:
Table as,
IF OBJECT_ID('TEMPDB..#RUN_ID') IS NOT NULL
DROP TABLE #RUN_ID
;WITH RUN_ID as (
SELECT 1 AS RUN_ID,1 AS EMP_ID, '1/1/2018' STARTDT, 'A' AS VALUE
UNION
SELECT 2 AS RUN_ID,1 AS EMP_ID, '2/1/2018' STARTDT, 'A' AS VALUE
UNION
SELECT 3 AS RUN_ID,1 AS EMP_ID, '12/1/2017' STARTDT, 'A' AS VALUE
UNION
SELECT 4 AS RUN_ID,1 AS EMP_ID, '3/1/2018' STARTDT, 'A' AS VALUE
UNION
SELECT 5 AS RUN_ID,1 AS EMP_ID, '2/1/2018' STARTDT, 'A' AS VALUE
)
SELECT * INTO #RUN_ID from RUN_ID
RUN_ID EMP_ID STARTDT VALUE
1 1 1/1/2018 A
2 1 2/1/2018 A
3 1 12/1/2017 A
4 1 3/1/2018 A
5 1 2/1/2018 A
RUN_ID is every day incremental value in the table. VALUE Column can be same or different.
Need to derive the result for the STARTDT's as below:
RUN_ID EMP_ID STARTDT VALUE
3 1 12/1/2017 A
5 1 2/1/2018 A
Note: The last records of RUN ID 5 is over writing all other records, where StartDt of 2/1/2018 record to be in target and RUN ID 3 should be in result, as its over writing the previous RUN ID StartDT
Scenario2:
RUN_ID EMP_ID STARTDT VALUE
1 1 1/1/2018 A
2 1 11/1/2017 A
3 1 12/1/2017 A
4 1 3/1/2018 A
5 1 2/1/2018 A
In this case, result should be
RUN_ID EMP_ID STARTDT VALUE
2 1 11/1/2017 A
3 1 12/1/2017 A
5 1 2/1/2018 A

IF OBJECT_ID('TEMPDB..#RUN_ID') IS NOT NULL
DROP TABLE #RUN_ID
;WITH RUN_ID as (
SELECT 1 AS RUN_ID,1 AS EMP_ID, CAST('1/1/2018' AS DATE) STARTDT, 'A' AS VALUE
UNION
SELECT 2 AS RUN_ID,1 AS EMP_ID, CAST('11/1/2017' AS DATE) STARTDT, 'A' AS VALUE
UNION
SELECT 3 AS RUN_ID,1 AS EMP_ID, CAST('12/1/2017' AS DATE) STARTDT, 'A' AS VALUE
UNION
SELECT 4 AS RUN_ID,1 AS EMP_ID, CAST('3/1/2018' AS DATE) STARTDT, 'A' AS VALUE
UNION
SELECT 5 AS RUN_ID,1 AS EMP_ID, CAST('2/1/2018' AS DATE) STARTDT, 'A' AS VALUE
)
SELECT * INTO #RUN_ID from RUN_ID
SELECT *
FROM (
SELECT *
,LAG(STARTDT) OVER (PARTITION BY EMP_ID ORDER BY RUN_ID DESC) LAG_DATE
,CASE WHEN LAG(STARTDT) OVER (PARTITION BY EMP_ID ORDER BY RUN_ID DESC) IS NULL THEN 0
WHEN STARTDT < LAG(STARTDT) OVER (PARTITION BY EMP_ID ORDER BY RUN_ID DESC) THEN 0 ELSE 1 END SCD_IND
FROM (
SELECT *
, RANK() OVER (PARTITION BY EMP_ID,STARTDT ORDER BY RUN_ID DESC) RN
FROM #RUN_ID
) A
WHERE A.RN=1
) A WHERE SCD_IND=0
RUN_ID EMP_ID STARTDT VALUE RN LAG_DATE SCD_IND
5 1 2018-02-01 A 1 NULL 0
3 1 2017-12-01 A 1 2018-03-01 0
2 1 2017-11-01 A 1 2017-12-01 0

Related

How can I create a "start" "end" time table from a timestamp list

I am trying to create a view that displays the time of employee stamps.
This is what the table looks like now:
Person
Person_Number
Date
Stamp_number
Time_Stamp
Paul
1
22-10-24
1
8:00
Paul
1
22-10-24
2
10:00
Paul
1
22-10-24
3
10:30
Paul
1
22-10-24
4
12:00
Jimmy
2
22-10-23
1
9:00
Jimmy
2
22-10-23
2
11:00
Jimmy
2
22-10-23
3
12:00
And I would like it to look like this using only a select query
Person
Person_Number
Date
Start
End
Duration
Paul
1
22-10-24
8:00
10:00
2:00
Paul
1
22-10-24
10:30
12:00
1:30
Jimmy
2
22-10-23
9:00
11:00
2:00
Jimmy
1
22-10-23
12:00
null
null
Is it possible ?
We can use conditional aggregation along with a ROW_NUMBER trick:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY Person_Number, Date
ORDER BY Stamp_number) - 1 rn
FROM yourTable
)
SELECT Person, Person_Number, Date,
MAX(CASE WHEN rn % 2 = 0 THEN Time_Stamp END) AS [Start],
MAX(CASE WHEN rn % 2 = 1 THEN Time_Stamp END) AS [End],
DATEDIFF(MINUTE,
MAX(CASE WHEN rn % 2 = 0 THEN Time_Stamp END),
MAX(CASE WHEN rn % 2 = 1 THEN Time_Stamp END)) AS Duration
FROM cte
GROUP BY Person, Person_Number, Date, rn / 2
ORDER BY 2, 4;
Here is a working demo.
Try the following:
SELECT Person, Person_Number, Date, [Start], [End],
CONVERT(TIME(0), CONVERT(DATETIME, [End]) - CONVERT(DATETIME, [Start])) AS Duration
FROM
(
SELECT Person, Person_Number, Date, MIN(Time_Stamp) AS [Start],
CASE
WHEN MAX(Time_Stamp) <> MIN(Time_Stamp)
THEN MAX(Time_Stamp)
END AS [End] /* To select End as null when there is no End for a Start */
FROM table_name
GROUP BY Person, Person_Number, Date, (Stamp_number+1)/2
) T
ORDER BY Person_Number, Date, [Start]
See a demo.

Find max of total days using Partition By [duplicate]

This question already has answers here:
Merge overlapping date intervals
(9 answers)
Closed 4 months ago.
Need SQL Query to get a max of the sum of Days where enddate is equal to startdate
.Below is a table
ID
StartDate
EndDate
Days
121
01-01-2022
01-03-2022
2
121
01-03-2022
01-04-2022
1
121
01-04-2022
01-06-2022
2
121
01-07-2022
01-08-2022
1
121
01-08-2022
01-09-2022
1
In the above table, the 01-01-2022 to 01-06-2022 sum is 5 which is greater than the sum of 2 from 01-07-2022 to 01-09-2022.
Output required
ID
Days
121
5
I have written as per my understanding. correct me if I did worse
with table as (
select 121 id, "12-28-2021" startdate, "12-30-2021" enddate, 2 days
union all
select 121 id, "12-30-2021" startdate, "12-31-2021" enddate, 1 days
union all
select 121 id, "01-01-2022" startdate, "01-03-2022" enddate, 2 days
union all
select 121 id, "01-03-2022" startdate, "01-04-2022" enddate, 1 days
union all
select 121 id, "01-04-2022" startdate, "01-06-2022" enddate, 2 days
union all
select 121 id, "01-07-2022" startdate, "01-08-2022" enddate, 1 days
union all
select 121 id, "01-08-2022" startdate, "01-09-2022" enddate, 1 days
)
select
table3.id,
max(days) days
from
(
select
table2.id,
sum(days) days
from
(
select
table1.*,
case
when sum(x) over(rows between unbounded preceding and 1 preceding) is null then 0
else sum(x) over(rows between unbounded preceding and 1 preceding)
end as y
from
(
select
table.*,
case
when enddate = lead(startdate) over(order by startdate) then 0
else 1
end as x
from table
) table1
) table2
group by id,y
) table3
group by id

BigQuery - Picking latest not null value within 28 interval

I'm trying to add a column on this table and stuck for a little while
ID
Category 1
Date
Data1
A
1
2022-05-30
21
B
2
2022-05-21
15
A
2
2022-05-02
33
A
1
2022-02-11
3
B
2
2022-05-01
19
A
1
2022-05-15
null
A
1
2022-05-20
11
A
2
2022-04-20
22
to
ID
Category 1
Date
Data1
Picked_Data
A
1
2022-05-30
21
11
B
2
2022-05-21
15
19
A
2
2022-05-02
33
22
A
1
2022-02-11
3
some number or null
B
2
2022-05-01
19
some number or null
A
1
2022-05-15
null
some number or null
A
1
2022-05-20
11
some number or null
A
2
2022-04-20
22
some number or null
The logic is to partition by Category1 and ID then pick the latest none null value within the past 28 days. If there is no data exist, it'll be null
For the first row, ID = A and Category 1, it will pick 7th row as they are in the same category, ID and the date difference is <= 28. It skipped row 4th and 6th as the date is too far back and null value.
I've tried querying this by
select first_value(Data1) over (partition bty Category1 order by case when Data1 is not null and Date between Date - Inteverval 28 DAY and Date then 1 else 2) as Picked_Data
but it's picking incorrect rows,my guess is this query
Date between Date - Inteverval 28 DAY and Date
is not picking the correct date.. could anyone give me advise/suggestion how I could twick this query?
Consider below approach
select *,
first_value(data1 ignore nulls) over past_28_days as picked_data
from your_table
window past_28_days as (
partition by id, category_1
order by unix_date(date)
range between 29 preceding and 1 preceding
)
if applied to sample data in your question - output is
Consider below approach:
with sample_data as (
select 'A' as ID, 1 as category_1, date('2022-05-30') as date, 21 as data1,
union all select 'B' as ID, 2 as category_1, date('2022-05-21') as date, 15 as data1,
union all select 'A' as ID, 2 as category_1, date('2022-05-02') as date, 33 as data1,
union all select 'A' as ID, 1 as category_1, date('2022-02-11') as date, 3 as data1,
union all select 'B' as ID, 2 as category_1, date('2022-05-01') as date, 19 as data1,
union all select 'A' as ID, 1 as category_1, date('2022-05-15') as date, NULL as data1,
union all select 'A' as ID, 1 as category_1, date('2022-05-20') as date, 11 as data1,
union all select 'A' as ID, 2 as category_1, date('2022-04-20') as date, 22 as data1,
),
with_next_data as (
select *,
lag(date) over (partition by ID,category_1 order by date) as next_date,
lag(data1) over (partition by ID,category_1 order by date) as next_data,
from sample_data
)
select
id,
category_1,
date,
data1,
if(date_diff(date, next_date,day) <= 28, next_data, null) as picked_data
from with_next_data
Output:

Creating a new RANK based on delta of previous row

I've been working on an issue for a few days now, and I can't seem to find the right fix. Does anybody have an idea?
Case
We want to create a new a new sequence number whenever an employee has resigned for more than 1 day. We have the delta of the current employment record and the previous, so we can check the sequence. We want to calculate the min(Start) and max(End) of each employment record which isn't separated more than 1 day apart.
Data
Employee
Contract
Unit
Start
End
Delta
John Doe
1
Unit A
2014-01-01
2017-12-31
NULL
John Doe
2
Unit A
2018-02-01
2018-12-31
31
John Doe
3
Unit B
2019-01-01
2020-05-31
1
John Doe
4
Unit A
2020-06-01
NULL
1
With the query it should give back:
Employee
Contract
Unit
Start
End
Delta
Sequence
John Doe
1
Unit A
2014-01-01
2017-12-31
NULL
1
John Doe
2
Unit A
2018-02-01
2018-12-31
31
2
John Doe
3
Unit B
2019-01-01
2020-05-31
1
2
John Doe
4
Unit A
2020-06-01
NULL
1
2
That is because sequence 1 end at 31-12-2017, and a new one starts in February of 2018, so there has been more than 1 day of separation between the records. The following all have a sequence of 2 because it is continuing.
Query
I've tried a few things already with lag() and lead(), but I keep working myself into a corner with the data sample that I have. When I run it on the full set it won't work.
SELECT
Employee,
Start,
End,
DeltaPrevious,
Delta,
DeltaNext,
case
when DeltaPrevious IS NULL AND Delta = 1 then 1
when DeltaPrevious = 1 AND Delta > 1 then min(Contract) OVER (PARTITION BY Employee ORDER BY Contract ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
when DeltaPrevious > 1 AND Delta = 1 then min(Contract) OVER (PARTITION BY Employee ORDER BY Contract ROWS BETWEEN 1 PRECEDING AND CURRENT ROW)
end as Sequence
FROM
Contracts
ORDER BY
Employee, Start ASC
Hope that someone has a great idea.
Thanks,
Basically, you want to use lag() to get the previous date and then do a cumulative sum. This looks like:
select c.*,
sum(case when prev_end >= dateadd(day, -1, start) then 0 else 1
end) over (partition by employee order by start) as ranking
from (select c.*,
lag(end) over (partition by employee order by start) as prev_end
from contracts c
) c;
You mention that you might want to recalculate the new start and end. You would just use the above as a subquery/CTE and aggregate on employee and ranking.
If I understood correctly from the definition of Sequence in your second table, you are more interested in the DeltaNext than in the Delta(Previous). Here an attempt, including the code to create a sample input date with two more employees:
CREATE TABLE #input_table (Employee VARCHAR(255), [Contract] INT, Unit VARCHAR(6), [Start] DATE, [End] DATE)
INSERT INTO #input_table
VALUES
('John Doe', 1, 'Unit A', '2014-01-01', '2017-12-31'),
('John Doe', 2, 'Unit A', '2018-02-01', '2018-12-31'),
('John Doe', 3, 'Unit B', '2019-01-01', '2020-05-31'),
('John Doe', 4, 'Unit A', '2020-06-01', NULL),
('Alice', 1, 'Unit A', '2020-01-01', NULL),
('Bob', 1, 'Unit C', '2020-01-01', '2020-02-20')
First we create the deltas:
SELECT *
, DeltaPrev = DATEDIFF(DAY, LAG([End], 1, NULL) OVER(PARTITION BY Employee
ORDER BY [Start]), [Start]) -- Not relevant (?)
, DeltaNext = DATEDIFF(DAY, [End], LEAD([Start], 1, NULL) OVER(PARTITION BY Employee ORDER BY [Start]))
INTO #cte_delta -- I'll create a CTE at the end
FROM #input_table
Then we define Sequence:
SELECT *
, [Sequence] = CASE WHEN DeltaNext > 1 THEN 1 ELSE 2 END
INTO #cte_sequence
FROM #cte_delta
We then group the same Sequences by assigning a unique ROW_NUMBER for each employee with consecutive/ same Sequences:
SELECT *
, GRP = ROW_NUMBER() OVER(PARTITION BY Employee ORDER BY [Start]) - ROW_NUMBER() OVER(PARTITION BY Employee, [Sequence] ORDER BY [Start])
INTO #cte_grp
FROM #cte_sequence
Finally we calculate the min and max of the contract duration:
SELECT *
, MIN([Start]) OVER(PARTITION BY Employee, GRP) AS ContractStart
, CASE WHEN COUNT(*) OVER(PARTITION BY Employee, GRP) = COUNT([End])
OVER(PARTITION BY Employee, GRP) THEN MAX([End]) OVER(PARTITION BY Employee, GRP) ELSE NULL END AS ContractEnd
FROM cte_grp
The COUNT(*) and COUNT([End]) comparison is necessary or else the ContractEnd would be the max non-NULL value, i.e. 2018-02-01.
The whole code with CTEs here:
WITH cte_delta AS (
SELECT *
, DeltaPrev = DATEDIFF(DAY, LAG([End], 1, NULL) OVER(PARTITION BY Employee ORDER BY [Start]), [Start]) -- Not relevant (?)
, DeltaNext = DATEDIFF(DAY, [End], LEAD([Start], 1, NULL) OVER(PARTITION BY Employee ORDER BY [Start]))
FROM #input_table
)
, cte_sequence AS (
SELECT *
, [Sequence] = CASE WHEN DeltaNext > 1 THEN 1 ELSE 2 END
FROM cte_delta
)
, cte_grp AS (
SELECT *
, GRP = ROW_NUMBER() OVER(PARTITION BY Employee ORDER BY [Start]) - ROW_NUMBER() OVER(PARTITION BY Employee, [Sequence] ORDER BY [Start])
FROM cte_sequence
)
SELECT *
, MIN([Start]) OVER(PARTITION BY Employee, GRP) AS ContractStart
, CASE WHEN COUNT(*) OVER(PARTITION BY Employee, GRP) = COUNT([End]) OVER(PARTITION BY Employee, GRP) THEN MAX([End]) OVER(PARTITION BY Employee, GRP) ELSE NULL END AS ContractEnd
FROM cte_grp
Here the output:
Employee
Contract
Unit
Start
End
DeltaPrev
DeltaNext
Sequence
GRP
ContractStart
ContractEnd
Alice
1
Unit A
2020-01-01
NULL
NULL
NULL
2
0
2020-01-01
NULL
Bob
1
Unit C
2020-01-01
2020-02-20
NULL
NULL
2
0
2020-01-01
2020-02-20
John Doe
1
Unit A
2014-01-01
2017-12-31
NULL
32
1
0
2014-01-01
2017-12-31
John Doe
2
Unit A
2018-02-01
2018-12-31
32
1
2
1
2018-02-01
NULL
John Doe
3
Unit B
2019-01-01
2020-05-31
1
1
2
1
2018-02-01
NULL
John Doe
4
Unit A
2020-06-01
NULL
1
NULL
2
1
2018-02-01
NULL
Feel free to select DISTINCT records according to your needs.

Insert the table data based on grouping of two columns

I have a oracle table with the following format,
For eg:
JLID Dcode SID TDT QTY
8295783 3119255 9842 3/5/2018 14
8269771 3119255 9842 3/6/2018 11
8302211 3119255 1126 3/1/2018 19
Here I have different SID for the same Dcode, now I need to get the SID with the maximum Qty. (i.e) for SID 9842 - (14+11)=25, for SID 1126 it is 19, then the results should be on SID 9842. So, our query should returns the following results
JLID Dcode START_DT END_DT SID
111 3119255 3/1/2018 3/31/2018 12:00 9842
Startdate and enddate should be calculated from TDT (i.e) start date is the first date of the month and the end date is the last date of the month
Can anyone please suggest me some ideas to do it.
It might be as simple as this:
SELECT Dcode, start_date, end_date, SID FROM (
SELECT Dcode, SID, TRUNC(start_date, 'MONTH') AS start_date
, LAST_DAY(end_date) AS end_date
, ROW_NUMBER() OVER ( PARTITION BY Dcode ORDER BY total_qty DESC ) AS rn
FROM (
SELECT Dcode, SID, MIN(TDT) AS start_date, MAX(TDT) AS end_date
, SUM(QTY) AS total_qty
FROM mytable
GROUP BY Dcode, SID
)
) WHERE rn = 1
In the inner most subquery I aggregation to get the range of dates and total quantity for particular values of Dcode and SID. Then I use an anaylitic (window) function to get the row for which total quantity is the greatest. (You would want to use RANK() in place of ROW_NUMBER() in the event you want to return more than one value of SID with the same quantity.)
Here's one option which doesn't contain JLID = 111 in the final result as I have no idea where you took it from.
SQL> with test (jlid, dcode, sid, tdt, qty) as
2 (select 8295783, 3119255, 9842, date '2018-03-05', 14 from dual union
3 select 8269771, 3119255, 9842, date '2018-08-22', 11 from dual union
4 select 8302211, 3119255, 1126, date '2018-03-01', 19 from dual union
5 --
6 select 1234567, 1112223, 1000, date '2018-06-16', 88 from dual
7 )
8 select dcode,
9 min (trunc (tdt, 'mm')) start_dt, --> MIN
10 max (last_day (tdt)) end_dt, --> MAX
11 sid
12 from (select dcode,
13 sid,
14 tdt,
15 sqty,
16 rank () over (partition by dcode order by sqty desc) rnk
17 from (select dcode,
18 sid,
19 tdt,
20 sum (qty) over (partition by dcode, sid) sqty
21 from test))
22 where rnk = 1
23 group by dcode, sid; --> GROUP BY
DCODE START_DT END_DT SID
---------- ---------------- ---------------- ----------
1112223 01.06.2018 00:00 30.06.2018 00:00 1000
3119255 01.03.2018 00:00 31.08.2018 00:00 9842
SQL>