SQL logic to achieve the below mentioned scenario - sql

I would like to know about the SQL logic to achieve the below scenario.
From the source I need to load the data to target as described below.
Source
ID Name Place Date
1 User 1 Chennai 01-Jun-22
1 User 1 Chennai 02-Jun-22
2 User 2 Bangalore 03-Jun-22
2 User 2 Bangalore 04-Jun-22
1 User 1 Bangalore 05-Jun-22
1 User 1 Bangalore 06-Jun-22
1 User 1 Bangalore 07-Jun-22
1 User 1 Chennai 08-Jun-22
Target
ID Name Place From Date To Date
1 User 1 Chennai 01-Jun-22 02-Jun-22
2 User 2 Bangalore 03-Jun-22 04-Jun-22
1 User 1 Bangalore 05-Jun-22 07-Jun-22
1 User 1 Chennai 08-Jun-22 08-Jun-22

Solution for your problem:
WITH CT1 AS
(
SELECT ID, Name, Place, "Date",
CASE WHEN CONCAT(ID,Place) != LAG(CONCAT(ID,Place),1,'0') OVER(ORDER BY "Date") THEN 1 ELSE 0END as t
FROM Table1
),
CT2 AS
(
SELECT ID, Name, Place, "Date",
SUM(t) OVER(ORDER BY "Date") as grp
FROM CT1
)
SELECT ID, Name, Place,
MIN("Date") as From_Date,
MAX("Date") as To_Date
FROM CT2
GROUP BY ID, Name, Place,grp
ORDER BY From_Date;
Working Example : db<>fiddle Link

CREATE TABLE #Temp([ID] INT,[Name] VARCHAR(100),[Place] VARCHAR(100),[Date] DATETIME)
INSERT INTO #Temp([ID],[Name],[Place],[Date]) VALUES('1','User1','Chennai','01-06-2022')
INSERT INTO #Temp([ID],[Name],[Place],[Date]) VALUES('1','User1','Chennai','02-06-2022')
INSERT INTO #Temp([ID],[Name],[Place],[Date]) VALUES('2','User2','Bangalore','03-06-2022')
INSERT INTO #Temp([ID],[Name],[Place],[Date]) VALUES('2','User2','Bangalore','04-06-2022')
INSERT INTO #Temp([ID],[Name],[Place],[Date]) VALUES('1','User1','Bangalore','05-06-2022')
INSERT INTO #Temp([ID],[Name],[Place],[Date]) VALUES('1','User1','Bangalore','06-06-2022')
INSERT INTO #Temp([ID],[Name],[Place],[Date]) VALUES('1','User1','Bangalore','07-06-2022')
INSERT INTO #Temp([ID],[Name],[Place],[Date]) VALUES('1','User1','Chennai','08-06-2022')
;WITH A AS(
SELECT
ROW_NUMBER() OVER(ORDER BY [Date]) [Rono],
*,
LEAD([Name]) OVER(ORDER BY [Date]) LeadName,
LEAD([Place]) OVER(ORDER BY [Date]) LeadPlace,
LAG([Name]) OVER(ORDER BY [Date]) LagName,
LAG([Place]) OVER(ORDER BY [Date]) LagPlace,
CASE WHEN LEAD([Name]) OVER(ORDER BY [Date])=[Name] AND LEAD([Place]) OVER(ORDER BY [Date])=[Place] THEN 1 ELSE 0 END F1,
CASE WHEN LAG([Name]) OVER(ORDER BY [Date])=[Name] AND LAG([Place]) OVER(ORDER BY [Date])=[Place] THEN 1 ELSE 0 END F2
FROM #Temp
),
B AS(
SELECT *,
CASE WHEN (A.F1=1 AND A.F2=0) OR (A.F1=0 AND A.F2=0) THEN LEAD([Rono]) OVER(ORDER BY [Date]) WHEN (A.F1=1 AND A.F2=1) THEN NULL ELSE 0 END [FF]
FROM A
WHERE A.F1+A.F2!=2
)
SELECT
B.[ID],B.[Name],B.[Place],
B.[Date] [StrtDate],
ISNULL(AB.[Date],B.[Date]) [EndDate]
FROM B
LEFT JOIN B AB ON B.FF=AB.Rono
WHERE B.FF!=0 OR B.FF IS NULL

Related

ROW_Number with Custom Group

I am trying to have row_number based on custom grouping but I am not able to produce it.
Below is my Query
CREATE TABLE mytbl (wid INT, id INT)
INSERT INTO mytbl Values(1,1),(2,1),(3,0),(4,2),(5,3)
Current Output
wid id
1 1
2 1
3 0
4 2
5 3
Query
SELECT *, RANK() OVER(PARTITION BY wid, CASE WHEN id = 0 THEN 0 ELSE 1 END ORDER BY ID)
FROM mytbl
I would like to rank the rows based on custom condition like if ID is 0 then I have start new group until I have non 0 ID.
Expected Output
wid id RN
1 1 1
2 1 1
3 0 1
4 2 2
5 3 2
Guessing here, as we don't have much clarification, but perhaps this:
SELECT wid,
id,
COUNT(CASE id WHEN 0 THEN 1 END) OVER (ORDER BY wid ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) +1 AS [Rank]
FROM mytbl ;
If I understand you correctly, you may use the next approach. Note, that you need to have an ordering column (I assume this is wid column):
Statement:
;WITH ChangesCTE AS (
SELECT
*,
CASE WHEN LAG(id) OVER (ORDER BY wid) = 0 THEN 1 ELSE 0 END AS ChangeIndex
FROM mytbl
), GroupsCTE AS (
SELECT
*,
SUM(ChangeIndex) OVER (ORDER BY wid) AS GroupIndex
FROM ChangesCTE
)
SELECT
wid,
id,
DENSE_RANK() OVER (ORDER BY GroupIndex) AS Rank
FROM GroupsCTE
Result:
wid id Rank
1 1 1
2 1 1
3 0 1
4 2 2
5 3 2
without much clarification on the logic required, my understanding is you want to increase the Rank by 1 whenever id = 0
select wid, id,
[Rank] = sum(case when id = 0 then 1 else 0 end) over(order by wid)
+ case when id <> 0 then 1 else 0 end
from mytbl
Try this,
CREATE TABLE #mytbl (wid INT, id INT)
INSERT INTO #mytbl Values(1,1),(2,1),(3,0)
,(4,2),(5,3),(6,0),(7,4),(8,5),(9,6)
;with CTE as
(
select *,ROW_NUMBER()over(order by wid)rn
from #mytbl where id=0
)
,CTE1 as
(
select max(rn)+1 ExtraRN from CTE
)
select a.* ,isnull(ca.rn,ca1.ExtraRN) from #mytbl a
outer apply(select top 1 * from CTE b
where a.wid<=b.wid )ca
cross apply(select ExtraRN from CTE1)ca1
drop table #mytbl
Here both OUTER APPLY and CROSS APPLY will not increase cardianility estimate.It will always return only one rows.

Picking up latest 2 records from table in hive

Team, I have a scenario here.
I need to pick 2 latest record through Hql.
I have tried rownumber but does not seems to be getting expected out put
Select
A.emp_ref_i,
A.last_updt_d,
A.start_date,
case when A.Last_updt_d=max(A.Last_updt_d) over (partition by A.emp_ref_i)
and A.start_date=max(a.start_date) over (partition by A.emp_ref_i)
then 'Y' else 'N' end as Valid_f,
a.CHANGE
from
(
select
distinct(emp_ref_i),
last_updt_d,
start_date,
CHANGE
from
PR) A
Currently getting output as
EMP_REF_I LAST_UPDT_D start_date Valid_f CHANGE
1 123 3/29/2020 2/3/2019 Y CHG3
2 123 3/30/2019 2/4/2018 N CHG2
3 123 3/29/2019 2/4/2018 N CHG1
but required:
EMP_REF_I LAST_UPDT_D start_date Valid_f CHANGE
1 123 3/29/2020 2/3/2019 Y CHG3
2 123 3/30/2019 2/4/2018 N CHG2
Use row_number and filter:
select s.emp_ref_i,
s.last_updt_d,
s.start_date,
case when rn=1 then 'Y' else 'N' end Valid_f,
s.change
from
(
Select
A.*,
row_number() over(partition by A.emp_ref_i order by a.Last_updt_d desc, a.start_date desc) rn
from (...) A
)s
where rn<=2;

Get consecutive days with condition

There is a table with three columns:
CREATE TABLE #t1 ( Id INT
,VisitDate DATE
,Counter INT)
AND test data:
INSERT INTO #t1 VALUES (1,'2019-01-01', 50)
INSERT INTO #t1 VALUES (2,'2019-01-02', 15)
INSERT INTO #t1 VALUES (3,'2019-01-03', 7)
INSERT INTO #t1 VALUES (4,'2019-01-04', 7)
INSERT INTO #t1 VALUES (5,'2019-01-05', 18)
INSERT INTO #t1 VALUES (6,'2019-01-06', 19)
INSERT INTO #t1 VALUES (7,'2019-01-07', 11)
INSERT INTO #t1 VALUES (8,'2019-01-08', 1)
INSERT INTO #t1 VALUES (9,'2019-01-09', 19)
Need to find three and more consecutive days where Counter more or equal ten:
Id VisitDate Counter
5 2019-01-05 18
6 2019-01-06 19
7 2019-01-07 11
My SELECT statement is
;WITH cte AS
(
SELECT *
,IIF(Counter > 10, 1,0) AS MoreThanTen
FROM #t1
), lag_lead_cte AS
(
SELECT *
,LAG(MoreThanTen) OVER (ORDER BY VisitDate) AS LagShift
,(LAG(MoreThanTen) OVER (ORDER BY VisitDate) + MoreThanTen ) AS LagMoreThanTen
,LEAD(MoreThanTen) OVER (ORDER BY VisitDate) AS LeadShift
,(LEAD(MoreThanTen) OVER (ORDER BY VisitDate) + MoreThanTen ) AS LeadMoreThanTen
FROM cte
)
SELECT *
FROM lag_lead_cte
WHERE LagMoreThanTen = 2 OR LeadMoreThanTen = 2
But the result is not fully consistent
Id VisitDate Counter
1 2019-01-01 50
2 2019-01-02 15
5 2019-01-05 18
6 2019-01-06 19
7 2019-01-07 11
It looks like a gaps-and-islands problem.
Here is one way to do it.
I'm assuming SQL Server based on the T-SQL tag.
Run this query CTE-by-CTE and examine intermediate results to understand how it works.
Query
WITH
CTE_rn
AS
(
SELECT *
,CASE WHEN Counter>10 THEN 1 ELSE 0 END AS MoreThanTen
,ROW_NUMBER() OVER (ORDER BY VisitDate) AS rn1
,ROW_NUMBER() OVER (PARTITION BY CASE WHEN Counter>10 THEN 1 ELSE 0 END ORDER BY VisitDate) AS rn2
FROM #t1
)
,CTE_Groups
AS
(
SELECT
*
,rn1-rn2 AS Diff
,COUNT(*) OVER (PARTITION BY MoreThanTen, rn1-rn2) AS GroupLength
FROM CTE_rn
)
SELECT
ID
,VisitDate
,Counter
FROM CTE_Groups
WHERE
GroupLength >= 3
AND Counter > 10
ORDER BY VisitDate
;
Result
+----+------------+---------+
| ID | VisitDate | Counter |
+----+------------+---------+
| 5 | 2019-01-05 | 18 |
| 6 | 2019-01-06 | 19 |
| 7 | 2019-01-07 | 11 |
+----+------------+---------+
Try this:
select Id, VisitDate, Counter from (
select Id, VisitDate, Counter, count(*) over (partition by grp) cnt from (
select *,
-- here I used difference between row number and day to group consecutive days
row_number() over (order by visitDate) - day(visitDate) grp
from #t1
where [Counter] > 10
) a
) a where cnt >= 3 --where group count is greater or equal to three
Based on the comment that days does not need to be consecutive, just rows have to be consecutive, here is updated query, which uses similair technique:
select id, visitdate, counter from (
select id, visitdate, counter, count(*) over (partition by grp) cnt from (
select *, rn - row_number() over (order by visitDate) grp from (
select *,
case when (Counter > 10) or (lag(Counter) over (order by visitDate) > 10 and Counter > 10) then
row_number() over (order by visitdate) end rn
from #t1
) a where rn is not null
) a
) a where cnt >= 3
I think this might be most simply handled by just looking at the sequences using lead() and lag():
select id, visitdate, counter
from (select t1.*,
lag(counter, 2) over (order by visitdate) as counter_2p,
lag(counter, 1) over (order by visitdate) as counter_1p,
lead(counter, 1) over (order by visitdate) as counter_1l,
lead(counter, 2) over (order by visitdate) as counter_2l
from t1
) t1
where counter >= 10 and
((counter_2p >= 10 and counter_1p >= 10) or
(counter_1p >= 10 and counter_1l >= 10) or
(counter_1l >= 10 and counter_2l >= 10)
);
Cross apply also works for this Question
with result as (
select
t.Id as Id1,t.VisitDate as VisitDate1,t.Counter as Counter1
,tt.Id as Id2,tt.VisitDate as VisitDate2,tt.Counter as Counter2
from #t1 t cross join #t1 tt where DATEDIFF(Day,t.VisitDate,tt.visitDate)=1
and t.Counter>10 and tt.Counter>10
)
select Id1 as Id,VisitDate1 as VisitDate ,Counter1 as [Counter] from result
union
select Id2 as Id,VisitDate2 as VisitDate,Counter2 as [Counter] from result

Create episode for each value with new Begin and End Dates

This is in reference to below Question
Loop through each value to the seq num
But now Client want to see the data differently and started a new thread for this question.
below is the requirement.
This is the data .
ID seqNum DOS Service End Date
1 1 1/1/2017 1/15/2017
1 2 1/16/2017 1/16/2017
1 3 1/17/2017 1/21/2017
1 4 1/22/2017 2/13/2017
1 5 2/14/2017 3/21/2017
1 6 2/16/2017 3/21/2017
Expected outPut:
ID SeqNum DOSBeg DOSEnd
1 1 1/1/2017 1/30/2017
1 2 1/31/2017 3/1/2017
1 3 3/2/2017 3/31/2017
For each DOSBeg, add 29 and that is DOSEnd. then Add 1 to DOSEnd (1/31/2017) is new DOSBeg.
Now add 29 to (1/31/2017) and that is 3/1/2017 which is DOSEnd . Repeat this untill DOSend >=Max End Date i.e 3/21/2017.
Basically, we need episode of 29 days for each ID.
I tried with this code and it is giving me duplicates.
with cte as (
select ID, minDate as DOSBeg,dateadd(day,29,mindate) as DOSEnd
from #temp
union all
select ID,dateadd(day,1,DOSEnd) as DOSBeg,dateadd(day,29,dateadd(day,1,DOSEnd)) as DOSEnd
from cte
)
select ID,DOSBeg,DOSEnd
from cte
OPTION (MAXRECURSION 0)
Here mindate is Minimum DOS for this ID i.e. 1/1/2017
I came up with below logic and this is working fine for me. Is there any better way than this ?
declare #table table (id int, seqNum int identity(1,1), DOS date, ServiceEndDate date)
insert into #table
values
(1,'20170101','20170115'),
(1,'20170116','20170116'),
(1,'20170117','20170121'),
(1,'20170122','20170213'),
(1,'20170214','20170321'),
(1,'20170216','20170321'),
(2,'20170101','20170103'),
(2,'20170104','20170118')
select * into #temp from #table
--drop table #data
select distinct ID, cast(min(DOS) over (partition by ID) as date) as minDate
,row_Number() over (partition by ID order by ID, DOS) as SeqNum,
DOS,
max(ServiceEndDate) over (partition by ID)as maxDate
into #data
from #temp
--drop table #StartDateLogic
with cte as
(select ID,mindate as startdate,maxdate
from #data
union all
select ID,dateadd(day,30,startdate) as startdate,maxdate
from cte
where maxdate >= dateadd(day,30,startdate))
select distinct ID,startdate
into #StartDateLogic
from cte
OPTION (MAXRECURSION 0)
--final Result set
select ID
,ROW_NUMBER() over (Partition by ID order by ID,StartDate) as SeqNum
,StartDate
,dateadd(day,29,startdate) as EndDate
from #StartDateLogic
You were on the right track wit the recursive cte, but you forgot the anchor.
declare #table table (id int, seqNum int identity(1,1), DOS date, ServiceEndDate date)
insert into #table
values
(1,'20170101','20170115'),
(1,'20170116','20170116'),
(1,'20170117','20170121'),
(1,'20170122','20170213'),
(1,'20170214','20170321'),
(1,'20170216','20170321'),
(2,'20170101','20170103'),
(2,'20170104','20170118')
;with dates as(
select top 1 with ties id, seqnum, DOSBeg = DOS, DOSEnd = dateadd(day,29,DOS)
from #table
order by row_number() over (partition by id order by seqnum)
union all
select t.id, t.seqNum, DOSBeg = dateadd(day,1,d.DOSEnd), DOSEnd = dateadd(day,29,dateadd(day,1,d.DOSEnd))
from dates d
inner join #table t on
d.id = t.id and t.seqNum = d.seqNum + 1
)
select *
from dates d
where d.DOSEnd <= (select max(dateadd(month,1,ServiceEndDate)) from #table where id = d.id)
order by id, seqNum

SQL Server Row Partition on the basis of Data Combination

I have data as below:
Create table #PP
(
MM int,
PP Int,
DT date
)
insert into #PP values(1,1,'2016-01-01')
insert into #PP values(1,1,'2016-02-01')
insert into #PP values(1,1,'2016-03-01')
insert into #PP values(1,1,'2016-04-01')
insert into #PP values(1,2,'2016-05-01')
insert into #PP values(1,2,'2016-06-01')
insert into #PP values(1,2,'2016-07-01')
insert into #PP values(1,2,'2016-08-01')
insert into #PP values(1,1,'2016-09-01')
insert into #PP values(1,1,'2016-10-01')
insert into #PP values(1,1,'2016-11-01')
insert into #PP values(1,1,'2016-12-01')
select * from #PP
My Data and What I am looking for
MM PP DT Sr NO
1 1 01/01/2016 1
1 1 01/02/2016 2
1 1 01/03/2016 3
1 1 01/04/2016 4
1 2 01/05/2016 1
1 2 01/06/2016 2
1 2 01/07/2016 3
1 2 01/08/2016 4
1 1 01/09/2016 1
1 1 01/10/2016 2
1 1 01/11/2016 3
1 1 01/12/2016 4
I have written the Query, but its not working properly
SELECT MM, PP, DT
, ROW_NUMBER() OVER(
PARTITION BY MM, PP
ORDER BY MM, PP
) SRNO
FROM #PP ORDER BY 1,2,3
My Query result is as below, which is wrong
This is my Query Result
This question is very subtle. It is important to note that the values are repeated in the MM and PP columns, but the row numbers should start again. This is easy enough to fix, using the difference of row numbers:
select mm, pp, dt,
row_number() over (partition by mm, pp, seqnum - seqnum_mp order by dt) as srno
from (select p.*,
row_number() over (partition by mm, pp order by dt) as seqnum_mp,
row_number() over (order by dt) as seqnum
from #pp p
) p;
Here is a SQL Fiddle showing that it works.
I have written, this is working for me
select mm, pp, dt,
row_number() over (partition by mm, pp, seqnum - seqnum_mp order by dt) as srno
from (select p.*,
row_number() over (partition by mm, pp order by dt) as seqnum_mp,
row_number() over (order by mm,dt) as seqnum
from #pp p
) p;