Minimum and maximum dates within continuous date range grouped by name - sql

I have a data ranges with start and end date for a persons, I want to get the continuous date ranges only per persons:
Input:
NAME | STARTDATE | END DATE
--------------------------------------
MIKE | **2019-05-15** | 2019-05-16
MIKE | 2019-05-17 | **2019-05-18**
MIKE | 2020-05-18 | 2020-05-19
Expected output like:
MIKE | **2019-05-15** | **2019-05-18**
MIKE | 2020-05-18 | 2020-05-19
So basically output is MIN and MAX for each continuous period for the person.
Appreciate any help.
I have tried the below query:
With N AS ( SELECT Name, StartDate, EndDate
, LastStop = MAX(EndDate)
OVER (PARTITION BY Name ORDER BY StartDate, EndDate
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) FROM Table ), B AS ( SELECT Name, StartDate, EndDate
, Block = SUM(CASE WHEN LastStop Is Null Then 1
WHEN LastStop < StartDate Then 1
ELSE 0
END)
OVER (PARTITION BY Name ORDER BY StartDate, LastStop) FROM N ) SELECT Name
, MIN(StartDate) DateFrom
, MAX(EndDate) DateTo FROM B GROUP BY Name, Block ORDER BY Name, Block
But its not considering the continuous period. It's showing the same input.

This is a type of gap-and-islands problem. There is no need to expand the data out by day! That seems very inefficient.
Instead, determine the "islands". This is where there is no overlap -- in your case lag() is sufficient. Then a cumulative sum and aggregation:
select name, min(startdate), max(enddate)
from (select t.*,
sum(case when prev_enddate >= dateadd(day, -1, startdate) then 0 else 1 end) over
(partition by name order by startdate) as grp
from (select t.*,
lag(enddate) over (partition by name order by startdate) as prev_enddate
from t
) t
) t
group by name, grp;
Here is a db<>fiddle.

Here is an example using an ad-hoc tally table
Example or dbFiddle
;with cte as (
Select A.[Name]
,B.D
,Grp = datediff(day,'1900-01-01',D) - dense_rank() over (partition by [Name] Order by D)
From YourTable A
Cross Apply (
Select Top (DateDiff(DAY,StartDate,EndDate)+1) D=DateAdd(DAY,-1+Row_Number() Over (Order By (Select Null)),StartDate)
From master..spt_values n1,master..spt_values n2
) B
)
Select [Name]
,StartDate= min(D)
,EndDate = max(D)
From cte
Group By [Name],Grp
Returns
Name StartDate EndDate
MIKE 2019-05-15 2019-05-18
MIKE 2020-05-18 2020-05-19
Just to help with the Visualization, the CTE generates the following

This will give you the same result
SELECT subquery.name,min(subquery.startdate),max(subquery.enddate1)
FROM (SELECT NAME,startdate,
CASE WHEN EXISTS(SELECT yt1.startdate
FROM t yt1
WHERE yt1.startdate = DATEADD(day, 1, yt2.enddate)
) THEN null else yt2.enddate END as enddate1
FROM t yt2) as subquery
GROUP by NAME, CAST(MONTH(subquery.startdate) AS VARCHAR(2)) + '-' + CAST(YEAR(subquery.startdate) AS VARCHAR(4))
For the CASE WHEN EXISTS I refered to SQL CASE
For the group by month and year you can see this GROUP BY MONTH AND YEAR
DB_FIDDLE

Related

How do we group continuous date into single date span in SQL?

Here is my data:
id
customercode
startdate
enddate
1
122
20200812
20200814
2
122
20200816
20200817
3
122
20200817
20200819
4
122
20200821
20200822
5
122
20200823
20200824
I tried the following code:
select Customercode, min(startdate) as startdate, max(enddate) as enddate
from (
select Customercode, startdate, enddate
sum(rst) over (order by Customercode, DOS) as grp
from (
select Customercode, startdate, enddate
case when coalesce(lag(enddate) over (partition by Customercode order by Customercode, startdate), startdate) + 1 <> startdate then 1 end rst
from tbl
) t1
) t2
group by grp, Customercode
order by startdate
My result
id
customercode
startdate
enddate
1
122
20200812
20200814
2
122
20200816
20200817
3
122
20200817
20200819
4
122
20200821
20200824
The desired output should be like this. Please share your thoughts.
id
customercode
startdate
enddate
1
122
20200812
20200814
2
122
20200816
20200819
3
122
20200821
20200824
It is unclear if you want to group records whose start date is the same as the previous end date, or one day afterwards.
If you want group on the same date, you would phrase the query as:
select customercode, min(startdate), max(enddate)
from (
select t.*,
sum(case when startdate = lag_enddate then 0 else 1 end)
over(partition by customercode order by startdate) as grp
from (
select t.*,
lag(enddate) over(partition by customercode order by startdate) as lag_enddate
from tbl t
) t
) t
group by customercode, grp
order by min(startdate)
You can also allow both cases at once, by modifying the conditional window sum(). This requires a little date artithmetics, whose syntax varies across databases. In standard SQL:
sum(case when startdate <= lag_enddate + interval '1' day then 0 else 1 end)
over(partition by customercode order by startdate) as grp

SQL - unique users who are visiting for the first time

Given following table visitorLog, write a SQL to find the following by date.
Total_Visitors
VisitorGain - compare to previous day
VisitorLoss - compare to previous day
Total_New_Visitors - unique users who are visiting for the first time
visitorLog :
*----------------------*
| Date Visitor |
*----------------------*
| 01-Jan-2011 V1 |
| 01-Jan-2011 V2 |
| 01-Jan-2011 V3 |
| 02-Jan-2011 V2 |
| 03-Jan-2011 V2 |
| 03-Jan-2011 V4 |
| 03-Jan-2011 V5 |
*----------------------*
Expected output:
*---------------------------------------------------------------------*
| Date Total_Visitors VisitorGain VisitorLoss Total_New_Visitors |
*---------------------------------------------------------------------*
| 01-Jan-2011 3 3 0 3 |
| 02-Jan-2011 1 0 2 0 |
| 03-Jan-2011 3 2 0 2 |
*---------------------------------------------------------------------*
Here is my SQL and SLQ fiddle.
with cte as
(
select
date,
total_visitors,
lag(total_visitors) over (order by date) as prev_visitors,
row_number() over (order by date ) as rnk
from
(
select
*,
count(visitor) over (partition by date) as total_visitors
from visitorLog
) val
group by
date,
total_visitors
),
cte2 as
(
select
date,
sum(case when rnk = 1 then 1 else 0 end) as total_new_visitors
from
(
select
date,
visitor,
row_number() over (partition BY visitor order by date) as rnk
from visitorLog
) t
group by
date
)
select
c.date,
sum(total_visitors) as total_visitors,
sum(
case
when rnk = 1 then total_visitors
when (rnk > 1 and prev_visitors < total_visitors) then (total_visitors - prev_visitors)
else
0
end
)visitorGain,
sum(
case
when rnk = 1 then 0
when prev_visitors > total_visitors then (prev_visitors - total_visitors)
else
0
end
) as visitorLoss,
sum(total_new_visitors) as total_new_visitors
from cte c
join cte2 c2
on c.date = c2.date
group by
c.date
order by
c.date
My solution is working as expected but I am wondering if I am missing any any edge cases here which may break my logic. any help would be great.
This logic does what you want:
select date, count(*) as num_visitor,
greatest(count(*) - lag(count(*)::int, 1, 0) over (order by date), 0) as visitor_gain,
greatest(lag(count(*)::int, 1, 0) over (order by date) - count(*), 0) as visitor_loss,
count(*) filter (where seqnum = 1) as num_new_visitors
from (select vl.*,
row_number() over (partition by visitor order by date) as seqnum
from visitorLog vl
) vl
group by date
order by date
Here is a db<>fiddle.
I would use window functions and aggregation:
select
date,
count(*) no_visitor,
count(*) - lag(count(*), 1, 0) over(partition by date) no_visitor_diff,
count(*) filter(where rn = 1) no_new_visitors
from (
select t.*, row_number() over(partition by visitor order by date) rn
from visitorLog
) t
group by date
order by date
The subquery ranks the visits of each customer using row_number() (the first visit of each customer gets row number 1). Then, the outer query aggregates by date, and uses lag() to get the visitor count of the "previous" day.
I don't really see the point to have two distinct columns for the difference of visitors compared to the last day, so this gives you a single column, with a value that's either positive or negative depending whether customers were gained or lost.
If you really want two columns, then:
greatest(count(*) - lag(count(*), 1, 0) over(partition by date), 0) visitor_gain,
- least(count(*) - lag(count(*), 1, 0) over(partition by date), 0) visitor_loss

Caller whose first and last call was to the same person

I have a phonelog table that has information about callers' call history. I'd like to find out callers whose first and last call was to the same person on a given day.
Callerid Recipientid DateCalled
1 2 2019-01-01 09:00:00.000
1 3 2019-01-01 17:00:00.000
1 4 2019-01-01 23:00:00.000
2 5 2019-07-05 09:00:00.000
2 5 2019-07-05 17:00:00.000
2 3 2019-07-05 23:00:00.000
2 5 2019-07-06 17:00:00.000
2 3 2019-08-01 09:00:00.000
2 3 2019-08-01 17:00:00.000
2 4 2019-08-02 09:00:00.000
2 5 2019-08-02 10:00:00.000
2 4 2019-08-02 11:00:00.000
Expected Output
Callerid Recipientid Datecalled
2 5 2019-07-05
2 3 2019-08-01
2 4 2019-08-02
I wrote the below query but can't get it to return recipientid. Any help on this will be appreciated!
select pl.callerid,cast(pl.datecalled as date) as datecalled
from phonelog pl inner join (select callerid, cast(datecalled as date) as datecalled,
min(datecalled) as firstcall, max(datecalled) as lastcall
from phonelog
group by callerid, cast(datecalled as date)) as x
on pl.callerid = x.callerid and cast(pl.datecalled as date) = x.datecalled
and (pl.datecalled = x.firstcall or pl.datecalled = x.lastcall)
group by pl.callerid, cast(pl.datecalled as date)
having count(distinct recipientid) = 1
Another dbFiddle option
First, my prequery (PQ alias), I am getting for a given client, per day, the min and max time called but also HAVING to make sure person had at least 2 phone calls in a given day. From that, I re-join to the phone log table on the FIRST (MIN) call for the person for the given day. Then I join one more time for the LAST (MAX) call for the same person for the same day and make sure the recipient of the first is same as last.
I do not have to join on the stripped-down "JustDate" column used for the grouping as the MIN/MAX qualifies the FULL date/time.
select
PQ.JustDate,
PQ.CallerID,
pl1.RecipientID
from
( select
callerID,
convert( date, dateCalled ) JustDate,
min( DateCalled ) minDateCall,
max( DateCalled ) maxDateCall
from
PhoneLog pl
group by
callerID,
convert( date, dateCalled )
having
count(*) > 1) PQ
JOIN PhoneLog pl1
on PQ.CallerID = pl1.CallerID
AND PQ.minDateCall = pl1.dateCalled
JOIN PhoneLog pl2
on PQ.CallerID = pl2.CallerID
AND PQ.maxDateCall = pl2.dateCalled
AND pl1.RecipientID = pl2.RecipientID
Its very easy with window function
WITH cte AS (
SELECT *, CAST(DateCalled as DATE) DateCalled
,FIRST_VALUE(Recipientid) OVER (PARTITION BY Callerid ,CAST(DateCalled as date) ORDER BY CAST(DateCalled AS DATE)) f
,LAST_VALUE(Recipientid) OVER (PARTITION BY Callerid ,CAST(DateCalled as date) ORDER BY CAST(DateCalled AS DATE)) l
FROM phonelog
)
SELECT DISTINCT Callerid,Recipientid, DateCalled FROM cte
WHERE f=l
Since SQL Server 2019 you could use the first_value() and last_value() window functions.
SELECT DISTINCT
x1.callerid,
x1.fri,
x1.datecalled
FROM (SELECT pl1.callerid,
pl1.recipientid,
convert(date, pl1.datecalled) datecalled,
first_value(pl1.recipientid) OVER (PARTITION BY pl1.callerid,
convert(date, pl1.datecalled)
ORDER BY pl1.datecalled
RANGE BETWEEN UNBOUNDED PRECEDING
AND UNBOUNDED FOLLOWING) fri,
last_value(pl1.recipientid) OVER (PARTITION BY pl1.callerid,
convert(date, pl1.datecalled)
ORDER BY pl1.datecalled
RANGE BETWEEN UNBOUNDED PRECEDING
AND UNBOUNDED FOLLOWING) lri
FROM phonelog pl1) x1
WHERE x1.fri = x1.lri;
In older versions you can use correlated subqueries with TOP 1.
SELECT DISTINCT
x1.callerid,
x1.fri,
x1.datecalled
FROM (SELECT pl1.callerid,
pl1.recipientid,
convert(date, pl1.datecalled) datecalled,
(SELECT TOP 1
pl2.recipientid
FROM phonelog pl2
WHERE pl2.callerid = pl1.callerid
AND pl2.datecalled >= convert(date, pl1.datecalled)
AND pl2.datecalled < dateadd(day, 1, convert(date, pl1.datecalled))
ORDER BY pl2.datecalled ASC) fri,
(SELECT TOP 1
pl2.recipientid
FROM phonelog pl2
WHERE pl2.callerid = pl1.callerid
AND pl2.datecalled >= convert(date, pl1.datecalled)
AND pl2.datecalled < dateadd(day, 1, convert(date, pl1.datecalled))
ORDER BY pl2.datecalled DESC) lri
FROM phonelog pl1) x1
WHERE x1.fri = x1.lri;
db<>fiddle
If you don't want to return log rows where somebody just made one call on a day, which of course means the first and the last call of the day were to the same person, you can use GROUP BY and HAVING count(*) > 1 instead of DISTINCT.
SELECT x1.callerid,
x1.fri,
x1.datecalled
FROM (...) x1
WHERE x1.fri = x1.lri
GROUP BY x1.callerid,
x1.fri,
x1.datecalled
HAVING count(*) > 1;
You can use a CTE to compute the first and last call of each day by Callerid, and then self-JOIN that CTE to find callers whose first and last calls were to the same Recipientid:
WITH CTE AS (
SELECT Callerid, RecipientId, CONVERT(DATE, Datecalled) AS Datecalled,
ROW_NUMBER() OVER (PARTITION BY Callerid, CONVERT(DATE, Datecalled) ORDER BY Datecalled) AS rna,
ROW_NUMBER() OVER (PARTITION BY Callerid, CONVERT(DATE, Datecalled) ORDER BY Datecalled DESC) AS rnb
FROM phonelog
)
SELECT c1.Callerid, c1.RecipientId, c1.Datecalled
FROM CTE c1
JOIN CTE c2 ON c1.Callerid = c2.Callerid AND c1.Recipientid = c2.Recipientid
WHERE c1.rna = 1 AND c2.rnb = 1
Output:
Callerid RecipientId Datecalled
2 5 2019-07-05
2 3 2019-08-01
2 4 2019-08-02
Demo on SQLFiddle
As my understanding, you want to select callerid with each Recipientid with the times greater than 1 to make sure that we have First call and Last call. So you just need to group by 3 columns combine with having count(Recipientid) > 1 Like this
SELECT Callerid, Recipientid, CAST(Datecalled AS DATE) AS Datecalled
FROM phonelog
GROUP BY Callerid, Recipientid, CAST(Datecalled AS DATE)
HAVING COUNT(Recipientid) > 1
Demo on db<>fiddle
As per my understanding we have to rank Caller_id as well as Recipient_id along with the Date.
Below is my solution which is working well for this case.
with CTE as
(select *,
row_number() over (partition by callerid, convert(VARCHAR,datecalled,23) order by convert(VARCHAR,datecalled,23)) as first_recipient_id,
row_number() over (partition by receipientid, convert(VARCHAR,datecalled,23) order by convert(VARCHAR,datecalled,23) desc) as last_recipient_id
from activity
)
select t.callerid,t.receipientid,CONVERT(VARCHAR,t.datecalled) as DateCalled from CTE t
where t.first_recipient_id >1 AND t.last_recipient_id>1;
The result that I was able to get:
Result
I think we need to identify first and last call made by caller on a day and then compare it with first and last call by caller to a recipient for that day. Below code has firstcall and lastcall made by caller on a day. Then it finds first and last call by caller to respective recipient and then compare.
SELECT DISTINCT
callerid,
recipientid,
CONVERT(date,firstcall)
FROM
(
Select
callerid,
recipientid,
MIN(dateCalled) OVER(PARTITION BY callerid,CONVERT(date,DateCalled)) as firstcall,
MAX(DateCalled) OVER(PARTITION BY callerid,CONVERT(date,DateCalled)) as lastcall,
MIN(DateCalled) OVER(PARTITION BY callerid,recipientid,convert(date,DateCalled)) as recipfirstcall,
MAX(call_start_time) OVER(PARTITION BY callerid,recipientid,convert(date,DateCalled)) as reciplastcall
from phonelog
) as A
where A.firstcall=A.recipfirstcall and A.lastcall=A.reciplastcall

SQL partition by on date range

Assume this is my table:
ID NUMBER DATE
------------------------
1 45 2018-01-01
2 45 2018-01-02
2 45 2018-01-27
I need to separate using partition by and row_number where the difference between one date and another is greater than 5 days. Something like this would be the result of the above example:
ROWNUMBER ID NUMBER DATE
-----------------------------
1 1 45 2018-01-01
2 2 45 2018-01-02
1 3 45 2018-01-27
My actual query is something like this:
SELECT ROW_NUMBER() OVER(PARTITION BY NUMBER ODER BY ID DESC) AS ROWNUMBER, ...
But as you can notice, it doesn't work for the dates. How can I achieve that?
You can use lag function :
select *, row_number() over (partition by number, grp order by id) as [ROWNUMBER]
from (select *, (case when datediff(day, lag(date,1,date) over (partition by number order by id), date) <= 1
then 1 else 2
end) as grp
from table
) t;
by using lag and datediff funtion
select * from
(
select t.*,
datediff(day,
lag(DATE) over (partition by NUMBER order by id),
DATE
) as diff
from t
) as TT where diff>5
http://sqlfiddle.com/#!18/130ae/11
I think you want to identify the groups, using lag() and datediff() and a cumulative sum. Then use row_number():
select t.*,
row_number() over (partition by number, grp order by date) as rownumber
from (select t.*,
sum(grp_start) over (partition by number order by date) as grp
from (select t.*,
(case when lag(date) over (partition by number order by date) < dateadd(day, 5, date)
then 1 else 0
end) as grp_start
from t
) t
) t;

SQL Server - find absence date occurrences [duplicate]

This question already has an answer here:
SQL: Gaps and Islands, Grouped dates
(1 answer)
Closed 5 years ago.
I have the following dataset:
enter image description here
Here is script for this data:
;with dataset AS (
select 'EMP01' AS EMP_ID,CAST('2018-01-01' AS DATE) AS PERIOD_START,CAST('2018-01-31' AS DATE) AS PERIOD_END,CAST('2018-01-07' AS DATE) AS CUT_DATE
UNION
select 'EMP01' AS EMP_ID,CAST('2018-01-01' AS DATE) AS PERIOD_START,CAST('2018-01-31' AS DATE) AS PERIOD_END,CAST('2018-01-15' AS DATE) AS CUT_DATE
UNION
select 'EMP02' AS EMP_ID,CAST('2018-01-01' AS DATE) AS PERIOD_START,CAST('2018-01-31' AS DATE) AS PERIOD_END,CAST('2018-01-09' AS DATE) AS CUT_DATE
)
select *
from dataset
I need to divide these periods (PERIOD_START and PERIOD_END) by CUT_DATE (exclude cut dates from that periods) The number of cut dates could be any (3,5,8 etc).
Expecting result for the dataset above is:
If your version of SQL Server supports LAG, you can use this.
SELECT EMPLOYEE_ID,
ITEM_TYPE,
MIN(APPLY_DATE) AS STARTDATE,
MAX(APPLY_DATE) AS ENDDATE
FROM
(SELECT T.*,
SUM(CASE WHEN PREV_TYPE=ITEM_TYPE THEN 0 ELSE 1 END)
OVER(PARTITION BY EMPLOYEE_ID ORDER BY APPLY_DATE) AS GRP
FROM (SELECT D.*,
LAG(ITEM_TYPE) OVER(PARTITION BY EMPLOYEE_ID ORDER BY APPLY_DATE) AS PREV_TYPE
FROM DATA D
) T
) T
WHERE ITEM_TYPE IN ('Sickness','Vacation')
GROUP BY EMPLOYEE_ID,ITEM_TYPE,GRP
The logic is to get the previous row's item_type (based on ascending order of apply_date) and compare it with the current row's value. If they are equal, they belong to the same group. Else you start a new group. This is done in the sum window function. After groups are assigned, you just need to get the max and min date for an employee_id,item_type.
Sample Demo
You would use the LAG function.
If you order by something, the LAG function gives the previous value;
a full description can be found at: http://www.sqlservercentral.com/articles/T-SQL/106783/
Take a look at vkp's answer for a full query
This is another way if way if lag is supported.
Rextester Sample
with tbl as
(select d.*
,case when (item_type = lag(item_type) over (partition by employee_id order by apply_date))
then 0
else 1
end grp_tmp
from DATA2 d
where
item_type <> 'Worked'
)
,tbl2 as
(select t.*
,sum(grp_tmp) over (order by employee_id,apply_date
rows between unbounded preceding and current row
)
as grp
from tbl t
)
select
EMPLOYEE_ID
,ITEM_TYPE
,(CONVERT(VARCHAR(24),min(apply_date),103)
+' - '
+CONVERT(VARCHAR(24),max(apply_date),103)
) as range
from tbl2
group by EMPLOYEE_ID,
ITEM_TYPE
,grp
order by
employee_id
,min(apply_date);
Output
+-------------+-----------+-------------------------+
| EMPLOYEE_ID | ITEM_TYPE | range |
+-------------+-----------+-------------------------+
| 1 | Sickness | 23/05/2017 - 24/05/2017 |
| 1 | Vacation | 26/05/2017 - 29/05/2017 |
| 1 | Sickness | 01/06/2017 - 01/06/2017 |
| 2 | Sickness | 25/05/2017 - 30/05/2017 |
+-------------+-----------+-------------------------+