How to find the continous range of dates in Sql Server? - sql

How do we find the continous range of dates from the following scenario?
Id modifiedDate StartDate EndDate
1 2019-01-01 2019-01-01 2019-12-31
1 2019-02-02 2019-02-01 2019-02-28
1 2019-02-27 2019-01-15 2019-03-15
1 2019-03-01 2019-03-01 2019-03-12
2 2019-01-01 2019-01-01 2019-03-01
2 2019-05-01 2019-05-01 2019-08-01
The Output i want to show is :
Id StartDate EndDate
1 2019-01-01 2019-01-15
1 2019-01-15 2019-02-01
1 2019-02-01 2019-02-28
1 2019-02-28 2019-03-01
1 2019-03-01 2019-03-12
2 2019-01-01 2019-03-01
2 2019-05-01 2019-08-01
What I have tried so far is :
With X As(
Select a.StartDate,a.EndDate,b.StartDate,b.EndDate
From table a Full Join table b ON a.endDate>b.StartDate
Where a.StartDate<>b.StartDate and b.endDate<>a.Enddate
)
Select StartDate,Enddate,Min(StartDtae)
From X
Group By StartDate,EndDate
But I couldn't get fill the gaps in between the dates. How can I fix this?

You can try this following script I have created with the Help of CTE and Row_Number(). I am getting 2 additional row considering your sample output from the the given input data. If you sample output is correct, you can ignore this solution.
CTE Only worked for MSSQL and Oracle. But you can convert the logic given, for any other databases.
WITH CTE
AS
(
SELECT DISTINCT id,Date, ROW_NUMBER() OVER(PARTITION BY id ORDER BY Date) RN
FROM
(
SELECT Id,StartDate Date FROM your_table
UNION ALL
SELECT Id,EndDate FROM your_table
) A
)
SELECT A.Id, A.Date StartDate,B.Date EndDate
FROM CTE A
INNER JOIN CTE B ON A.Id = B.Id AND A.RN = B.RN - 1
Output is-
Id StartDate EndDate
1 2019-01-01 2019-01-15
1 2019-01-15 2019-02-01
1 2019-02-01 2019-02-28
1 2019-02-28 2019-03-01
1 2019-03-01 2019-03-12
1 2019-03-12 2019-03-15 -- Not exist in your expected output
1 2019-03-15 2019-12-31 -- Not exist in your expected output
Note: Adding an additional Filtering at the as below will give you the exact output you have posted. But take your own decision which one best suits your requirement.
SELECT....
....
FROM CTE A
INNER JOIN CTE B ON A.Id = B.Id AND A.RN = B.RN - 1
WHERE B.DATE <= '2019-03-12'

The following query should give you the desired result:
WITH dates AS (SELECT StartDate
FROM TABLE
UNION
SELECT EndDate + 1
FROM TABLE)
SELECT StartDate
, (SELECT MIN(StartDate) - 1
FROM dates b
WHERE StartDate - 1 > a.StartDate) EndDate
FROM dates a

Just use lead() with union:
select t.id, t.dte as startdate,
lead(t.dte) over (partition by t.id order by t.dte) as enddate
from (select distinct t.id, v.dte
from t cross apply
(values (startdate), (enddate)) v(dte)
) t;
In addition to being concise, this probably has the best performance.

Related

split the date range by every 6 months dynamically. Start date can be any thing, but add up to month range

Please help to split the date range by every 6 moths and the start date could be anything but using the start date we need to add up to 09-30 only and the next day which is 10/01 should become start date. I tried using recursive cte but still not getting the exact result
startdate enddate
06-22-2018 09-30-2022
output
startdate enddate
06-22-2018 09-30-2018
10-01-2018 03-31-2019
04-01-2019 09-30-2019
10-01-2019 03-31-2020
04-01-2020 09-30-2020
Here is another option which uses an ad-hoc tally table
Example
Declare #YourTable table (startdate date, enddate date)
Insert Into #YourTable values
('06/22/2018','09/30/2022')
;with cte as (
Select *
,Grp = sum( case when day(D)=1 and month(D) in (4,10) then 1 else 0 end) over (order by d)
From #YourTable A
Cross Apply (
Select Top (DateDiff(DAY,startdate,enddate)+1) D=DateAdd(DAY,-1+Row_Number() Over (Order By (Select Null)),startdate)
From master..spt_values n1,master..spt_values n2
) B
)
Select StartDate = min(D)
,EndDate = max(D)
From cte
Group by Grp
Order By min(D)
Returns
StartDate EndDate
2018-06-22 2018-09-30
2018-10-01 2019-03-31
2019-04-01 2019-09-30
2019-10-01 2020-03-31
2020-04-01 2020-09-30
2020-10-01 2021-03-31
2021-04-01 2021-09-30
2021-10-01 2022-03-31
2022-04-01 2022-09-30
Option where we JOIN to an ad-hoc calendar table (note the TOP 10000 and base date of 2000-01-01)
Declare #YourTable table (id int,startdate date, enddate date)
Insert Into #YourTable values
(1,'06/22/2018','09/30/2022')
;with cte as (
Select A.*
,B.D
,Grp = sum( case when day(D)=1 and month(D) in (4,10) then 1 else 0 end) over (order by d)
From #YourTable A
Join (
Select Top 10000 D=DateAdd(DAY,-1+Row_Number() Over (Order By (Select Null)),'2000-01-01')
From master..spt_values n1,master..spt_values n2
) B on D between startDate and EndDate
and (D in (startdate,EndDate)
or ( day(D) in (1,day(eomonth(d))) and month(D) in (3,4,9,10))
)
)
Select ID
,StartDate = min(D)
,EndDate = max(D)
From cte
Group by ID,Grp
Order By ID,min(D)
Returns
ID StartDate EndDate
1 2018-06-22 2018-09-30
1 2018-10-01 2019-03-31
1 2019-04-01 2019-09-30
1 2019-10-01 2020-03-31
1 2020-04-01 2020-09-30
1 2020-10-01 2021-03-31
1 2021-04-01 2021-09-30
1 2021-10-01 2022-03-31
1 2022-04-01 2022-09-30
You can use a recursive CTE:
with cte as (
select startdate, eomonth(datefromparts(year(startdate), 9, 1)) as enddate, enddate as orig_enddate
from t
union all
select dateadd(day, 1, enddate), eomonth(dateadd(month, 5, dateadd(day, 1, enddate))) as enddate, orig_enddate
from cte
where enddate < orig_enddate
)
select *
from cte;
Here is a db<>fiddle.
It is unclear what year you want for the first row. As per your question, this uses Sep 30th of the year of the startdate.
If you need more than 100 dates, then add option max(recursion 0).

Get max date for each from either of 2 columns

I have a table like below
AID BID CDate
-----------------------------------------------------
1 2 2018-11-01 00:00:00.000
8 1 2018-11-08 00:00:00.000
1 3 2018-11-09 00:00:00.000
7 1 2018-11-15 00:00:00.000
6 1 2018-12-24 00:00:00.000
2 5 2018-11-02 00:00:00.000
2 7 2018-12-15 00:00:00.000
And I am trying to get a result set as follows
ID MaxDate
-------------------
1 2018-12-24 00:00:00.000
2 2018-12-15 00:00:00.000
Each value in the id columns(AID,BID) should return the max of CDate .
ex: in the case of 1, its max CDate is 2018-12-24 00:00:00.000 (here 1 appears under BID)
in the case of 2 , max date is 2018-12-15 00:00:00.000 . (here 2 is under AID)
I tried the following.
1.
select
g.AID,g.BID,
max(g.CDate) as 'LastDate'
from dbo.TT g
inner join
(select AID,BID,max(CDate) as maxdate
from dbo.TT
group by AID,BID)a
on (a.AID=g.AID or a.BID=g.BID)
and a.maxdate=g.CDate
group by g.AID,g.BID
and 2.
SELECT
AID,
CDate
FROM (
SELECT
*,
max_date = MAX(CDate) OVER (PARTITION BY [AID])
FROM dbo.TT
) AS s
WHERE CDate= max_date
Please suggest a 3rd solution.
You can assemble the data in a table expression first, and the compute the max for each value is simple. For example:
select
id, max(cdate)
from (
select aid as id, cdate from t
union all
select bid, cdate from t
) x
group by id
You seem to only care about values that are in both columns. If this interpretation is correct, then:
select id, max(cdate)
from ((select aid as id, cdate, 1 as is_a, 0 as is_b
from t
) union all
(select bid as id, cdate, 1 as is_a, 0 as is_b
from t
)
) ab
group by id
having max(is_a) = 1 and max(is_b) = 1;

Oracle SQL - Select users between two date by month

I am learning SQL and I was wondering how to select active users by month, depending on their starting and ending date (both timestamp(6)). My table looks like this:
Cust_Num | Start_Date | End_Date
1 | 2018-01-01 | 2019-01-01
2 | 2018-01-01 | NULL
3 | 2019-01-01 | 2019-06-01
4 | 2017-01-01 | 2019-03-01
So, counting the active users by month, I should have an output like:
As of. | Count
2018-06-01 | 3
...
2019-02-01 | 3
2019-07-01 | 1
So far, I do a manual operation by entering each month:
Select
201906,
count(distinct a.cust_num)
From
active_users a
Where
to_date(‘20190630’,’yyyymmdd) between a.start_date and nvl (a.end_date, ‘31-dec-9999)
union all
Select
201905,
count(distinct a.cust_num)
From
active_users a
Where
to_date(‘20190531’,’yyyymmdd) between a.start_date and nvl (a.end_date, ‘31-dec-9999)
union all
...
Not very optimized and sustainable if I want to enter 10 years ao 120 months lol.
Any help is welcome. Thanks a lot!
This query shows the active-user-count effective as-of the end of the month.
How it works:
Convert each input row (with StartDate and EndDate value) into two rows that represent a point-in-time when the active-user-count incremented (on StartDate) and decremented (on EndDate). We need to convert NULL to a far-off date value because NULL values are sorted before instead of after non-NULL values:
This makes your data look like this:
OnThisDate Change
2018-01-01 1
2019-01-01 -1
2018-01-01 1
9999-12-31 -1
2019-01-01 1
2019-06-01 -1
2017-01-01 1
2019-03-01 -1
Then we simply SUM OVER the Change values (after sorting) to get the active-user-count as of that specific date:
So first, sort by OnThisDate:
OnThisDate Change
2017-01-01 1
2018-01-01 1
2018-01-01 1
2019-01-01 1
2019-01-01 -1
2019-03-01 -1
2019-06-01 -1
9999-12-31 -1
Then SUM OVER:
OnThisDate ActiveCount
2017-01-01 1
2018-01-01 2
2018-01-01 3
2019-01-01 4
2019-01-01 3
2019-03-01 2
2019-06-01 1
9999-12-31 0
Then we PARTITION (not group!) the rows by month and sort them by their date so we can identify the last ActiveCount row for that month (this actually happens in the WHERE of the outermost query, using ROW_NUMBER() and COUNT() for each month PARTITION):
OnThisDate ActiveCount IsLastInMonth
2017-01-01 1 1
2018-01-01 2 0
2018-01-01 3 1
2019-01-01 4 0
2019-01-01 3 1
2019-03-01 2 1
2019-06-01 1 1
9999-12-31 0 1
Then filter on that where IsLastInMonth = 1 (actually, where ROW_COUNT() = COUNT(*) inside each PARTITION) to give us the final output data:
At-end-of-month Active-count
2017-01 1
2018-01 3
2019-01 3
2019-03 2
2019-06 1
9999-12 0
This does result in "gaps" in the result-set because the At-end-of-month column only shows rows where the Active-count value actually changed rather than including all possible calendar months - but that's ideal (as far as I'm concerned) because it excludes redundant data. Filling in the gaps can be done inside your application code by simply repeating output rows for each additional month until it reaches the next At-end-of-month value.
Here's the query using T-SQL on SQL Server (I don't have access to Oracle right now). And here's the SQLFiddle I used to come to a solution: http://sqlfiddle.com/#!18/ad68b7/24
SELECT
OtdYear,
OtdMonth,
ActiveCount
FROM
(
-- This query adds columns to indicate which row is the last-row-in-month ( where RowInMonth == RowsInMonth )
SELECT
OnThisDate,
OtdYear,
OtdMonth,
ROW_NUMBER() OVER ( PARTITION BY OtdYear, OtdMonth ORDER BY OnThisDate ) AS RowInMonth,
COUNT(*) OVER ( PARTITION BY OtdYear, OtdMonth ) AS RowsInMonth,
ActiveCount
FROM
(
SELECT
OnThisDate,
YEAR( OnThisDate ) AS OtdYear,
MONTH( OnThisDate ) AS OtdMonth,
SUM( [Change] ) OVER ( ORDER BY OnThisDate ASC ) AS ActiveCount
FROM
(
SELECT
StartDate AS [OnThisDate],
1 AS [Change]
FROM
tbl
UNION ALL
SELECT
ISNULL( EndDate, DATEFROMPARTS( 9999, 12, 31 ) ) AS [OnThisDate],
-1 AS [Change]
FROM
tbl
) AS sq1
) AS sq2
) AS sq3
WHERE
RowInMonth = RowsInMonth
ORDER BY
OtdYear,
OtdMonth
This query can be flattened into fewer nested queries by using aggregate and window functions directly instead of using aliases (like OtdYear, ActiveCount, etc) but that would make the query much harder to understand.
I have created the query which will give the result of all the months starting from the minimum start date in the table till maximum end date.
You can change it using adding one condition in WHERE clause.
-- table creation
CREATE TABLE ACTIVE_USERS (CUST_NUM NUMBER, START_DATE DATE, END_DATE DATE)
-- data creation
INSERT INTO ACTIVE_USERS
SELECT * FROM
(
SELECT 1, DATE '2018-01-01', DATE '2019-01-01' FROM DUAL UNION ALL
SELECT 2, DATE '2018-01-01', NULL FROM DUAL UNION ALL
SELECT 3, DATE '2019-01-01', DATE '2019-06-01' FROM DUAL UNION ALL
SELECT 4, DATE '2017-01-01', DATE '2019-03-01' FROM DUAL
)
-- data in the actual table
SELECT * FROM ACTIVE_USERS ORDER BY CUST_NUM;
CUST_NUM START_DATE END_DATE
---------- ---------- ----------
1 2018-01-01 2019-01-01
2 2018-01-01
3 2019-01-01 2019-06-01
4 2017-01-01 2019-03-01
Query to fetch desired result
WITH CTE ( START_DATE, END_DATE ) AS
(
SELECT
ADD_MONTHS( START_DATE, LEVEL - 1 ),
ADD_MONTHS( START_DATE, LEVEL ) - 1
FROM
(
SELECT
MIN( START_DATE ) AS START_DATE,
MAX( END_DATE ) AS END_DATE
FROM
ACTIVE_USERS
)
CONNECT BY LEVEL <= CEIL( MONTHS_BETWEEN( END_DATE, START_DATE ) ) + 1
)
--
--
SELECT
C.START_DATE,
COUNT(1) AS CNT
FROM
CTE C
JOIN ACTIVE_USERS D ON
(
C.END_DATE BETWEEN
D.START_DATE
AND
CASE
WHEN D.END_DATE IS NOT NULL THEN D.END_DATE
ELSE C.END_DATE
END
)
GROUP BY
C.START_DATE
ORDER BY
C.START_DATE;
-- output --
START_DATE CNT
---------- ----------
2017-01-01 1
2017-02-01 1
2017-03-01 1
2017-04-01 1
2017-05-01 1
2017-06-01 1
2017-07-01 1
2017-08-01 1
2017-09-01 1
2017-10-01 1
2017-11-01 1
START_DATE CNT
---------- ----------
2017-12-01 1
2018-01-01 3
2018-02-01 3
2018-03-01 3
2018-04-01 3
2018-05-01 3
2018-06-01 3
2018-07-01 3
2018-08-01 3
2018-09-01 3
2018-10-01 3
START_DATE CNT
---------- ----------
2018-11-01 3
2018-12-01 3
2019-01-01 3
2019-02-01 3
2019-03-01 2
2019-04-01 2
2019-05-01 2
2019-06-01 1
30 rows selected.
Cheers!!

Subtract subsequent row from previous row based on User

I have the following data and I want to subtract current row from previous row based on the UserID. I tried the code below is not given me what I want
DECLARE #DATETBLE TABLE (UserID INT, Dates DATE)
INSERT INTO #DATETBLE VALUES
(1,'2018-01-01'), (1,'2018-01-02'), (1,'2018-01-03'),(1,'2018-01-13'),
(2,'2018-01-15'),(2,'2018-01-16'),(2,'2018-01-17'), (5,'2018-02-04'),
(5,'2018-02-05'),(5,'2018-02-06'),(5,'2018-02-11'), (5,'2018-02-17')
;with cte as (
select UserID,Dates, row_number() over (order by UserID) as seqnum
from #DATETBLE t
)
select t.UserID,t.Dates, datediff(day,tprev.Dates,t.Dates)as diff
from cte t left outer join
cte tprev
on t.seqnum = tprev.seqnum + 1;
Current Output
UserID Dates diff
1 2018-01-01 NULL
1 2018-01-02 1
1 2018-01-03 1
1 2018-01-13 10
2 2018-01-15 2
2 2018-01-16 1
2 2018-01-17 1
5 2018-02-04 18
5 2018-02-05 1
5 2018-02-06 1
5 2018-02-11 5
5 2018-02-17 6
My Expected Output
UserID Dates diff
1 2018-01-01 NULL
1 2018-01-02 1
1 2018-01-03 1
1 2018-01-13 10
2 2018-01-15 NULL
2 2018-01-16 1
2 2018-01-17 1
5 2018-02-04 NULL
5 2018-02-05 1
5 2018-02-06 1
5 2018-02-11 5
5 2018-02-17 6
Your tag (sql-server-2008) suggests me to use APPLY :
select t.userid, t.dates, datediff(day, t1.dates, t.dates) as diff
from #DATETBLE t outer apply
( select top (1) t1.*
from #DATETBLE t1
where t1.userid = t.userid and
t1.dates < t.dates
order by t1.dates desc
) t1;
If you have SQL Server version 2012 or higher, you could use LAG() with a partition by UserID:
SELECT UserID
, DATEDIFF(dd,COALESCE(LAG_DATES, Dates), Dates) as diff
FROM
(
SELECT UserID
, Dates
, LAG(Dates) OVER (PARTITION BY UserID ORDER BY Dates) as LAG_DATES
FROM #DATETBLE
) exp
This will give you a 0 value instead of a NULL value for the first date in the sequence though.
Since you tagged the post with SQL Server 2008, however, you may need to use a method that doesn't rely on this windowed function.

Merging the two tables using sql / Spark

I have two data set as below and need to merge two data set based on the date range logic. Please suggest any idea? and the driver table is A
Table A
UID Start Date End Date A_Val
1 1980-01-01 00:00:00 1980-02-01 00:00:00 A
1 1980-02-02 00:00:00 1980-03-10 00:00:00 B
1 1980-03-11 00:00:00 1980-03-24 00:00:00 C
Table B
UID Start Date End Date B_Val
1 1980-01-10 00:00:00 1980-02-01 00:00:00 G
1 1980-02-02 00:00:00 1980-03-01 00:00:00 H
1 1980-03-02 00:00:00 1980-03-24 00:00:00 I
Result / out put needed as below
UID Start Date End Date A_Val B_Val
1 1980-01-01 00:00:00 1980-01-09 00:00:00 A NULL
1 1980-01-10 00:00:00 1980-02-01 00:00:00 A G
1 1980-02-02 00:00:00 1980-03-01 00:00:00 B H
1 1980-03-02 00:00:00 1980-03-10 00:00:00 B I
1 1980-03-11 00:00:00 1980-03-24 00:00:00 C I
Table Detail
Need the out put as below based on date range calculations
out put of Merged Table
You can do it in several ways, here is one:
find minimum and maximum date from whole set (subquery T),
create each day entry with hierarchical query (subquery D),
left join data from A and B,
assign groups to continuous periods, having same A_VAL and B_VAL (subquery G),
group data using assigned group numbers.
SQLFiddle demo
with
T as (select min(start_date) sd, max(end_date) ed
from (select start_date, end_date from a union all
select start_date, end_date from b)),
D as (select sd + level - 1 dt from t connect by sd + level - 1 <= ed),
G as (select dt, a_val, b_val,
row_number() over (order by dt) -
row_number() over (partition by a_val, b_val order by dt) grp
from d
left join a on dt between a.start_date and a.end_date
left join b on dt between b.start_date and b.end_date)
select min(dt) sd, max(dt) ed, min(a_val) a_val, min(b_val) b_val
from g group by grp order by sd
Result:
SD ED A_VAL B_VAL
----------- ----------- ----- -----
1980-01-01 1980-01-09 A
1980-01-10 1980-02-01 A G
1980-02-02 1980-03-01 B H
1980-03-02 1980-03-10 B I
1980-03-11 1980-03-24 C I
If you are doing this for one U_ID filter data at first. If for many U_ID's then you have to consider this value in partitioning and grouping.