Get a list of dates between few dates - sql

There are some quite similar questions, but not the same.
I have to solve the next problem:
From table with such structure
| DATE_FROM | DATE_TO |
|------------|------------|
| 2010-05-17 | 2010-05-19 |
| 2017-01-02 | 2017-01-04 |
| 2017-05-01 | NULL |
| 2017-06-12 | NULL |
I need to get a list like the one below
| DATE_LIST |
|------------|
| 2010-05-17 |
| 2010-05-18 |
| 2010-05-19 |
| 2017-01-02 |
| 2010-01-03 |
| 2010-01-04 |
| 2017-05-01 |
| 2017-06-12 |
How can I get it with SQL? SQL Server 2016.

Another option is with a CROSS APPLY and an ad-hoc tally table
Select Date_List=B.D
from YourTable A
Cross Apply (
Select Top (DateDiff(DAY,[DATE_FROM],IsNull([DATE_TO],[DATE_FROM]))+1) D=DateAdd(DAY,-1+Row_Number() Over (Order By (Select Null)),[DATE_FROM])
From master..spt_values n1,master..spt_values n2
) B
Returns
Date_List
2010-05-17
2010-05-18
2010-05-19
2017-01-02
2017-01-03
2017-01-04
2017-05-01
2017-06-12

One method uses a recursive CTE:
with cte as (
select date_from as date_list, date_to
from t
union all
select dateadd(day, 1, date_from), date_to
from cte
where date_from < date_to
)
select date_list
from cte;
By default, the recursive CTE is limited to a recursive depth of 100 (and then it returns an error). That works for spans of up to 100 days. You can remove the limit with OPTION (MAXRECURSION 0).

Although you could create the date range on the fly in your query, consider creating a permanent calendar table. This will provide better performance and can be extended with other attributes like day of week, fiscal quarter, etc. You can find many examples of loading such a table with an internet search.
Below is an example with 40 years of dates.
--example calendar table load script
CREATE TABLE dbo.Calendar(
CalendarDate date NOT NULL
CONSTRAINT PK_Calendar PRIMARY KEY
);
WITH
t4 AS (SELECT n FROM (VALUES(0),(0),(0),(0)) t(n))
,t256 AS (SELECT 0 AS n FROM t4 AS a CROSS JOIN t4 AS b CROSS JOIN t4 AS c CROSS JOIN t4 AS d)
,t64k AS (SELECT ROW_NUMBER() OVER (ORDER BY (a.n)) AS num FROM t256 AS a CROSS JOIN t256 AS b)
INSERT INTO dbo.Calendar WITH(TABLOCKX)
SELECT DATEADD(day, num, '20000101')
FROM t64k
WHERE DATEADD(day, num, '20000101') < '20400101'
GO
DECLARE #example TABLE(
DATE_FROM date NOT NULL
,DATE_TO date NULL
);
GO
--example query
INSERT INTO #example VALUES
('2010-05-17', '2010-05-19')
, ('2017-01-02', '2017-01-04')
, ('2017-05-01', NULL)
, ('2017-06-12', NULL)
SELECT
c.CalendarDate
FROM #example AS e
JOIN dbo.Calendar AS c ON
c.CalendarDate BETWEEN e.DATE_FROM AND COALESCE(e.DATE_TO, e.DATE_FROM);

Related

Returning non-overlapping records within a date range

I have the following data. I have looked over a lot of threads about overlapping and non-overlapping dates but none seemed to help me.
===============================
PK | StartDate | EndDate
===============================
1 | 2016-05-01 | 2016-05-02
2 | 2016-05-02 | 2016-05-03
3 | 2016-05-03 | 2016-05-04
4 | 2016-05-04 | 2016-05-05
5 | 2016-05-07 | 2016-06-08
===============================
From this table with a SQL query I want to return the first record out of overlapping dates
or basically
===============================
PK | StartDate | EndDate
===============================
1 | 2016-05-01 | 2016-05-02
3 | 2016-05-03 | 2016-05-04
5 | 2016-05-07 | 2016-06-08
===============================
I have been struggling a lot with this query and was wondering if this is actually possible without too much of a hit on performance and whether its better if thats done on backend or with a SQL query because I believe it'd be easier for me to do it on the backend.
This can be achieved by creating a new column and partitioning it to fetch only first rows.
declare #tbl table
(pk int identity,StartDate date,EndDate date)
insert into #tbl
values('2016-05-01','2016-05-02')
,('2016-05-02','2016-05-03')
,('2016-05-03','2016-05-04')
,('2016-05-04','2016-05-05')
,('2016-05-07','2016-06-08')
select pk,startdate,enddate from(select pk,startdate,enddate
,ROW_NUMBER()over(partition by [overlappingdates] order by startdate)rn
from(
select *,case when ROW_NUMBER()over(order by startdate) % 2 = 0
then StartDate else EndDate end as [overlappingdates]
from
#tbl
)t
)t
where t.rn = 1
Need to join to first find all overlapping, unpivot to dates in columns, then just find the MIN(dates).
Min of Start/End of Overlapping Date Ranges
DROP TABLE IF EXISTS #Dates
CREATE TABLE #Dates (PK INT IDENTITY(1,1),StartDate DATE,EndDate DATE)
INSERT INTO #Dates VALUES
('2016-05-01','2016-05-02')
,('2016-05-02','2016-05-03')
,('2016-05-03','2016-05-04')
,('2016-05-04','2016-05-05')
,('2016-05-07','2016-06-08')
SELECT A.PK
,StartDate = MIN(C.StartDate)
,EndDate = MIN(C.EndDate)
FROM #Dates AS A
INNER JOIN #Dates AS B
ON A.PK < B.PK /*Only join 1 way and don't join to itself*/
/*Find any overlap*/
AND A.StartDate <= B.EndDate
AND A.EndDate >= B.StartDate
CROSS APPLY (VALUES /*Puts in unpivoted(vertical) format so can run MIN() function*/
(A.StartDate,A.EndDate)
,(B.StartDate,B.EndDate)
) AS C(StartDate,EndDate)
GROUP BY A.PK

Create all months list from a date column in ORACLE SQL

CREATE TABLE dates(
alldates date);
INSERT INTO dates (alldates) VALUES ('1-May-2017');
INSERT INTO dates (alldates) VALUES ('1-Mar-2018');
I want to generate all months beginning between these two dates. I am very new to Oracle SQL. My solution is below, but it is not working properly.
WITH t1(test) AS (
SELECT MIN(alldates) as test
FROM dates
UNION ALL
SELECT ADD_MONTHS(test,1) as test
FROM t1
WHERE t1.test<= (SELECT MAX(alldates) FROM date)
)
SELECT * FROM t1
The result I want should look like
Test
2017-02-01
2017-03-01
...
2017-12-01
2018-01-01
2018-02-01
2018-03-01
You made a typo and wrote date instead of dates but you also need to make a second change and use ADD_MONTHS in the recursive query's WHERE clause or you will generate one too many rows.
WITH t1(test) AS (
SELECT MIN(alldates)
FROM dates
UNION ALL
SELECT ADD_MONTHS(test,1)
FROM t1
WHERE ADD_MONTHS(test,1) <= (SELECT MAX(alldates) FROM dates)
)
SELECT * FROM t1
Which outputs:
| TEST |
| :-------- |
| 01-MAY-17 |
| 01-JUN-17 |
| 01-JUL-17 |
| 01-AUG-17 |
| 01-SEP-17 |
| 01-OCT-17 |
| 01-NOV-17 |
| 01-DEC-17 |
| 01-JAN-18 |
| 01-FEB-18 |
| 01-MAR-18 |
However, a more efficient query would be to get the minimum and maximum values in the same query and then iterate using these pre-found bounds:
WITH t1(min_date, max_date) AS (
SELECT MIN(alldates),
MAX(alldates)
FROM dates
UNION ALL
SELECT ADD_MONTHS(min_date,1),
max_date
FROM t1
WHERE ADD_MONTHS(min_date,1) <= max_date
)
SELECT min_date AS month
FROM t1
db<>fiddle here
Update
Oracle 11gR2 has bugs handling recursive date queries; this is fixed in later Oracle versions but if you want to use SQL Fiddle and Oracle 11gR2 then you need to iterate over a numeric value and not a date. Something like this:
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE dates(
alldates date);
INSERT INTO dates (alldates) VALUES ('1-May-2017');
INSERT INTO dates (alldates) VALUES ('1-Mar-2018');
Query 1:
WITH t1(min_date, month, total_months) AS (
SELECT MIN(alldates),
0,
MONTHS_BETWEEN(MAX(alldates),MIN(alldates))
FROM dates
UNION ALL
SELECT min_date,
month+1,
total_months
FROM t1
WHERE month+1<=total_months
)
SELECT ADD_MONTHS(min_date,month) AS month
FROM t1
Results:
| MONTH |
|----------------------|
| 2017-05-01T00:00:00Z |
| 2017-06-01T00:00:00Z |
| 2017-07-01T00:00:00Z |
| 2017-08-01T00:00:00Z |
| 2017-09-01T00:00:00Z |
| 2017-10-01T00:00:00Z |
| 2017-11-01T00:00:00Z |
| 2017-12-01T00:00:00Z |
| 2018-01-01T00:00:00Z |
| 2018-02-01T00:00:00Z |
| 2018-03-01T00:00:00Z |
You seem to want a recursive CTE. That syntax would be:
WITH CTE(min_date, max_date) as (
SELECT MIN(alldates) as min_date, MAX(alldates) as max_date
FROM dates
UNION ALL
SELECT add_months(min_date, 1), max_date
FROM CTE
WHERE min_date < max_date
)
SELECT min_date
FROM CTE;
Here is a db<>fiddle.
You just made a typo: date instead of dates:
WITH t1(test) AS (
SELECT MIN(alldates) as test
FROM dates
UNION ALL
SELECT ADD_MONTHS(test,1) as test
FROM t1
WHERE t1.test<= (SELECT MAX(alldates) FROM dateS) -- fixed here
)
SELECT * FROM t1

Select Multiple Rows from Timespan

Problem
In my sql-server-2014 I store projects in a table with the columns:
Startdate .. | Enddate ....| Projectname .................| Volume
2017-02-13 | 2017-04-12 | GenerateRevenue .........| 20.02
2017-04-02 | 2018-01-01 | BuildRevenueGenerator | 300.044
2017-05-23 | 2018-03-19 | HarvestRevenue ............| 434.009
I need a SELECT to give me one row per month of the project for each project. the days of the month don't have to be considered.
Date .......... | Projectname..................| Volume
2017-02-01 | GenerateRevenue .........| 20.02
2017-03-01 | GenerateRevenue .........| 20.02
2017-04-01 | GenerateRevenue .........| 20.02
2017-04-01 | BuildRevenueGenerator | 300.044
2017-05-01 | BuildRevenueGenerator | 300.044
2017-06-01 | BuildRevenueGenerator | 300.044
...
Extra
Ideally the logic of the SELECT allows me both to calculate the monthly volume and also the difference between each month and the previous.
Date .......... | Projectname..................| VolumeMonthly
2017-02-01 | GenerateRevenue .........| 6.6733
2017-03-01 | GenerateRevenue .........| 6.6733
2017-04-01 | GenerateRevenue .........| 6.6733
2017-04-01 | BuildRevenueGenerator | 30.0044
2017-05-01 | BuildRevenueGenerator | 30.0044
2017-06-01 | BuildRevenueGenerator | 30.0044
...
Also...
I know I can map it on a temporary calendar table, but that tends to get bloated and complex very fast. Im really looking for a better way to solve this problem.
Solution
Gordons solution worked very nicely and it doesn't require a second table or mapping on a calendar of some sort. Although I had to change a few things, like making sure both sides of the union have the same SELECT.
Here my adapted version:
with cte as (
select startdate as mondate, enddate, projectName, volume
from projects
union all
select dateadd(month, 1, mondate), enddate, projectName, volume
from cte
where eomonth(dateadd(month, 1, mondate)) <= eomonth(enddate)
)
select * from cte;
Volume monthly can be achieved by replacing volume with:
CAST(Cast(volume AS DECIMAL) / Cast(Datediff(month,
startdate,enddate)+ 1 AS DECIMAL) AS DECIMAL(15, 2))
END AS [volumeMonthly]
Another option is with an ad-hoc tally table
Example
-- Some Sample Data
Declare #YourTable table (StartDate date,EndDate date,ProjectName varchar(50), Volume float)
Insert Into #YourTable values
('2017-03-15','2017-07-25','Project X',25)
,('2017-04-01','2017-06-30','Project Y',50)
-- Set Your Desired Date Range
Declare #Date1 date = '2017-01-01'
Declare #Date2 date = '2017-12-31'
Select Period = D
,B.*
,MonthlyVolume = sum(Volume) over (Partition By convert(varchar(6),D,112))
From (Select Top (DateDiff(MONTH,#Date1,#Date2)+1) D=DateAdd(MONTH,-1+Row_Number() Over (Order By (Select Null)),#Date1)
From master..spt_values n1
) A
Join #YourTable B on convert(varchar(6),D,112) between convert(varchar(6),StartDate,112) and convert(varchar(6),EndDate,112)
Order by Period,ProjectName
Returns
Note: Use a LEFT JOIN to see gaps
You can use a recursive subquery to expand the rows for each project, based on the table:
with cte as (
select stardate as mondate, p.*
from projects
union all
select dateadd(month, 1, mondate), . . . -- whatever columns you want here
from cte
where eomonth(dateadd(month, 1, mondate)) <= eomonth(enddate)
)
select *
from cte;
I'm not sure if this actually answers your question. When I first read the question, I figured the table had one row per project.
Using a couple of common table expressions, an adhoc calendar table for months and lag() (SQL Server 2012+) for the final delta calculation:
create table projects (id int identity(1,1), StartDate date, EndDate date, ProjectName varchar(32), Volume float);
insert into projects values ('20170101','20170330','SO Q1',240),('20170214','20170601','EX Q2',120)
declare #StartDate date = '20170101'
, #EndDate date = '20170731';
;with Months as (
select top (datediff(month,#startdate,#enddate)+1)
MonthStart = dateadd(month, row_number() over (order by number) -1, #StartDate)
, MonthEnd = dateadd(day,-1,dateadd(month, row_number() over (order by number), #StartDate))
from master.dbo.spt_values
)
, ProjectMonthlyVolume as (
select p.*
, MonthlyVolume = Volume/(datediff(month,p.StartDate,p.EndDate)+1)
, m.MonthStart
from Months m
left join Projects p
on p.EndDate >= m.MonthStart
and p.StartDate <= m.MonthEnd
)
select
MonthStart = convert(char(7),MonthStart,120)
, MonthlyVolume = isnull(sum(MonthlyVolume),0)
, Delta = isnull(sum(MonthlyVolume),0) - lag(Sum(MonthlyVolume)) over (order by MonthStart)
from ProjectMonthlyVolume pmv
group by MonthStart
rextester demo: http://rextester.com/DZL54787
returns:
+------------+---------------+-------+
| MonthStart | MonthlyVolume | Delta |
+------------+---------------+-------+
| 2017-01 | 80 | NULL |
| 2017-02 | 104 | 24 |
| 2017-03 | 104 | 0 |
| 2017-04 | 24 | -80 |
| 2017-05 | 24 | 0 |
| 2017-06 | 24 | 0 |
| 2017-07 | 0 | -24 |
+------------+---------------+-------+

Convert start and end dates to normalized table

I have a table with customer records, with each customer having a start date and end date. I'm looking for the most efficient way to convert this into a table that counts the number of active customers for each day. For example:
Existing table (Table A):
Customer - Start Date - End Date
A - 1/1/2017 - 1/3/2017
B - 1/2/2017 - 1/5/2017
What I need (Table B):
Date - Customer_Count
1/1/2017 - 1
1/2/2017 - 2
1/3/2017 - 2
1/4/2017 - 1
1/5/2017 - 1
The method I'm using right now is simply joining a date reference table to the customer table, and then grouping by the reference date column. While this method works, the customer table is very large, and there are additional conditions I want to be able to apply (i.e., the geography of the customer, product, etc.) which will additionally impact performance.
Appreciate the help!
You can generate dates using custom table and the do cross apply as below:
select RowN as [Date], count(*) as Customer_Count from #yourcust cross apply
(
select top (datediff(day, startdate, enddate)+1) rowN = dateadd(day, row_number() over (order by s1.name) -1 , startdate) from master..spt_values s1,master..spt_values s2
) a
group by RowN
Output
+------------+----------------+
| Date | Customer_Count |
+------------+----------------+
| 2017-01-01 | 1 |
| 2017-01-02 | 2 |
| 2017-01-03 | 2 |
| 2017-01-04 | 1 |
| 2017-01-05 | 1 |
+------------+----------------+
A tally/calendar table would do the trick, but an ad-hoc tally table in concert with a Cross Apply may help as well
Example
Select Date
,Customer_Count = count(*)
From YourTable A
Cross Apply (
Select Top (DateDiff(DD,[Start Date] ,[End Date] )+1) Date=DateAdd(DAY,-1+Row_Number() Over (Order By (Select Null)),[Start Date])
From master..spt_values
) B
Group By Date
Order By Date
Returns
Date Customer_Count
2017-01-01 1
2017-01-02 2
2017-01-03 2
2017-01-04 1
2017-01-05 1

Group rows into sequences using a sliding window on a DateTime column

I have a table that stores timestamped events. I want to group the events into 'sequences' by using 5-min sliding window on the timestamp column, and write the 'sequence ID' (any ID that can distinguish sequences) and 'order in sequence' into another table.
Input - event table:
+----+-------+-----------+
| Id | Name | Timestamp |
+----+-------+-----------+
| 1 | test | 00:00:00 |
| 2 | test | 00:06:00 |
| 3 | test | 00:10:00 |
| 4 | test | 00:14:00 |
+----+-------+-----------+
Desired output - sequence table. Here SeqId is the ID of the starting event, but it doesn't have to be, just something to uniquely identify a sequence.
+---------+-------+----------+
| EventId | SeqId | SeqOrder |
+---------+-------+----------+
| 1 | 1 | 1 |
| 2 | 2 | 1 |
| 3 | 2 | 2 |
| 4 | 2 | 3 |
+---------+-------+----------+
What would be the best way to do it? This is MSSQL 2008, I can use SSAS and SSIS if they make things easier.
CREATE TABLE #Input (Id INT, Name VARCHAR(20), Time_stamp TIME)
INSERT INTO #Input
VALUES
( 1 ,'test','00:00:00' ),
( 2 ,'test','00:06:00' ),
( 3 ,'test','00:10:00' ),
( 4 ,'test','00:14:00' )
SELECT * FROM #Input;
WITH cte AS -- add a sequential number
(
SELECT *,
ROW_NUMBER() OVER(ORDER BY Id) AS sort
FROM #Input
), cte2 as -- find the Id's with a difference of more than 5min
(
SELECT cte.*,
CASE WHEN DATEDIFF(MI, cte_1.Time_stamp,cte.Time_stamp) < 5 THEN 0 ELSE 1 END as GrpType
FROM cte
LEFT OUTER JOIN
cte as cte_1 on cte.sort =cte_1.sort +1
), cte3 as -- assign a SeqId
(
SELECT GrpType, Time_Stamp,ROW_NUMBER() OVER(ORDER BY Time_stamp) SeqId
FROM cte2
WHERE GrpType = 1
), cte4 as -- find the Time_Stamp range per SeqId
(
SELECT cte3.*,cte_2.Time_stamp as TS_to
FROM cte3
LEFT OUTER JOIN
cte3 as cte_2 on cte3.SeqId =cte_2.SeqId -1
)
-- final query
SELECT
t.Id,
cte4.SeqId,
ROW_NUMBER() OVER(PARTITION BY cte4.SeqId ORDER BY t.Time_stamp) AS SeqOrder
FROM cte4 INNER JOIN #Input t ON t.Time_stamp>=cte4.Time_stamp AND (t.Time_stamp <cte4.TS_to OR cte4.TS_to IS NULL);
This code is slightly more complex but it returns the expected output (which Gordon Linoffs solution doesn't...) and it's even slightly faster.
You seem to want things grouped together when they are less than five minutes apart. You can assign the groups by getting the previous time stamp and marking the beginning of a group. You then need to do a cumulative sum to get the group id:
with e as (
select e.*,
(case when datediff(minute, prev_timestamp, timestamp) < 5 then 1 else 0 end) as flag
from (select e.*,
(select top 1 e2.timestamp
from events e2
where e2.timestamp < e.timestamp
order by e2.timestamp desc
) as prev_timestamp
from events e
) e
)
select e.eventId, e.seqId,
row_number() over (partition by seqId order b timestamp) as seqOrder
from (select e.*, (select sum(flag) from e e2 where e2.timestamp <= e.timestamp) as seqId
from e
) e;
By the way, this logic is easier to express in SQL Server 2012+ because the window functions are more powerful.