SQL Union as Subquery to create Date Ranges from Start Date - sql

I have three tabels, each of them has a date column (the date column is an INT field and needs to stay that way). I need a UNION accross all three tables so that I get the list of unique dates in accending order like this:
20040602
20051215
20060628
20100224
20100228
20100422
20100512
20100615
Then I need to add a column to the result of the query where I subtract one from each date and place it one row above as the end date. Basically I need to generate the end date from the start date somehow and this is what I got so far (not working):
With Query1 As (
Select date_one As StartDate
From table_one
Union
Select date_two As StartDate
From table_two
Union
Select date_three e As StartDate
From table_three
Order By Date Asc
)
Select Query1.StartDate - 1 As EndDate
From Query1
Thanks a lot for your help!

Building on your existing union cte, we can use lead() in the outer query to get the start_date of the next record, and withdraw 1 from it.
with q as (
select date_one start_date from table_one
union select date_two from table_two
union select date_three from table_three
)
select
start_date,
dateadd(day, -1, lead(start_date) over(order by start_date)) end_date
from q
order by start_date
If the datatype the original columns are numeric, then you need to do some casting before applying date functions:
with q as (
select cast(cast(date_one as varchar(8)) as date) start_date from table_one
union select cast(cast(date_two as varchar(8)) as date) from table_two
union select cast(cast(date_three as varchar(8)) as date) from table_three
)
select
start_date,
dateadd(day, -1, lead(start_date) over(order by start_date)) end_date
from q
order by start_date

Related

Select date ranges where periods do not overlap

I have two tables each containing the start and end dates of several periods. I want an efficient way to find periods (date ranges) where dates are within the ranges of the first table but not within ranges of the second table.
For example, if this is my first table (with dates that I want)
start_date end_date
2001-01-01 2010-01-01
2012-01-01 2015-01-01
And this is my second table (with dates that I do not want)
start_date end_date
2002-01-01 2006-01-01
2003-01-01 2004-01-01
2005-01-01 2009-01-01
2014-01-01 2018-01-01
Then output looks like
start_date end_date
2001-01-01 2001-12-31
2009-01-02 2010-01-01
2012-01-01 2013-12-31
We can safely assume that periods in the first table are non-overlapping, but can not assume periods in the second table are overlapping.
I already have a method for doing this but it is an order of magnitude slower than I can accept. So hoping someone can propose a faster approach.
My present method looks like:
merge table 2 into non-overlapping periods
find the inverse of table 2
join overlapping periods from table 1 and inverted-table-2
I am sure there is a faster way if some of these steps can be merged together.
In more detail
/* (1) merge overlapping preiods */
WITH
spell_starts AS (
SELECT [start_date], [end_date]
FROM table_2 s1
WHERE NOT EXISTS (
SELECT 1
FROM table_2 s2
WHERE s2.[start_date] < s1.[start_date]
AND s1.[start_date] <= s2.[end_date]
)
),
spell_ends AS (
SELECT [start_date], [end_date]
FROM table_2 t1
WHERE NOT EXISTS (
SELECT 1
FROM table_2 t2
WHERE t2.[start_date] <= t1.[end_date]
AND t1.[end_date] < t2.[end_date]
)
)
SELECT s.[start_date], MIN(e.[end_date]) as [end_date]
FROM spell_starts s
INNER JOIN spell_ends e
ON s.[start_date] <= e.[end_date]
GROUP BY s.[start_date]
/* (2) inverse table 2 */
SELECT [start_date], [end_date]
FROM (
/* all forward looking spells */
SELECT DATEADD(DAY, 1, [end_date]) AS [start_date]
,LEAD(DATEADD(DAY, -1, [start_date]), 1, '9999-01-01') OVER ( ORDER BY [start_date] ) AS [end_date]
FROM merge_table_2
UNION ALL
/* back looking spell (to 'origin of time') created separately */
SELECT '1900-01-01' AS [start_date]
,DATEADD(DAY, -1, MIN([start_date])) AS [end_date]
FROM merge_table_2
) k
WHERE [start_date] <= [end_date]
AND '1900-01-01' <= [start_date]
AND [end_date] <= '9999-01-01'
/* (3) overlap spells */
SELECT IIF(t1.start_date < t2.start_date, t2.start_date, t1.start_date) AS start_date
,IIF(t1.end_date < t2.end_date, t1.end_date, t2.end_date) AS end_date
FROM table_1 t1
INNER JOIN inverse_merge_table_2 t2
ON t1.start_date < t2.end_date
AND t2.start_date < t1.end_date
Hope this helps. I have comment the two ctes I am using for explanation purposes
Here you go:
drop table table1
select cast('2001-01-01' as date) as start_date, cast('2010-01-01' as date) as end_date into table1
union select '2012-01-01', '2015-01-01'
drop table table2
select cast('2002-01-01' as date) as start_date, cast('2006-01-01' as date) as end_date into table2
union select '2003-01-01', '2004-01-01'
union select '2005-01-01', '2009-01-01'
union select '2014-01-01', '2018-01-01'
/***** Solution *****/
-- This cte put all dates into one column
with cte as
(
select t
from
(
select start_date as t
from table1
union all
select end_date
from table1
union all
select dateadd(day,-1,start_date) -- for table 2 we bring the start date back one day to make sure we have nothing in the forbidden range
from table2
union all
select dateadd(day,1,end_date) -- for table 2 we bring the end date forward one day to make sure we have nothing in the forbidden range
from table2
)a
),
-- This one adds an end date using the lead function
cte2 as (select t as s, coalesce(LEAD(t,1) OVER ( ORDER BY t ),t) as e from cte a)
-- this query gets all intervals not in table2 but in table1
select s, e
from cte2 a
where not exists(select 1 from table2 b where s between dateadd(day,-1,start_date) and dateadd(day,1,end_date) and e between dateadd(day,-1,start_date) and dateadd(day,1,end_date) )
and exists(select 1 from table1 b where s between start_date and end_date and e between start_date and end_date)
and s <> e
If you want performance, then you want to use window functions.
The idea is to:
Combine the dates with flags of being in-and-out of the two tables.
Use cumulative sums to determine where dates start being in-and-out.
Then you have a gaps-and-islands problem where you want to combine the results.
Finally, filter on the particular periods you want.
This looks like:
with dates as (
select start_date as dte, 1 as in1, 0 as in2
from table1
union all
select dateadd(day, 1, end_date), -1, 0
from table1
union all
select start_date, 0, 1 as in2
from table2
union all
select dateadd(day, 1, end_date), 0, -1
from table2
),
d as (
select dte,
sum(sum(in1)) over (order by dte) as ins_1,
sum(sum(in2)) over (order by dte) as ins_2
from dates
group by dte
)
select min(dte), max(next_dte)
from (select d.*, dateadd(day, -1, lead(dte) over (order by dte)) as next_dte,
row_number() over (order by dte) as seqnum,
row_number() over (partition by case when ins_1 >= 1 and ins_2 = 0 then 'in' else 'out' end order by dte) as seqnum_2
from d
) d
group by (seqnum - seqnum_2)
having max(ins_1) > 0 and max(ins_2) = 0
order by min(dte);
Here is a db<>fiddle.
Thanks to #zip and #Gordon for their answers. Both were superior to my initial approach. However, the following solution was faster than both of their approaches in my environment & context:
WITH acceptable_starts AS (
SELECT [start_date] FROM table1 AS a
WHERE NOT EXISTS (
SELECT 1 FROM table2 AS b
WHERE DATEADD(DAY, 1, a.[end_date]) BETWEEN b.[start_date] AND b.
UNION ALL
SELECT DATEADD(DAY, 1, [end_date]) FROM table2 AS a
WHERE NOT EXISTS (
SELECT 1 FROM table2 AS b
WHERE DATEADD(DAY, 1, a.[end_date]) BETWEEN b.[start_date] AND b.[end_date]
)
),
acceptable_ends AS (
SELECT [end_date] FROM table1 AS a
WHERE NOT EXISTS (
SELECT 1 FROM table2 AS b
WHERE DATEADD(DAY, -1, a.[start_date]) BETWEEN b.[start_date] AND b.[end_date]
)
UNION ALL
SELECT DATEADD(DAY, -1, [start_date]) FROM table2 AS a
WHERE NOT EXISTS (
SELECT 1 FROM table2 AS b
WHERE DATEADD(DAY, -1, a.[start_date]) BETWEEN b.[start_date] AND b.[end_date]
)
)
SELECT s.[start_date], MIN(e.[end_date]) AS [end_date]
FROM acceptable_starts
INNER JOIN acceptable_ends
ON s.[start_date] < e.[end_date]

Set based query to replace loop to populate all month end dates from given date for all records

I have a table that stores patient lab test results. There can be results from multiple tests like Albumin, Potassium, Phosphorus etc. First reading for each patient from each of these categories is stored in a table called #MetricFirstGroupReading.
CREATE TABLE #MetricFirstGroupReading (Patient_Key INT, Metric_Group VARCHAR(100),
Observation_Date DATE)
ALTER TABLE #MetricFirstGroupReading
ADD CONSTRAINT UQ_MetricFirst UNIQUE (Patient_Key, Metric_Group);
INSERT INTO #MetricFirstGroupReading
SELECT 1, 'Albumin', '2018-11-15' UNION
SELECT 1, 'Potassium', '2018-12-10' UNION
SELECT 2, 'Albumin', '2018-10-20' UNION
SELECT 2, 'Potassium', '2018-11-25'
Now, I need to populate all month end dates upto current month into a new table, for each record from the #MetricFirstGroupReading table. Following is the expected result when the query run on December 2018.
I know how to do it using WHILE loops. How to do this without loops, using set based SQL queries, in SQL Server 2016?
Following worked. This is an expansion of the idea present in tsql: How to retrieve the last date of each month between given date range
Query
CREATE TABLE #AllMonthEnds (MonthEndDate DATE)
DECLARE #Start datetime
DECLARE #End datetime
SELECT #Start = '2000-01-01'
SELECT #End = DATEADD(MONTH,1,GETDATE())
;With CTE as
(
SELECT #Start as Date,Case When DatePart(mm,#Start)<>DatePart(mm,#Start+1) then 1 else 0 end as [Last]
UNION ALL
SELECT Date+1,Case When DatePart(mm,Date+1)<>DatePart(mm,Date+2) then 1 else 0 end from CTE
WHERE Date<#End
)
INSERT INTO #AllMonthEnds
SELECT [Date]
FROM CTE
WHERE [Last]=1
OPTION ( MAXRECURSION 0 )
SELECT T.Patient_Key, T.Metric_Group, T.Observation_Date AS First_Observation_Date,
DATEDIFF(MONTh,Observation_Date, MonthEndDate) AS MonthDiff,
A.MonthEndDate AS IterationDate
FROM #AllMonthEnds A
INNER JOIN
(
SELECT *, ROW_NUMBER() OVER(PARTITION BY Patient_Key, Metric_Group ORDER BY Observation_Date) AS RowVal
FROM #MetricFirstGroupReading M
)T
ON A.MonthEndDate >= T.Observation_Date
WHERE RowVal = 1
ORDER BY Patient_Key, Metric_Group, T.Observation_Date, A.MonthEndDate
How about:
select MetricFirstGroupReading.*, datediff(month, MetricFirstGroupReading.Observation_Date, months.monthendval) monthdiff, months.*
into allmonths
from
(
SELECT 1 patientid, 'Albumin' test, '2018-11-15' Observation_Date UNION
SELECT 1 patientid, 'Potassium' test, '2018-12-10' Observation_Date UNION
SELECT 2 patientid, 'Albumin' test, '2018-10-20' Observation_Date UNION
SELECT 2 patientid, 'Potassium' test, '2018-11-25' Observation_Date) MetricFirstGroupReading
join
(
select '2018-10-31' monthendval union
select '2018-11-30' monthendval union
select '2018-12-31' monthendval
) months on MetricFirstGroupReading.Observation_Date< months.monthendval
Replace the first select union with your table, and add or remove month ends from the second inner select.
Consider building a temp table of all 12 month end dates, then join to main table by date range. Use DateDiff for month difference:
CREATE TABLE #MonthEndDates (Month_End_Value DATE)
INSERT INTO #MonthEndDates
VALUES ('2018-01-31'),
('2018-02-28'),
('2018-03-31'),
('2018-04-30'),
('2018-05-31'),
('2018-04-30'),
('2018-06-30'),
('2018-07-31'),
('2018-08-31'),
('2018-09-30'),
('2018-10-31'),
('2018-11-30'),
('2018-12-31')
SELECT m.Patient_Key, m.Metric_Group, m.Observation_Date,
DateDiff(month, m.Observation_Date, d.Month_End_Value) AS Month_Diff,
d.Month_End_Value
FROM #MetricFirstGroupReading m
INNER JOIN #MonthEndDates d
ON m.Observation_Date < d.Month_End_Value
GO
Rextester Demo

Lowest continuous date without break

I have a table and each record has a date. We can assume that a date range is contiguous if there's not a 3 month break. How can I find the start of the most recent contiguous date range?
For example, imagine if I had this data:
1990-5-1
1990-6-4
1990-10-28
1990-11-14
1990-12-19
1991-1-20
1991-4-30
1991-5-13
I'd like for it to return 1991-4-30 because it's the start of the most recent contiguous range of dates.
I think this does what you're looking for. Using my own table and column names as test data. This is on Oracle.
select * from (
select * from sm_ss_tickets t1 where exists (
select * from sm_ss_tickets t2 where t2.created_date between t1.created_date and t1.created_date+90 and t1.rowid <> t2.rowid
) order by created_date asc
) where rownum = 1;
Maybe something like the following would work:
WITH d1 AS (
SELECT date'1990-05-01' AS dt FROM dual
UNION ALL
SELECT date'1990-06-04' AS dt FROM dual
UNION ALL
SELECT date'1990-10-28' AS dt FROM dual
UNION ALL
SELECT date'1990-11-14' AS dt FROM dual
UNION ALL
SELECT date'1990-12-19' AS dt FROM dual
UNION ALL
SELECT date'1991-01-20' AS dt FROM dual
UNION ALL
SELECT date'1991-04-30' AS dt FROM dual
UNION ALL
SELECT date'1991-05-13' AS dt FROM dual
)
SELECT MAX(dt) FROM (
SELECT dt, LAG(dt) OVER ( ORDER BY dt ) AS prev_dt, LEAD(dt) OVER ( ORDER BY dt ) AS next_dt
FROM d1
) WHERE ( dt > ADD_MONTHS(prev_dt, 3) OR prev_dt IS NULL )
AND dt > ADD_MONTHS(next_dt, -3)
In the above, a date can only be the start of a contiguous sequence if there is no prior date within 3 months (either it is more than three months ago or it doesn't exist at all) and there is also a subsequent date within 3 months.
You can use LAG and LEAD. Find the query below. I think it works fine.
tmp_year is the table I have created. tdate is the column.
The records in the table are
28-JAN-15
27-JAN-15
26-JAN-15
25-JAN-15
12-JUL-14
11-JUL-14
10-JUL-14
09-JUL-14
24-DEC-13
23-DEC-13
22-DEC-13
21-DEC-13
15-SEP-13
07-JUN-13
27-FEB-13
19-NOV-12
11-AUG-12
Please find the query which returns 25th Jan 2015.
select max(d.tdate) from (
select c.tdate,c.next_date,c.date_diff,lag(date_diff) over( order by tdate) prev_diff from (
select b.tdate ,b.next_date,(next_date-tdate) date_diff from
(select a.tdate,lead(a.tdate) over(order by a.tdate) next_date from tmp_year a ) b ) c) d where d.date_diff<90 and d.prev_diff>=90;

force number of rows to return in date range from SQL query

I'm running a query on our SQL (2012) database which returns a count of records in a given date range, grouped by the date.
For example:
Date Count
12/08 12
14/08 19
19/08 11
I need to fill in the blanks as the charts I plot get screwed up because there are missing values. Is there a way to force the SQL to report back a blank row, or a "0" value when it doesn't come across a result?
My query is
SELECT TheDate, count(recordID)
FROM myTable
WHERE (TheDate between '12-AUG-2013 00:00:00' and '20-AUG-2013 23:59:59')
GROUP BY TheDate
Would I need to create a temp table with the records in, then select from that and right join any records from myTable?
Thanks for any help!
If you create a (temporary or permanent) table of the date range, you can then left join to your results to create a result set including blanks
SELECT dates.TheDate, count(recordID)
FROM
( select
convert(date,dateadd(d,number,'2013-08-12')) as theDate
from master..spt_values
where type='p' and number < 9
) dates
left join yourtable on dates.thedate = convert(date,yourtable.thedate)
GROUP BY dates.TheDate
A temp table would do the job but for such a small date range you could go even simpler and use a UNION-ed subquery. E.g:
SELECT dates.TheDate, ISNULL(counts.Records, 0)
FROM
(SELECT TheDate, count(recordID) AS Records
FROM myTable
WHERE (TheDate between '12-AUG-2013 00:00:00' and '20-AUG-2013 23:59:59')
GROUP BY TheDate
) counts
RIGHT JOIN
(SELECT CAST('12-AUG-2013' AS DATETIME) AS TheDate
UNION ALL SELECT CAST('13-AUG-2013' AS DATETIME) AS TheDate
UNION ALL SELECT CAST('14-AUG-2013' AS DATETIME) AS TheDate
UNION ALL SELECT CAST('15-AUG-2013' AS DATETIME) AS TheDate
UNION ALL SELECT CAST('16-AUG-2013' AS DATETIME) AS TheDate
UNION ALL SELECT CAST('17-AUG-2013' AS DATETIME) AS TheDate
UNION ALL SELECT CAST('18-AUG-2013' AS DATETIME) AS TheDate
UNION ALL SELECT CAST('19-AUG-2013' AS DATETIME) AS TheDate
UNION ALL SELECT CAST('20-AUG-2013' AS DATETIME) AS TheDate
) dates
ON counts.TheDate = dates.TheDate
Here's a SQL Fiddle Demo.
If you need a more generic (but also more complex) solution, take a look at this excellent answer (by #RedFilter) to a similar question.

Total Count of Active Employees by Date

I have in the past written queries that give me counts by date (hires, terminations, etc...) as follows:
SELECT per.date_start AS "Date",
COUNT(peo.EMPLOYEE_NUMBER) AS "Hires"
FROM hr.per_all_people_f peo,
hr.per_periods_of_service per
WHERE per.date_start BETWEEN peo.effective_start_date AND peo.EFFECTIVE_END_DATE
AND per.date_start BETWEEN :PerStart AND :PerEnd
AND per.person_id = peo.person_id
GROUP BY per.date_start
I was now looking to create a count of active employees by date, however I am not sure how I would date the query as I use a range to determine active as such:
SELECT COUNT(peo.EMPLOYEE_NUMBER) AS "CT"
FROM hr.per_all_people_f peo
WHERE peo.current_employee_flag = 'Y'
and TRUNC(sysdate) BETWEEN peo.effective_start_date AND peo.EFFECTIVE_END_DATE
Here is a simple way to get started. This works for all the effective and end dates in your data:
select thedate,
SUM(num) over (order by thedate) as numActives
from ((select effective_start_date as thedate, 1 as num from hr.per_periods_of_service) union all
(select effective_end_date as thedate, -1 as num from hr.per_periods_of_service)
) dates
It works by adding one person for each start and subtracting one for each end (via num) and doing a cumulative sum. This might have duplicates dates, so you might also do an aggregation to eliminate those duplicates:
select thedate, max(numActives)
from (select thedate,
SUM(num) over (order by thedate) as numActives
from ((select effective_start_date as thedate, 1 as num from hr.per_periods_of_service) union all
(select effective_end_date as thedate, -1 as num from hr.per_periods_of_service)
) dates
) t
group by thedate;
If you really want all dates, then it is best to start with a calendar table, and use a simple variation on your original query:
select c.thedate, count(*) as NumActives
from calendar c left outer join
hr.per_periods_of_service pos
on c.thedate between pos.effective_start_date and pos.effective_end_date
group by c.thedate;
If you want to count all employees who were active during the entire input date range
SELECT COUNT(peo.EMPLOYEE_NUMBER) AS "CT"
FROM hr.per_all_people_f peo
WHERE peo.[EFFECTIVE_START_DATE] <= :StartDate
AND (peo.[EFFECTIVE_END_DATE] IS NULL OR peo.[EFFECTIVE_END_DATE] >= :EndDate)
Here is my example based on Gordon Linoff answer
with a little modification, because in SUBSTRACT table all records were appeared with -1 in NUM, even if no date was in END DATE = NULL.
use AdventureWorksDW2012 --using in MS SSMS for choosing DATABASE to work with
-- and may be not work in other platforms
select
t.thedate
,max(t.numActives) AS "Total Active Employees"
from (
select
dates.thedate
,SUM(dates.num) over (order by dates.thedate) as numActives
from
(
(
select
StartDate as thedate
,1 as num
from DimEmployee
)
union all
(
select
EndDate as thedate
,-1 as num
from DimEmployee
where EndDate IS NOT NULL
)
) AS dates
) AS t
group by thedate
ORDER BY thedate
worked for me, hope it will help somebody
I was able to get the results I was looking for with the following:
--Active Team Members by Date
SELECT "a_date",
COUNT(peo.EMPLOYEE_NUMBER) AS "CT"
FROM hr.per_all_people_f peo,
(SELECT DATE '2012-04-01'-1 + LEVEL AS "a_date"
FROM dual
CONNECT BY LEVEL <= DATE '2012-04-30'+2 - DATE '2012-04-01'-1
)
WHERE peo.current_employee_flag = 'Y'
AND "a_date" BETWEEN peo.effective_start_date AND peo.EFFECTIVE_END_DATE
GROUP BY "a_date"
ORDER BY "a_date"