Highlight multiple records in a date range - sql

Working with SQL Server 2008.
fromdate todate ID name
--------------------------------
1-Aug-16 7-Aug-16 x jack
3-Aug-16 4-Aug-16 x jack
5-Aug-16 6-Aug-16 x tom
1-Aug-16 2-Aug-16 x john
3-Aug-16 4-Aug-16 x harry
5-Aug-16 6-Aug-16 x mac
Is there a way to script this so that I know if there are multiple names tagged to an ID in the same date range?
For example above, I want to flag that ID x has Name Jack and Tom tagged in the same date range.
ID multiple_flag
------------------------------------------------
x yes
y no

If there is a unique index in your table (in my example it is column i but you could also generate one by means of using ROW_NUMBER()) then you can do the following query based on an INNER JOIN to find overlapping date ranges:
CREATE TABLE #tmp (i int identity primary key,fromdate date,todate date,ID int,name varchar(32));
insert into #tmp (fromdate,todate,ID ,name) values
('1-Aug-16','7-Aug-16',3,'jack'),
('3-Aug-16','4-Aug-16',3,'tom'),
('5-Aug-16','6-Aug-16',3,'jack');
select a.*,b.name bname,b.i i2 from #tmp a
INNER join #tmp b on b.id=a.id AND b.i<>a.i
AND ( b.fromdate between a.fromdate and a.todate
OR b.todate between a.fromdate and a.todate)
(My id column is int). This will give you:
i fromdate todate ID name bname i2
- ---------- ---------- - ---- ----- --
1 2016-08-01 2016-08-07 3 jack tom 2
1 2016-08-01 2016-08-07 3 jack jack 3
Implement further filtering or grouping as required. I left a little demo here.

Please check the below sql, but it might not be the optimal one..
SELECT formdate,todate,id,tab1.name,
case when tab2.#Of >1 then 'yes' else 'no' end as multiple_flag
FROM tab1
inner join (SELECT Name, COUNT(*) as #Of
FROM tab1
GROUP BY Name) as tab2 on tab1.name=tab2.name
order by tab1.id ;
add your where condition, before the order by, if you need to add some date range on your sql.
change formdate to fromdate before run this sql, as I have used formdate in my machine.
The result looks like

One way to do it is using EXISTS CASE:
Please note this part of the query:
-- make sure the records date ranges overlap
AND t1.fromdate <= t2.todate
AND t2.fromdate <= t1.todate
for an explanation on testing for overlapping ranges, read the overlap wiki.
Create and populate sample data (Please save us this step in your future questions)
DECLARE #T as table
(
fromdate date,
todate date,
ID char(1),
name varchar(10)
)
INSERT INTO #T VALUES
('2016-08-01', '2016-08-07', 'x', 'jack'),
('2016-08-03', '2016-08-04', 'x', 'tom'),
('2016-08-05', '2016-08-06', 'x', 'jack'),
('2016-08-01', '2016-08-02', 'y', 'john'),
('2016-08-03', '2016-08-04', 'y', 'harry'),
('2016-08-05', '2016-08-06', 'y', 'mac')
The query:
SELECT DISTINCT id,
CASE WHEN EXISTS
(
SELECT 1
FROM #T t2
WHERE t1.Id = t2.Id
-- make sure it's not the same record
AND t1.fromdate <> t2.fromdate
AND t1.todate <> t2.todate
-- make sure the records date ranges overlap
AND t1.fromdate <= t2.todate
AND t2.fromdate <= t1.todate
)
THEN 'Yes'
ELSE 'No'
END As multiple_flag
FROM #T t1
Results:
id multiple_flag
---- -------------
x Yes
y No

Related

Exclude rows where dates exist in another table

I have 2 tables, one is working pattern, another is absences.
1) Work pattern
ID | Shift Start | Shift End
123| 01-03-2017 | 02-03-2017
2) Absences
ID| Absence Start | Absence End
123| 01-03-2017 | 04-03-2017
What would be the best way, when selecting rows from work pattern, to exclude any that have a date marked as an absence in the absence table?
For example, I have a report that uses the work pattern table to count how may days a week an employee has worked, however I don't want it to include the days that have been marked as an absence on the absence table if that makes sense? Also don't want it to include any days that fall between the absence start and absence end date?
If the span of the absence should always encompass the shift to be excluded you can use not exists():
select *
from WorkPatterns w
where not exists (
select 1
from Absences a
where a.Id = w.Id
and a.AbsenceStart <= w.ShiftStart
and a.AbsenceEnd >= w.ShiftEnd
)
rextester demo: http://rextester.com/DCODC76816
returns:
+-----+------------+------------+
| id | ShiftStart | ShiftEnd |
+-----+------------+------------+
| 123 | 2017-02-27 | 2017-02-28 |
| 123 | 2017-03-05 | 2017-03-06 |
+-----+------------+------------+
given this test setup:
create table WorkPatterns ([id] int, [ShiftStart] datetime, [ShiftEnd] datetime) ;
insert into WorkPatterns ([id], [ShiftStart], [ShiftEnd]) values
(123, '20170227', '20170228')
,(123, '20170301', '20170302')
,(123, '20170303', '20170304')
,(123, '20170305', '20170306')
;
create table Absences ([id] int, [AbsenceStart] datetime, [AbsenceEnd] datetime) ;
insert into Absences ([id], [AbsenceStart], [AbsenceEnd]) values
(123, '20170301', '20170304');
What would be the best way, when selecting rows from work pattern
If you dealing only whit dates (no time) and have control over db schema,
One approach will be to create calendar table ,
Where you going to put all dates since company started and some years in future
Fill that table once.
After it is easy to join other tables whit dates and do math.
If you have trouble whit constructing TSQL query please edit question whit more details about columns and values of tables, relations and needed results.
How about this:
SELECT WP_START.[id], WP_START.[shift_start], WP_START.[shift_end]
FROM work_pattern AS WP_START
INNER JOIN absences AS A ON WP_START.id = A.id
WHERE WP_START.[shift_start] NOT BETWEEN A.[absence_start] AND A.[absence_end]
UNION
SELECT WP_END.[id], WP_END.[shift_start], WP_END.[shift_end]
FROM work_pattern AS WP_END
INNER JOIN absences AS A ON WP_END.id = A.id
WHERE WP_END.[shift_end] NOT BETWEEN A.[absence_start] AND A.[absence_end]
See it on SQL Fiddle: http://sqlfiddle.com/#!6/49ae6/6
Here is my example that includes a Date Dimension table. If your DBAs won't add it, you can create #dateDim as a temp table, like I've done with SQLFiddle (didn't know I could do that). A typical date dimension would have a lot more details you need about the days, but if the table can't be added, just use what you need. You'll have to populate the other Holidays you need. The DateDim I use often is at https://github.com/shawnoden/SQL_Stuff/blob/master/sql_CreateDateDimension.sql
SQL Fiddle
MS SQL Server 2014 Schema Setup:
/* Tables for your test data. */
CREATE TABLE WorkPatterns ( id int, ShiftStart date, ShiftEnd date ) ;
INSERT INTO WorkPatterns ( id, ShiftStart, ShiftEnd )
VALUES
(123, '20170101', '20171031')
, (124, '20170601', '20170831')
;
CREATE TABLE Absences ( id int, AbsenceStart date, AbsenceEnd date ) ;
INSERT INTO Absences ( id, AbsenceStart, AbsenceEnd )
VALUES
( 123, '20170123', '20170127' )
, ( 123, '20170710', '20170831' )
, ( 124, '20170801', '20170820' )
;
/* ******** MAKE SIMPLE CALENDAR TABLE ******** */
CREATE TABLE dateDim (
theDate DATE NOT NULL
, IsWeekend BIT DEFAULT 0
, IsHoliday BIT DEFAULT 0
, IsWorkDay BIT DEFAULT 0
);
/* Populate basic details of dates. */
INSERT dateDim(theDate, IsWeekend, IsHoliday)
SELECT d
, CONVERT(BIT, CASE WHEN DATEPART(dw,d) IN (1,7) THEN 1 ELSE 0 END)
, CONVERT(BIT, CASE WHEN d = '20170704' THEN 1 ELSE 0 END) /* 4th of July. */
FROM (
SELECT d = DATEADD(DAY, rn - 1, '20170101')
FROM
(
SELECT TOP (DATEDIFF(DAY, '20170101', '20171231'))
rn = ROW_NUMBER() OVER (ORDER BY s1.[object_id])
FROM sys.all_objects AS s1
CROSS JOIN sys.all_objects AS s2
ORDER BY s1.[object_id]
) AS x
) AS y ;
/* If not a weekend or holiday, it's a WorkDay. */
UPDATE dateDim
SET IsWorkDay = CASE WHEN IsWeekend = 0 AND IsHoliday = 0 THEN 1 ELSE 0 END
;
Query For Calculation:
SELECT wp.ID, COUNT(d.theDate) AS workDayCount
FROM WorkPatterns wp
INNER JOIN dateDim d ON d.theDate BETWEEN wp.ShiftStart AND wp.ShiftEnd
AND d.IsWorkDay = 1
LEFT OUTER JOIN Absences a ON d.theDate BETWEEN a.AbsenceStart AND a.AbsenceEnd
AND wp.ID = a.ID
WHERE a.ID IS NULL
GROUP BY wp.ID
ORDER BY wp.ID
Results:
| ID | workDayCount |
|-----|--------------|
| 123 | 172 | << 216 total days, 44 non-working
| 124 | 51 | << 65 total days, 14 non-working

How to check the overlapping time intervals from one type 2 SCD dimension

I have one problem identifying and fixing some records having overlapping time intervals, for one scd type 2 dimension.
What I have is:
Bkey Uid startDate endDate
'John' 1 1990-01-01 (some time stamp) 2017-01-10 (some time stamp)
'John' 2 2016-11=03 (some time stamp) 2016-11-14 (some time stamp)
'John' 3 2016-11-14 (some time stamp) 2016-12-29 (some time stamp)
'John' 4 2016-12-29 (some time stamp) 2017-01-10 (some time stamp)
'John' 5 2017-01-10 (some time stamp) 2017-04-22 (some time stamp)
......
I want to find (first) which are all the Johns having overlapping time periods, for a table having lots and lots of Johns and then to figure out a way to correct those overlapping time periods. For the latest I know there are some function LAGG, LEAD, which can handle that, but it eludes me how to find those over lappings.
Any hints?
Regards,
[ 1 ] Following query will return overlapping time ranges:
SELECT *,
(
SELECT *
FROM #Dimension1 y
WHERE x.Bkey = y.Bkey
AND x.Uid <> y.Uid
AND NOT(x.startDate > y.endDate OR x.endDate < y.startDate)
FOR XML RAW, ROOT, TYPE
) OverlappingTimeRanges
FROM #Dimension1 x
Full script:
DECLARE #Dimension1 TABLE (
Bkey VARCHAR(50) NOT NULL,
Uid INT NOT NULL,
startDate DATE NOT NULL,
endDate DATE NOT NULL,
CHECK(startDate < endDate)
);
INSERT #Dimension1
SELECT 'John', 1, '1990-01-01', '2017-01-10' UNION ALL
SELECT 'John', 2, '2016-11-03', '2016-11-14' UNION ALL
SELECT 'John', 3, '2016-11-14', '2016-12-29' UNION ALL
SELECT 'John', 4, '2016-12-29', '2017-01-10' UNION ALL
SELECT 'John', 5, '2017-01-11', '2017-04-22';
SELECT *,
(
SELECT *
FROM #Dimension1 y
WHERE x.Bkey = y.Bkey
AND x.Uid <> y.Uid
AND NOT(x.startDate > y.endDate OR x.endDate < y.startDate)
FOR XML RAW, ROOT, TYPE
) OverlappingTimeRanges
FROM #Dimension1 x
Demo here
[ 2 ] In order to find distinct groups of time ranges with overlapping original rows I would use following approach:
-- Edit 1
DECLARE #Groups TABLE (
Bkey VARCHAR(50) NOT NULL,
Uid INT NOT NULL,
startDateNew DATE NOT NULL,
endDateNew DATE NOT NULL,
CHECK(startDateNew < endDateNew)
);
INSERT #Groups
SELECT x.Bkey, x.Uid, z.startDateNew, z.endDateNew
FROM #Dimension1 x
OUTER APPLY (
SELECT MIN(y.startDate) AS startDateNew, MAX(y.endDate) AS endDateNew
FROM #Dimension1 y
WHERE x.Bkey = y.Bkey
AND NOT(x.startDate > y.endDate OR x.endDate < y.startDate)
) z
-- End of Edit 1
-- This returns distinct groups identified by DistinctGroupId together with all overlapping Uid(s) from current group
SELECT *
FROM (
SELECT ROW_NUMBER() OVER(ORDER BY b.Bkey, b.startDateNew, b.endDateNew) AS DistinctGroupId, b.*
FROM (
SELECT DISTINCT a.Bkey, a.startDateNew, a.endDateNew
FROM #Groups a
) b
) c
OUTER APPLY (
SELECT d.Uid AS Overlapping_Uid
FROM #Groups d
WHERE c.Bkey = d.Bkey
AND c.startDateNew = d.startDateNew
AND c.endDateNew = d.endDateNew
) e
-- This returns distinct groups identified by DistinctGroupId together with an XML (XmlCol) which includes overlapping Uid(s)
SELECT *
FROM (
SELECT ROW_NUMBER() OVER(ORDER BY b.Bkey, b.startDateNew, b.endDateNew) AS DistinctGroupId, b.*
FROM (
SELECT DISTINCT a.Bkey, a.startDateNew, a.endDateNew
FROM #Groups a
) b
) c
OUTER APPLY (
SELECT (
SELECT d.Uid AS Overlapping_Uid
FROM #Groups d
WHERE c.Bkey = d.Bkey
AND c.startDateNew = d.startDateNew
AND c.endDateNew = d.endDateNew
FOR XML RAW, TYPE
) AS XmlCol
) e
Note: Last range used in my example is 'John', 5, '2017-01-11', '2017-04-22'; and not 'John', 5, '2017-01-10', '2017-04-22';. Also, data type used is DATE and not DATETIME[2][OFFSET].
I think the tricky part of your query is being able to articulate the logic for overlapping ranges. We can self join on the condition that a row on the left overlaps with any row on the right. All matching rows are those which overlap.
We can think of four possible overlap scenarios:
|---------| |---------| no overlap
|---------|
|---------| 1st end and 2nd start overlap
|---------|
|---------| 1st start and 2nd end overlap
|---------|
|---| 2nd completely contained inside 1st
(could be 1st inside 2nd also)
SELECT DISTINCT
t.Uid
FROM yourTable t1
INNER JOIN yourTable t2
ON t1.startDate <= t2.endDate AND
t2.startDate <= t1.endDate
WHERE
t1.Bkey = 'John' AND t2.Bkey = 'John'
This will at least let you identify overlapping records. Updating and separating them in a meaningful way will probably end up being an ugly gaps and islands problem, perhaps meriting another question.
we can acheive this by doing a self join of emp table.
a.emp_id != b.emp_id ensures same row is not joined with itself.
remaining comparison clause checks if any row's start date or end date falls in other row's date range.
create table emp(name varchar(20), emp_id numeric(10), start_date date, end_date date);
insert into emp values('John', 1, '1990-01-01', '2017-01-10');
insert into emp values( 'John', 2, '2016-11-03', '2016-11-14');
insert into emp values( 'John', 3, '2016-11-14', '2016-12-29');
insert into emp values( 'John', 4, '2016-12-29', '2017-01-10');
insert into emp values( 'John', 5, '2017-01-11', '2017-04-22');
commit;
with A as (select * from EMP),
B as (select * from EMP)
select A.* from A,B where A.EMP_ID != B.EMP_ID
and A.START_DATE < B.END_DATE and B.START_DATE < A.END_DATE
and (A.START_DATE between B.START_DATE and B.END_DATE
or A.END_DATE between B.START_DATE and B.END_DATE);

How to I get a correct average number of appointments per day?

I want to see what the average number of appointments is by each appointment type is. Basically I have the following tables and columns:
Table 1 - Dates
-----------
Date date (primary key)
Table 2 - Appointments
-----------
AppointmentStart Datetime
ApptId Numeric
FacilityId Numeric
ApptKind Numeric
Appointmentid Numeric
Table 3 AppointmentType
-----------
ApptTypeId Numeric
Name Varchar
Sample Data
============
Table 1 Date
---------------
date
1/1/2017
1/2/2017
...
Table 2 Appointment
----------------
ApptStart | ApptTypeId | FacilityId | ApptKind | ApptId
2017-1-1 9:00:00 1 2 1 2385525
2017-1-1 9:15:00 3 2 1 2385526
2017-1-1 9:30:00 2 2 1 2385527
...
Table 3 ApptType
-----------------
ApptTypeId | Name
1 Walk-in
2 MAT
3 Acute
...
There are about 30 different appointment types and not all of them occur every day. So far I have created a table that lists every date in the time range that I want then I do a left join with the count of appointments (nulls equal 0). I also remove Saturdays and Sundays. This works really well for one appointment type but when I do this with multiple appointment types zeroes only show up for the days where there are no appointments.
My solution:
Somehow insert each appointment type next to each day then do the left join with the NULL = 0 part although I don't know how to get the list to repeat for each day in the table.
Example:
At the end I want
EndResult
----------
Average(Count(appts)) | ApptType.Name
OR
EndResult
---------
Count(apptid) | ApptType.Name | Date
5 Acute 1/1/2017
0 MAT 1/1/2017
4 Walk-in 1/1/2017
0 Other 1/1/2017
Then repeat for the next day with the same appointment type names
This is how I would write a query that gets you to
End Result #2:
SELECT IsNull(B.ApptCount, 0) AS ApptCount, C.Name AS ApptTypeName, A.Date
FROM (
SELECT Table1.Date, Table3.ApptTypeID
FROM Table1, Table3
) AS A LEFT JOIN (
SELECT Convert(Date, ApptStart) AS ApptDate, ApptTypeID, COUNT(ApptID) AS ApptCount
FROM Table2
GROUP BY Date(ApptStart), ApptTypeID
) AS B ON A.Date = B.ApptDate AND A.ApptTypeID = B.ApptTypeID
LEFT JOIN Table3 AS C ON B.ApptTypeID = C.ApptTypeID
This assumes that ApptTypeID is indeed part of Table2. You can wrap this result up further to get your End Result #1:
SELECT Avg(D.ApptCount), D.ApptTypeName
FROM (
SELECT IsNull(B.ApptCount, 0) AS ApptCount, C.Name AS ApptTypeName, A.Date
FROM (
SELECT Table1.Date, Table3.ApptID
FROM Table1, Table3
) AS A LEFT JOIN (
SELECT Convert(Date, ApptStart) AS ApptDate, ApptTypeID, COUNT(ApptID) AS ApptCount
FROM Table2
GROUP BY Date(ApptStart), ApptTypeID
) AS B ON A.Date = B.ApptDate AND A.ApptTypeID = B.ApptTypeID
LEFT JOIN Table3 AS C ON B.ApptTypeID = C.ApptTypeID
) AS D
GROUP BY D.ApptTypeName
First we declare and populate table variables for example data.
DECLARE #Dates TABLE (
Date DATE
)
INSERT #Dates
VALUES
('2017-01-01')
,('2017-01-02')
DECLARE #Appointments TABLE (
AppointmentStart DATETIME
,ApptId INT
,FacilityId INT
,ApptKind INT
,Appointmentid INT
)
INSERT #Appointments
VALUES
('2017-01-01 09:00:00.000', 1, 2, 1, 2385525)
,('2017-01-01 09:15:00.000', 3, 2, 1, 2385526)
,('2017-01-01 09:30:00.000', 2, 2, 1, 2385527)
DECLARE #ApptType TABLE (
ApptTypeId INT
,Name VARCHAR(32)
)
INSERT #ApptType
VALUES
(1, 'Walk-in')
,(2, 'MAT')
,(3, 'Acute')
This shows us the cartesian product of a full outer join of Dates and ApptType.
SELECT
[Dates].[Date]
,[ApptType].[ApptTypeID]
,[ApptType].[Name]
FROM #Dates AS [Dates]
FULL OUTER JOIN #ApptType AS [ApptType]
ON 1 = 1
We can use the cartesian product as our left data set, and count the number of items in our right data set (#Appointments). By doing this with a left join, we ensure that every date/appointment type combination is included, even if there were no appointments of that type on that date.
SELECT
A.[Date]
,A.[Name]
,COUNT(B.Appointmentid)
FROM (
SELECT
[Dates].[Date]
,[ApptType].[ApptTypeID]
,[ApptType].[Name]
FROM #Dates AS [Dates]
FULL OUTER JOIN #ApptType AS [ApptType]
ON 1 = 1) AS A
LEFT JOIN #Appointments AS B
ON A.[ApptTypeId] = B.[ApptId]
AND A.[Date] = CAST(B.[AppointmentStart] AS DATE)
GROUP BY
A.[Date]
,A.[Name]
ORDER BY
A.[Date]
,A.[Name]

How to compare date fields between two tables and get the less or equal date from the second table

I have two tables. Table A and table B. Both of them have date fields. I need compare those fields and get a table C with the less or equal date between Table A and table B, taking into account that the table A is the main.
CONTEXT: I have in Table A Expiration of products, and in table B on business days. The user can update table B when it is determined
that a date is not to be considered as a "business day". Then delete
the date from table B and then go to table A to update all product
expirations that were registered with that date and assign them the
previous business day. So in my case I am creating table C, which
contains the Id of table A and the working date less or equal to the
date mentioned. Then I will make the respective update.
IF OBJECT_ID('tempdb..#tmpA') IS NOT NULL DROP TABLE #tmpA
IF OBJECT_ID('tempdb..#tmpB') IS NOT NULL DROP TABLE #tmpB
CREATE TABLE #tmpA(Id INT IDENTITY(100,1),Fecha date)
INSERT INTO #tmpA(Fecha)
VALUES
('20170101'),('20171003'),('20170504'),('2017-09-01')
SELECT * FROM #tmpA
Id Fecha
----------- ----------
100 2017-01-01
101 2017-10-03
102 2017-05-04
103 2017-09-01
CREATE TABLE #tmpB(Id INT IDENTITY(1,4),Fecha date)
INSERT INTO #tmpB(Fecha)
VALUES
('20170101'),('20171001'),('20170504')
SELECT * FROM #tmpB
Id Fecha
----------- ----------
1 2017-01-01
5 2017-10-01
9 2017-05-04
I want to get this result (The same number of records in table A):
Id Fecha
----------- ----------
100 2017-01-01
101 2017-10-01 --> **this row is less than 2017-10-03**
102 2017-05-04
103 2017-05-04 --> **this row is less than 2017-09-01**
I tried to built some queries without results,
IF OBJECT_ID('tempdb..#tmpC') IS NOT NULL DROP TABLE #tmpC
SELECT A.* INTO #tmpC FROM #tmpA A LEFT JOIN #tmpB B ON A.Fecha = B.Fecha WHERE B.Fecha IS NULL
SELECT * FROM #tmpC
SELECT *
FROM #tmpA A INNER JOIN
(
SELECT *
FROM #tmpC
GROUP BY id, Fecha
) AS Q ON MAX(Q.Fecha) <= A.Fecha
UPDATE:
NOTE. The Id column is simply an identity, but it does not mean that it should be related. The important thing is the dates.
Regards
While I'm not sure if this will scale well (if you have more than 100k rows) this will bring back the results which you want.
Theoretically, the correct way for you to do this, in a fashion which will scale well, would be to have a view where you utilize RANK() and join both of these tables together, though this was the quick and easy way. Please try this and let me know if it meets your requirements.
For your edification, I have left both of the dates in there for you to be able to compare them.
SELECT
A.ID
,A.FECHA OLDDATE
,B.FECHA CORRECTDATE
FROM #TMPA A
LEFT OUTER JOIN #TMPB B ON 1=1
WHERE 1=1
AND B.FECHA = (
SELECT MAX(FECHA)
FROM #TMPB
WHERE FECHA <= A.FECHA)
Is this what you want?
select a.id,
(case when b.fecha < a.fecha then b.fecha else a.fecha end) as fecha
from #tmpA a left join
#tmpB b
on a.id = b.id;
You can get minmum by union all
select id, min(fecha) from (
select * from #tmpA
union all
select * from #tmpB
) a
group by a.id
#JotaPardo WHERE 1=1 is used to basically make sure the query runs if the WHERE conditions don't hold up. 1=1 will equate to true so saying WHERE 1=1 or WHERE TRUE, and TRUE is always TRUE, ensures the query will have at least one WHERE clause condition that will always hold up.

SQL to Return missing Row

I have one Scenario where I need to find missing records in Table using SQL - without using Cursor, Views, SP.
For a particular CustID initial Start_Date will be 19000101 and End_date will be any random date.
Then for next Record for the same CustID will have its Start_Date as End_Date (of previous Record) + 1.
Its End_Date again will be any random date.
And so on….
For Last record of same CustID its end Date will be 99991231.
Following population of data will explain it better.
CustID Start_Date End_Date
1 19000101 20121231
1 20130101 20130831
1 20130901 20140321
1 20140321 99991231
Basically I am trying to populate data like in SCD2 scenario.
Now I want to find missing record (or CustID).
Like below we don’t have record with CustID = 4 with Start_Date = 20120606 and End_Date = 20140101
CustID Start_Date End_Date
4 19000101 20120605
4 20140102 99991231
Code for Creating Table
CREATE TABLE TestTable
(
CustID int,
Start_Date int,
End_Date int
)
INSERT INTO TestTable values (1,19000101,20121231)
INSERT INTO TestTable values (1,20130101,20130831)
INSERT INTO TestTable values (1,20130901,20140321)
INSERT INTO TestTable values (1,20140321,99991231)
INSERT INTO TestTable values (2,19000101,99991213)
INSERT INTO TestTable values (3,19000101,20140202)
INSERT INTO TestTable values (3,20140203,99991231)
INSERT INTO TestTable values (4,19000101,20120605)
--INSERT INTO TestTable values (4,20120606,20140101) --Missing Value
INSERT INTO TestTable values (4,20140102,99991231)
Now SQL should return CustID = 4 as its has missing Value.
My idea is based on this logic. Lets assume 19000101 as 1 and 99991231 as 10. Now for all IDs, if you subtract the End_date - start_date and add them up, the total sum must be equal to 9 (10 - 1). You can do the same here
SELECT ID, SUM(END_DATE - START_DATE) as total from TABLE group by ID where total < (MAX_END_DATE - MIN_START_DATE)
You might want to find the command in your SQL that gives the number of days between 2 days and use that in the SUM part.
Lets take the following example
1 1900 2003
1 2003 9999
2 1900 2222
2 2222 9977
3 1900 9999
The query will be executed as follows
1 (2003 - 1900) + (9999 - 2003) = 1 8098
2 (2222 - 1900) + (9977 - 2222) = 2 9077
3 (9999 - 1900) = 3 8098
The where clause will eliminate 1 and 3 giving you only 2, which is what you want.
If you just need the CustID then this will do
SELECT t1.CustID
FROM TestTable t1
LEFT JOIN TestTable t2
ON DATEADD(D, 1, t1.Start_Date) = t2.Start_Date
WHERE t2.CustID IS NULL
GROUP BY t1.CustID
You need rows if the one of the following conditions is met:
Not a final row (99991231) and no matching next row
Not a start row (19000101) and no matching previous row
You can left join to the same table to find previous and next rows and filter the results where you don't find a row by checking the column values for null:
SELECT t1.CustID, t1.StartDate, t1.EndDate
FROM TestTable t1
LEFT JOIN TestTable tPrevious on tPrevious.CustID = t1.CustID
and tPrevious.EndDate = t1.StartDate - 1
LEFT JOIN TestTable tNext on tNext.CustID = t1.CustID
and tNext.StartDate = t1.EndDate + 1
WHERE (t1.EndDate <> 99991231 and tNext.CustID is null) -- no following
or (t1.StartDate <> 19000101 and tPrevious.CustID is null) -- no previous