Exclude rows where dates exist in another table - sql

I have 2 tables, one is working pattern, another is absences.
1) Work pattern
ID | Shift Start | Shift End
123| 01-03-2017 | 02-03-2017
2) Absences
ID| Absence Start | Absence End
123| 01-03-2017 | 04-03-2017
What would be the best way, when selecting rows from work pattern, to exclude any that have a date marked as an absence in the absence table?
For example, I have a report that uses the work pattern table to count how may days a week an employee has worked, however I don't want it to include the days that have been marked as an absence on the absence table if that makes sense? Also don't want it to include any days that fall between the absence start and absence end date?

If the span of the absence should always encompass the shift to be excluded you can use not exists():
select *
from WorkPatterns w
where not exists (
select 1
from Absences a
where a.Id = w.Id
and a.AbsenceStart <= w.ShiftStart
and a.AbsenceEnd >= w.ShiftEnd
)
rextester demo: http://rextester.com/DCODC76816
returns:
+-----+------------+------------+
| id | ShiftStart | ShiftEnd |
+-----+------------+------------+
| 123 | 2017-02-27 | 2017-02-28 |
| 123 | 2017-03-05 | 2017-03-06 |
+-----+------------+------------+
given this test setup:
create table WorkPatterns ([id] int, [ShiftStart] datetime, [ShiftEnd] datetime) ;
insert into WorkPatterns ([id], [ShiftStart], [ShiftEnd]) values
(123, '20170227', '20170228')
,(123, '20170301', '20170302')
,(123, '20170303', '20170304')
,(123, '20170305', '20170306')
;
create table Absences ([id] int, [AbsenceStart] datetime, [AbsenceEnd] datetime) ;
insert into Absences ([id], [AbsenceStart], [AbsenceEnd]) values
(123, '20170301', '20170304');

What would be the best way, when selecting rows from work pattern
If you dealing only whit dates (no time) and have control over db schema,
One approach will be to create calendar table ,
Where you going to put all dates since company started and some years in future
Fill that table once.
After it is easy to join other tables whit dates and do math.
If you have trouble whit constructing TSQL query please edit question whit more details about columns and values of tables, relations and needed results.

How about this:
SELECT WP_START.[id], WP_START.[shift_start], WP_START.[shift_end]
FROM work_pattern AS WP_START
INNER JOIN absences AS A ON WP_START.id = A.id
WHERE WP_START.[shift_start] NOT BETWEEN A.[absence_start] AND A.[absence_end]
UNION
SELECT WP_END.[id], WP_END.[shift_start], WP_END.[shift_end]
FROM work_pattern AS WP_END
INNER JOIN absences AS A ON WP_END.id = A.id
WHERE WP_END.[shift_end] NOT BETWEEN A.[absence_start] AND A.[absence_end]
See it on SQL Fiddle: http://sqlfiddle.com/#!6/49ae6/6

Here is my example that includes a Date Dimension table. If your DBAs won't add it, you can create #dateDim as a temp table, like I've done with SQLFiddle (didn't know I could do that). A typical date dimension would have a lot more details you need about the days, but if the table can't be added, just use what you need. You'll have to populate the other Holidays you need. The DateDim I use often is at https://github.com/shawnoden/SQL_Stuff/blob/master/sql_CreateDateDimension.sql
SQL Fiddle
MS SQL Server 2014 Schema Setup:
/* Tables for your test data. */
CREATE TABLE WorkPatterns ( id int, ShiftStart date, ShiftEnd date ) ;
INSERT INTO WorkPatterns ( id, ShiftStart, ShiftEnd )
VALUES
(123, '20170101', '20171031')
, (124, '20170601', '20170831')
;
CREATE TABLE Absences ( id int, AbsenceStart date, AbsenceEnd date ) ;
INSERT INTO Absences ( id, AbsenceStart, AbsenceEnd )
VALUES
( 123, '20170123', '20170127' )
, ( 123, '20170710', '20170831' )
, ( 124, '20170801', '20170820' )
;
/* ******** MAKE SIMPLE CALENDAR TABLE ******** */
CREATE TABLE dateDim (
theDate DATE NOT NULL
, IsWeekend BIT DEFAULT 0
, IsHoliday BIT DEFAULT 0
, IsWorkDay BIT DEFAULT 0
);
/* Populate basic details of dates. */
INSERT dateDim(theDate, IsWeekend, IsHoliday)
SELECT d
, CONVERT(BIT, CASE WHEN DATEPART(dw,d) IN (1,7) THEN 1 ELSE 0 END)
, CONVERT(BIT, CASE WHEN d = '20170704' THEN 1 ELSE 0 END) /* 4th of July. */
FROM (
SELECT d = DATEADD(DAY, rn - 1, '20170101')
FROM
(
SELECT TOP (DATEDIFF(DAY, '20170101', '20171231'))
rn = ROW_NUMBER() OVER (ORDER BY s1.[object_id])
FROM sys.all_objects AS s1
CROSS JOIN sys.all_objects AS s2
ORDER BY s1.[object_id]
) AS x
) AS y ;
/* If not a weekend or holiday, it's a WorkDay. */
UPDATE dateDim
SET IsWorkDay = CASE WHEN IsWeekend = 0 AND IsHoliday = 0 THEN 1 ELSE 0 END
;
Query For Calculation:
SELECT wp.ID, COUNT(d.theDate) AS workDayCount
FROM WorkPatterns wp
INNER JOIN dateDim d ON d.theDate BETWEEN wp.ShiftStart AND wp.ShiftEnd
AND d.IsWorkDay = 1
LEFT OUTER JOIN Absences a ON d.theDate BETWEEN a.AbsenceStart AND a.AbsenceEnd
AND wp.ID = a.ID
WHERE a.ID IS NULL
GROUP BY wp.ID
ORDER BY wp.ID
Results:
| ID | workDayCount |
|-----|--------------|
| 123 | 172 | << 216 total days, 44 non-working
| 124 | 51 | << 65 total days, 14 non-working

Related

How to select rows with max date older then some value

I have Microsoft SQL Server 2008 and a table with data like this:
id | file_date [datatime] | file_path [varchar(255)]
____________________________________________________
1 | 01-01-1999 | C:\f1.txt
2 | 01-01-2020 | C:\f2.txt
3 | 05-05-1999 | C:\f3.txt
4 | 05-05-2020 | C:\f3.txt
5 | 05-05-1999 | C:\f4.txt
6 | 06-05-1999 | C:\f4.txt
I need to select all file_paths, where file_date is old and no other rows with this file_path with newer file_date exists
For example, if I have to fetch rows with dates older then 2019, my result should be like this:
file_path
C:\f1.txt
C:\f4.txt
I have a solution:
SELECT rslt.file_path
FROM mytable rslt
GROUP BY rslt.file_path
HAVING MAX(rslt.file_date) < '2019-01-01'
The problem is that this script takes ~2 minutes to returns ~62k of rows in a table, where I have 44.6 millions of rows, and simple script to take all rows older than the date (see below) takes 2-3 seconds
SELECT * FROM mytable WHERE file_date < '2019-01-01'
So, is there any way to optimize my solution?
How long does this take?
SELECT t.file_path
FROM mytable t
WHERE NOT EXISTS (SELECT 1
FROM mytable t2
WHERE t2.file_path = t.file_path AND t2.file_date >= '2019-01-01'
);
You want an index on (file_path, file_date) for best performance.
Could you do a negation of your second faster query and do a NOT IN?
SELECT rslt.file_path
FROM mytable rslt
WHERE rslt.file_path NOT IN
(SELECT rslt2.file_path
FROM mytable rslt2
WHERE rslt2.file_path IS NOT NULL
AND rslt2.file_date >= '2019-01-01')
GROUP BY rslt.file_path;
NOT IN appears to get a bit funky if the selection pulls back nulls, so I put a IS NOT NULL in the where of the inner query as well, but it may not be necessary for you.
DECLARE #TargetDate date = '01-01-2019'
DECLARE #PathList TABLE (id int, file_date datetime, file_path varchar(255))
INSERT INTO #PathList VALUES
(1, '01-01-1999', 'C:\f1.txt')
, (2, '01-01-2020', 'C:\f2.txt')
, (3, '05-05-1999', 'C:\f3.txt')
, (4, '05-05-2020', 'C:\f3.txt')
, (5, '05-05-1999', 'C:\f4.txt')
, (6, '06-05-1999', 'C:\f4.txt')
;
SELECT DISTINCT
PL.file_path
FROM #PathList PL
LEFT JOIN #PathList PH ON PH.file_path = PL.file_path
AND PH.file_date >= #TargetDate
WHERE
PL.file_date < #TargetDate
AND PH.id IS NULL
Check this
SELECT rslt.file_path, MAX(rslt.file_date) as Max_file_date
into #t
FROM mytable rslt
GROUP BY rslt.file_path
Select file_path
From #t
Where Max_file_date < '2019-01-01'
or try
SELECT rslt.file_path
into #t
FROM mytable rslt
WHERE file_date < '2019-01-01'
GROUP BY rslt.file_path

How to I get a correct average number of appointments per day?

I want to see what the average number of appointments is by each appointment type is. Basically I have the following tables and columns:
Table 1 - Dates
-----------
Date date (primary key)
Table 2 - Appointments
-----------
AppointmentStart Datetime
ApptId Numeric
FacilityId Numeric
ApptKind Numeric
Appointmentid Numeric
Table 3 AppointmentType
-----------
ApptTypeId Numeric
Name Varchar
Sample Data
============
Table 1 Date
---------------
date
1/1/2017
1/2/2017
...
Table 2 Appointment
----------------
ApptStart | ApptTypeId | FacilityId | ApptKind | ApptId
2017-1-1 9:00:00 1 2 1 2385525
2017-1-1 9:15:00 3 2 1 2385526
2017-1-1 9:30:00 2 2 1 2385527
...
Table 3 ApptType
-----------------
ApptTypeId | Name
1 Walk-in
2 MAT
3 Acute
...
There are about 30 different appointment types and not all of them occur every day. So far I have created a table that lists every date in the time range that I want then I do a left join with the count of appointments (nulls equal 0). I also remove Saturdays and Sundays. This works really well for one appointment type but when I do this with multiple appointment types zeroes only show up for the days where there are no appointments.
My solution:
Somehow insert each appointment type next to each day then do the left join with the NULL = 0 part although I don't know how to get the list to repeat for each day in the table.
Example:
At the end I want
EndResult
----------
Average(Count(appts)) | ApptType.Name
OR
EndResult
---------
Count(apptid) | ApptType.Name | Date
5 Acute 1/1/2017
0 MAT 1/1/2017
4 Walk-in 1/1/2017
0 Other 1/1/2017
Then repeat for the next day with the same appointment type names
This is how I would write a query that gets you to
End Result #2:
SELECT IsNull(B.ApptCount, 0) AS ApptCount, C.Name AS ApptTypeName, A.Date
FROM (
SELECT Table1.Date, Table3.ApptTypeID
FROM Table1, Table3
) AS A LEFT JOIN (
SELECT Convert(Date, ApptStart) AS ApptDate, ApptTypeID, COUNT(ApptID) AS ApptCount
FROM Table2
GROUP BY Date(ApptStart), ApptTypeID
) AS B ON A.Date = B.ApptDate AND A.ApptTypeID = B.ApptTypeID
LEFT JOIN Table3 AS C ON B.ApptTypeID = C.ApptTypeID
This assumes that ApptTypeID is indeed part of Table2. You can wrap this result up further to get your End Result #1:
SELECT Avg(D.ApptCount), D.ApptTypeName
FROM (
SELECT IsNull(B.ApptCount, 0) AS ApptCount, C.Name AS ApptTypeName, A.Date
FROM (
SELECT Table1.Date, Table3.ApptID
FROM Table1, Table3
) AS A LEFT JOIN (
SELECT Convert(Date, ApptStart) AS ApptDate, ApptTypeID, COUNT(ApptID) AS ApptCount
FROM Table2
GROUP BY Date(ApptStart), ApptTypeID
) AS B ON A.Date = B.ApptDate AND A.ApptTypeID = B.ApptTypeID
LEFT JOIN Table3 AS C ON B.ApptTypeID = C.ApptTypeID
) AS D
GROUP BY D.ApptTypeName
First we declare and populate table variables for example data.
DECLARE #Dates TABLE (
Date DATE
)
INSERT #Dates
VALUES
('2017-01-01')
,('2017-01-02')
DECLARE #Appointments TABLE (
AppointmentStart DATETIME
,ApptId INT
,FacilityId INT
,ApptKind INT
,Appointmentid INT
)
INSERT #Appointments
VALUES
('2017-01-01 09:00:00.000', 1, 2, 1, 2385525)
,('2017-01-01 09:15:00.000', 3, 2, 1, 2385526)
,('2017-01-01 09:30:00.000', 2, 2, 1, 2385527)
DECLARE #ApptType TABLE (
ApptTypeId INT
,Name VARCHAR(32)
)
INSERT #ApptType
VALUES
(1, 'Walk-in')
,(2, 'MAT')
,(3, 'Acute')
This shows us the cartesian product of a full outer join of Dates and ApptType.
SELECT
[Dates].[Date]
,[ApptType].[ApptTypeID]
,[ApptType].[Name]
FROM #Dates AS [Dates]
FULL OUTER JOIN #ApptType AS [ApptType]
ON 1 = 1
We can use the cartesian product as our left data set, and count the number of items in our right data set (#Appointments). By doing this with a left join, we ensure that every date/appointment type combination is included, even if there were no appointments of that type on that date.
SELECT
A.[Date]
,A.[Name]
,COUNT(B.Appointmentid)
FROM (
SELECT
[Dates].[Date]
,[ApptType].[ApptTypeID]
,[ApptType].[Name]
FROM #Dates AS [Dates]
FULL OUTER JOIN #ApptType AS [ApptType]
ON 1 = 1) AS A
LEFT JOIN #Appointments AS B
ON A.[ApptTypeId] = B.[ApptId]
AND A.[Date] = CAST(B.[AppointmentStart] AS DATE)
GROUP BY
A.[Date]
,A.[Name]
ORDER BY
A.[Date]
,A.[Name]

Highlight multiple records in a date range

Working with SQL Server 2008.
fromdate todate ID name
--------------------------------
1-Aug-16 7-Aug-16 x jack
3-Aug-16 4-Aug-16 x jack
5-Aug-16 6-Aug-16 x tom
1-Aug-16 2-Aug-16 x john
3-Aug-16 4-Aug-16 x harry
5-Aug-16 6-Aug-16 x mac
Is there a way to script this so that I know if there are multiple names tagged to an ID in the same date range?
For example above, I want to flag that ID x has Name Jack and Tom tagged in the same date range.
ID multiple_flag
------------------------------------------------
x yes
y no
If there is a unique index in your table (in my example it is column i but you could also generate one by means of using ROW_NUMBER()) then you can do the following query based on an INNER JOIN to find overlapping date ranges:
CREATE TABLE #tmp (i int identity primary key,fromdate date,todate date,ID int,name varchar(32));
insert into #tmp (fromdate,todate,ID ,name) values
('1-Aug-16','7-Aug-16',3,'jack'),
('3-Aug-16','4-Aug-16',3,'tom'),
('5-Aug-16','6-Aug-16',3,'jack');
select a.*,b.name bname,b.i i2 from #tmp a
INNER join #tmp b on b.id=a.id AND b.i<>a.i
AND ( b.fromdate between a.fromdate and a.todate
OR b.todate between a.fromdate and a.todate)
(My id column is int). This will give you:
i fromdate todate ID name bname i2
- ---------- ---------- - ---- ----- --
1 2016-08-01 2016-08-07 3 jack tom 2
1 2016-08-01 2016-08-07 3 jack jack 3
Implement further filtering or grouping as required. I left a little demo here.
Please check the below sql, but it might not be the optimal one..
SELECT formdate,todate,id,tab1.name,
case when tab2.#Of >1 then 'yes' else 'no' end as multiple_flag
FROM tab1
inner join (SELECT Name, COUNT(*) as #Of
FROM tab1
GROUP BY Name) as tab2 on tab1.name=tab2.name
order by tab1.id ;
add your where condition, before the order by, if you need to add some date range on your sql.
change formdate to fromdate before run this sql, as I have used formdate in my machine.
The result looks like
One way to do it is using EXISTS CASE:
Please note this part of the query:
-- make sure the records date ranges overlap
AND t1.fromdate <= t2.todate
AND t2.fromdate <= t1.todate
for an explanation on testing for overlapping ranges, read the overlap wiki.
Create and populate sample data (Please save us this step in your future questions)
DECLARE #T as table
(
fromdate date,
todate date,
ID char(1),
name varchar(10)
)
INSERT INTO #T VALUES
('2016-08-01', '2016-08-07', 'x', 'jack'),
('2016-08-03', '2016-08-04', 'x', 'tom'),
('2016-08-05', '2016-08-06', 'x', 'jack'),
('2016-08-01', '2016-08-02', 'y', 'john'),
('2016-08-03', '2016-08-04', 'y', 'harry'),
('2016-08-05', '2016-08-06', 'y', 'mac')
The query:
SELECT DISTINCT id,
CASE WHEN EXISTS
(
SELECT 1
FROM #T t2
WHERE t1.Id = t2.Id
-- make sure it's not the same record
AND t1.fromdate <> t2.fromdate
AND t1.todate <> t2.todate
-- make sure the records date ranges overlap
AND t1.fromdate <= t2.todate
AND t2.fromdate <= t1.todate
)
THEN 'Yes'
ELSE 'No'
END As multiple_flag
FROM #T t1
Results:
id multiple_flag
---- -------------
x Yes
y No

Grouping rows with a date range

I am using SQL Server 2008 and need to create a query that shows rows that fall within a date range.
My table is as follows:
ADM_ID WH_PID WH_IN_DATETIME WH_OUT_DATETIME
My rules are:
If the WH_OUT_DATETIME is on or within 24 hours of the WH_IN_DATETIME of another ADM_ID with the same WH_P_ID
I would like another column added to the results which identify the grouped value if possible as EP_ID.
e.g.
ADM_ID WH_PID WH_IN_DATETIME WH_OUT_DATETIME
------ ------ -------------- ---------------
1 9 2014-10-12 00:00:00 2014-10-13 15:00:00
2 9 2014-10-14 14:00:00 2014-10-15 15:00:00
3 9 2014-10-16 14:00:00 2014-10-17 15:00:00
4 9 2014-11-20 00:00:00 2014-11-21 00:00:00
5 5 2014-10-17 00:00:00 2014-10-18 00:00:00
Would return rows with:
ADM_ID WH_PID EP_ID EP_IN_DATETIME EP_OUT_DATETIME WH_IN_DATETIME WH_OUT_DATETIME
------ ------ ----- ------------------- ------------------- ------------------- -------------------
1 9 1 2014-10-12 00:00:00 2014-10-17 15:00:00 2014-10-12 00:00:00 2014-10-13 15:00:00
2 9 1 2014-10-12 00:00:00 2014-10-17 15:00:00 2014-10-14 14:00:00 2014-10-15 15:00:00
3 9 1 2014-10-12 00:00:00 2014-10-17 15:00:00 2014-10-16 14:00:00 2014-10-17 15:00:00
4 9 2 2014-11-20 00:00:00 2014-11-20 00:00:00 2014-10-16 14:00:00 2014-11-21 00:00:00
5 5 1 2014-10-17 00:00:00 2014-10-18 00:00:00 2014-10-17 00:00:00 2014-10-18 00:00:00
The EP_OUT_DATETIME will always be the latest date in the group. Hope this clarifies a bit.
This way, I can group by the EP_ID and find the EP_OUT_DATETIME and start time for any ADM_ID/PID that fall within.
Each should roll into the next, meaning that if another row has an WH_IN_DATETIME which follows on the WH_OUT_DATETIME of another for the same WH_PID, than that row's WH_OUT_DATETIME becomes the EP_OUT_DATETIME for all of the WH_PID's within that EP_ID.
I hope this makes some sense.
Thanks,
MR
Since the question does not specify that the solution be a "single" query ;-), here is another approach: using the "quirky update" feature dealy, which is updating a variable at the same time you update a column. Breaking down the complexity of this operation, I create a scratch table to hold the piece that is the hardest to calculate: the EP_ID. Once that is done, it gets joined into a simple query and provides the window with which to calculate the EP_IN_DATETIME and EP_OUT_DATETIME fields.
The steps are:
Create the scratch table
Seed the scratch table with all of the ADM_ID values -- this lets us do an UPDATE as all of the rows already exist.
Update the scratch table
Do the final, simple select joining the scratch table to the main table
The Test Setup
SET ANSI_NULLS ON;
SET NOCOUNT ON;
CREATE TABLE #Table
(
ADM_ID INT NOT NULL PRIMARY KEY,
WH_PID INT NOT NULL,
WH_IN_DATETIME DATETIME NOT NULL,
WH_OUT_DATETIME DATETIME NOT NULL
);
INSERT INTO #Table VALUES (1, 9, '2014-10-12 00:00:00', '2014-10-13 15:00:00');
INSERT INTO #Table VALUES (2, 9, '2014-10-14 14:00:00', '2014-10-15 15:00:00');
INSERT INTO #Table VALUES (3, 9, '2014-10-16 14:00:00', '2014-10-17 15:00:00');
INSERT INTO #Table VALUES (4, 9, '2014-11-20 00:00:00', '2014-11-21 00:00:00');
INSERT INTO #Table VALUES (5, 5, '2014-10-17 00:00:00', '2014-10-18 00:00:00');
Step 1: Create and Populate the Scratch Table
CREATE TABLE #Scratch
(
ADM_ID INT NOT NULL PRIMARY KEY,
EP_ID INT NOT NULL
-- Might need WH_PID and WH_IN_DATETIME fields to guarantee proper UPDATE ordering
);
INSERT INTO #Scratch (ADM_ID, EP_ID)
SELECT ADM_ID, 0
FROM #Table;
Alternate scratch table structure to ensure proper update order (since "quirky update" uses the order of the Clustered Index, as noted at the bottom of this answer):
CREATE TABLE #Scratch
(
WH_PID INT NOT NULL,
WH_IN_DATETIME DATETIME NOT NULL,
ADM_ID INT NOT NULL,
EP_ID INT NOT NULL
);
INSERT INTO #Scratch (WH_PID, WH_IN_DATETIME, ADM_ID, EP_ID)
SELECT WH_PID, WH_IN_DATETIME, ADM_ID, 0
FROM #Table;
CREATE UNIQUE CLUSTERED INDEX [CIX_Scratch]
ON #Scratch (WH_PID, WH_IN_DATETIME, ADM_ID);
Step 2: Update the Scratch Table using a local variable to keep track of the prior value
DECLARE #EP_ID INT; -- this is used in the UPDATE
;WITH cte AS
(
SELECT TOP (100) PERCENT
t1.*,
t2.WH_OUT_DATETIME AS [PriorOut],
t2.ADM_ID AS [PriorID],
ROW_NUMBER() OVER (PARTITION BY t1.WH_PID ORDER BY t1.WH_IN_DATETIME)
AS [RowNum]
FROM #Table t1
LEFT JOIN #Table t2
ON t2.WH_PID = t1.WH_PID
AND t2.ADM_ID <> t1.ADM_ID
AND t2.WH_OUT_DATETIME >= (t1.WH_IN_DATETIME - 1)
AND t2.WH_OUT_DATETIME < t1.WH_IN_DATETIME
ORDER BY t1.WH_PID, t1.WH_IN_DATETIME
)
UPDATE sc
SET #EP_ID = sc.EP_ID = CASE
WHEN cte.RowNum = 1 THEN 1
WHEN cte.[PriorOut] IS NULL THEN (#EP_ID + 1)
ELSE #EP_ID
END
FROM #Scratch sc
INNER JOIN cte
ON cte.ADM_ID = sc.ADM_ID
Step 3: Select Joining the Scratch Table
SELECT tab.ADM_ID,
tab.WH_PID,
sc.EP_ID,
MIN(tab.WH_IN_DATETIME) OVER (PARTITION BY tab.WH_PID, sc.EP_ID)
AS [EP_IN_DATETIME],
MAX(tab.WH_OUT_DATETIME) OVER (PARTITION BY tab.WH_PID, sc.EP_ID)
AS [EP_OUT_DATETIME],
tab.WH_IN_DATETIME,
tab.WH_OUT_DATETIME
FROM #Table tab
INNER JOIN #Scratch sc
ON sc.ADM_ID = tab.ADM_ID
ORDER BY tab.ADM_ID;
Resources
MSDN page for UPDATE
look for "#variable = column = expression"
Performance Analysis of doing Running Totals (not exactly the same thing as here, but not too far off)
This blog post does mention:
PRO: this method is generally pretty fast
CON: "The order of the UPDATE is controlled by the order of the clustered index". This behavior might rule out using this method depending on circumstances. But in this particular case, if the WH_PID values are not at least grouped together naturally via the ordering of the clustered index and ordered by WH_IN_DATETIME, then those two fields just get added to the scratch table and the PK (with implied clustered index) on the scratch table becomes (WH_PID, WH_IN_DATETIME, ADM_ID).
I would do this using exists in a correlated subquery:
select t.*,
(case when exists (select 1
from table t2
where t2.WH_P_ID = t.WH_P_ID and
t2.ADM_ID = t.ADM_ID and
t.WH_OUT_DATETIME between t2.WH_IN_DATETIME and dateadd(day, 1, t2.WH_OUT_DATETIME)
)
then 1 else 0
end) as TimeFrameFlag
from table t;
Try this query :
;WITH cte
AS (SELECT t1.ADM_ID AS EP_ID,*
FROM #yourtable t1
WHERE NOT EXISTS (SELECT 1
FROM #yourtable t2
WHERE t1.WH_PID = t2.WH_PID
AND t1.ADM_ID <> t2.ADM_ID
AND Abs(Datediff(HH, t1.WH_OUT_DATETIME, t2.WH_IN_DATETIME)) <= 24)
UNION ALL
SELECT t2.EP_ID,t1.ADM_ID,t1.WH_PID,t1.WH_IN_DATETIME,t1.WH_OUT_DATETIME
FROM #yourtable t1
JOIN cte t2
ON t1.WH_PID = t2.WH_PID
AND t1.ADM_ID <> t2.ADM_ID
AND Abs(( Datediff(HH, t2.WH_IN_DATETIME, t1.WH_OUT_DATETIME) )) <= 24),
cte_result
AS (SELECT t1.*,Dense_rank() OVER ( partition BY wh_pid ORDER BY t1.WH_PID, ISNULL(t2.EP_ID, t1.ADM_ID)) AS EP_ID
FROM #yourtable t1
LEFT OUTER JOIN (SELECT DISTINCT ADM_ID,
EP_ID
FROM cte) t2
ON t1.ADM_ID = t2.ADM_ID)
SELECT ADM_ID,WH_PID,EP_ID,Min(WH_IN_DATETIME)OVER(partition BY wh_pid, ep_id) AS [EP_IN_DATETIME],Max(WH_OUT_DATETIME)OVER(partition BY wh_pid, ep_id) AS [EP_OUT_DATETIME],
WH_IN_DATETIME,
WH_OUT_DATETIME
FROM cte_result
ORDER BY ADM_ID
I assumed these things :
Those rows which follow your rule, are a group.
min(WH_IN_DATETIME) of the group will be shown in EP_IN_DATETIME column for all rows belong to that group. Similarly, max(WH_OUT_DATETIME) of the group will be shown in EP_IN_DATETIME column for all rows belong to that group.
EP_ID will be assigned to groups of each WH_PID separately.
One thing which is not justified by your question that how EP_OUT_DATETIME and WH_IN_DATETIME of 4th row become 2014-11-20 00:00:00 and 2014-10-16 14:00:00 respectively. Assuming that it is a typo and it should be 2014-11-21 00:00:00.000 and 2014-11-20 00:00:00.000.
Explaination :
First CTE cte will return the possible groups based on your rule. Second CTE cte_result will assign EP_ID to groups. In the last, you can select min(WH_IN_DATETIME) and Max(WH_OUT_DATETIME) in partitions of wh_pid, ep_id.
sqlfiddle
Here's yet another alternative... which may miss your results still.
I agree with #NoDisplayName that there appears to be an error in your ADM_ID 5 output, the 2 OUT dates should match - at least that seems logical to me. I can't understand why you would want an out date to ever be showing an in date value, but of course there could be a good reason. :)
Also, the wording of your question makes it sound like this is just a part of the problem and that you may take this output to then further. I'm not sure what you are really aiming for, but I've broken the query below up into 2 CTEs and you may find your final information in the 2nd CTE (as it sounds like you want to group the data back together).
Here's the complete structure & query on SQL Fiddle
-- The Cross Join ensures we always have a pair of first and last time pairs
-- The left join matches all overlapping combinations,
-- allowing the where clause to restrict to just the first and last
-- These first/last pairs are then grouped in the first CTE
-- Data is restricted in the second CTE
-- The final select is then quite simple
With GroupedData AS (
SELECT
(Row_Number() OVER (ORDER BY t1.WH_PID, t1.WH_IN_DATETIME) - 1) / 2 Grp,
t1.WH_IN_DATETIME, t1.WH_OUT_DATETIME, t1.WH_PID
FROM yourtable t1
CROSS JOIN (SELECT 0 AS [First] UNION SELECT 1) SetOrder
LEFT OUTER JOIN yourtable t2
ON t1.WH_PID = t2.WH_PID
AND ((DATEADD(d,1,t1.WH_OUT_DATETIME) BETWEEN t2.WH_IN_DATETIME AND t2.WH_OUT_DATETIME AND [First] = 0)
OR (DATEADD(d,1,t2.WH_OUT_DATETIME) BETWEEN t1.WH_IN_DATETIME AND t1.WH_OUT_DATETIME AND [First] = 1))
WHERE t2.WH_PID IS NULL
), RestrictedData AS (
SELECT WH_PID, MIN(WH_IN_DATETIME) AS WH_IN_DATETIME, MAX(WH_OUT_DATETIME) AS WH_OUT_DATETIME
FROM GroupedData
GROUP BY Grp, WH_PID
)
SELECT yourtable.ADM_ID, yourtable.WH_PID, RestrictedData.WH_IN_DATETIME AS EP_IN_DATETIME, RestrictedData.WH_OUT_DATETIME AS EP_OUT_DATETIME, yourtable.WH_IN_DATETIME, yourtable.WH_OUT_DATETIME
FROM RestrictedData
INNER JOIN yourtable
ON RestrictedData.WH_PID = yourtable.WH_PID
AND yourtable.WH_IN_DATETIME BETWEEN RestrictedData.WH_IN_DATETIME AND RestrictedData.WH_OUT_DATETIME
ORDER BY yourtable.ADM_ID
A Left Outer Join and DateDiff Function should help you to filter the records. Finally Use Window Function to create GroupID's
create table #test
(ADM_ID int,WH_PID int,WH_IN_DATETIME DATETIME,WH_OUT_DATETIME DATETIME)
INSERT #test
VALUES ( 1,9,'2014-10-12 00:00:00','2014-10-13 15:00:00'),
(2,9,'2014-10-14 14:00:00','2014-10-15 15:00:00'),
(3,9,'2014-10-16 14:00:00','2014-10-17 15:00:00'),
(1,10,'2014-10-16 14:00:00','2014-10-17 15:00:00'),
(2,10,'2014-10-18 14:00:00','2014-10-19 15:00:00')
SELECT Row_number()OVER(partition by a.WH_PID ORDER BY a.WH_IN_DATETIME) Group_Id,
a.WH_PID,
a.WH_IN_DATETIME,
b.WH_OUT_DATETIME
FROM #test a
LEFT JOIN #test b
ON a.WH_PID = b.WH_PID
AND a.ADM_ID <> b.ADM_ID
where Datediff(hh, a.WH_OUT_DATETIME, b.WH_IN_DATETIME)BETWEEN 0 AND 24
OUTPUT :
Group_Id WH_PID WH_IN_DATETIME WH_OUT_DATETIME
-------- ------ ----------------------- -----------------------
1 9 2014-10-12 00:00:00.000 2014-10-15 15:00:00.000
2 9 2014-10-14 14:00:00.000 2014-10-17 15:00:00.000
1 10 2014-10-16 14:00:00.000 2014-10-19 15:00:00.000

Select between dates and with initial values

I need to make a graph from a log. The log entries are not in regular intervals.
I would like to select rows between dates along with what the values were immediately before the start date (that is, from whenever the immediatly preceeding log was entered).
So, let's say:
table Foo has id and value columns,
table Bar has id, foo_id, and value columns, and
table BarLog has id, foo_id, bar_id, bar_value and timestamp.
So there can be many Bars for one Foo.
I need all rows from BarLog for all Bars given some foo_id between, say, 07/01/2012 and 07/31/2012 and the value (row) for each Bar as it was on 07/01/2012.
Hope that made sense, if not, I'll try to clarify.
EDIT (above left for context):
Let's simplify this down another step. If I have a table with two foreign keys, fk_a and fk_b, and a timestamp, how can I get the most recent rows with a given fk_a and a distict fk_b.
As suggested, here's an example.
+----+------+------+-------------+
| id | fk_a | fk_b | timestamp |
+----+------+------+-------------+
| 1 | 1 | 1 | 01-JUL-2012 |
| 2 | 2 | 2 | 02-JUL-2012 |
| 3 | 1 | 1 | 04-JUL-2012 |
| 4 | 2 | 2 | 05-JUL-2012 |
| 5 | 1 | 3 | 07-JUL-2012 |
+----+------+------+-------------+
Given a fk_a of 1, I would want rows 3 and 5. So looking only at rows 1, 3, and 5 (those with fk_a of 1), get the most recent of each fk_b (where row 3 is more recent than row 1 for fk_b=1).
Thanks again.
Are you looking for something like this?
SELECT bl.bar_value, timestamp
FROM foo f, bar b, barlog bl
WHERE f.id = b.id
AND b.foo_id = bl.foo_id
AND timestamp BETWEEN '01-JUL-2012' AND '31-JUL-2012'
AND b.foo_id = :enter_value_here
ORDER BY timestamp DESC
Use the :enter_value_here to add the foo_id you need the data for...
What plotting tool are you using? You can take the data-set and push it into excel for plotting..in any case, hopefully the query above can get you closer to what you're trying to do.
For a dense set, create a date table and run the following query:
DECLARE #StartDate datetime
SET #StartDate = '2012-01-01'
SELECT f.ID as foo_id, b.bar_id, f.Value, GetDate() as DateStamp
FROM Foo f
inner join Bar b on f.id = b.foo_id
WHERE /*enter criteria for bar selection*/
UNION ALL
SELECT f.ID as foo_id, b.bar_id, f.Value, GetDate() as DateStamp
FROM (
SELECT MAX(bl.timestamp) as bl_timestamp, bl.bar_id as bar_id
FROM Dates d
INNER JOIN BarLog bl on bl.timestamp < d.Date
WHERE /*enter criteria for bar selection*/
GROUP BY bl.bar_id
) as pi
INNER JOIN BarLog bl on pi.bar_id = bl.bar_id and bl.timestamp = pi.bl_timestamp
WHERE d.Day_Of_Month = 1 and d.Date between #StartDate and getDate()
AND /*enter criteria for bar selection*/
The date table can be something like http://it.toolbox.com/wiki/index.php/Create_a_Time_Dimension_/_Date_Table or could be created temporarily each query by:
CREATE TABLE #Dates ([Date] datetime, Day_Of_Month int)
DECLARE #cDate datetime
SET #cDate = #StartDate
WHILE #cDate < getdate()
BEGIN
INSERT INTO #Dates (Date, Day_Of_Month)
SELECT #cDate, Datepart(d, #cdate)
SET #cDate = DATEADD(m, 1 + DATEDIFF(m, 0, #cdate), 0)
END
with a DROP TABLE #Dates sitting after the select.
This query will return:
Foo_ID, Bar_ID, Value at datestamp, Datestamp
with the datestamps incrementing by 1 month at a time.
Finally found this question which had what I was looking for. Basically just joining with a grouped select. So the answer for my edit would be something like
SELECT * FROM SomeTable a
JOIN (
SELECT fk_b, MAX(timestamp) as latest
FROM SomeTable
GROUP BY fk_b
) b
ON a.id = b.id
WHERE a.fk_a = #someIdA
Which would return the latest of each distinct fk_b with a specified fk_a
The original question would just be a union of this with a simple get between dates