Estimated number of rows is way off in execution plan

Estimated number of rows is way off in execution plan - sql

I have a situation where the estimated number of rows in the execution plan is way off
My columns in the join are varchar(50). I have tried different indexes but it does not reduce this problem. I have even tried with an index on the temp table. What else can I do?
PS this is the first place where the estimated number starts to drift... Also the tables are not big (48000 rows).
The code is:
SELECT DISTINCT householdnumber, householdid, primaryCustomerID
INTO #Households
FROM TableA
SELECT
A.*,
MIN(B.[ProfileCreatedDate]) PROFILECREATEDDATE
INTO #Profile
from #Households AS a
LEFT JOIN TableA AS B
ON A.[HouseholdNumber]= B.[HouseholdNumber] and A.[HouseholdId]=B.[HouseholdId]
GROUP BY a.householdnumber, a.householdid, a.primaryCustomerID;
I know it seems that this can be rewritten as:
SELECT householdnumber, householdid, primaryCustomerID, MIN([ProfileCreatedDate]) AS PROFILECREATEDDATE
INTO #Profile2
from TableA
GROUP BY householdnumber, householdid, primaryCustomerID;
But the results are not identical and I don't want to change the results since I am not sure if the creator of this code knew what they were doing.
Some statistics on the columns:
householdnumber is always equal to householdid. householdid is nvarchar(50) but householdnumber is varchar(40). The table has 48877 rows. Distinct combination of householdnumber, householdid, primaryCustomerID has 48029 rows. And distinct number of primaryCustomerID is 47152.

Regarding the code - it appears that the difference between the larger (original) version and your simpler GROUP BY version is that the original finds the minimum profilecreateddate for anyone in that household, whereas your simpler version finds the profilecreateddate for the specific primarycustomerid.
For example (using simpler data)
CREATE TABLE #TableA (householdnumber int, householdid int, primaryCustomerID int, ProfileCreatedDate datetime);
INSERT INTO #TableA (householdnumber, householdid, primaryCustomerID, ProfileCreatedDate) VALUES
(1, 1, 1, '20201001'),
(1, 1, 1, '20201002'),
(1, 1, 2, '20201003');
SELECT DISTINCT householdnumber, householdid, primaryCustomerID
INTO #Households
FROM #TableA;
SELECT
A.*,
MIN(B.[ProfileCreatedDate]) PROFILECREATEDDATE
INTO #Profile
from #Households AS a
LEFT JOIN #TableA AS B
ON A.[HouseholdNumber]= B.[HouseholdNumber] and A.[HouseholdId]=B.[HouseholdId]
GROUP BY a.householdnumber, a.householdid, a.primaryCustomerID;
SELECT * FROM #Profile;
/* -- Results
householdnumber householdid primaryCustomerID PROFILECREATEDDATE
1 1 1 2020-10-01 00:00:00.000
1 1 2 2020-10-01 00:00:00.000
*/
SELECT householdnumber, householdid, primaryCustomerID, MIN([ProfileCreatedDate]) AS PROFILECREATEDDATE
INTO #Profile2
from #TableA
GROUP BY householdnumber, householdid, primaryCustomerID;
SELECT * FROM #Profile2;
/* -- Results
householdnumber householdid primaryCustomerID PROFILECREATEDDATE
1 1 1 2020-10-01 00:00:00.000
1 1 2 2020-10-03 00:00:00.000
*/
If you notice in the above, the PROFILECREATEDATE for row 2 is different.
You could therefore try the following code that should give the same results as the original set - see how that goes for time (and confirm it matches the original results).
SELECT DISTINCT t1.householdnumber, t1.householdid, primaryCustomerID,
MIN([ProfileCreatedDate]) OVER (PARTITION BY t1.householdnumber, t1.householdid) AS PROFILECREATEDDATE
INTO #Profile3
FROM #TableA t1;
SELECT * FROM #Profile3;
/* -- Results
householdnumber householdid primaryCustomerID PROFILECREATEDDATE
1 1 1 2020-10-01 00:00:00.000
1 1 2 2020-10-01 00:00:00.000
*/

Related

Can I SELECT the AVG of 'x' values from two tables but only select the values where the 'y' values match? (INNER JOIN)

I'm trying to get the average of two values which are in two different tables. I only want to get the average of the values where the in the same column the 'Week' Values of both tables are the same.
So e.g.:
Table1 Name= BicepsTable
Week | Biceps
1 | 33
2 | 33.2
3 | 34.1
.
Table2 Name=ThighTable
Week | Thigh
1 | 42.1
3 | 42.8
4 | 43
.
From these tables I want to have the values {(1, 37.55), (3, 38.45)}.
( . (33+42)/2=37.55 . . . . (34.1+42.8)/2=38.45 . )
I tried to get this with the following code but the following code gives me {(1, 37.55), (3, 37.55)} where the second value is wrong, the second average value should be the one of the next column.
sql = 'SELECT BicepsTable.Week,
((SELECT BicepsTable.Biceps FROM BicepsTable INNER JOIN ThighTable ON BicepsTable.Week = ThighTable.Week)
+
(SELECT ThighTable.Thigh FROM ThighTable INNER JOIN BicepsTable ON ThighTable.Week = BicepsTable.Week)) /2
FROM BicepsTable INNER JOIN ThighTable ON BicepsTable.Week = ThighTable.Week'
Please help, if you don't understand my problem, or got questions, feel free to ask:)

I suggest taking a union of the two tables, and then taking the average of each week:
SELECT Week, AVG(rating) AS avg_rating
FROM
(
SELECT Week, Biceps AS rating FROM BicepsTable
UNION ALL
SELECT Week, Thigh FROM ThighTable
) t
GROUP BY Week
HAVING COUNT(*) = 2
ORDER BY Week;
Aggregation, as used above, is a good option here, because the AVG will only operate on the values which are present. So, if only one or the other table has a value, then the average will reflect that.

declare #test1 as table
(id int,t1value float)
declare #test2 as table
(id int,t2value float)
insert into #test1
values(1,100),(2,150),(3,200)
insert into #test2
values(1,100),(3,150),(5,200)
select *,(a.t1value+b.t2value)/2 taverage from #test1 a
inner join #test2 b on a.[id]=b.[id]
group by a.id,a.t1value,b.id,b.t2value

Joining tables on ID and closest date

I have two tables, one showing when someone left and one showing when they came back (sometimes when they come back, they may forget to enter that they came back. I am tryint to join the tables so that they look like the desired table from the image.

You can try this.
DECLARE #TableA TABLE(ID INT, Leave DATE)
INSERT INTO #TableA VALUES
(62175, '11/29/2019'),
(62175, '11/11/2019'),
(62175, '3/29/2019'),
(62175, '8/22/2019'),
(68454, '11/29/2019'),
(68454, '12/13/2019')
DECLARE #TableB TABLE(ID INT, [Return] DATE)
INSERT INTO #TableB VALUES
(62175, '4/4/2019'),
(62175, '11/16/2019'),
(62175, '11/30/2019'),
(68454, '11/30/2019'),
(68454, '12/14/2019')
SELECT TA.*, CASE WHEN ROW_NUMBER()OVER(PARTITION BY X.ID, X.[Return] ORDER BY TA.Leave DESC) = 1 THEN X.[Return] ELSE NULL END [Return]
FROM #TableA TA
OUTER APPLY (SELECT TOP 1 * FROM #TableB TB
WHERE TA.ID = TB.ID
AND TB.[Return] > TA.Leave
ORDER BY TB.[Return] ) X
ORDER BY TA.ID, TA.Leave
Result:
ID Leave Return
----------- ---------- ----------
62175 2019-03-29 2019-04-04
62175 2019-08-22 NULL
62175 2019-11-11 2019-11-16
62175 2019-11-29 2019-11-30
68454 2019-11-29 2019-11-30
68454 2019-12-13 2019-12-14

These tables are invalid, they should be in one table with 3 columns. ID, Leave, Return

Very tricky question. I think this does what you want:
with ab as (
select id, leave, null as return
from a
union all
select id, null, return
from b
)
select distinct id, coalesce(leave, prev_leave), coalesce(return, next_return)
from (select ab.*,
(case when leave is null
then lag(leave) over (partition by id order by coalesce(leave, return))
end) as prev_leave,
(case when leave is null
then lead(leave) over (partition by id order by coalesce(leave, return))
end) as next_return
from ab
) ab

How to I get a correct average number of appointments per day?

I want to see what the average number of appointments is by each appointment type is. Basically I have the following tables and columns:
Table 1 - Dates
-----------
Date date (primary key)
Table 2 - Appointments
-----------
AppointmentStart Datetime
ApptId Numeric
FacilityId Numeric
ApptKind Numeric
Appointmentid Numeric
Table 3 AppointmentType
-----------
ApptTypeId Numeric
Name Varchar
Sample Data
============
Table 1 Date
---------------
date
1/1/2017
1/2/2017
...
Table 2 Appointment
----------------
ApptStart | ApptTypeId | FacilityId | ApptKind | ApptId
2017-1-1 9:00:00 1 2 1 2385525
2017-1-1 9:15:00 3 2 1 2385526
2017-1-1 9:30:00 2 2 1 2385527
...
Table 3 ApptType
-----------------
ApptTypeId | Name
1 Walk-in
2 MAT
3 Acute
...
There are about 30 different appointment types and not all of them occur every day. So far I have created a table that lists every date in the time range that I want then I do a left join with the count of appointments (nulls equal 0). I also remove Saturdays and Sundays. This works really well for one appointment type but when I do this with multiple appointment types zeroes only show up for the days where there are no appointments.
My solution:
Somehow insert each appointment type next to each day then do the left join with the NULL = 0 part although I don't know how to get the list to repeat for each day in the table.
Example:
At the end I want
EndResult
----------
Average(Count(appts)) | ApptType.Name
OR
EndResult
---------
Count(apptid) | ApptType.Name | Date
5 Acute 1/1/2017
0 MAT 1/1/2017
4 Walk-in 1/1/2017
0 Other 1/1/2017
Then repeat for the next day with the same appointment type names

This is how I would write a query that gets you to
End Result #2:
SELECT IsNull(B.ApptCount, 0) AS ApptCount, C.Name AS ApptTypeName, A.Date
FROM (
SELECT Table1.Date, Table3.ApptTypeID
FROM Table1, Table3
) AS A LEFT JOIN (
SELECT Convert(Date, ApptStart) AS ApptDate, ApptTypeID, COUNT(ApptID) AS ApptCount
FROM Table2
GROUP BY Date(ApptStart), ApptTypeID
) AS B ON A.Date = B.ApptDate AND A.ApptTypeID = B.ApptTypeID
LEFT JOIN Table3 AS C ON B.ApptTypeID = C.ApptTypeID
This assumes that ApptTypeID is indeed part of Table2. You can wrap this result up further to get your End Result #1:
SELECT Avg(D.ApptCount), D.ApptTypeName
FROM (
SELECT IsNull(B.ApptCount, 0) AS ApptCount, C.Name AS ApptTypeName, A.Date
FROM (
SELECT Table1.Date, Table3.ApptID
FROM Table1, Table3
) AS A LEFT JOIN (
SELECT Convert(Date, ApptStart) AS ApptDate, ApptTypeID, COUNT(ApptID) AS ApptCount
FROM Table2
GROUP BY Date(ApptStart), ApptTypeID
) AS B ON A.Date = B.ApptDate AND A.ApptTypeID = B.ApptTypeID
LEFT JOIN Table3 AS C ON B.ApptTypeID = C.ApptTypeID
) AS D
GROUP BY D.ApptTypeName

First we declare and populate table variables for example data.
DECLARE #Dates TABLE (
Date DATE
)
INSERT #Dates
VALUES
('2017-01-01')
,('2017-01-02')
DECLARE #Appointments TABLE (
AppointmentStart DATETIME
,ApptId INT
,FacilityId INT
,ApptKind INT
,Appointmentid INT
)
INSERT #Appointments
VALUES
('2017-01-01 09:00:00.000', 1, 2, 1, 2385525)
,('2017-01-01 09:15:00.000', 3, 2, 1, 2385526)
,('2017-01-01 09:30:00.000', 2, 2, 1, 2385527)
DECLARE #ApptType TABLE (
ApptTypeId INT
,Name VARCHAR(32)
)
INSERT #ApptType
VALUES
(1, 'Walk-in')
,(2, 'MAT')
,(3, 'Acute')
This shows us the cartesian product of a full outer join of Dates and ApptType.
SELECT
[Dates].[Date]
,[ApptType].[ApptTypeID]
,[ApptType].[Name]
FROM #Dates AS [Dates]
FULL OUTER JOIN #ApptType AS [ApptType]
ON 1 = 1
We can use the cartesian product as our left data set, and count the number of items in our right data set (#Appointments). By doing this with a left join, we ensure that every date/appointment type combination is included, even if there were no appointments of that type on that date.
SELECT
A.[Date]
,A.[Name]
,COUNT(B.Appointmentid)
FROM (
SELECT
[Dates].[Date]
,[ApptType].[ApptTypeID]
,[ApptType].[Name]
FROM #Dates AS [Dates]
FULL OUTER JOIN #ApptType AS [ApptType]
ON 1 = 1) AS A
LEFT JOIN #Appointments AS B
ON A.[ApptTypeId] = B.[ApptId]
AND A.[Date] = CAST(B.[AppointmentStart] AS DATE)
GROUP BY
A.[Date]
,A.[Name]
ORDER BY
A.[Date]
,A.[Name]

How to compare date fields between two tables and get the less or equal date from the second table

I have two tables. Table A and table B. Both of them have date fields. I need compare those fields and get a table C with the less or equal date between Table A and table B, taking into account that the table A is the main.
CONTEXT: I have in Table A Expiration of products, and in table B on business days. The user can update table B when it is determined
that a date is not to be considered as a "business day". Then delete
the date from table B and then go to table A to update all product
expirations that were registered with that date and assign them the
previous business day. So in my case I am creating table C, which
contains the Id of table A and the working date less or equal to the
date mentioned. Then I will make the respective update.
IF OBJECT_ID('tempdb..#tmpA') IS NOT NULL DROP TABLE #tmpA
IF OBJECT_ID('tempdb..#tmpB') IS NOT NULL DROP TABLE #tmpB
CREATE TABLE #tmpA(Id INT IDENTITY(100,1),Fecha date)
INSERT INTO #tmpA(Fecha)
VALUES
('20170101'),('20171003'),('20170504'),('2017-09-01')
SELECT * FROM #tmpA
Id Fecha
----------- ----------
100 2017-01-01
101 2017-10-03
102 2017-05-04
103 2017-09-01
CREATE TABLE #tmpB(Id INT IDENTITY(1,4),Fecha date)
INSERT INTO #tmpB(Fecha)
VALUES
('20170101'),('20171001'),('20170504')
SELECT * FROM #tmpB
Id Fecha
----------- ----------
1 2017-01-01
5 2017-10-01
9 2017-05-04
I want to get this result (The same number of records in table A):
Id Fecha
----------- ----------
100 2017-01-01
101 2017-10-01 --> **this row is less than 2017-10-03**
102 2017-05-04
103 2017-05-04 --> **this row is less than 2017-09-01**
I tried to built some queries without results,
IF OBJECT_ID('tempdb..#tmpC') IS NOT NULL DROP TABLE #tmpC
SELECT A.* INTO #tmpC FROM #tmpA A LEFT JOIN #tmpB B ON A.Fecha = B.Fecha WHERE B.Fecha IS NULL
SELECT * FROM #tmpC
SELECT *
FROM #tmpA A INNER JOIN
(
SELECT *
FROM #tmpC
GROUP BY id, Fecha
) AS Q ON MAX(Q.Fecha) <= A.Fecha
UPDATE:
NOTE. The Id column is simply an identity, but it does not mean that it should be related. The important thing is the dates.
Regards

While I'm not sure if this will scale well (if you have more than 100k rows) this will bring back the results which you want.
Theoretically, the correct way for you to do this, in a fashion which will scale well, would be to have a view where you utilize RANK() and join both of these tables together, though this was the quick and easy way. Please try this and let me know if it meets your requirements.
For your edification, I have left both of the dates in there for you to be able to compare them.
SELECT
A.ID
,A.FECHA OLDDATE
,B.FECHA CORRECTDATE
FROM #TMPA A
LEFT OUTER JOIN #TMPB B ON 1=1
WHERE 1=1
AND B.FECHA = (
SELECT MAX(FECHA)
FROM #TMPB
WHERE FECHA <= A.FECHA)

Is this what you want?
select a.id,
(case when b.fecha < a.fecha then b.fecha else a.fecha end) as fecha
from #tmpA a left join
#tmpB b
on a.id = b.id;

You can get minmum by union all
select id, min(fecha) from (
select * from #tmpA
union all
select * from #tmpB
) a
group by a.id

#JotaPardo WHERE 1=1 is used to basically make sure the query runs if the WHERE conditions don't hold up. 1=1 will equate to true so saying WHERE 1=1 or WHERE TRUE, and TRUE is always TRUE, ensures the query will have at least one WHERE clause condition that will always hold up.

Grouping rows with a date range

I am using SQL Server 2008 and need to create a query that shows rows that fall within a date range.
My table is as follows:
ADM_ID WH_PID WH_IN_DATETIME WH_OUT_DATETIME
My rules are:
If the WH_OUT_DATETIME is on or within 24 hours of the WH_IN_DATETIME of another ADM_ID with the same WH_P_ID
I would like another column added to the results which identify the grouped value if possible as EP_ID.
e.g.
ADM_ID WH_PID WH_IN_DATETIME WH_OUT_DATETIME
------ ------ -------------- ---------------
1 9 2014-10-12 00:00:00 2014-10-13 15:00:00
2 9 2014-10-14 14:00:00 2014-10-15 15:00:00
3 9 2014-10-16 14:00:00 2014-10-17 15:00:00
4 9 2014-11-20 00:00:00 2014-11-21 00:00:00
5 5 2014-10-17 00:00:00 2014-10-18 00:00:00
Would return rows with:
ADM_ID WH_PID EP_ID EP_IN_DATETIME EP_OUT_DATETIME WH_IN_DATETIME WH_OUT_DATETIME
------ ------ ----- ------------------- ------------------- ------------------- -------------------
1 9 1 2014-10-12 00:00:00 2014-10-17 15:00:00 2014-10-12 00:00:00 2014-10-13 15:00:00
2 9 1 2014-10-12 00:00:00 2014-10-17 15:00:00 2014-10-14 14:00:00 2014-10-15 15:00:00
3 9 1 2014-10-12 00:00:00 2014-10-17 15:00:00 2014-10-16 14:00:00 2014-10-17 15:00:00
4 9 2 2014-11-20 00:00:00 2014-11-20 00:00:00 2014-10-16 14:00:00 2014-11-21 00:00:00
5 5 1 2014-10-17 00:00:00 2014-10-18 00:00:00 2014-10-17 00:00:00 2014-10-18 00:00:00
The EP_OUT_DATETIME will always be the latest date in the group. Hope this clarifies a bit.
This way, I can group by the EP_ID and find the EP_OUT_DATETIME and start time for any ADM_ID/PID that fall within.
Each should roll into the next, meaning that if another row has an WH_IN_DATETIME which follows on the WH_OUT_DATETIME of another for the same WH_PID, than that row's WH_OUT_DATETIME becomes the EP_OUT_DATETIME for all of the WH_PID's within that EP_ID.
I hope this makes some sense.
Thanks,
MR

Since the question does not specify that the solution be a "single" query ;-), here is another approach: using the "quirky update" feature dealy, which is updating a variable at the same time you update a column. Breaking down the complexity of this operation, I create a scratch table to hold the piece that is the hardest to calculate: the EP_ID. Once that is done, it gets joined into a simple query and provides the window with which to calculate the EP_IN_DATETIME and EP_OUT_DATETIME fields.
The steps are:
Create the scratch table
Seed the scratch table with all of the ADM_ID values -- this lets us do an UPDATE as all of the rows already exist.
Update the scratch table
Do the final, simple select joining the scratch table to the main table
The Test Setup
SET ANSI_NULLS ON;
SET NOCOUNT ON;
CREATE TABLE #Table
(
ADM_ID INT NOT NULL PRIMARY KEY,
WH_PID INT NOT NULL,
WH_IN_DATETIME DATETIME NOT NULL,
WH_OUT_DATETIME DATETIME NOT NULL
);
INSERT INTO #Table VALUES (1, 9, '2014-10-12 00:00:00', '2014-10-13 15:00:00');
INSERT INTO #Table VALUES (2, 9, '2014-10-14 14:00:00', '2014-10-15 15:00:00');
INSERT INTO #Table VALUES (3, 9, '2014-10-16 14:00:00', '2014-10-17 15:00:00');
INSERT INTO #Table VALUES (4, 9, '2014-11-20 00:00:00', '2014-11-21 00:00:00');
INSERT INTO #Table VALUES (5, 5, '2014-10-17 00:00:00', '2014-10-18 00:00:00');
Step 1: Create and Populate the Scratch Table
CREATE TABLE #Scratch
(
ADM_ID INT NOT NULL PRIMARY KEY,
EP_ID INT NOT NULL
-- Might need WH_PID and WH_IN_DATETIME fields to guarantee proper UPDATE ordering
);
INSERT INTO #Scratch (ADM_ID, EP_ID)
SELECT ADM_ID, 0
FROM #Table;
Alternate scratch table structure to ensure proper update order (since "quirky update" uses the order of the Clustered Index, as noted at the bottom of this answer):
CREATE TABLE #Scratch
(
WH_PID INT NOT NULL,
WH_IN_DATETIME DATETIME NOT NULL,
ADM_ID INT NOT NULL,
EP_ID INT NOT NULL
);
INSERT INTO #Scratch (WH_PID, WH_IN_DATETIME, ADM_ID, EP_ID)
SELECT WH_PID, WH_IN_DATETIME, ADM_ID, 0
FROM #Table;
CREATE UNIQUE CLUSTERED INDEX [CIX_Scratch]
ON #Scratch (WH_PID, WH_IN_DATETIME, ADM_ID);
Step 2: Update the Scratch Table using a local variable to keep track of the prior value
DECLARE #EP_ID INT; -- this is used in the UPDATE
;WITH cte AS
(
SELECT TOP (100) PERCENT
t1.*,
t2.WH_OUT_DATETIME AS [PriorOut],
t2.ADM_ID AS [PriorID],
ROW_NUMBER() OVER (PARTITION BY t1.WH_PID ORDER BY t1.WH_IN_DATETIME)
AS [RowNum]
FROM #Table t1
LEFT JOIN #Table t2
ON t2.WH_PID = t1.WH_PID
AND t2.ADM_ID <> t1.ADM_ID
AND t2.WH_OUT_DATETIME >= (t1.WH_IN_DATETIME - 1)
AND t2.WH_OUT_DATETIME < t1.WH_IN_DATETIME
ORDER BY t1.WH_PID, t1.WH_IN_DATETIME
)
UPDATE sc
SET #EP_ID = sc.EP_ID = CASE
WHEN cte.RowNum = 1 THEN 1
WHEN cte.[PriorOut] IS NULL THEN (#EP_ID + 1)
ELSE #EP_ID
END
FROM #Scratch sc
INNER JOIN cte
ON cte.ADM_ID = sc.ADM_ID
Step 3: Select Joining the Scratch Table
SELECT tab.ADM_ID,
tab.WH_PID,
sc.EP_ID,
MIN(tab.WH_IN_DATETIME) OVER (PARTITION BY tab.WH_PID, sc.EP_ID)
AS [EP_IN_DATETIME],
MAX(tab.WH_OUT_DATETIME) OVER (PARTITION BY tab.WH_PID, sc.EP_ID)
AS [EP_OUT_DATETIME],
tab.WH_IN_DATETIME,
tab.WH_OUT_DATETIME
FROM #Table tab
INNER JOIN #Scratch sc
ON sc.ADM_ID = tab.ADM_ID
ORDER BY tab.ADM_ID;
Resources
MSDN page for UPDATE
look for "#variable = column = expression"
Performance Analysis of doing Running Totals (not exactly the same thing as here, but not too far off)
This blog post does mention:
PRO: this method is generally pretty fast
CON: "The order of the UPDATE is controlled by the order of the clustered index". This behavior might rule out using this method depending on circumstances. But in this particular case, if the WH_PID values are not at least grouped together naturally via the ordering of the clustered index and ordered by WH_IN_DATETIME, then those two fields just get added to the scratch table and the PK (with implied clustered index) on the scratch table becomes (WH_PID, WH_IN_DATETIME, ADM_ID).

I would do this using exists in a correlated subquery:
select t.*,
(case when exists (select 1
from table t2
where t2.WH_P_ID = t.WH_P_ID and
t2.ADM_ID = t.ADM_ID and
t.WH_OUT_DATETIME between t2.WH_IN_DATETIME and dateadd(day, 1, t2.WH_OUT_DATETIME)
)
then 1 else 0
end) as TimeFrameFlag
from table t;

Try this query :
;WITH cte
AS (SELECT t1.ADM_ID AS EP_ID,*
FROM #yourtable t1
WHERE NOT EXISTS (SELECT 1
FROM #yourtable t2
WHERE t1.WH_PID = t2.WH_PID
AND t1.ADM_ID <> t2.ADM_ID
AND Abs(Datediff(HH, t1.WH_OUT_DATETIME, t2.WH_IN_DATETIME)) <= 24)
UNION ALL
SELECT t2.EP_ID,t1.ADM_ID,t1.WH_PID,t1.WH_IN_DATETIME,t1.WH_OUT_DATETIME
FROM #yourtable t1
JOIN cte t2
ON t1.WH_PID = t2.WH_PID
AND t1.ADM_ID <> t2.ADM_ID
AND Abs(( Datediff(HH, t2.WH_IN_DATETIME, t1.WH_OUT_DATETIME) )) <= 24),
cte_result
AS (SELECT t1.*,Dense_rank() OVER ( partition BY wh_pid ORDER BY t1.WH_PID, ISNULL(t2.EP_ID, t1.ADM_ID)) AS EP_ID
FROM #yourtable t1
LEFT OUTER JOIN (SELECT DISTINCT ADM_ID,
EP_ID
FROM cte) t2
ON t1.ADM_ID = t2.ADM_ID)
SELECT ADM_ID,WH_PID,EP_ID,Min(WH_IN_DATETIME)OVER(partition BY wh_pid, ep_id) AS [EP_IN_DATETIME],Max(WH_OUT_DATETIME)OVER(partition BY wh_pid, ep_id) AS [EP_OUT_DATETIME],
WH_IN_DATETIME,
WH_OUT_DATETIME
FROM cte_result
ORDER BY ADM_ID
I assumed these things :
Those rows which follow your rule, are a group.
min(WH_IN_DATETIME) of the group will be shown in EP_IN_DATETIME column for all rows belong to that group. Similarly, max(WH_OUT_DATETIME) of the group will be shown in EP_IN_DATETIME column for all rows belong to that group.
EP_ID will be assigned to groups of each WH_PID separately.
One thing which is not justified by your question that how EP_OUT_DATETIME and WH_IN_DATETIME of 4th row become 2014-11-20 00:00:00 and 2014-10-16 14:00:00 respectively. Assuming that it is a typo and it should be 2014-11-21 00:00:00.000 and 2014-11-20 00:00:00.000.
Explaination :
First CTE cte will return the possible groups based on your rule. Second CTE cte_result will assign EP_ID to groups. In the last, you can select min(WH_IN_DATETIME) and Max(WH_OUT_DATETIME) in partitions of wh_pid, ep_id.
sqlfiddle

Here's yet another alternative... which may miss your results still.
I agree with #NoDisplayName that there appears to be an error in your ADM_ID 5 output, the 2 OUT dates should match - at least that seems logical to me. I can't understand why you would want an out date to ever be showing an in date value, but of course there could be a good reason. :)
Also, the wording of your question makes it sound like this is just a part of the problem and that you may take this output to then further. I'm not sure what you are really aiming for, but I've broken the query below up into 2 CTEs and you may find your final information in the 2nd CTE (as it sounds like you want to group the data back together).
Here's the complete structure & query on SQL Fiddle
-- The Cross Join ensures we always have a pair of first and last time pairs
-- The left join matches all overlapping combinations,
-- allowing the where clause to restrict to just the first and last
-- These first/last pairs are then grouped in the first CTE
-- Data is restricted in the second CTE
-- The final select is then quite simple
With GroupedData AS (
SELECT
(Row_Number() OVER (ORDER BY t1.WH_PID, t1.WH_IN_DATETIME) - 1) / 2 Grp,
t1.WH_IN_DATETIME, t1.WH_OUT_DATETIME, t1.WH_PID
FROM yourtable t1
CROSS JOIN (SELECT 0 AS [First] UNION SELECT 1) SetOrder
LEFT OUTER JOIN yourtable t2
ON t1.WH_PID = t2.WH_PID
AND ((DATEADD(d,1,t1.WH_OUT_DATETIME) BETWEEN t2.WH_IN_DATETIME AND t2.WH_OUT_DATETIME AND [First] = 0)
OR (DATEADD(d,1,t2.WH_OUT_DATETIME) BETWEEN t1.WH_IN_DATETIME AND t1.WH_OUT_DATETIME AND [First] = 1))
WHERE t2.WH_PID IS NULL
), RestrictedData AS (
SELECT WH_PID, MIN(WH_IN_DATETIME) AS WH_IN_DATETIME, MAX(WH_OUT_DATETIME) AS WH_OUT_DATETIME
FROM GroupedData
GROUP BY Grp, WH_PID
)
SELECT yourtable.ADM_ID, yourtable.WH_PID, RestrictedData.WH_IN_DATETIME AS EP_IN_DATETIME, RestrictedData.WH_OUT_DATETIME AS EP_OUT_DATETIME, yourtable.WH_IN_DATETIME, yourtable.WH_OUT_DATETIME
FROM RestrictedData
INNER JOIN yourtable
ON RestrictedData.WH_PID = yourtable.WH_PID
AND yourtable.WH_IN_DATETIME BETWEEN RestrictedData.WH_IN_DATETIME AND RestrictedData.WH_OUT_DATETIME
ORDER BY yourtable.ADM_ID

A Left Outer Join and DateDiff Function should help you to filter the records. Finally Use Window Function to create GroupID's
create table #test
(ADM_ID int,WH_PID int,WH_IN_DATETIME DATETIME,WH_OUT_DATETIME DATETIME)
INSERT #test
VALUES ( 1,9,'2014-10-12 00:00:00','2014-10-13 15:00:00'),
(2,9,'2014-10-14 14:00:00','2014-10-15 15:00:00'),
(3,9,'2014-10-16 14:00:00','2014-10-17 15:00:00'),
(1,10,'2014-10-16 14:00:00','2014-10-17 15:00:00'),
(2,10,'2014-10-18 14:00:00','2014-10-19 15:00:00')
SELECT Row_number()OVER(partition by a.WH_PID ORDER BY a.WH_IN_DATETIME) Group_Id,
a.WH_PID,
a.WH_IN_DATETIME,
b.WH_OUT_DATETIME
FROM #test a
LEFT JOIN #test b
ON a.WH_PID = b.WH_PID
AND a.ADM_ID <> b.ADM_ID
where Datediff(hh, a.WH_OUT_DATETIME, b.WH_IN_DATETIME)BETWEEN 0 AND 24
OUTPUT :
Group_Id WH_PID WH_IN_DATETIME WH_OUT_DATETIME
-------- ------ ----------------------- -----------------------
1 9 2014-10-12 00:00:00.000 2014-10-15 15:00:00.000
2 9 2014-10-14 14:00:00.000 2014-10-17 15:00:00.000
1 10 2014-10-16 14:00:00.000 2014-10-19 15:00:00.000

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Estimated number of rows is way off in execution plan - sql

Related

Can I SELECT the AVG of 'x' values from two tables but only select the values where the 'y' values match? (INNER JOIN)

Joining tables on ID and closest date

How to I get a correct average number of appointments per day?

How to compare date fields between two tables and get the less or equal date from the second table

Grouping rows with a date range

Categories

Resources