Grouping rows with a date range - sql

I am using SQL Server 2008 and need to create a query that shows rows that fall within a date range.
My table is as follows:
ADM_ID WH_PID WH_IN_DATETIME WH_OUT_DATETIME
My rules are:
If the WH_OUT_DATETIME is on or within 24 hours of the WH_IN_DATETIME of another ADM_ID with the same WH_P_ID
I would like another column added to the results which identify the grouped value if possible as EP_ID.
e.g.
ADM_ID WH_PID WH_IN_DATETIME WH_OUT_DATETIME
------ ------ -------------- ---------------
1 9 2014-10-12 00:00:00 2014-10-13 15:00:00
2 9 2014-10-14 14:00:00 2014-10-15 15:00:00
3 9 2014-10-16 14:00:00 2014-10-17 15:00:00
4 9 2014-11-20 00:00:00 2014-11-21 00:00:00
5 5 2014-10-17 00:00:00 2014-10-18 00:00:00
Would return rows with:
ADM_ID WH_PID EP_ID EP_IN_DATETIME EP_OUT_DATETIME WH_IN_DATETIME WH_OUT_DATETIME
------ ------ ----- ------------------- ------------------- ------------------- -------------------
1 9 1 2014-10-12 00:00:00 2014-10-17 15:00:00 2014-10-12 00:00:00 2014-10-13 15:00:00
2 9 1 2014-10-12 00:00:00 2014-10-17 15:00:00 2014-10-14 14:00:00 2014-10-15 15:00:00
3 9 1 2014-10-12 00:00:00 2014-10-17 15:00:00 2014-10-16 14:00:00 2014-10-17 15:00:00
4 9 2 2014-11-20 00:00:00 2014-11-20 00:00:00 2014-10-16 14:00:00 2014-11-21 00:00:00
5 5 1 2014-10-17 00:00:00 2014-10-18 00:00:00 2014-10-17 00:00:00 2014-10-18 00:00:00
The EP_OUT_DATETIME will always be the latest date in the group. Hope this clarifies a bit.
This way, I can group by the EP_ID and find the EP_OUT_DATETIME and start time for any ADM_ID/PID that fall within.
Each should roll into the next, meaning that if another row has an WH_IN_DATETIME which follows on the WH_OUT_DATETIME of another for the same WH_PID, than that row's WH_OUT_DATETIME becomes the EP_OUT_DATETIME for all of the WH_PID's within that EP_ID.
I hope this makes some sense.
Thanks,
MR

Since the question does not specify that the solution be a "single" query ;-), here is another approach: using the "quirky update" feature dealy, which is updating a variable at the same time you update a column. Breaking down the complexity of this operation, I create a scratch table to hold the piece that is the hardest to calculate: the EP_ID. Once that is done, it gets joined into a simple query and provides the window with which to calculate the EP_IN_DATETIME and EP_OUT_DATETIME fields.
The steps are:
Create the scratch table
Seed the scratch table with all of the ADM_ID values -- this lets us do an UPDATE as all of the rows already exist.
Update the scratch table
Do the final, simple select joining the scratch table to the main table
The Test Setup
SET ANSI_NULLS ON;
SET NOCOUNT ON;
CREATE TABLE #Table
(
ADM_ID INT NOT NULL PRIMARY KEY,
WH_PID INT NOT NULL,
WH_IN_DATETIME DATETIME NOT NULL,
WH_OUT_DATETIME DATETIME NOT NULL
);
INSERT INTO #Table VALUES (1, 9, '2014-10-12 00:00:00', '2014-10-13 15:00:00');
INSERT INTO #Table VALUES (2, 9, '2014-10-14 14:00:00', '2014-10-15 15:00:00');
INSERT INTO #Table VALUES (3, 9, '2014-10-16 14:00:00', '2014-10-17 15:00:00');
INSERT INTO #Table VALUES (4, 9, '2014-11-20 00:00:00', '2014-11-21 00:00:00');
INSERT INTO #Table VALUES (5, 5, '2014-10-17 00:00:00', '2014-10-18 00:00:00');
Step 1: Create and Populate the Scratch Table
CREATE TABLE #Scratch
(
ADM_ID INT NOT NULL PRIMARY KEY,
EP_ID INT NOT NULL
-- Might need WH_PID and WH_IN_DATETIME fields to guarantee proper UPDATE ordering
);
INSERT INTO #Scratch (ADM_ID, EP_ID)
SELECT ADM_ID, 0
FROM #Table;
Alternate scratch table structure to ensure proper update order (since "quirky update" uses the order of the Clustered Index, as noted at the bottom of this answer):
CREATE TABLE #Scratch
(
WH_PID INT NOT NULL,
WH_IN_DATETIME DATETIME NOT NULL,
ADM_ID INT NOT NULL,
EP_ID INT NOT NULL
);
INSERT INTO #Scratch (WH_PID, WH_IN_DATETIME, ADM_ID, EP_ID)
SELECT WH_PID, WH_IN_DATETIME, ADM_ID, 0
FROM #Table;
CREATE UNIQUE CLUSTERED INDEX [CIX_Scratch]
ON #Scratch (WH_PID, WH_IN_DATETIME, ADM_ID);
Step 2: Update the Scratch Table using a local variable to keep track of the prior value
DECLARE #EP_ID INT; -- this is used in the UPDATE
;WITH cte AS
(
SELECT TOP (100) PERCENT
t1.*,
t2.WH_OUT_DATETIME AS [PriorOut],
t2.ADM_ID AS [PriorID],
ROW_NUMBER() OVER (PARTITION BY t1.WH_PID ORDER BY t1.WH_IN_DATETIME)
AS [RowNum]
FROM #Table t1
LEFT JOIN #Table t2
ON t2.WH_PID = t1.WH_PID
AND t2.ADM_ID <> t1.ADM_ID
AND t2.WH_OUT_DATETIME >= (t1.WH_IN_DATETIME - 1)
AND t2.WH_OUT_DATETIME < t1.WH_IN_DATETIME
ORDER BY t1.WH_PID, t1.WH_IN_DATETIME
)
UPDATE sc
SET #EP_ID = sc.EP_ID = CASE
WHEN cte.RowNum = 1 THEN 1
WHEN cte.[PriorOut] IS NULL THEN (#EP_ID + 1)
ELSE #EP_ID
END
FROM #Scratch sc
INNER JOIN cte
ON cte.ADM_ID = sc.ADM_ID
Step 3: Select Joining the Scratch Table
SELECT tab.ADM_ID,
tab.WH_PID,
sc.EP_ID,
MIN(tab.WH_IN_DATETIME) OVER (PARTITION BY tab.WH_PID, sc.EP_ID)
AS [EP_IN_DATETIME],
MAX(tab.WH_OUT_DATETIME) OVER (PARTITION BY tab.WH_PID, sc.EP_ID)
AS [EP_OUT_DATETIME],
tab.WH_IN_DATETIME,
tab.WH_OUT_DATETIME
FROM #Table tab
INNER JOIN #Scratch sc
ON sc.ADM_ID = tab.ADM_ID
ORDER BY tab.ADM_ID;
Resources
MSDN page for UPDATE
look for "#variable = column = expression"
Performance Analysis of doing Running Totals (not exactly the same thing as here, but not too far off)
This blog post does mention:
PRO: this method is generally pretty fast
CON: "The order of the UPDATE is controlled by the order of the clustered index". This behavior might rule out using this method depending on circumstances. But in this particular case, if the WH_PID values are not at least grouped together naturally via the ordering of the clustered index and ordered by WH_IN_DATETIME, then those two fields just get added to the scratch table and the PK (with implied clustered index) on the scratch table becomes (WH_PID, WH_IN_DATETIME, ADM_ID).

I would do this using exists in a correlated subquery:
select t.*,
(case when exists (select 1
from table t2
where t2.WH_P_ID = t.WH_P_ID and
t2.ADM_ID = t.ADM_ID and
t.WH_OUT_DATETIME between t2.WH_IN_DATETIME and dateadd(day, 1, t2.WH_OUT_DATETIME)
)
then 1 else 0
end) as TimeFrameFlag
from table t;

Try this query :
;WITH cte
AS (SELECT t1.ADM_ID AS EP_ID,*
FROM #yourtable t1
WHERE NOT EXISTS (SELECT 1
FROM #yourtable t2
WHERE t1.WH_PID = t2.WH_PID
AND t1.ADM_ID <> t2.ADM_ID
AND Abs(Datediff(HH, t1.WH_OUT_DATETIME, t2.WH_IN_DATETIME)) <= 24)
UNION ALL
SELECT t2.EP_ID,t1.ADM_ID,t1.WH_PID,t1.WH_IN_DATETIME,t1.WH_OUT_DATETIME
FROM #yourtable t1
JOIN cte t2
ON t1.WH_PID = t2.WH_PID
AND t1.ADM_ID <> t2.ADM_ID
AND Abs(( Datediff(HH, t2.WH_IN_DATETIME, t1.WH_OUT_DATETIME) )) <= 24),
cte_result
AS (SELECT t1.*,Dense_rank() OVER ( partition BY wh_pid ORDER BY t1.WH_PID, ISNULL(t2.EP_ID, t1.ADM_ID)) AS EP_ID
FROM #yourtable t1
LEFT OUTER JOIN (SELECT DISTINCT ADM_ID,
EP_ID
FROM cte) t2
ON t1.ADM_ID = t2.ADM_ID)
SELECT ADM_ID,WH_PID,EP_ID,Min(WH_IN_DATETIME)OVER(partition BY wh_pid, ep_id) AS [EP_IN_DATETIME],Max(WH_OUT_DATETIME)OVER(partition BY wh_pid, ep_id) AS [EP_OUT_DATETIME],
WH_IN_DATETIME,
WH_OUT_DATETIME
FROM cte_result
ORDER BY ADM_ID
I assumed these things :
Those rows which follow your rule, are a group.
min(WH_IN_DATETIME) of the group will be shown in EP_IN_DATETIME column for all rows belong to that group. Similarly, max(WH_OUT_DATETIME) of the group will be shown in EP_IN_DATETIME column for all rows belong to that group.
EP_ID will be assigned to groups of each WH_PID separately.
One thing which is not justified by your question that how EP_OUT_DATETIME and WH_IN_DATETIME of 4th row become 2014-11-20 00:00:00 and 2014-10-16 14:00:00 respectively. Assuming that it is a typo and it should be 2014-11-21 00:00:00.000 and 2014-11-20 00:00:00.000.
Explaination :
First CTE cte will return the possible groups based on your rule. Second CTE cte_result will assign EP_ID to groups. In the last, you can select min(WH_IN_DATETIME) and Max(WH_OUT_DATETIME) in partitions of wh_pid, ep_id.
sqlfiddle

Here's yet another alternative... which may miss your results still.
I agree with #NoDisplayName that there appears to be an error in your ADM_ID 5 output, the 2 OUT dates should match - at least that seems logical to me. I can't understand why you would want an out date to ever be showing an in date value, but of course there could be a good reason. :)
Also, the wording of your question makes it sound like this is just a part of the problem and that you may take this output to then further. I'm not sure what you are really aiming for, but I've broken the query below up into 2 CTEs and you may find your final information in the 2nd CTE (as it sounds like you want to group the data back together).
Here's the complete structure & query on SQL Fiddle
-- The Cross Join ensures we always have a pair of first and last time pairs
-- The left join matches all overlapping combinations,
-- allowing the where clause to restrict to just the first and last
-- These first/last pairs are then grouped in the first CTE
-- Data is restricted in the second CTE
-- The final select is then quite simple
With GroupedData AS (
SELECT
(Row_Number() OVER (ORDER BY t1.WH_PID, t1.WH_IN_DATETIME) - 1) / 2 Grp,
t1.WH_IN_DATETIME, t1.WH_OUT_DATETIME, t1.WH_PID
FROM yourtable t1
CROSS JOIN (SELECT 0 AS [First] UNION SELECT 1) SetOrder
LEFT OUTER JOIN yourtable t2
ON t1.WH_PID = t2.WH_PID
AND ((DATEADD(d,1,t1.WH_OUT_DATETIME) BETWEEN t2.WH_IN_DATETIME AND t2.WH_OUT_DATETIME AND [First] = 0)
OR (DATEADD(d,1,t2.WH_OUT_DATETIME) BETWEEN t1.WH_IN_DATETIME AND t1.WH_OUT_DATETIME AND [First] = 1))
WHERE t2.WH_PID IS NULL
), RestrictedData AS (
SELECT WH_PID, MIN(WH_IN_DATETIME) AS WH_IN_DATETIME, MAX(WH_OUT_DATETIME) AS WH_OUT_DATETIME
FROM GroupedData
GROUP BY Grp, WH_PID
)
SELECT yourtable.ADM_ID, yourtable.WH_PID, RestrictedData.WH_IN_DATETIME AS EP_IN_DATETIME, RestrictedData.WH_OUT_DATETIME AS EP_OUT_DATETIME, yourtable.WH_IN_DATETIME, yourtable.WH_OUT_DATETIME
FROM RestrictedData
INNER JOIN yourtable
ON RestrictedData.WH_PID = yourtable.WH_PID
AND yourtable.WH_IN_DATETIME BETWEEN RestrictedData.WH_IN_DATETIME AND RestrictedData.WH_OUT_DATETIME
ORDER BY yourtable.ADM_ID

A Left Outer Join and DateDiff Function should help you to filter the records. Finally Use Window Function to create GroupID's
create table #test
(ADM_ID int,WH_PID int,WH_IN_DATETIME DATETIME,WH_OUT_DATETIME DATETIME)
INSERT #test
VALUES ( 1,9,'2014-10-12 00:00:00','2014-10-13 15:00:00'),
(2,9,'2014-10-14 14:00:00','2014-10-15 15:00:00'),
(3,9,'2014-10-16 14:00:00','2014-10-17 15:00:00'),
(1,10,'2014-10-16 14:00:00','2014-10-17 15:00:00'),
(2,10,'2014-10-18 14:00:00','2014-10-19 15:00:00')
SELECT Row_number()OVER(partition by a.WH_PID ORDER BY a.WH_IN_DATETIME) Group_Id,
a.WH_PID,
a.WH_IN_DATETIME,
b.WH_OUT_DATETIME
FROM #test a
LEFT JOIN #test b
ON a.WH_PID = b.WH_PID
AND a.ADM_ID <> b.ADM_ID
where Datediff(hh, a.WH_OUT_DATETIME, b.WH_IN_DATETIME)BETWEEN 0 AND 24
OUTPUT :
Group_Id WH_PID WH_IN_DATETIME WH_OUT_DATETIME
-------- ------ ----------------------- -----------------------
1 9 2014-10-12 00:00:00.000 2014-10-15 15:00:00.000
2 9 2014-10-14 14:00:00.000 2014-10-17 15:00:00.000
1 10 2014-10-16 14:00:00.000 2014-10-19 15:00:00.000

Related

Select min date record from duplicates in table

Let say that I have this table "contract" which have duplicated records in the "END" column for the same ID.
ID
Begin
End
20
2016-01-01
9999-12-31
20
2020-01-01
9999-12-31
30
2018-01-01
2019-02-28
30
2019-03-01
9999-12-31
30
2020-02-01
9999-12-31
10
2019-01-01
2019-06-30
10
2019-07-01
2020-02-29
10
2020-03-01
9999-12-31
I want to get the oldest date in the "Begin" column for all the ID's that have duplicated records in the "END" column with the date "9999-12-31". So for this example I expect to get:
ID
Begin
20
2016-01-01
30
2019-03-01
I made an SQL script, but there should be a better way.
select ID, MIN(Begin) from
(
select * from contract m where exists
(
select 1 from contract v where END = '9999-12-31' and v.ID = m.ID
having count(ID)=2
)
and END = '9999-12-31'
)a
group by FUN_ID
If it is a big table, you really want to use EXISTS for finding duplicates because it will short circuit. Here's two ways to use EXISTS that might help with what you are trying to do.
DROP TABLE IF EXISTS #Test;
CREATE TABLE #Test
(
ID INT NOT NULL
,[Begin] DATE NOT NULL
,[End] DATE NOT NULL
)
;
INSERT INTO #Test
VALUES
(20,'2016-01-01','9999-12-31')
,(20,'2020-01-01','9999-12-31')
,(30,'2018-01-01','2019-02-28')
,(30,'2019-03-01','9999-12-31')
,(30,'2020-02-01','9999-12-31')
,(10,'2019-01-01','2019-06-30')
,(10,'2019-07-01','2020-02-29')
,(10,'2020-03-01','9999-12-31')
;
--See all duplicates with OldestBegin for context
SELECT
TST.ID
,TST.[Begin]
,TST.[End]
,OldestBegin = MIN([Begin]) OVER (PARTITION BY TST.ID,TST.[End])
FROM #Test AS TST
WHERE EXISTS
(
SELECT 1
FROM #Test AS _TST
WHERE TST.ID = _TST.ID
AND TST.[End] = _TST.[End]
AND TST.[Begin] <> _TST.[Begin]
)
;
--Get only oldest duplicate
SELECT
TST.ID
,TST.[End]
,[Begin] = MIN([Begin])
FROM #Test AS TST
WHERE EXISTS
(
SELECT 1
FROM #Test AS _TST
WHERE TST.ID = _TST.ID
AND TST.[End] = _TST.[End]
AND TST.[Begin] <> _TST.[Begin]
)
GROUP BY
TST.ID
,TST.[End]
;
Perhaps this will help:
DECLARE #Tab TABLE(ID INT,[Begin] DATE,[End] DATE)
INSERT #Tab
VALUES
(20,'2016-01-01','9999-12-31')
,(20,'2020-01-01','9999-12-31')
,(30,'2018-01-01','2019-02-28')
,(30,'2019-03-01','9999-12-31')
,(30,'2020-02-01','9999-12-31')
,(10,'2019-01-01','2019-06-30')
,(10,'2019-07-01','2020-02-29')
,(10,'2020-03-01','9999-12-31')
;WITH cte AS(
SELECT *
FROM #Tab
WHERE [End] = '9999-12-31'
)
SELECT ID, MIN([Begin]) AS [Begin]
FROM cte
GROUP BY ID
HAVING COUNT(*) > 1
Try this:
WITH test as (SELECT
count(*), min(begin) as Begin, ID from contract
where end = '9999-12-31' group by ID having count(*) > 1) select ID, Begin from test

Estimated number of rows is way off in execution plan

I have a situation where the estimated number of rows in the execution plan is way off
My columns in the join are varchar(50). I have tried different indexes but it does not reduce this problem. I have even tried with an index on the temp table. What else can I do?
PS this is the first place where the estimated number starts to drift... Also the tables are not big (48000 rows).
The code is:
SELECT DISTINCT householdnumber, householdid, primaryCustomerID
INTO #Households
FROM TableA
SELECT
A.*,
MIN(B.[ProfileCreatedDate]) PROFILECREATEDDATE
INTO #Profile
from #Households AS a
LEFT JOIN TableA AS B
ON A.[HouseholdNumber]= B.[HouseholdNumber] and A.[HouseholdId]=B.[HouseholdId]
GROUP BY a.householdnumber, a.householdid, a.primaryCustomerID;
I know it seems that this can be rewritten as:
SELECT householdnumber, householdid, primaryCustomerID, MIN([ProfileCreatedDate]) AS PROFILECREATEDDATE
INTO #Profile2
from TableA
GROUP BY householdnumber, householdid, primaryCustomerID;
But the results are not identical and I don't want to change the results since I am not sure if the creator of this code knew what they were doing.
Some statistics on the columns:
householdnumber is always equal to householdid. householdid is nvarchar(50) but householdnumber is varchar(40). The table has 48877 rows. Distinct combination of householdnumber, householdid, primaryCustomerID has 48029 rows. And distinct number of primaryCustomerID is 47152.
Regarding the code - it appears that the difference between the larger (original) version and your simpler GROUP BY version is that the original finds the minimum profilecreateddate for anyone in that household, whereas your simpler version finds the profilecreateddate for the specific primarycustomerid.
For example (using simpler data)
CREATE TABLE #TableA (householdnumber int, householdid int, primaryCustomerID int, ProfileCreatedDate datetime);
INSERT INTO #TableA (householdnumber, householdid, primaryCustomerID, ProfileCreatedDate) VALUES
(1, 1, 1, '20201001'),
(1, 1, 1, '20201002'),
(1, 1, 2, '20201003');
SELECT DISTINCT householdnumber, householdid, primaryCustomerID
INTO #Households
FROM #TableA;
SELECT
A.*,
MIN(B.[ProfileCreatedDate]) PROFILECREATEDDATE
INTO #Profile
from #Households AS a
LEFT JOIN #TableA AS B
ON A.[HouseholdNumber]= B.[HouseholdNumber] and A.[HouseholdId]=B.[HouseholdId]
GROUP BY a.householdnumber, a.householdid, a.primaryCustomerID;
SELECT * FROM #Profile;
/* -- Results
householdnumber householdid primaryCustomerID PROFILECREATEDDATE
1 1 1 2020-10-01 00:00:00.000
1 1 2 2020-10-01 00:00:00.000
*/
SELECT householdnumber, householdid, primaryCustomerID, MIN([ProfileCreatedDate]) AS PROFILECREATEDDATE
INTO #Profile2
from #TableA
GROUP BY householdnumber, householdid, primaryCustomerID;
SELECT * FROM #Profile2;
/* -- Results
householdnumber householdid primaryCustomerID PROFILECREATEDDATE
1 1 1 2020-10-01 00:00:00.000
1 1 2 2020-10-03 00:00:00.000
*/
If you notice in the above, the PROFILECREATEDATE for row 2 is different.
You could therefore try the following code that should give the same results as the original set - see how that goes for time (and confirm it matches the original results).
SELECT DISTINCT t1.householdnumber, t1.householdid, primaryCustomerID,
MIN([ProfileCreatedDate]) OVER (PARTITION BY t1.householdnumber, t1.householdid) AS PROFILECREATEDDATE
INTO #Profile3
FROM #TableA t1;
SELECT * FROM #Profile3;
/* -- Results
householdnumber householdid primaryCustomerID PROFILECREATEDDATE
1 1 1 2020-10-01 00:00:00.000
1 1 2 2020-10-01 00:00:00.000
*/

Exclude rows where dates exist in another table

I have 2 tables, one is working pattern, another is absences.
1) Work pattern
ID | Shift Start | Shift End
123| 01-03-2017 | 02-03-2017
2) Absences
ID| Absence Start | Absence End
123| 01-03-2017 | 04-03-2017
What would be the best way, when selecting rows from work pattern, to exclude any that have a date marked as an absence in the absence table?
For example, I have a report that uses the work pattern table to count how may days a week an employee has worked, however I don't want it to include the days that have been marked as an absence on the absence table if that makes sense? Also don't want it to include any days that fall between the absence start and absence end date?
If the span of the absence should always encompass the shift to be excluded you can use not exists():
select *
from WorkPatterns w
where not exists (
select 1
from Absences a
where a.Id = w.Id
and a.AbsenceStart <= w.ShiftStart
and a.AbsenceEnd >= w.ShiftEnd
)
rextester demo: http://rextester.com/DCODC76816
returns:
+-----+------------+------------+
| id | ShiftStart | ShiftEnd |
+-----+------------+------------+
| 123 | 2017-02-27 | 2017-02-28 |
| 123 | 2017-03-05 | 2017-03-06 |
+-----+------------+------------+
given this test setup:
create table WorkPatterns ([id] int, [ShiftStart] datetime, [ShiftEnd] datetime) ;
insert into WorkPatterns ([id], [ShiftStart], [ShiftEnd]) values
(123, '20170227', '20170228')
,(123, '20170301', '20170302')
,(123, '20170303', '20170304')
,(123, '20170305', '20170306')
;
create table Absences ([id] int, [AbsenceStart] datetime, [AbsenceEnd] datetime) ;
insert into Absences ([id], [AbsenceStart], [AbsenceEnd]) values
(123, '20170301', '20170304');
What would be the best way, when selecting rows from work pattern
If you dealing only whit dates (no time) and have control over db schema,
One approach will be to create calendar table ,
Where you going to put all dates since company started and some years in future
Fill that table once.
After it is easy to join other tables whit dates and do math.
If you have trouble whit constructing TSQL query please edit question whit more details about columns and values of tables, relations and needed results.
How about this:
SELECT WP_START.[id], WP_START.[shift_start], WP_START.[shift_end]
FROM work_pattern AS WP_START
INNER JOIN absences AS A ON WP_START.id = A.id
WHERE WP_START.[shift_start] NOT BETWEEN A.[absence_start] AND A.[absence_end]
UNION
SELECT WP_END.[id], WP_END.[shift_start], WP_END.[shift_end]
FROM work_pattern AS WP_END
INNER JOIN absences AS A ON WP_END.id = A.id
WHERE WP_END.[shift_end] NOT BETWEEN A.[absence_start] AND A.[absence_end]
See it on SQL Fiddle: http://sqlfiddle.com/#!6/49ae6/6
Here is my example that includes a Date Dimension table. If your DBAs won't add it, you can create #dateDim as a temp table, like I've done with SQLFiddle (didn't know I could do that). A typical date dimension would have a lot more details you need about the days, but if the table can't be added, just use what you need. You'll have to populate the other Holidays you need. The DateDim I use often is at https://github.com/shawnoden/SQL_Stuff/blob/master/sql_CreateDateDimension.sql
SQL Fiddle
MS SQL Server 2014 Schema Setup:
/* Tables for your test data. */
CREATE TABLE WorkPatterns ( id int, ShiftStart date, ShiftEnd date ) ;
INSERT INTO WorkPatterns ( id, ShiftStart, ShiftEnd )
VALUES
(123, '20170101', '20171031')
, (124, '20170601', '20170831')
;
CREATE TABLE Absences ( id int, AbsenceStart date, AbsenceEnd date ) ;
INSERT INTO Absences ( id, AbsenceStart, AbsenceEnd )
VALUES
( 123, '20170123', '20170127' )
, ( 123, '20170710', '20170831' )
, ( 124, '20170801', '20170820' )
;
/* ******** MAKE SIMPLE CALENDAR TABLE ******** */
CREATE TABLE dateDim (
theDate DATE NOT NULL
, IsWeekend BIT DEFAULT 0
, IsHoliday BIT DEFAULT 0
, IsWorkDay BIT DEFAULT 0
);
/* Populate basic details of dates. */
INSERT dateDim(theDate, IsWeekend, IsHoliday)
SELECT d
, CONVERT(BIT, CASE WHEN DATEPART(dw,d) IN (1,7) THEN 1 ELSE 0 END)
, CONVERT(BIT, CASE WHEN d = '20170704' THEN 1 ELSE 0 END) /* 4th of July. */
FROM (
SELECT d = DATEADD(DAY, rn - 1, '20170101')
FROM
(
SELECT TOP (DATEDIFF(DAY, '20170101', '20171231'))
rn = ROW_NUMBER() OVER (ORDER BY s1.[object_id])
FROM sys.all_objects AS s1
CROSS JOIN sys.all_objects AS s2
ORDER BY s1.[object_id]
) AS x
) AS y ;
/* If not a weekend or holiday, it's a WorkDay. */
UPDATE dateDim
SET IsWorkDay = CASE WHEN IsWeekend = 0 AND IsHoliday = 0 THEN 1 ELSE 0 END
;
Query For Calculation:
SELECT wp.ID, COUNT(d.theDate) AS workDayCount
FROM WorkPatterns wp
INNER JOIN dateDim d ON d.theDate BETWEEN wp.ShiftStart AND wp.ShiftEnd
AND d.IsWorkDay = 1
LEFT OUTER JOIN Absences a ON d.theDate BETWEEN a.AbsenceStart AND a.AbsenceEnd
AND wp.ID = a.ID
WHERE a.ID IS NULL
GROUP BY wp.ID
ORDER BY wp.ID
Results:
| ID | workDayCount |
|-----|--------------|
| 123 | 172 | << 216 total days, 44 non-working
| 124 | 51 | << 65 total days, 14 non-working

How to compare date fields between two tables and get the less or equal date from the second table

I have two tables. Table A and table B. Both of them have date fields. I need compare those fields and get a table C with the less or equal date between Table A and table B, taking into account that the table A is the main.
CONTEXT: I have in Table A Expiration of products, and in table B on business days. The user can update table B when it is determined
that a date is not to be considered as a "business day". Then delete
the date from table B and then go to table A to update all product
expirations that were registered with that date and assign them the
previous business day. So in my case I am creating table C, which
contains the Id of table A and the working date less or equal to the
date mentioned. Then I will make the respective update.
IF OBJECT_ID('tempdb..#tmpA') IS NOT NULL DROP TABLE #tmpA
IF OBJECT_ID('tempdb..#tmpB') IS NOT NULL DROP TABLE #tmpB
CREATE TABLE #tmpA(Id INT IDENTITY(100,1),Fecha date)
INSERT INTO #tmpA(Fecha)
VALUES
('20170101'),('20171003'),('20170504'),('2017-09-01')
SELECT * FROM #tmpA
Id Fecha
----------- ----------
100 2017-01-01
101 2017-10-03
102 2017-05-04
103 2017-09-01
CREATE TABLE #tmpB(Id INT IDENTITY(1,4),Fecha date)
INSERT INTO #tmpB(Fecha)
VALUES
('20170101'),('20171001'),('20170504')
SELECT * FROM #tmpB
Id Fecha
----------- ----------
1 2017-01-01
5 2017-10-01
9 2017-05-04
I want to get this result (The same number of records in table A):
Id Fecha
----------- ----------
100 2017-01-01
101 2017-10-01 --> **this row is less than 2017-10-03**
102 2017-05-04
103 2017-05-04 --> **this row is less than 2017-09-01**
I tried to built some queries without results,
IF OBJECT_ID('tempdb..#tmpC') IS NOT NULL DROP TABLE #tmpC
SELECT A.* INTO #tmpC FROM #tmpA A LEFT JOIN #tmpB B ON A.Fecha = B.Fecha WHERE B.Fecha IS NULL
SELECT * FROM #tmpC
SELECT *
FROM #tmpA A INNER JOIN
(
SELECT *
FROM #tmpC
GROUP BY id, Fecha
) AS Q ON MAX(Q.Fecha) <= A.Fecha
UPDATE:
NOTE. The Id column is simply an identity, but it does not mean that it should be related. The important thing is the dates.
Regards
While I'm not sure if this will scale well (if you have more than 100k rows) this will bring back the results which you want.
Theoretically, the correct way for you to do this, in a fashion which will scale well, would be to have a view where you utilize RANK() and join both of these tables together, though this was the quick and easy way. Please try this and let me know if it meets your requirements.
For your edification, I have left both of the dates in there for you to be able to compare them.
SELECT
A.ID
,A.FECHA OLDDATE
,B.FECHA CORRECTDATE
FROM #TMPA A
LEFT OUTER JOIN #TMPB B ON 1=1
WHERE 1=1
AND B.FECHA = (
SELECT MAX(FECHA)
FROM #TMPB
WHERE FECHA <= A.FECHA)
Is this what you want?
select a.id,
(case when b.fecha < a.fecha then b.fecha else a.fecha end) as fecha
from #tmpA a left join
#tmpB b
on a.id = b.id;
You can get minmum by union all
select id, min(fecha) from (
select * from #tmpA
union all
select * from #tmpB
) a
group by a.id
#JotaPardo WHERE 1=1 is used to basically make sure the query runs if the WHERE conditions don't hold up. 1=1 will equate to true so saying WHERE 1=1 or WHERE TRUE, and TRUE is always TRUE, ensures the query will have at least one WHERE clause condition that will always hold up.

How to add missing dates to the result table?

I have a table:
accdate (DATETIME) | value (INT)
-------------------+------------
|
The accdate-column contains datasets on hour-granularity. That means, there are datetimes in the format YYYY-mm-dd HH:00:00. If I view the table using SELECT * FROM mytable ORDER BY accdate ASC I get an ordered table by accdate. But mytable does not contain all possible dates and hours between the first row and the last (some dates are missing in the times my program is not running). I want to have default-values for all possible date+hour-combinations between the first and the last row.
I know this can be solved by using a LEFT JOIN with another table, that contains all possible dates in this range. But how do I construct such a table in a SQL-Statement? I think it is not senseful to populate the table with dummy data, if I can resolve the problem in the query.
Example:
accdate (DATETIME) | value (INT)
---------------------+------------
2011-11-11 19:00:00 | 50
2011-11-11 20:00:00 | 53
2011-11-11 22:00:00 | 16
2011-11-12 06:00:00 | 15
2011-11-12 07:00:00 | 150
The date 2011-11-11 21:00:00 and the range between 23 pm and 5am is missing. For these dates there should be a row in the result-table (containing a 0 in the value-column).
I hope you understand my problem. If something is unclear, please comment. Thank you.
With SQLite 3.8.3 or later, you can use a common table expression to generate values out of nothing:
WITH RECURSIVE AllDates(accdate)
AS (VALUES('2011-11-11 00:00:00')
UNION ALL
SELECT datetime(accdate, '+1 hour')
FROM AllDates
WHERE accdate < '2011-11-12 10:00:00')
SELECT AllHours.accdate,
MyTable.value
FROM AllHours
LEFT JOIN MyTable USING (accdate)
The only way I can think of is to use a left join with the same table, adding to the desired field on the join and union the result with the actual results to complete the set:
Example setup:
CREATE TABLE tmp (
id INT IDENTITY,
number INT
);
-- insert some incomplete sequenced values
INSERT INTO tmp (number) VALUES(1);
INSERT INTO tmp (number) VALUES(3);
INSERT INTO tmp (number) VALUES(4);
Example query:
-- select your actual data
SELECT number
FROM tmp
UNION
-- select the missing data
SELECT a.number + 1
FROM tmp a
LEFT JOIN tmp b ON a.number + 1 = b.number
WHERE b.id IS NULL
-- order the complete set
ORDER BY number ASC;
This will not work if you have more than one missing value between your results (eg.: 1 and 4), but if your data misses only single hours between each result this Works like a charm.