SQL list account managers with predecessor - sql

I have the following that I can't seem to figure out. I'm trying to get a list of account managers on an account, their start/end date and the new account manager that took over the account on a single row.
Example:
DECLARE #accountManagerListing TABLE
(
accountNumber INT,
accountManager VARCHAR(8),
accountManagerStartDate DATE,
accountManagerEndDate DATE
)
INSERT INTO #accountManagerListing (accountNumber, accountManager, accountManagerStartDate, accountManagerEndDate)
VALUES (1, 'asmith', '01/01/2001', '01/31/2001'),
(1, 'bsmith', '02/01/2001', '03/01/2002'),
(1, 'csmith', '03/02/2002', '03/10/2002'),
(1, 'dsmith', '03/11/2002', '06/01/2017'),
(1, 'esmith', '06/02/2017', '08/17/2018'),
(2, 'fsmith', '02/11/2018', '06/01/2018'),
(2, 'gsmith', '06/02/2018', null)
Expected results:
Account Number Old Account Manager New Account Manager Start Date End Date
1 asmith 01/01/2001 01/31/2001
1 asmith bsmith 02/01/2001 03/01/2002
1 bsmith csmith 03/02/2002 03/10/2002
1 csmith dsmith 03/11/2002 06/01/2017
1 dsmith esmith 06/02/2017 08/17/2018
2 fsmith 02/11/2018 06/01/2018
2 fsmith gsmith 06/02/2018 NULL

Use lag() :
select a.*,
lag(accountManager) over (partition by accountnumber order by accountManagerStartDate) as OldAccountManager
from #accountManagerListing a;

You can use a left join:
select aml.*, amlprev.accountmanager as old_accountmanager
from #accountManagerListing aml left join
#accountManagerListing amlprev
on amlprev.accountnumber = aml.accountnumber and
amlprev.enddate = dateadd(day, -1, aml.startdate);
This finds the immediately preceding manager (if any). If there is a gap, then this returns no manager. This logic seems more aligned with your description of "took over".

Related

SQL window function to remove multiple values with different criteria

I have a data set where I'm trying to remove records with the following conditions:
If a practid has multiple records with the same date and at least one record has a reason of "L&B" then I want all the practid's for that date to be removed.
DECLARE t table(practid int, statusdate date, reason varchar(100)
INSERT INTO t VALUES (1, '2018-03-01', 'L&B'),
(1, '2018-03-01', 'NULL'),
(1, '2018-04-01, 'R&D'),
(2, '2018-05-01, 'R&D'),
(2, '2018-05-01, 'R&D'),
(2, '2018-03-15', NULL),
(2, '2018-03-15', 'R&D),
(3, '2018-07-01, 'L&B)
With this data set I would want the following result:
PractId StatusDate Reason
1 2018-04-01 R&D
2 2018-05-01 R&D
2 2018-05-01 R&D
2 2018-03-15 NULL
2 2018-03-15 R&D
I tried solving this with a window function but am getting stuck:
SELECT *, ROW_NUMBER() OVER
(PARTITION BY practid, statusdate, CASE WHEN reason = 'L&B' THEN 0 ELSE 1 END) AS rn
FROM table
From my query I can't figure out how to keep Practid = 2 since I would want to keep all the records.
To continue along your current approach, we can use COUNT as an analytic function. We can count the occurrences of the L&B reason over each practid/statusdate window, and then retain only groups where this reason never occurs.
SELECT practid, statusdate, reason
FROM
(
SELECT *,
COUNT(CASE WHEN reason = 'L&B' THEN 1 END) OVER
(PARTITION BY practid, statusdate) cnt
FROM yourTable
) t
WHERE cnt = 0;
Demo
You can try to use not exists with a subquery.
Select *
from t t1
where not exists (
select 1
from t tt
where tt.reason = 'L&B' and t1.statusdate = tt.statusdate
)
sqlfiddle

Parent count based on pairing of multiple children

In the below example, I'm trying to count the number of drinks I can make based on the availability of ingredients per bar location that I have.
To further clarify, as seen in the below example: based on the figures highlighted in the chart below; I know that I can only make 1 Margarita on 6/30/2018 (in either DC or FL if I ship the supplies to the location).
Sample of data table
Please use the below code to enter the relevant data above:
CREATE TABLE #drinks
(
a_date DATE,
loc NVARCHAR(2),
parent NVARCHAR(20),
line_num INT,
child NVARCHAR(20),
avail_amt INT
);
INSERT INTO #drinks VALUES ('6/26/2018','CA','Long Island','1','Vodka','7');
INSERT INTO #drinks VALUES ('6/27/2018','CA','Long Island','2','Gin','5');
INSERT INTO #drinks VALUES ('6/28/2018','CA','Long Island','3','Rum','26');
INSERT INTO #drinks VALUES ('6/26/2018','DC','Long Island','1','Vodka','15');
INSERT INTO #drinks VALUES ('6/27/2018','DC','Long Island','2','Gin','18');
INSERT INTO #drinks VALUES ('6/28/2018','DC','Long Island','3','Rum','5');
INSERT INTO #drinks VALUES ('6/26/2018','FL','Long Island','1','Vodka','34');
INSERT INTO #drinks VALUES ('6/27/2018','FL','Long Island','2','Gin','14');
INSERT INTO #drinks VALUES ('6/28/2018','FL','Long Island','3','Rum','4');
INSERT INTO #drinks VALUES ('6/30/2018','DC','Margarita','1','Tequila','6');
INSERT INTO #drinks VALUES ('7/1/2018','DC','Margarita','2','Triple Sec','3');
INSERT INTO #drinks VALUES ('6/29/2018','FL','Margarita','1','Tequila','1');
INSERT INTO #drinks VALUES ('6/30/2018','FL','Margarita','2','Triple Sec','0');
INSERT INTO #drinks VALUES ('7/2/2018','CA','Cuba Libre','1','Rum','1');
INSERT INTO #drinks VALUES ('7/8/2018','CA','Cuba Libre','2','Coke','5');
INSERT INTO #drinks VALUES ('7/13/2018','CA','Cuba Libre','3','Lime','14');
INSERT INTO #drinks VALUES ('7/5/2018','DC','Cuba Libre','1','Rum','0');
INSERT INTO #drinks VALUES ('7/19/2018','DC','Cuba Libre','2','Coke','12');
INSERT INTO #drinks VALUES ('7/31/2018','DC','Cuba Libre','3','Lime','9');
INSERT INTO #drinks VALUES ('7/2/2018','FL','Cuba Libre','1','Rum','1');
INSERT INTO #drinks VALUES ('7/19/2018','FL','Cuba Libre','2','Coke','3');
INSERT INTO #drinks VALUES ('7/17/2018','FL','Cuba Libre','3','Lime','2');
INSERT INTO #drinks VALUES ('6/30/2018','DC','Long Island','3','Rum','4');
INSERT INTO #drinks VALUES ('7/7/2018','FL','Cosmopolitan','5','Triple Sec','7');
The expected results are as follows:
Please note, as seen in the expected results, children are interchangeable. For example, on 7/7/2018 Triple Sec arrived for the drink cosmopolitan; however because the child is also rum, it changes the availability of Margaritas for FL.
Also not the update to the DC region for Cuba Libre's on both 06/30 and 06/31.
Please take into consideration that parts are interchangeable and also that each time a new item arrives it makes available any item previously now.
Lastly - It would be awesome if I could add another column that shows kit availability regardless of location based only on availability of the child. For Ex. If there is a child #3 in DC and none in FL they FL can assume that they have enough inventory to make drink based on inventory in another location!
I've created a couple of extra tables to help with writing the query, but these could be generated from the #drinks table if you wanted:
CREATE TABLE #recipes
(
parent NVARCHAR(20),
child NVARCHAR(20)
);
INSERT INTO #recipes VALUES ('Long Island', 'Vodka');
INSERT INTO #recipes VALUES ('Long Island', 'Gin');
INSERT INTO #recipes VALUES ('Long Island', 'Rum');
INSERT INTO #recipes VALUES ('Maragrita', 'Tequila');
INSERT INTO #recipes VALUES ('Maragrita', 'Triple Sec');
INSERT INTO #recipes VALUES ('Cuba Libre', 'Coke');
INSERT INTO #recipes VALUES ('Cuba Libre', 'Rum');
INSERT INTO #recipes VALUES ('Cuba Libre', 'Lime');
INSERT INTO #recipes VALUES ('Cosmopolitan', 'Cranberry Juice');
INSERT INTO #recipes VALUES ('Cosmopolitan', 'Triple Sec');
CREATE TABLE #locations
(
loc NVARCHAR(20)
);
INSERT INTO #locations VALUES ('CA');
INSERT INTO #locations VALUES ('FL');
INSERT INTO #locations VALUES ('DC');
The query then becomes:
DECLARE #StartDateTime DATETIME
DECLARE #EndDateTime DATETIME
SET #StartDateTime = '2018-06-26'
SET #EndDateTime = '2018-07-31';
--First, build a range of dates that the report has to run for
WITH DateRange(a_date) AS
(
SELECT #StartDateTime AS DATE
UNION ALL
SELECT DATEADD(d, 1, a_date)
FROM DateRange
WHERE a_date < #EndDateTime
)
SELECT a_date, parent, loc, avail_amt
FROM (--available_recipes_inventory
SELECT a_date, parent, loc, avail_amt,
LAG(avail_amt, 1, 0) OVER (PARTITION BY loc, parent ORDER BY a_date) AS previous_avail_amt
FROM (--recipes_inventory
SELECT a_date, parent, loc,
--The least amount of the ingredients for a recipe is the most
--amount of drinks we can make for it
MIN(avail_amt) as avail_amt
FROM (--ingredients_inventory
SELECT dr.a_date, r.parent, r.child, l.loc,
--Default ingredients we don't have with a zero amount
ISNULL(d.avail_amt, 0) as avail_amt
FROM DateRange dr CROSS JOIN
#recipes r CROSS JOIN
#locations l OUTER APPLY
(
--Find the total amount available for each
--ingredient at each location for each date
SELECT SUM(d1.avail_amt) as avail_amt
FROM #drinks d1
WHERE d1.a_date <= dr.a_date
AND d1.loc = l.loc
AND d1.child = r.child
) d
) AS ingredients_inventory
GROUP BY a_date, parent, loc
) AS recipes_inventory
--Remove all recipes that we don't have enough ingredients for
WHERE avail_amt > 0
) AS available_recipes_inventory
--Selects the first time a recipe has enough ingredients to be made
WHERE previous_avail_amt = 0
--Selects when the amount of ingredients has changed
OR previous_avail_amt != avail_amt
ORDER BY a_date
--MAXRECURSION needed to generate the date range
OPTION (MAXRECURSION 0)
GO
The innermost SELECT creates a pseudo inventory table (ingredients_inventory) consisting of location, ingredient, date and amount available. When an ingredient is not available at a location for a particular date, then a zero is used.
The next SELECT query out finds how many of each recipe can be made for each location/date (again this may be zero).
The next SELECT query out is an intermediate table necessary to gather how many of each recipe for each location could be made for the previous day (whilst also removing any drinks that could not be made).
And finally, the outermost SELECT query uses the previous day's data to find when the quantity of each particular recipe that can be made has changed.
This query produces slightly different numbers to your table, but I think that's because yours is wrong? Taking Florida for example, an extra Rum comes in on 2nd July, so the number of Long Islands that can be made goes up to 5. And 2 Cuba Libres can be made by the 19th.
Results:
+------------+-------------+-----+-----------+
| a_date | parent | loc | avail_amt |
+------------+-------------+-----+-----------+
| 2018-06-28 | Long Island | DC | 5 |
| 2018-06-28 | Long Island | CA | 5 |
| 2018-06-28 | Long Island | FL | 4 |
| 2018-06-30 | Long Island | DC | 9 |
| 2018-07-01 | Maragrita | DC | 3 |
| 2018-07-02 | Long Island | FL | 5 |
| 2018-07-07 | Maragrita | FL | 1 |
| 2018-07-13 | Cuba Libre | CA | 5 |
| 2018-07-19 | Cuba Libre | FL | 2 |
| 2018-07-31 | Cuba Libre | DC | 9 |
+------------+-------------+-----+-----------+
I think this would give the required result.
Created a function that'll get the inventory.
Create function GetInventoryByDateAndLocation
(#date DATE, #Loc NVARCHAR(2))
RETURNS TABLE
AS
RETURN
(
Select child,avail_amt from
(Select a_date, child,avail_amt,
ROW_NUMBER() over (partition by child order by a_date desc) as ranking
from drinks where loc = #Loc and a_date<=#date)c
where ranking = 1
)
Then the query:
with parentChild as
(Select distinct parent, line_num, child from drinks),
ParentChildNo as
(Select parent, max(line_num) as ChildNo from parentChild group by parent)
,Inventory as
(Select a_date,loc,s.* from drinks d cross apply
GetInventoryByDateAndLocation(d.a_date, d.loc)s)
, Available as
(Select a_date,parent,loc,count(*) as childAvailable,min(avail_amt) as quantity
from Inventory i
join parentChild c
on i.child = c.child
group by parent,loc,a_date)
Select a_date,a.parent,loc,quantity from available a
join ParentChildNo pc
on a.parent = pc.parent and a.childAvailable = pc.ChildNo
where quantity > 0 order by 1
This would give all the drinks which can be made from the inventory. Hope it solves your issue.
These are just my 2 cents. There are better ways of doing this and I hope more people would read this and suggest better.
don't think this is exactly what your looking for... maybe it will help.
SELECT DISTINCT #drinks.loc,#drinks.parent,avail.Avail
FROM #drinks
LEFT OUTER JOIN (
SELECT DISTINCT #drinks.parent, MIN(availnow.maxavailnow / line_num)
OVER(PARTITION BY parent) as Avail
FROM #drinks
LEFT OUTER JOIN (
SELECT #drinks.child,SUM(avail_amt) maxavailnow
FROM #drinks
LEFT OUTER JOIN (SELECT MAX(a_date) date,loc,child FROM #drinks GROUP BY loc,child) maxx ON #drinks.loc = maxx.loc AND #drinks.child = maxx.child AND maxx.date = #drinks.a_date
GROUP BY #drinks.child
) availnow ON #drinks.child = availnow.child
) avail ON avail.parent = #drinks.parent
SELECT ( SELECT MAX(d2.a_date)
FROM #drinks AS d2
WHERE d2.parent = d.parent
AND d2.loc = d.loc) AS a_date
,d.loc
,d.parent
,SUM(d.avail_amt) AS [avail_amt(SUM)]
,COUNT(d.avail_amt) AS [avail_amt(COUNT)]
FROM #drinks AS d
GROUP BY d.loc
,d.parent
ORDER BY a_date

Exclude rows where dates exist in another table

I have 2 tables, one is working pattern, another is absences.
1) Work pattern
ID | Shift Start | Shift End
123| 01-03-2017 | 02-03-2017
2) Absences
ID| Absence Start | Absence End
123| 01-03-2017 | 04-03-2017
What would be the best way, when selecting rows from work pattern, to exclude any that have a date marked as an absence in the absence table?
For example, I have a report that uses the work pattern table to count how may days a week an employee has worked, however I don't want it to include the days that have been marked as an absence on the absence table if that makes sense? Also don't want it to include any days that fall between the absence start and absence end date?
If the span of the absence should always encompass the shift to be excluded you can use not exists():
select *
from WorkPatterns w
where not exists (
select 1
from Absences a
where a.Id = w.Id
and a.AbsenceStart <= w.ShiftStart
and a.AbsenceEnd >= w.ShiftEnd
)
rextester demo: http://rextester.com/DCODC76816
returns:
+-----+------------+------------+
| id | ShiftStart | ShiftEnd |
+-----+------------+------------+
| 123 | 2017-02-27 | 2017-02-28 |
| 123 | 2017-03-05 | 2017-03-06 |
+-----+------------+------------+
given this test setup:
create table WorkPatterns ([id] int, [ShiftStart] datetime, [ShiftEnd] datetime) ;
insert into WorkPatterns ([id], [ShiftStart], [ShiftEnd]) values
(123, '20170227', '20170228')
,(123, '20170301', '20170302')
,(123, '20170303', '20170304')
,(123, '20170305', '20170306')
;
create table Absences ([id] int, [AbsenceStart] datetime, [AbsenceEnd] datetime) ;
insert into Absences ([id], [AbsenceStart], [AbsenceEnd]) values
(123, '20170301', '20170304');
What would be the best way, when selecting rows from work pattern
If you dealing only whit dates (no time) and have control over db schema,
One approach will be to create calendar table ,
Where you going to put all dates since company started and some years in future
Fill that table once.
After it is easy to join other tables whit dates and do math.
If you have trouble whit constructing TSQL query please edit question whit more details about columns and values of tables, relations and needed results.
How about this:
SELECT WP_START.[id], WP_START.[shift_start], WP_START.[shift_end]
FROM work_pattern AS WP_START
INNER JOIN absences AS A ON WP_START.id = A.id
WHERE WP_START.[shift_start] NOT BETWEEN A.[absence_start] AND A.[absence_end]
UNION
SELECT WP_END.[id], WP_END.[shift_start], WP_END.[shift_end]
FROM work_pattern AS WP_END
INNER JOIN absences AS A ON WP_END.id = A.id
WHERE WP_END.[shift_end] NOT BETWEEN A.[absence_start] AND A.[absence_end]
See it on SQL Fiddle: http://sqlfiddle.com/#!6/49ae6/6
Here is my example that includes a Date Dimension table. If your DBAs won't add it, you can create #dateDim as a temp table, like I've done with SQLFiddle (didn't know I could do that). A typical date dimension would have a lot more details you need about the days, but if the table can't be added, just use what you need. You'll have to populate the other Holidays you need. The DateDim I use often is at https://github.com/shawnoden/SQL_Stuff/blob/master/sql_CreateDateDimension.sql
SQL Fiddle
MS SQL Server 2014 Schema Setup:
/* Tables for your test data. */
CREATE TABLE WorkPatterns ( id int, ShiftStart date, ShiftEnd date ) ;
INSERT INTO WorkPatterns ( id, ShiftStart, ShiftEnd )
VALUES
(123, '20170101', '20171031')
, (124, '20170601', '20170831')
;
CREATE TABLE Absences ( id int, AbsenceStart date, AbsenceEnd date ) ;
INSERT INTO Absences ( id, AbsenceStart, AbsenceEnd )
VALUES
( 123, '20170123', '20170127' )
, ( 123, '20170710', '20170831' )
, ( 124, '20170801', '20170820' )
;
/* ******** MAKE SIMPLE CALENDAR TABLE ******** */
CREATE TABLE dateDim (
theDate DATE NOT NULL
, IsWeekend BIT DEFAULT 0
, IsHoliday BIT DEFAULT 0
, IsWorkDay BIT DEFAULT 0
);
/* Populate basic details of dates. */
INSERT dateDim(theDate, IsWeekend, IsHoliday)
SELECT d
, CONVERT(BIT, CASE WHEN DATEPART(dw,d) IN (1,7) THEN 1 ELSE 0 END)
, CONVERT(BIT, CASE WHEN d = '20170704' THEN 1 ELSE 0 END) /* 4th of July. */
FROM (
SELECT d = DATEADD(DAY, rn - 1, '20170101')
FROM
(
SELECT TOP (DATEDIFF(DAY, '20170101', '20171231'))
rn = ROW_NUMBER() OVER (ORDER BY s1.[object_id])
FROM sys.all_objects AS s1
CROSS JOIN sys.all_objects AS s2
ORDER BY s1.[object_id]
) AS x
) AS y ;
/* If not a weekend or holiday, it's a WorkDay. */
UPDATE dateDim
SET IsWorkDay = CASE WHEN IsWeekend = 0 AND IsHoliday = 0 THEN 1 ELSE 0 END
;
Query For Calculation:
SELECT wp.ID, COUNT(d.theDate) AS workDayCount
FROM WorkPatterns wp
INNER JOIN dateDim d ON d.theDate BETWEEN wp.ShiftStart AND wp.ShiftEnd
AND d.IsWorkDay = 1
LEFT OUTER JOIN Absences a ON d.theDate BETWEEN a.AbsenceStart AND a.AbsenceEnd
AND wp.ID = a.ID
WHERE a.ID IS NULL
GROUP BY wp.ID
ORDER BY wp.ID
Results:
| ID | workDayCount |
|-----|--------------|
| 123 | 172 | << 216 total days, 44 non-working
| 124 | 51 | << 65 total days, 14 non-working

Grouping rows with a date range

I am using SQL Server 2008 and need to create a query that shows rows that fall within a date range.
My table is as follows:
ADM_ID WH_PID WH_IN_DATETIME WH_OUT_DATETIME
My rules are:
If the WH_OUT_DATETIME is on or within 24 hours of the WH_IN_DATETIME of another ADM_ID with the same WH_P_ID
I would like another column added to the results which identify the grouped value if possible as EP_ID.
e.g.
ADM_ID WH_PID WH_IN_DATETIME WH_OUT_DATETIME
------ ------ -------------- ---------------
1 9 2014-10-12 00:00:00 2014-10-13 15:00:00
2 9 2014-10-14 14:00:00 2014-10-15 15:00:00
3 9 2014-10-16 14:00:00 2014-10-17 15:00:00
4 9 2014-11-20 00:00:00 2014-11-21 00:00:00
5 5 2014-10-17 00:00:00 2014-10-18 00:00:00
Would return rows with:
ADM_ID WH_PID EP_ID EP_IN_DATETIME EP_OUT_DATETIME WH_IN_DATETIME WH_OUT_DATETIME
------ ------ ----- ------------------- ------------------- ------------------- -------------------
1 9 1 2014-10-12 00:00:00 2014-10-17 15:00:00 2014-10-12 00:00:00 2014-10-13 15:00:00
2 9 1 2014-10-12 00:00:00 2014-10-17 15:00:00 2014-10-14 14:00:00 2014-10-15 15:00:00
3 9 1 2014-10-12 00:00:00 2014-10-17 15:00:00 2014-10-16 14:00:00 2014-10-17 15:00:00
4 9 2 2014-11-20 00:00:00 2014-11-20 00:00:00 2014-10-16 14:00:00 2014-11-21 00:00:00
5 5 1 2014-10-17 00:00:00 2014-10-18 00:00:00 2014-10-17 00:00:00 2014-10-18 00:00:00
The EP_OUT_DATETIME will always be the latest date in the group. Hope this clarifies a bit.
This way, I can group by the EP_ID and find the EP_OUT_DATETIME and start time for any ADM_ID/PID that fall within.
Each should roll into the next, meaning that if another row has an WH_IN_DATETIME which follows on the WH_OUT_DATETIME of another for the same WH_PID, than that row's WH_OUT_DATETIME becomes the EP_OUT_DATETIME for all of the WH_PID's within that EP_ID.
I hope this makes some sense.
Thanks,
MR
Since the question does not specify that the solution be a "single" query ;-), here is another approach: using the "quirky update" feature dealy, which is updating a variable at the same time you update a column. Breaking down the complexity of this operation, I create a scratch table to hold the piece that is the hardest to calculate: the EP_ID. Once that is done, it gets joined into a simple query and provides the window with which to calculate the EP_IN_DATETIME and EP_OUT_DATETIME fields.
The steps are:
Create the scratch table
Seed the scratch table with all of the ADM_ID values -- this lets us do an UPDATE as all of the rows already exist.
Update the scratch table
Do the final, simple select joining the scratch table to the main table
The Test Setup
SET ANSI_NULLS ON;
SET NOCOUNT ON;
CREATE TABLE #Table
(
ADM_ID INT NOT NULL PRIMARY KEY,
WH_PID INT NOT NULL,
WH_IN_DATETIME DATETIME NOT NULL,
WH_OUT_DATETIME DATETIME NOT NULL
);
INSERT INTO #Table VALUES (1, 9, '2014-10-12 00:00:00', '2014-10-13 15:00:00');
INSERT INTO #Table VALUES (2, 9, '2014-10-14 14:00:00', '2014-10-15 15:00:00');
INSERT INTO #Table VALUES (3, 9, '2014-10-16 14:00:00', '2014-10-17 15:00:00');
INSERT INTO #Table VALUES (4, 9, '2014-11-20 00:00:00', '2014-11-21 00:00:00');
INSERT INTO #Table VALUES (5, 5, '2014-10-17 00:00:00', '2014-10-18 00:00:00');
Step 1: Create and Populate the Scratch Table
CREATE TABLE #Scratch
(
ADM_ID INT NOT NULL PRIMARY KEY,
EP_ID INT NOT NULL
-- Might need WH_PID and WH_IN_DATETIME fields to guarantee proper UPDATE ordering
);
INSERT INTO #Scratch (ADM_ID, EP_ID)
SELECT ADM_ID, 0
FROM #Table;
Alternate scratch table structure to ensure proper update order (since "quirky update" uses the order of the Clustered Index, as noted at the bottom of this answer):
CREATE TABLE #Scratch
(
WH_PID INT NOT NULL,
WH_IN_DATETIME DATETIME NOT NULL,
ADM_ID INT NOT NULL,
EP_ID INT NOT NULL
);
INSERT INTO #Scratch (WH_PID, WH_IN_DATETIME, ADM_ID, EP_ID)
SELECT WH_PID, WH_IN_DATETIME, ADM_ID, 0
FROM #Table;
CREATE UNIQUE CLUSTERED INDEX [CIX_Scratch]
ON #Scratch (WH_PID, WH_IN_DATETIME, ADM_ID);
Step 2: Update the Scratch Table using a local variable to keep track of the prior value
DECLARE #EP_ID INT; -- this is used in the UPDATE
;WITH cte AS
(
SELECT TOP (100) PERCENT
t1.*,
t2.WH_OUT_DATETIME AS [PriorOut],
t2.ADM_ID AS [PriorID],
ROW_NUMBER() OVER (PARTITION BY t1.WH_PID ORDER BY t1.WH_IN_DATETIME)
AS [RowNum]
FROM #Table t1
LEFT JOIN #Table t2
ON t2.WH_PID = t1.WH_PID
AND t2.ADM_ID <> t1.ADM_ID
AND t2.WH_OUT_DATETIME >= (t1.WH_IN_DATETIME - 1)
AND t2.WH_OUT_DATETIME < t1.WH_IN_DATETIME
ORDER BY t1.WH_PID, t1.WH_IN_DATETIME
)
UPDATE sc
SET #EP_ID = sc.EP_ID = CASE
WHEN cte.RowNum = 1 THEN 1
WHEN cte.[PriorOut] IS NULL THEN (#EP_ID + 1)
ELSE #EP_ID
END
FROM #Scratch sc
INNER JOIN cte
ON cte.ADM_ID = sc.ADM_ID
Step 3: Select Joining the Scratch Table
SELECT tab.ADM_ID,
tab.WH_PID,
sc.EP_ID,
MIN(tab.WH_IN_DATETIME) OVER (PARTITION BY tab.WH_PID, sc.EP_ID)
AS [EP_IN_DATETIME],
MAX(tab.WH_OUT_DATETIME) OVER (PARTITION BY tab.WH_PID, sc.EP_ID)
AS [EP_OUT_DATETIME],
tab.WH_IN_DATETIME,
tab.WH_OUT_DATETIME
FROM #Table tab
INNER JOIN #Scratch sc
ON sc.ADM_ID = tab.ADM_ID
ORDER BY tab.ADM_ID;
Resources
MSDN page for UPDATE
look for "#variable = column = expression"
Performance Analysis of doing Running Totals (not exactly the same thing as here, but not too far off)
This blog post does mention:
PRO: this method is generally pretty fast
CON: "The order of the UPDATE is controlled by the order of the clustered index". This behavior might rule out using this method depending on circumstances. But in this particular case, if the WH_PID values are not at least grouped together naturally via the ordering of the clustered index and ordered by WH_IN_DATETIME, then those two fields just get added to the scratch table and the PK (with implied clustered index) on the scratch table becomes (WH_PID, WH_IN_DATETIME, ADM_ID).
I would do this using exists in a correlated subquery:
select t.*,
(case when exists (select 1
from table t2
where t2.WH_P_ID = t.WH_P_ID and
t2.ADM_ID = t.ADM_ID and
t.WH_OUT_DATETIME between t2.WH_IN_DATETIME and dateadd(day, 1, t2.WH_OUT_DATETIME)
)
then 1 else 0
end) as TimeFrameFlag
from table t;
Try this query :
;WITH cte
AS (SELECT t1.ADM_ID AS EP_ID,*
FROM #yourtable t1
WHERE NOT EXISTS (SELECT 1
FROM #yourtable t2
WHERE t1.WH_PID = t2.WH_PID
AND t1.ADM_ID <> t2.ADM_ID
AND Abs(Datediff(HH, t1.WH_OUT_DATETIME, t2.WH_IN_DATETIME)) <= 24)
UNION ALL
SELECT t2.EP_ID,t1.ADM_ID,t1.WH_PID,t1.WH_IN_DATETIME,t1.WH_OUT_DATETIME
FROM #yourtable t1
JOIN cte t2
ON t1.WH_PID = t2.WH_PID
AND t1.ADM_ID <> t2.ADM_ID
AND Abs(( Datediff(HH, t2.WH_IN_DATETIME, t1.WH_OUT_DATETIME) )) <= 24),
cte_result
AS (SELECT t1.*,Dense_rank() OVER ( partition BY wh_pid ORDER BY t1.WH_PID, ISNULL(t2.EP_ID, t1.ADM_ID)) AS EP_ID
FROM #yourtable t1
LEFT OUTER JOIN (SELECT DISTINCT ADM_ID,
EP_ID
FROM cte) t2
ON t1.ADM_ID = t2.ADM_ID)
SELECT ADM_ID,WH_PID,EP_ID,Min(WH_IN_DATETIME)OVER(partition BY wh_pid, ep_id) AS [EP_IN_DATETIME],Max(WH_OUT_DATETIME)OVER(partition BY wh_pid, ep_id) AS [EP_OUT_DATETIME],
WH_IN_DATETIME,
WH_OUT_DATETIME
FROM cte_result
ORDER BY ADM_ID
I assumed these things :
Those rows which follow your rule, are a group.
min(WH_IN_DATETIME) of the group will be shown in EP_IN_DATETIME column for all rows belong to that group. Similarly, max(WH_OUT_DATETIME) of the group will be shown in EP_IN_DATETIME column for all rows belong to that group.
EP_ID will be assigned to groups of each WH_PID separately.
One thing which is not justified by your question that how EP_OUT_DATETIME and WH_IN_DATETIME of 4th row become 2014-11-20 00:00:00 and 2014-10-16 14:00:00 respectively. Assuming that it is a typo and it should be 2014-11-21 00:00:00.000 and 2014-11-20 00:00:00.000.
Explaination :
First CTE cte will return the possible groups based on your rule. Second CTE cte_result will assign EP_ID to groups. In the last, you can select min(WH_IN_DATETIME) and Max(WH_OUT_DATETIME) in partitions of wh_pid, ep_id.
sqlfiddle
Here's yet another alternative... which may miss your results still.
I agree with #NoDisplayName that there appears to be an error in your ADM_ID 5 output, the 2 OUT dates should match - at least that seems logical to me. I can't understand why you would want an out date to ever be showing an in date value, but of course there could be a good reason. :)
Also, the wording of your question makes it sound like this is just a part of the problem and that you may take this output to then further. I'm not sure what you are really aiming for, but I've broken the query below up into 2 CTEs and you may find your final information in the 2nd CTE (as it sounds like you want to group the data back together).
Here's the complete structure & query on SQL Fiddle
-- The Cross Join ensures we always have a pair of first and last time pairs
-- The left join matches all overlapping combinations,
-- allowing the where clause to restrict to just the first and last
-- These first/last pairs are then grouped in the first CTE
-- Data is restricted in the second CTE
-- The final select is then quite simple
With GroupedData AS (
SELECT
(Row_Number() OVER (ORDER BY t1.WH_PID, t1.WH_IN_DATETIME) - 1) / 2 Grp,
t1.WH_IN_DATETIME, t1.WH_OUT_DATETIME, t1.WH_PID
FROM yourtable t1
CROSS JOIN (SELECT 0 AS [First] UNION SELECT 1) SetOrder
LEFT OUTER JOIN yourtable t2
ON t1.WH_PID = t2.WH_PID
AND ((DATEADD(d,1,t1.WH_OUT_DATETIME) BETWEEN t2.WH_IN_DATETIME AND t2.WH_OUT_DATETIME AND [First] = 0)
OR (DATEADD(d,1,t2.WH_OUT_DATETIME) BETWEEN t1.WH_IN_DATETIME AND t1.WH_OUT_DATETIME AND [First] = 1))
WHERE t2.WH_PID IS NULL
), RestrictedData AS (
SELECT WH_PID, MIN(WH_IN_DATETIME) AS WH_IN_DATETIME, MAX(WH_OUT_DATETIME) AS WH_OUT_DATETIME
FROM GroupedData
GROUP BY Grp, WH_PID
)
SELECT yourtable.ADM_ID, yourtable.WH_PID, RestrictedData.WH_IN_DATETIME AS EP_IN_DATETIME, RestrictedData.WH_OUT_DATETIME AS EP_OUT_DATETIME, yourtable.WH_IN_DATETIME, yourtable.WH_OUT_DATETIME
FROM RestrictedData
INNER JOIN yourtable
ON RestrictedData.WH_PID = yourtable.WH_PID
AND yourtable.WH_IN_DATETIME BETWEEN RestrictedData.WH_IN_DATETIME AND RestrictedData.WH_OUT_DATETIME
ORDER BY yourtable.ADM_ID
A Left Outer Join and DateDiff Function should help you to filter the records. Finally Use Window Function to create GroupID's
create table #test
(ADM_ID int,WH_PID int,WH_IN_DATETIME DATETIME,WH_OUT_DATETIME DATETIME)
INSERT #test
VALUES ( 1,9,'2014-10-12 00:00:00','2014-10-13 15:00:00'),
(2,9,'2014-10-14 14:00:00','2014-10-15 15:00:00'),
(3,9,'2014-10-16 14:00:00','2014-10-17 15:00:00'),
(1,10,'2014-10-16 14:00:00','2014-10-17 15:00:00'),
(2,10,'2014-10-18 14:00:00','2014-10-19 15:00:00')
SELECT Row_number()OVER(partition by a.WH_PID ORDER BY a.WH_IN_DATETIME) Group_Id,
a.WH_PID,
a.WH_IN_DATETIME,
b.WH_OUT_DATETIME
FROM #test a
LEFT JOIN #test b
ON a.WH_PID = b.WH_PID
AND a.ADM_ID <> b.ADM_ID
where Datediff(hh, a.WH_OUT_DATETIME, b.WH_IN_DATETIME)BETWEEN 0 AND 24
OUTPUT :
Group_Id WH_PID WH_IN_DATETIME WH_OUT_DATETIME
-------- ------ ----------------------- -----------------------
1 9 2014-10-12 00:00:00.000 2014-10-15 15:00:00.000
2 9 2014-10-14 14:00:00.000 2014-10-17 15:00:00.000
1 10 2014-10-16 14:00:00.000 2014-10-19 15:00:00.000

Left join with complex join clause

I have two tables and want to left join them.
I want all entries from the account table, but only rows matching a criteria from the right table. If no criteria is matching, I only want the account.
The following does not work as expected:
SELECT * FROM Account a
LEFT JOIN
Entries ef ON ef.account_id = a.account_id AND
(ef.entry_period_end_date BETWEEN $periodStartDate_escaped AND LAST_DAY(date_add( $periodStartDate_escaped, INTERVAL $periodLengthInMonths_escaped MONTH))
OR
ef.forecast_period_end BETWEEN $periodStartDate_escaped AND LAST_DAY(date_add( $periodStartDate_escaped, INTERVAL $periodLengthInMonths_escaped MONTH))
OR
ef.entry_period_end_date IS NULL
OR
ef.forecast_period_end IS NULL
)
cause it also gives me the rows from the entries table, which are outside the requested period.
Example Data:
Account Table
AccountID | AccountName
1 Test
2 Foobar
3 Test1
4 Foobar2
Entries Table
id | AccountID | entry_period_end_date | forecast_period_end | amount
1 1 12/31/2009 12/31/2009 100
2 1 NULL 10/31/2009 150
3 2 NULL NULL 200
4 3 10/31/2009 NULL 250
5 4 10/31/2009 10/31/2009 300
So the query should return (when i set startDate = 12/01/2009, endDate 12/31/2009)
AccountID | id
1 1
2 NULL
3 NULL
4 NULL
Thx,
Martin
If either entry_period_end_date or forecast_period_end is NULL, the row will be returned, even if your other, non-NULL column is not within the period.
Probably you meant this:
SELECT *
FROM Account a
LEFT JOIN
Entries ef
ON ef.account_id = a.account_id
AND
(
entry_period_end_date BETWEEN …
OR forecast_period_end BETWEEN …
)
, which will return you all rows with either entry_period_end or forecast_period_end within the given period.
Update:
A test script:
CREATE TABLE account (AccountID INT NOT NULL, AccountName VARCHAR(100) NOT NULL);
INSERT
INTO account
VALUES
(1, 'Test'),
(2, 'Foobar'),
(3, 'Test1'),
(4, 'Foobar1');
CREATE TABLE Entries (id INT NOT NULL, AccountID INT NOT NULL, entry_period_end_date DATETIME, forecast_period_end DATETIME, amount FLOAT NOT NULL);
INSERT
INTO Entries
VALUES
(1, 1, '2009-12-31', '2009-12-31', 100),
(2, 1, NULL, '2009-10-31', 100),
(3, 2, NULL, NULL, 100),
(4, 3, '2009-10-31', NULL, 100),
(5, 4, '2009-10-31', '2009-10-31', 100);
SELECT a.*, ef.id
FROM Account a
LEFT JOIN
Entries ef
ON ef.accountID = a.accountID
AND
(
entry_period_end_date BETWEEN '2009-12-01' AND '2009-12-31'
OR forecast_period_end BETWEEN '2009-12-01' AND '2009-12-31'
);
returns following:
1, 'Test', 1
2, 'Foobar', NULL
3, 'Test1', NULL
4, 'Foobar1' NULL
Edited to fix logic so end date logic is grouped together, then forecast period logic...
Now it should check for a "good" end date (null or within range), then check for a "good" forecast date (null or within range)
Since all the logic is on the Entries table, narrow it down first, then join
SELECT a.*,temp.id FROM Account a
LEFT JOIN
(
SELECT id, account_id
FROM Entries ef
WHERE
((ef.entry_period_end_date BETWEEN $periodStartDate_escaped AND LAST_DAY(date_add( $periodStartDate_escaped, INTERVAL $periodLengthInMonths_escaped MONTH))
OR
ef.entry_period_end_date IS NULL
)
AND
(ef.forecast_period_end BETWEEN $periodStartDate_escaped AND LAST_DAY(date_add( $periodStartDate_escaped, INTERVAL $periodLengthInMonths_escaped MONTH))
OR
ef.forecast_period_end IS NULL
)
) temp
ON a.account_id = temp.account_id