Add X number of Working days to a date - sql

I have a table PostingPeriod that uses a company calendar to track all working days. Simplified, it looks like this:
Date Year Quarter Month Day IsWorkingDay
25.06.2015 2015 2 6 25 1
26.06.2015 2015 2 6 26 1
27.06.2015 2015 2 6 27 0
I have another table that contains all purchase lines with the Orderdate, confirmed delivery date from the vendor and the maximum allowed timeframe in working days between orderdate and deliverydate:
PurchID OrderDate ConfDelivery DeliveryDays
1234 14.04.2015 20.05.2015 30
1235 14.04.2015 24.05.2015 20
I want to create a new column that returns the maximum allowed Date (regardless of workday or not) for each order. The usual approach (Workingdays / 5 to get weeks, multiplied by 7 to get days) doesn't work, as all holidays etc need to be taken into consideration.
As this is for a DWH that will feed an OLAP database, performance is not an issue.

You could do this by assigning each working day an arbitrary index using ROW_NUMBER, e.g.
SELECT Date, WorkingDayIndex = ROW_NUMBER() OVER(ORDER BY Date)
FROM dbo.Calendar
Which will give you something like:
Date WorkingDayIndex
-----------------------------
2015-04-27 80
2015-04-28 81
2015-04-29 82
2015-04-30 83
2015-05-01 84
2015-05-05 85
2015-05-06 86
2015-05-07 87
Then if you want to know the date that is n working days from a given date, find the date with an index n higher, i.e. 2015-04-27 has an index of 80, therefore 5 working days later would have an index of 85 which yields 2015-05-05.
FULL WORKING EXAMPLE
/***************************************************************************************************************************/
-- CREATE TABLES AND POPULATE WITH TEST DATA
SET DATEFIRST 1;
DECLARE #Calendar TABLE (Date DATE, IsWorkingDay BIT);
INSERT #Calendar
SELECT TOP 365 DATEADD(DAY, ROW_NUMBER() OVER(ORDER BY object_id), '20141231'), 1 FROM sys.all_objects;
UPDATE #Calendar
SET IsWorkingDay = 0
WHERE DATEPART(WEEKDAY, Date) IN (6, 7)
OR Date IN ('2015-01-01', '2015-04-03', '2015-04-06', '2015-05-04', '2015-05-25', '2015-08-31', '2015-12-25', '2015-12-28');
DECLARE #T TABLE (PurchID INT, OrderDate DATE, ConfDeliveryDate DATE, DeliveryDays INT);
INSERT #T VALUES (1234, '20150414', '20150520', 30), (1235, '20150414', '20150524', 20);
/***************************************************************************************************************************/
-- ACTUAL QUERY
WITH WorkingDayCalendar AS
( SELECT *, WorkingDayIndex = ROW_NUMBER() OVER(ORDER BY Date)
FROM #Calendar
WHERE IsWorkingDay = 1
)
SELECT *
FROM #T AS t
INNER JOIN WorkingDayCalendar AS c1
ON c1.Date = t.OrderDate
INNER JOIN WorkingDayCalendar AS c2
ON c2.WorkingDayIndex = c1.WorkingDayIndex + t.DeliveryDays;
If this is a common requirement, then you could just make WorkingDayIndex a fixed field on your calendar table so you don't need to calculate it each time it is required.

Starting from OrderDate, the Date if you advance N(DeliveryDays) WorkingDays.
If i understood correctly you want something like this:
select
PurchID,
OrderDate,
ConfDelivery,
DeliveryDay,
myDays.[Date] myWorkingDayDeliveryDate
from Purchases p
outer apply (
select
[Date]
from (
select
ROW_NUMBER() OVER (
ORDER BY
Date
) myDays,
[Date]
from PostingPeriod pp
where
IsWorkingDay = 1 and
pp.date >= p.OrderDate
) myDays
where
myDays = p.DeliveryDay
) myDays

You'd have to do something like
SELECT OrderDate.PurchId, OrderDate.OrderDate, OrderDate.DeliveryDays, Aux.Counter, Aux.Date
FROM OrderDate, (SELECT row_number() OVER (ORDER BY Date) AS Counter, Date FROM PostingPeriod WHERE IsWorkingDay = 1 ) Aux
WHERE Counter = DeliveryDays
ORDER BY 1
Basically, you'd need all the dates inserted in the table PostingPeriod (weekends and holidays would have a IsWorkingDay = 0, rest of the days = 1)
and this would provide you the minimal date by summing the OrderDate with the ammount of working days

Related

How to generate temp table with integers 1-52 for weeks of year to join on?

I need to create a temporary table (I think) that contains a single WeekID field with values 1 through 52, indicating each week of the calendar year. I want to be able to left join against this table on the week number based on some data I have to indicate totals for each week of the year.
Preferably would like to do this in a single query.
What I have been using outputs the last 5 weeks in which records exist, as opposed to the actual last 5 weeks, in which totals may be 0.
Here is my errant query that gives me last 5 weeks totals where tickets actually got opened:
SET DATEFIRST 1
SELECT TOP 5 * FROM
(SELECT TOP 5
DATEPART(year, t.TicketQueuedDateTime) AS 'TicketYear',
DATEPART(week, t.TicketQueuedDateTime) AS 'TicketWeek',
COUNT(t.TicketStatus) AS 'WeekTotal'
FROM TicketTable t
GROUP BY DATEPART(year, t.TicketQueuedDateTime), DATEPART(week, t.TicketQueuedDateTime)
ORDER BY TicketYear DESC, TicketWeek DESC) val
ORDER BY val.TicketYear, val.TicketWeek
Current output:
TicketYear TicketWeek WeekTotal
2018 25 13
2018 26 10
2018 27 4
2018 29 2
2018 32 1
This works great; however, I want to show the actual totals for the actual last 5 weeks, even if there hasn't been any tickets (a "0" output should be filled in where there are "gap" weeks with no tickets as well).
Expected output (assuming for sake of this post that we're in week 33 and there have been no tickets this week:
TicketYear TicketWeek WeekTotal
2018 29 2
2018 30 0
2018 31 0
2018 32 1
2018 33 0
(note: weeks with no tickets gaps are filled with "0" value, and reflects the actual last 5 weeks including current week)
MSSQL 2016 Enterprise Edition
Without creating temporary table, you can simplify this query using CTE, like below.
- Use recursive CTE to generate week numbers
- Get distinct years from TicketTable
- Cross join distinct years and weeks to get all combinations
- Then left join it with TicketTable to get count for each year-week
;With WEEK_CTE as (
Select 1 as WeekNo
UNION ALL
SELECT 1 + WeekNo from WEEK_CTE
WHERE WeekNo < 52
)
Select yr.Year AS 'TicketYear'
, wk.WeekNo AS 'TicketWeek'
, COUNT(t.TicketStatus) AS 'WeekTotal'
from Week_CTE wk
cross join (select distinct year(TicketQueuedDateTime) as [Year] from TicketTable) yr
left join TicketTable t on wk.WeekNo = DATEPART(WEEK, t.TicketQueuedDateTime) and yr.Year = YEAR(t.TicketQueuedDateTime)
group by yr.Year, wk.WeekNo
You could generate such a table in a number of ways. If you don't already have a tally table in your database (i.e. a table with sequential integers in it), I'd suggest creating one, as their usefulness is endless. Regardless, you can create one on the fly using row_number(). Then just subtract the integer value you generated from the current date in weeks, selecting the top 52 of em. Strip out the year and week, and you my friend, have got yourself the query to populate your join table.
-- Creating a numbers table
if object_id('tempdb.dbo.#Numbers') is not null drop table #Numbers
create table #Numbers
(
num int primary key clustered
)
-- Populating it with some numbers
insert into #Numbers (num)
select row_number() over (order by (select null)) - 1
from sys.all_objects
select top 52
WeeksAgo = num,
TicketYear = year(dateadd(week, -num, getdate())),
TicketWeek = datepart(week, dateadd(week, -num, getdate()))
from #Numbers
I reused #Xedni's query and came up with the query below:
if object_id('tempdb.dbo.#Numbers') is not null drop table #Numbers
create table #Numbers
(
num int primary key clustered
)
-- Populating it with some numbers
insert into #Numbers (num)
select row_number() over (order by (select null)) - 1
from sys.all_objects
select TicketYear = year(dateadd(week, -num, getdate())),
TicketWeek = datepart(week, dateadd(week, -num, getdate()))
from #Numbers
SELECT TOP 5 * FROM
(SELECT TOP 5
DATEPART(year, t.TicketQueuedDateTime) AS 'TicketYear',
DATEPART(week, t.TicketQueuedDateTime) AS 'TicketWeek',
COUNT(t.TicketStatus) AS 'WeekTotal'
FROM #Numbers as n
LEFT OUTER JOIN TicketTable as t ON year(dateadd(week, -n.num, getdate())) = t.DATEPART(year, t.TicketQueuedDateTime) AND datepart(week, dateadd(week, -n.num, getdate())) = DATEPART(week, t.TicketQueuedDateTime)
GROUP BY DATEPART(year, t.TicketQueuedDateTime), DATEPART(week, t.TicketQueuedDateTime)
ORDER BY TicketYear DESC, TicketWeek DESC) val
ORDER BY val.TicketYear, val.TicketWeek
PS: I was not able to test this and if you're looking for performance, this is probably not the best query to use. But try this out, if it works for you, we can work on improving the performance.
Cheers!

How do you find date intervals across rows when there are more than 2 rows?

I'm calculating patient readmission rates and need to find what patients have readmitted within a certain interval, and how often. I have admit data that looks like:
Subscriber_id New_Admission_date
01 2016-06-02
02 2016-06-01
03 2016-06-10
04 2016-06-08
02 2016-06-04
02 2016-06-30
03 2016-06-28
To find what patients have readmitted within 14 days and what the interval between admits was, I have this code:
select ra.Subscriber_id, DATEDIFF(d,ra.first_ad,ra.last_ad) as interval
from
(
select j.Subscriber_ID,
min(j.New_admission_date) as first_ad,
max (j.New_Admission_Date) as last_ad
from June_inpatients as j
inner join
(select j.Subscriber_ID, count(Subscriber_ID) as total
from June_inpatients as j
group by Subscriber_ID
having count(Subscriber_ID) >1 ) as r
on j.Subscriber_ID = r.Subscriber_ID
group by j.Subscriber_ID
) as ra
where DATEDIFF(d,ra.first_ad,ra.last_ad) < 15
The problem is that some patients, like patient ID 02 in the example data, have more than 2 admits. My code misses any intermediary admits since it's using min() and max(). How would I find the interval between a patient's first admit and second admit when there are three admits, and then find the interval between the second admit and the third?
Assuming you're using at least SQL 2012 you can use the Lag function.
The idea with LAG/LEAD is that we can query data from the previous/next rows returned.
In my full example below I use LAG twice, once on subscriber and once on the date. Ordering by the subscriber and date guarantees that the previous/next rows will be in the correct order. I then limit my where clause to ensure:
that the previous row is for the same subscriber
that the dates are within 15 days
DECLARE #tbl TABLE (
pkey INT NOT NULL PRIMARY KEY IDENTITY,
subscriber INT NOT NULL,
dt DATETIME NOT NULL
);
INSERT INTO #tbl
( subscriber, dt )
VALUES
( 1, '2016-06-02'),
( 2, '2016-06-01'),
(3, '2016-06-10'),
(4, '2016-06-08'),
(2, '2016-06-04'),
(2, '2016-06-30'),
(3, '2016-06-28');
SELECT *
FROM #tbl
ORDER BY subscriber, dt
; WITH tmp AS (
SELECT subscriber, dt,
LAG(subscriber) OVER (ORDER BY subscriber, dt) previousSubscriber,
LAG(dt) OVER (ORDER BY subscriber, dt) previousDt
FROM #tbl
--ORDER BY subscriber, dt
)
SELECT tmp.*, DATEDIFF(DAY, previousDt, dt)
FROM tmp
WHERE tmp.subscriber = previousSubscriber
AND DATEDIFF(DAY, previousDt, dt) < 15
If you are using SQL Server 2012 or later, try this:
;WITH
cte As
(
SELECT Subscriber_id,
LAG(New_Admission_Date, 1) OVER (PARTITION BY Subscriber_id ORDER BY New_Admission_Date) AS PreviousAdmissionDate,
New_Admission_Date
FROM AdmissionTable
)
SELECT *
FROM cte
WHERE DATEDIFF(DAY, PreviousAdmissionDate, New_Admission_Date) <= 14
This will work without Lag function.
;WITH J
AS
(
SELECT ROW_NUMBER() OVER(PARTITION BY J1.Subscriber_id ORDER BY J1.New_Admission_date ) ROW_ID ,*
FROM June_inpatients J1
)
SELECT J1.Subscriber_id, J1.New_Admission_date Previous_Admission_date ,
J2.Subscriber_id, J2.New_Admission_date , DATEDIFF(DD,J1.New_Admission_date,J2.New_Admission_date) Interval
FROM J J1
INNER JOIN J J2 ON J1.Subscriber_id = J2.Subscriber_id AND J1.ROW_ID = J2.ROW_ID -1
WHERE DATEDIFF(DD,J1.New_Admission_date,J2.New_Admission_date)<15

trying to find the maximum number of occurrences over time T-SQL

I have data recording the StartDateTime and EndDateTime (both DATETIME2) of a process for all of the year 2013.
My task is to find the maximum amount of times the process was being ran at any specific time throughout the year.
I have wrote some code to check every minute/second how many processes were running at the specific time, but this takes a very long time and would be impossible to let it run for the whole year.
Here is the code (in this case check every minute for the date 25/10/2013)
CREATE TABLE dbo.#Hit
(
ID INT IDENTITY (1,1) PRIMARY KEY,
Moment DATETIME2,
COUNT INT
)
DECLARE #moment DATETIME2
SET #moment = '2013-10-24 00:00:00'
WHILE #moment < '2013-10-25'
BEGIN
INSERT INTO #Hit ( Moment, COUNT )
SELECT #moment, COUNT(*)
FROM dbo.tblProcessTimeLog
WHERE ProcessFK IN (25)
AND #moment BETWEEN StartDateTime AND EndDateTime
AND DelInd = 0
PRINT #moment
SET #moment = DATEADD(MINute,1,#moment)
END
SELECT * FROM #Hit
ORDER BY COUNT DESC
Can anyone think how i could get a similar result (I just need the maximum amount of processes being run at any given time), but for all year?
Thanks
DECLARE #d DATETIME = '20130101'; -- the first day of the year you care about
;WITH m(m) AS
( -- all the minutes in a day
SELECT TOP (1440) ROW_NUMBER() OVER (ORDER BY number) - 1
FROM master..spt_values
),
d(d) AS
( -- all the days in *that* year (accounts for leap years vs. hard-coding 365)
SELECT TOP (DATEDIFF(DAY, #d, DATEADD(YEAR, 1, #d))) DATEADD(DAY, number, #d)
FROM master..spt_values WHERE type = N'P' ORDER BY number
),
x AS
( -- all the minutes in *that* year
SELECT moment = DATEADD(MINUTE, m.m, d.d) FROM m CROSS JOIN d
)
SELECT TOP (1) WITH TIES -- in case more than one at the top
x.moment, [COUNT] = COUNT(l.ProcessFK)
FROM x
INNER JOIN dbo.tblProcessTimeLog AS l
ON x.moment >= l.StartDateTime
AND x.moment <= l.EndDateTime
WHERE l.ProcessFK = 25 AND l.DelInd = 0
GROUP BY x.moment
ORDER BY [COUNT] DESC;
See this post for why I don't think you should use BETWEEN for range queries, even in cases where it does semantically do what you want.
Create a table T whose rows represent some time segments.
This table could well be a temporary table (depending on your case).
Say:
row 1 - [from=00:00:00, to=00:00:01)
row 2 - [from=00:00:01, to=00:00:02)
row 3 - [from=00:00:02, to=00:00:03)
and so on.
Then just join from your main table
(tblProcessTimeLog, I think) to this table
based on the datetime values recorded in
tblProcessTimeLog.
A year has just about half million minutes
so it is not that many rows to store in T.
I recently pulled some code from SO trying to solve the 'island and gaps' problem, and the algorithm for that should help you solve your problem.
The idea is that you want to find the point in time that has the most started processes, much like figuring out the deepest nesting of parenthesis in an expression:
( ( ( ) ( ( ( (deepest here, 6)))))
This sql will produce this result for you (I included a temp table with sample data):
/*
CREATE TABLE #tblProcessTimeLog
(
StartDateTime DATETIME2,
EndDateTime DATETIME2
)
-- delete from #tblProcessTimeLog
INSERT INTO #tblProcessTimeLog (StartDateTime, EndDateTime)
Values ('1/1/2012', '1/6/2012'),
('1/2/2012', '1/6/2012'),
('1/3/2012', '1/6/2012'),
('1/4/2012', '1/6/2012'),
('1/5/2012', '1/7/2012'),
('1/6/2012', '1/8/2012'),
('1/6/2012', '1/10/2012'),
('1/6/2012', '1/11/2012'),
('1/10/2012', '1/12/2012'),
('1/15/2012', '1/16/2012')
;
*/
with cteProcessGroups (EventDate, GroupId) as
(
select EVENT_DATE, (E.START_ORDINAL - E.OVERALL_ORDINAL) GROUP_ID
FROM
(
select EVENT_DATE, EVENT_TYPE,
MAX(START_ORDINAL) OVER (ORDER BY EVENT_DATE, EVENT_TYPE ROWS UNBOUNDED PRECEDING) as START_ORDINAL,
ROW_NUMBER() OVER (ORDER BY EVENT_DATE, EVENT_TYPE) AS OVERALL_ORDINAL
from
(
Select StartDateTime AS EVENT_DATE, 1 as EVENT_TYPE, ROW_NUMBER() OVER (ORDER BY StartDateTime) as START_ORDINAL
from #tblProcessTimeLog
UNION ALL
select EndDateTime, 0 as EVENT_TYPE, NULL
FROM #tblProcessTimeLog
) RAWDATA
) E
)
select Max(EventDate) as EventDate, count(GroupId) as OpenProcesses
from cteProcessGroups
group by (GroupId)
order by COUNT(GroupId) desc
Results:
EventDate OpenProcesses
2012-01-05 00:00:00.0000000 5
2012-01-06 00:00:00.0000000 4
2012-01-15 00:00:00.0000000 2
2012-01-10 00:00:00.0000000 2
2012-01-08 00:00:00.0000000 1
2012-01-07 00:00:00.0000000 1
2012-01-11 00:00:00.0000000 1
2012-01-06 00:00:00.0000000 1
2012-01-06 00:00:00.0000000 1
2012-01-06 00:00:00.0000000 1
2012-01-16 00:00:00.0000000 1
Note that the 'in-between' rows don't give anything meaningful. Basically this output is only tuned to tell you when the most activity was. Looking at the other rows in the out put, there wasn't just 1 process running on 1/8 (there was actually 3). But the way this code works is that by grouping the processes that are concurrent together in a group, you can count the number of simultaneous processes. The date returned is when the max concurrent processes began. It doesn't tell you how long they were going on for, but you can solve that with an additional query. (once you know the date the most was ocurring, you can find out the specific process IDs by using a BETWEEN statement on the date.)
Hope this helps.

SQL Query for a running aggregate of date value - 365 days with SUM and GROUP BY clauses involved

Setup: SQL Server 2012 with data being used in a line graph in SSRS using Report Builder 3.0
I have a table with sales data with the first transaction being on 10/10/2012. This table gets written to regularly updating the sales information and will continue to be written to into the future (current date being 11/26/13, which is why the sample table stops there).
Here's a sample of the original data I'm starting with:
StartDate Product PaidQty UnPaidQty
-------------------------------------------------------------
2012-10-10 Product A 100 150
2012-10-10 Product B 110 50
2012-10-10 Product C 10 100
2012-10-11 Product A 120 200
2012-10-11 Product D 140 230
2012-10-11 Product E 180 20
...
2013-05-01 Product H 120 60
2013-05-01 Product J 90 90
2013-05-01 Product K 120 160
...
2013-11-25 Product B 90 80
2013-11-25 Product F 190 180
2013-11-25 Product G 120 60
NOTE: There may not be values for every date. For instance there isn't a row for 12/25/12.
What I want to end with should look like this:
StartDate DailyTotal AnnualTotal
--------------------------------------------
2012-10-10 520 520 <-- The Sum of 10/10/12 through 10/10/12
2012-10-11 890 1410 <-- The Sum of 10/10/12 through 10/11/12
...
2013-05-01 640 278,000 <-- The Sum of 10/10/12 through 05/01/13
...
2013-11-25 720 450,500 <-- The Sum of 11/26/12 through 11/25/13
Getting the daily total column, combining PaidQty and UnPaidQty is simple enough:
SELECT StartDate, SUM(PaidQty) + SUM(UnPaidQty) AS Total
FROM Table
GROUP BY StartDate
ORDER BY StartDate
But, I need to end up with data that provides me a daily total (Paid + UnPaid) as well as a running total of the previous 365 days. So because no data exists prior to 10/10/2012, the running total will sum the totals from 10/10/2012 until 10/11/2013 at which point it will sum the totals of the StartDate value - 365 days. There is a large list of products, any of which can be sold on any day, but for my end result, I don't care about the products. This information comes into play with the use of the GROUP BY clause in order to return a single row for each date.
I'm just not grasping what I would need to do in order to add an additional column for a running annual total. I've tried using an OVER() clause, such as:
SELECT StartDate, SUM(PaidQty) + SUM(UnPaidQty) AS Total, SUM(PaidQty) + SUM(UnPaidQty) OVER (ORDER BY StartDate) AS AnnualTotal
FROM Table
GROUP BY StartDate
ORDER BY StartDate
But this errors out with the message that PaidQty and UnPaidQty need to be in an aggregate function or GROUP BY clause. If I add the those to the GROUP BY clause, then I end up with multiple rows for each date again and the running value isn't correct.
EDIT: As suggested by Aaron's answer below, I've ended up with the following query:
IF OBJECT_ID('tempdb..#x') IS NOT NULL DROP TABLE #x
CREATE TABLE #x
(
StartDate DATETIME,
PaidQty INT,
UnPaidQty INT
)
INSERT INTO #x
SELECT StartDate, SUM(PaidQty), SUM(UnPaidQty)
FROM MyTable
GROUP BY StartDate
SELECT Date, PaidQty+UnPaidQty AS DailyTotal,
SUM(PaidQty+UnPaidQty) OVER (Order By Date ROWS 364 PRECEDING) AS AnnualTotal
FROM CalendarDates
LEFT JOIN #x ON Date = StartDate
WHERE Date BETWEEN '2012-10-10' AND GetDate()
ORDER BY Date
I created a table in my database called CalendarDates that simply contains a list of all dates from 1/1/2010 through 12/31/2100. This is used to fill in NULL for any dates that no sales were completed.
DECLARE #x TABLE(StartDate DATE, Product VARCHAR(30), PaidQty INT, UnPaidQty INT);
INSERT #x VALUES
('2012-10-10','Product A',100,150),
('2012-10-10','Product B',110,50 ),
('2012-10-10','Product C',10 ,100),
('2012-10-11','Product A',120,200),
('2012-10-11','Product D',140,230),
('2012-10-11','Product E',180,20 ),
('2012-10-12','Product B',90 ,80 ),
('2012-10-12','Product F',190,180),
('2012-10-12','Product G',120,60 );
;WITH x AS
(
SELECT StartDate, pq = SUM(PaidQty), uq = SUM(UnPaidQty)
FROM #x GROUP BY StartDate
)
SELECT StartDate, pq, uq,
SUM(pq+uq) OVER (ORDER BY StartDate ROWS 365 PRECEDING)
FROM x;
Now, this assumes that you will have a row for every date. If not, you may want to generate a set of dates and use that as an anchor in a left join. Otherwise the previous 365 rows will represent more than 365 days. To do this:
DECLARE #minDate DATE, #maxDate DATE, #delta INT;
SELECT #maxDate = MAX(StartDate), #minDate = MIN(StartDate) FROM #x;
SET #delta = DATEDIFF(DAY, #minDate, #maxDate);
IF #delta > 364
SELECT #minDate = DATEADD(DAY, -364, #maxDate), #delta = 364;
;WITH n(n) AS
(
SELECT TOP (#delta+1) ROW_NUMBER() OVER (ORDER BY [object_id])
FROM sys.all_columns
),
d(d) AS (SELECT DATEADD(DAY, n-1, #minDate) FROM n),
x AS
(
SELECT StartDate = d.d, pq = SUM(PaidQty), uq = SUM(UnPaidQty)
FROM d LEFT OUTER JOIN #x AS x
ON d.d = x.StartDate GROUP BY d.d
)
SELECT StartDate, pq, uq,
SUM(pq+uq) OVER (ORDER BY StartDate ROWS 365 PRECEDING)
FROM x
ORDER BY StartDate;

Select repeat occurrences within time period <x days

If I had a large table (100000 + entries) which had service records or perhaps admission records. How would I find all the instances of re-occurrence within a set number of days.
The table setup could be something like this likely with more columns.
Record ID Customer ID Start Date Time Finish Date Time
1 123456 24/04/2010 16:49 25/04/2010 13:37
3 654321 02/05/2010 12:45 03/05/2010 18:48
4 764352 24/03/2010 21:36 29/03/2010 14:24
9 123456 28/04/2010 13:49 31/04/2010 09:45
10 836472 19/03/2010 19:05 20/03/2010 14:48
11 123456 05/05/2010 11:26 06/05/2010 16:23
What I am trying to do is work out a way to select the records where there is a re-occurrence of the field [Customer ID] within a certain time period (< X days). (Where the time period is Start Date Time of the 2nd occurrence - Finish Date Time of the first occurrence.
This is what I would like it to look like once it was run for say x=7
Record ID Customer ID Start Date Time Finish Date Time Re-occurence
9 123456 28/04/2010 13:49 31/04/2010 09:45 1
11 123456 05/05/2010 11:26 06/05/2010 16:23 2
I can solve this problem with a smaller set of records in Excel but have struggled to come up with a SQL solution in MS Access. I do have some SQL queries that I have tried but I am not sure I am on the right track.
Any advice would be appreciated.
I think this is a clear expression of what you want. It's not extremely high performance but I'm not sure that you can avoid either correlated sub-query or a cartesian JOIN of the table to itself to solve this problem. It is standard SQL and should work in most any engine, although the details of the date math may differ:
SELECT * FROM YourTable YT1 WHERE EXISTS
(SELECT * FROM YourTable YT2 WHERE
YT2.CustomerID = YT1.CustomerID AND YT2.StartTime <= YT2.FinishTime + 7)
In order to accomplish this you would need to make a self join as you are comparing the entire table to itself. Assuming similar names it would look something like this:
select r1.customer_id, min(start_time), max(end_time), count(1) as reoccurences
from records r1,
records r2
where r1.record_id > r2.record_id -- this ensures you don't double count the records
and r1.customer_id = r2.customer_id
and r1.finish_time - r2.start_time <= 7
group by r1.customer_id
You wouldn't be able to easily get both the record_id and the number of occurences, but you could go back and find it by correlating the start time to the record number with that customer_id and start_time.
This will do it:
declare #t table(Record_ID int, Customer_ID int, StartDateTime datetime, FinishDateTime datetime)
insert #t values(1 ,123456,'2010-04-24 16:49','2010-04-25 13:37')
insert #t values(3 ,654321,'2010-05-02 12:45','2010-05-03 18:48')
insert #t values(4 ,764352,'2010-03-24 21:36','2010-03-29 14:24')
insert #t values(9 ,123456,'2010-04-28 13:49','2010-04-30 09:45')
insert #t values(10,836472,'2010-03-19 19:05','2010-03-20 14:48')
insert #t values(11,123456,'2010-05-05 11:26','2010-05-06 16:23')
declare #days int
set #days = 7
;with a as (
select record_id, customer_id, startdatetime, finishdatetime,
rn = row_number() over (partition by customer_id order by startdatetime asc)
from #t),
b as (
select record_id, customer_id, startdatetime, finishdatetime, rn, 0 recurrence
from a
where rn = 1
union all
select a.record_id, a.customer_id, a.startdatetime, a.finishdatetime,
a.rn, case when a.startdatetime - #days < b.finishdatetime then recurrence + 1 else 0 end
from b join a
on b.rn = a.rn - 1 and b.customer_id = a.customer_id
)
select record_id, customer_id, startdatetime, recurrence from b
where recurrence > 0
Result:
https://data.stackexchange.com/stackoverflow/q/112808/
I just realize it should be done in access. I am so sorry, this was written for sql server 2005. I don't know how to rewrite it for access.