trying to find the maximum number of occurrences over time T-SQL - sql

I have data recording the StartDateTime and EndDateTime (both DATETIME2) of a process for all of the year 2013.
My task is to find the maximum amount of times the process was being ran at any specific time throughout the year.
I have wrote some code to check every minute/second how many processes were running at the specific time, but this takes a very long time and would be impossible to let it run for the whole year.
Here is the code (in this case check every minute for the date 25/10/2013)
CREATE TABLE dbo.#Hit
(
ID INT IDENTITY (1,1) PRIMARY KEY,
Moment DATETIME2,
COUNT INT
)
DECLARE #moment DATETIME2
SET #moment = '2013-10-24 00:00:00'
WHILE #moment < '2013-10-25'
BEGIN
INSERT INTO #Hit ( Moment, COUNT )
SELECT #moment, COUNT(*)
FROM dbo.tblProcessTimeLog
WHERE ProcessFK IN (25)
AND #moment BETWEEN StartDateTime AND EndDateTime
AND DelInd = 0
PRINT #moment
SET #moment = DATEADD(MINute,1,#moment)
END
SELECT * FROM #Hit
ORDER BY COUNT DESC
Can anyone think how i could get a similar result (I just need the maximum amount of processes being run at any given time), but for all year?
Thanks

DECLARE #d DATETIME = '20130101'; -- the first day of the year you care about
;WITH m(m) AS
( -- all the minutes in a day
SELECT TOP (1440) ROW_NUMBER() OVER (ORDER BY number) - 1
FROM master..spt_values
),
d(d) AS
( -- all the days in *that* year (accounts for leap years vs. hard-coding 365)
SELECT TOP (DATEDIFF(DAY, #d, DATEADD(YEAR, 1, #d))) DATEADD(DAY, number, #d)
FROM master..spt_values WHERE type = N'P' ORDER BY number
),
x AS
( -- all the minutes in *that* year
SELECT moment = DATEADD(MINUTE, m.m, d.d) FROM m CROSS JOIN d
)
SELECT TOP (1) WITH TIES -- in case more than one at the top
x.moment, [COUNT] = COUNT(l.ProcessFK)
FROM x
INNER JOIN dbo.tblProcessTimeLog AS l
ON x.moment >= l.StartDateTime
AND x.moment <= l.EndDateTime
WHERE l.ProcessFK = 25 AND l.DelInd = 0
GROUP BY x.moment
ORDER BY [COUNT] DESC;
See this post for why I don't think you should use BETWEEN for range queries, even in cases where it does semantically do what you want.

Create a table T whose rows represent some time segments.
This table could well be a temporary table (depending on your case).
Say:
row 1 - [from=00:00:00, to=00:00:01)
row 2 - [from=00:00:01, to=00:00:02)
row 3 - [from=00:00:02, to=00:00:03)
and so on.
Then just join from your main table
(tblProcessTimeLog, I think) to this table
based on the datetime values recorded in
tblProcessTimeLog.
A year has just about half million minutes
so it is not that many rows to store in T.

I recently pulled some code from SO trying to solve the 'island and gaps' problem, and the algorithm for that should help you solve your problem.
The idea is that you want to find the point in time that has the most started processes, much like figuring out the deepest nesting of parenthesis in an expression:
( ( ( ) ( ( ( (deepest here, 6)))))
This sql will produce this result for you (I included a temp table with sample data):
/*
CREATE TABLE #tblProcessTimeLog
(
StartDateTime DATETIME2,
EndDateTime DATETIME2
)
-- delete from #tblProcessTimeLog
INSERT INTO #tblProcessTimeLog (StartDateTime, EndDateTime)
Values ('1/1/2012', '1/6/2012'),
('1/2/2012', '1/6/2012'),
('1/3/2012', '1/6/2012'),
('1/4/2012', '1/6/2012'),
('1/5/2012', '1/7/2012'),
('1/6/2012', '1/8/2012'),
('1/6/2012', '1/10/2012'),
('1/6/2012', '1/11/2012'),
('1/10/2012', '1/12/2012'),
('1/15/2012', '1/16/2012')
;
*/
with cteProcessGroups (EventDate, GroupId) as
(
select EVENT_DATE, (E.START_ORDINAL - E.OVERALL_ORDINAL) GROUP_ID
FROM
(
select EVENT_DATE, EVENT_TYPE,
MAX(START_ORDINAL) OVER (ORDER BY EVENT_DATE, EVENT_TYPE ROWS UNBOUNDED PRECEDING) as START_ORDINAL,
ROW_NUMBER() OVER (ORDER BY EVENT_DATE, EVENT_TYPE) AS OVERALL_ORDINAL
from
(
Select StartDateTime AS EVENT_DATE, 1 as EVENT_TYPE, ROW_NUMBER() OVER (ORDER BY StartDateTime) as START_ORDINAL
from #tblProcessTimeLog
UNION ALL
select EndDateTime, 0 as EVENT_TYPE, NULL
FROM #tblProcessTimeLog
) RAWDATA
) E
)
select Max(EventDate) as EventDate, count(GroupId) as OpenProcesses
from cteProcessGroups
group by (GroupId)
order by COUNT(GroupId) desc
Results:
EventDate OpenProcesses
2012-01-05 00:00:00.0000000 5
2012-01-06 00:00:00.0000000 4
2012-01-15 00:00:00.0000000 2
2012-01-10 00:00:00.0000000 2
2012-01-08 00:00:00.0000000 1
2012-01-07 00:00:00.0000000 1
2012-01-11 00:00:00.0000000 1
2012-01-06 00:00:00.0000000 1
2012-01-06 00:00:00.0000000 1
2012-01-06 00:00:00.0000000 1
2012-01-16 00:00:00.0000000 1
Note that the 'in-between' rows don't give anything meaningful. Basically this output is only tuned to tell you when the most activity was. Looking at the other rows in the out put, there wasn't just 1 process running on 1/8 (there was actually 3). But the way this code works is that by grouping the processes that are concurrent together in a group, you can count the number of simultaneous processes. The date returned is when the max concurrent processes began. It doesn't tell you how long they were going on for, but you can solve that with an additional query. (once you know the date the most was ocurring, you can find out the specific process IDs by using a BETWEEN statement on the date.)
Hope this helps.

Related

SQL Server - Split year into 4 weekly periods

I would like to split up the year into 13 periods with 4 weeks in each
52 weeks a year / 4 = 13 even periods
I would like each period to start on a saturday and end on a friday.
It should look like the below image
Obviously I could do this manually, but the dates would change each year and I am looking for a way to automate this with SQL rather than manually do this for each upcoming year
Is there a way to produce this yearly split automatically?
In this previous answer I show an approach to create a numbers/date table. Such a table is very handsome in many places.
With this approach you might try something like this:
CREATE TABLE dbo.RunningNumbers(Number INT NOT NULL,CalendarDate DATE NOT NULL, CalendarYear INT NOT NULL,CalendarMonth INT NOT NULL,CalendarDay INT NOT NULL, CalendarWeek INT NOT NULL, CalendarYearDay INT NOT NULL, CalendarWeekDay INT NOT NULL);
DECLARE #CountEntries INT = 100000;
DECLARE #StartNumber INT = 0;
WITH E1(N) AS(SELECT 1 FROM(VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))t(N)), --10 ^ 1
E2(N) AS(SELECT 1 FROM E1 a CROSS JOIN E1 b), -- 10 ^ 2 = 100 rows
E4(N) AS(SELECT 1 FROM E2 a CROSS JOIN E2 b), -- 10 ^ 4 = 10,000 rows
E8(N) AS(SELECT 1 FROM E4 a CROSS JOIN E4 b), -- 10 ^ 8 = 10,000,000 rows
CteTally AS
(
SELECT TOP(ISNULL(#CountEntries,1000000)) ROW_NUMBER() OVER(ORDER BY(SELECT NULL)) -1 + ISNULL(#StartNumber,0) As Nmbr
FROM E8
)
INSERT INTO dbo.RunningNumbers
SELECT CteTally.Nmbr,CalendarDate.d,CalendarExt.*
FROM CteTally
CROSS APPLY
(
SELECT DATEADD(DAY,CteTally.Nmbr,{ts'1900-01-01 00:00:00'})
) AS CalendarDate(d)
CROSS APPLY
(
SELECT YEAR(CalendarDate.d) AS CalendarYear
,MONTH(CalendarDate.d) AS CalendarMonth
,DAY(CalendarDate.d) AS CalendarDay
,DATEPART(WEEK,CalendarDate.d) AS CalendarWeek
,DATEPART(DAYOFYEAR,CalendarDate.d) AS CalendarYearDay
,DATEPART(WEEKDAY,CalendarDate.d) AS CalendarWeekDay
) AS CalendarExt;
GO
NTILE - SQL Server 2008+ will create (almost) even chunks.
This the actual query
SELECT *,NTILE(13) OVER(ORDER BY CalendarDate) AS Periode
FROM RunningNumbers
WHERE CalendarWeekDay=6
AND CalendarDate>={d'2017-01-01'} AND CalendarDate <= {d'2017-12-31'};
GO
--Carefull with existing data!
--DROP TABLE dbo.RunningNumbers;
Hint 1: Place indexes!
Hint 2: Read the link about NTILE, especially the Remark-section.
I think this will fit for this case. You might think about using Prdp's approach with ROW_NUMBER() in conncetion with INT division. But - big advantage! - NTILE would allow PARTITION BY CalendarYear.
Hint 3: You might add a column to the table
...where you set the period's number as a fix value. This will make future queries very easy and would allow manual correction on special cases (53rd week..)
Here is one way using Calendar table
DECLARE #start DATE = '2017-04-01',
#end_date DATE = '2017-12-31'
SET DATEFIRST 7;
WITH Calendar
AS (SELECT 1 AS id,
#start AS start_date,
Dateadd(dd, 6, #start) AS end_date
UNION ALL
SELECT id + 1,
Dateadd(week, 1, start_date),
Dateadd(week, 1, end_date)
FROM Calendar
WHERE end_date < #end_date)
SELECT id,
( Row_number()OVER(ORDER BY id) - 1 ) / 4 + 1 AS Period,
start_date,
end_date
FROM Calendar
OPTION (maxrecursion 0)
I have generated dates using Recursive CTE but it is better to create a physical calendar table use it in queries like this
Firstly, you will never get 52 even weeks in a year, there are overlap weeks in most calendar standards. You will occasionally get a week 53.
You can tell SQL to use Saturday as the first day of the week with datefirst, then running a datepart on today's date with getdate() will tell you the week of the year:
SET datefirst 6 -- 6 is Saturday
SELECT datepart(ww,getdate()) as currentWeek
You could then divide this by 4 with a CEILING command to get the 4-week split:
SET datefirst 6
SELECT DATEPART(ww,getdate()) as currentWeek,
CEILING(DATEPART(ww,getdate())/4) as four_week_split

Group by contiguous dates and Count

I have a table which contains information about reports being accessed along with the Date.I need to group reports being accessed according to a date range and count them.
I'm using T-SQL
Table
EventId ReportId Date
60 4 11/24/2015
59 11 11/23/2015
58 6 11/22/2015
57 11 11/22/2015
56 9 11/21/2015
55 3 11/20/2015
54 5 11/20/2015
53 6 11/19/2015
52 5 11/19/2015
51 4 11/18/2015
50 3 11/17/2015
49 9 11/16/2015
If days' difference is 3 then I need result in the format
StartDate EndDate ReportsAccessed
11/22/2015 11/24/2015 4
11/19/2015 11/21/2015 5
11/16/2015 11/18/2015 3
but the difference between days could change.
Assuming you have values for all the dates, then you can calculate the difference in days between each date and the maximum (or minimum) date. Then divide this by three and use that for aggregation:
select min(date), max(date), count(*) as ReportsAccessed
from (select t.*, max(date) over () as maxd
from table t
) t
group by (datediff(day, date, maxd) / 3)
order by min(date);
"3" is what I think you are referring to as the "difference in days".
Those 2 blocks are simply for added clarity on what parameters you'd have to change
DECLARE #t as TABLE(
id int identity(1,1),
reportId int,
dateAccess date)
DECLARE #NumberOfDays int=3;
And here comes the actual select
Select StartDate, EndDate, COUNT(reportId) from
(
select *,
DATEADD(day, DATEDIFF(DAY, dateAccess, maxdate.maxdate)%#NumberOfDays, dateAccess) as EndDate,
DATEADD(day, DATEDIFF(DAY, dateAccess, maxdate.maxdate)%#NumberOfDays-#NumberOfDays+1, dateAccess) as StartDate
from #t, (select MAX(dateAccess) maxdate from #t t2) maxdate
) results
GROUP BY StartDate, EndDate
ORDER BY StartDate desc
There are a few places I'm unsure if it's optimized or not, for instance cross joining with select max(date) instead of using a subquery, but that returns the exact result from your OP.
Basically, I simply split the entries into groups based on how far they are from the MAX(date), and then use a COUNT. On that note, it might be more useful to use COUNT(distinct ...) otherwise if someone looks at the document #9 3 times, it will tell you tha 3 documents were checked, but only 1 was truly looked at.
The upside with using MAX(date) over MIN(date) is that your first group will always have the maximal amount of days. This will prove very useful if you want to compare the last few periods to the average. The downside is that you don't have stable data. With every new entry (assuming it's a new day), your query will cycle itself to produce a new set of results. If you wanted to graph the data, you'd be better comparing to MIN(date) that way the first days won't change when you add a new one.
Depending on the usage, it could even be useful to extrapolate the number of accesses done in the last period (in that case MIN(date) is also preferable).
Here's an adaptation of Gordon's answer that's probably much more optimized (it's at the very least much more aesthetic) :
SELECT DateADD(day, -datediff(day, dateAccess, maxdate)/3*3, maxdate) as EndDate,
DateADD(day, (-datediff(day, dateAccess, maxdate)/3+1)*3, maxdate) as StartDate,
count(reportId)
from (select *, MAX(dateAccess) over() as maxdate from #t) t
GROUP BY datediff(day, dateAccess, maxdate)/3, maxdate
I will insist that most efficient way of doing this is to use tally table. That way you are getting sargable predicates with all benefits from indexes on date column:
declare #c int = 3
;with minmax as(select min(date) as mind, max(date) as maxd from t),
tally as(select #c * (-1 + row_number() over(order by(select null))) as rn
from master..spt_values),
intervals as(select dateadd(dd, rn, mind) as f, dateadd(dd, rn + #c - 1, mind) t
from tally t cross join minmax m where dateadd(dd, rn, mind) <= maxd)
select i.f as [from], i.t as [to], count(*) as reeports
from intervals i
join t on t.date >= i.f and t.date <= i.t
group by i.f, i.t
Explanation: minmax selects minimum date and maximum date from table.
tally generates numbers from 0 to N(depends on system, but enougth to calc intervals). intervals selects resulting intervals. The last part is simple join on intervals to calculate counts per interval.
Fiddle http://sqlfiddle.com/#!3/c61d1/5

Finding most recent date based on consecutive dates

I have s table that lists absences(holidays) of all employees, and what we would like to find out is who is away today, and the date that they will return.
Unfortunately, absences aren't given IDs, so you can't just retrieve the max date from an absence ID if one of those dates is today.
However, absences are given an incrementing ID per day as they are inputt, so I need a query that will find the employeeID if there is an entry with today's date, then increment the AbsenceID column to find the max date on that absence.
Table Example (assuming today's date is 11/11/2014, UK format):
AbsenceID EmployeeID AbsenceDate
100 10 11/11/2014
101 10 12/11/2014
102 10 13/11/2014
103 10 14/11/2014
104 10 15/11/2014
107 21 11/11/2014
108 21 12/11/2014
120 05 11/11/2014
130 15 20/11/2014
140 10 01/03/2015
141 10 02/03/2015
142 10 03/03/2015
143 10 04/03/2015
So, from the above, we'd want the return dates to be:
EmployeeID ReturnDate
10 15/11/2014
21 12/11/2014
05 11/11/2014
Edit: note that the 140-143 range couldn't be included in the results as they appears in the future, and none of the date range of the absence are today.
Presumably I need an iterative sub-function running on each entry with today's date where the employeeID matches.
So based on what I believe you're asking, you want to return a list of the people that are off today and when they are expected back based on the holidays that you have recorded in the system, which should only work only on consecutive days.
SQL Fiddle Demo
Schema Setup:
CREATE TABLE EmployeeAbsence
([AbsenceID] int, [EmployeeID] int, [AbsenceDate] DATETIME)
;
INSERT INTO EmployeeAbsence
([AbsenceID], [EmployeeID], [AbsenceDate])
VALUES
(100, 10, '2014-11-11'),
(101, 10, '2014-11-12'),
(102, 10, '2014-11-13'),
(103, 10, '2014-11-14'),
(104, 10, '2014-11-15'),
(107, 21, '2014-11-11'),
(108, 21, '2014-11-12'),
(120, 05, '2014-11-11'),
(130, 15, '2014-11-20')
;
Recursive CTE to generate the output:
;WITH cte AS (
SELECT EmployeeID, AbsenceDate
FROM dbo.EmployeeAbsence
WHERE AbsenceDate = CAST(GETDATE() AS DATE)
UNION ALL
SELECT e.EmployeeID, e.AbsenceDate
FROM cte
INNER JOIN dbo.EmployeeAbsence e ON e.EmployeeID = cte.EmployeeID
AND e.AbsenceDate = DATEADD(d,1,cte.AbsenceDate)
)
SELECT cte.EmployeeID, MAX(cte.AbsenceDate)
FROM cte
GROUP BY cte.EmployeeID
Results:
| EMPLOYEEID | Return Date |
|------------|---------------------------------|
| 5 | November, 11 2014 00:00:00+0000 |
| 10 | November, 15 2014 00:00:00+0000 |
| 21 | November, 12 2014 00:00:00+0000 |
Explanation:
The first SELECT in the CTE gets employees that are off today with this filter:
WHERE AbsenceDate = CAST(GETDATE() AS DATE)
This result set is then UNIONED back to the EmployeeAbsence table with a join that matches EmployeeID as well as the AbsenceDate + 1 day to find the consecutive days recursively using:
-- add a day to the cte.AbsenceDate from the first SELECT
e.AbsenceDate = DATEADD(d,1,cte.AbsenceDate)
The final SELECT simply groups the cte results by employee with the MAX AbsenceDate that has been calculated per employee.
SELECT cte.EmployeeID, MAX(cte.AbsenceDate)
FROM cte
GROUP BY cte.EmployeeID
Excluding Weekends:
I've done a quick test based on your comment and the below modification to the INNER JOIN within the CTE should exclude weekends when adding the extra days if it detects that adding a day will result in a Saturday:
INNER JOIN dbo.EmployeeAbsence e ON e.EmployeeID = cte.EmployeeID
AND e.AbsenceDate = CASE WHEN datepart(dw,DATEADD(d,1,cte.AbsenceDate)) = 7
THEN DATEADD(d,3,cte.AbsenceDate)
ELSE DATEADD(d,1,cte.AbsenceDate) END
So when you add a day: datepart(dw,DATEADD(d,1,cte.AbsenceDate)) = 7, if it results in Saturday (7), then you add 3 days instead of 1 to get Monday: DATEADD(d,3,cte.AbsenceDate).
You'd need to do a few things to get this data into a usable format. You need to be able to work out where a group begins and ends. This is difficult with this example because there is no straight forward grouping column.
So that we can calculate when a group starts and ends, you need to create a CTE containing all the columns and also use LAG() to get the AbsenceID and EmployeeID from the previous row for each row. In this CTE you should also use ROW_NUMBER() at the same time so that we have a way to re-order the rows into the same order again.
Something like:
WITH
[AbsenceStage] AS (
SELECT [AbsenceID], [EmployeeID], [AbsenceDate]
,[RN] = ROW_NUMBER() OVER (ORDER BY [EmployeeID] ASC, [AbsenceDate] ASC, [AbsenceID] ASC)
,[AbsenceID_Prev] = LAG([AbsenceID]) OVER (ORDER BY [EmployeeID] ASC, [AbsenceDate] ASC, [AbsenceID] ASC)
,[EmployeeID_Prev] = LAG([EmployeeID]) OVER (ORDER BY [EmployeeID] ASC, [AbsenceDate] ASC, [AbsenceID] ASC)
FROM [HR_Absence]
)
Now that we have this we can compare each row to the previous to see if the current row is in a different "group" to the previous row.
The condition would be something like:
[EmployeeID_Prev] IS NULL -- We have a new group if the previous row is null
OR [EmployeeID_Prev] <> [EmployeeID] -- Or if the previous row is for a different employee
OR [AbsenceID_Prev] <> ([AbsenceID]-1) -- Or if the AbsenceID is not sequential
You can then use this to join the CTE to it's self to find the first row in each group with something like:
....
FROM [AbsenceStage] AS [Row]
INNER JOIN [AbsenceStage] AS [First]
ON ([First].[RN] = (
-- Get the first row before ([RN] Less that or equal to) this one where it is the start of a grouping
SELECT MAX([RN]) FROM [AbsenceStage]
WHERE [RN] <= [Row].[RN] AND (
[EmployeeID_Prev] IS NULL
OR [EmployeeID_Prev] <> [EmployeeID]
OR [AbsenceID_Prev] <> ([AbsenceID]-1)
)
))
...
You can then GROUP BY the [First].[RN] which will now act like a group id and allow you to get the start and end date of each absence group.
SELECT
[Row].[EmployeeID]
,MIN([Row].[AbsenceDate]) AS [Absence_Begin]
,MAX([Row].[AbsenceDate]) AS [Absence_End]
...
-- FROM and INNER JOIN from above
...
GROUP BY [First].[RN], [Row].[EmployeeID];
You could then put all that into a view giving you the EmployeeID with the Start and End date of each absence. You can then easily pull out the Employee's currently off with a:
WHERE CAST(CURRENT_TIMESTAMP AS date) BETWEEN [Absence_Begin] AND [Absence_End]
SQL Fiddle
Like another answer here, I'm going to create the leave intervals, but via a different method. First the code:
declare #today date = getdate(); --use whatever date here
with g as (
select *, dateadd(day, -1 * row_number() over (partition by employeeid order by absencedate), AbsenceDate) as group_number
from employeeabsence
) , leave_intervals as (
select employeeid, min(absencedate) as [start], max(absencedate) as [end]
from g
group by EmployeeID, group_number
)
select employeeid, [start], [end]
from leave_intervals
where #today between [start] and [end]
By way of explanation, we first put a date value into a variable. I chose today, but this code will work for any date passed in. Next, we create a common table expression (CTE) that will add on a grouping column to your table. This is the meat of the solution, so it bears some treatment. Within a given interval, the AbsenceDate increases at a rate of one day per row. row_number() also increases at a rate of one per row. So, if we subtract a row_number() number of days from the AbsenceDate, we'll get another (arbitrary) date. The key here is to realize that that arbitrary date will be the same for every row in the interval, so we can use it to group by. From there, it's just a matter of doing just that; get the min and max per interval. Lastly, we find what intervals contain #today.

Modulo Time in SQL Server 2005 - Return data every n hours

I have something like this:
SELECt *
FROM (
SELECT prodid, date, time, tmp, rowid
FROM live_pilot_plant
WHERE date BETWEEN CONVERT(DATETIME, '3/19/2012', 101)
AND CONVERT(DATETIME, '3/31/2012', 101)
) b
WHERE b.rowid % 400 = 0
FYI: The reason for the convert in the where clause, is because my date is stored as a varchar(10), I had to convert it to datetime in order to get the correct range of data. (I tried a bunch of different things and this worked)
I'm wondering how I can return the data I want every 4 hours during those selected dates. I have data collected approximately every 5 seconds (with some breaks in data) - ie data wasn't collected during a 2 hour period, but then continues at 5 second increments.
In my example I just used a modulo with my rowid - and the syntax works, but as I mentioned above there are some periods where data isnt collected so using logic like: if you take data every 5 seconds and multiple that by 4 hours you can approximately say how many rows are in between wont work.
My time column is a varchar column and is in the form hh:mm:ss
My ideal output is:
| prodid | date | time | tmp |
| 4 | 3/19/2012 | 10:00:00 | 2.3 |
| 7 | 3/19/2012 | 14:00:24 | 3.2 |
As you can see I can be a bit off (in terms of seconds) - I more so need the approximate value in terms of time.
Thank you in advance.
This should work
select prodid, date, time, tmp, rowid
from live_pilot_plant as lpp
inner join (
select min(prodid) as prodid -- is prodid your PK?? if not change it to rowid or whatelse is your PK
from live_pilot_plant
WHERE date BETWEEN CONVERT(DATETIME, '3/19/2012', 101) -- or whatever you want
AND CONVERT(DATETIME, '3/31/2012', 101) -- for better performance it is on the inner select
group by date,
floor( -- floor makes the trick
convert(float,convert(datetime, time)) -- assumes "time" column is a varchar containing data like '19:23:05'
* 6 -- 6 comes form 24 hours / 4 hours
)
) as filter on lpp.prodid = filter.prodid -- if prodid is not the PK also correct here.
A side note for everyone else who have date + time data in only one datetime field, suppose named "when_it_was", the group by can be as simple as:
group by floor(when_it_was * 6) -- again, 6 comes from 24/4
something along the lines of the following should work. Basically create date + time partitions, each partition representing a block of 4 hours and pick the record with the highest rank from each partition
select * from (
select *,
row_number() over (partition by date,cast(left( time, charindex( ':', time) - 1) as int) / 4 order by
date, time) as ranker from live_pilot_plant
) Z where ranker = 1
Assuming rowid is a PK and increased with date/time. Just convert time field to 4 hours interval number substring(time,1,2))/4 and select MIN(rowid) from each of 4 hours groups in a day:
select prodid, date, time, tmp, rowid from live_pilot_plant where rowid in
(
select min(rowid)
from live_pilot_plant
WHERE CONVERT(DATETIME, date, 101) BETWEEN CONVERT(DATETIME, '3/19/2012', 101)
AND CONVERT(DATETIME, '3/31/2012', 101)
group by date,convert(int,substring(time,1,2))/4
)
order by CONVERT(DATETIME, date, 101),time

Select repeat occurrences within time period <x days

If I had a large table (100000 + entries) which had service records or perhaps admission records. How would I find all the instances of re-occurrence within a set number of days.
The table setup could be something like this likely with more columns.
Record ID Customer ID Start Date Time Finish Date Time
1 123456 24/04/2010 16:49 25/04/2010 13:37
3 654321 02/05/2010 12:45 03/05/2010 18:48
4 764352 24/03/2010 21:36 29/03/2010 14:24
9 123456 28/04/2010 13:49 31/04/2010 09:45
10 836472 19/03/2010 19:05 20/03/2010 14:48
11 123456 05/05/2010 11:26 06/05/2010 16:23
What I am trying to do is work out a way to select the records where there is a re-occurrence of the field [Customer ID] within a certain time period (< X days). (Where the time period is Start Date Time of the 2nd occurrence - Finish Date Time of the first occurrence.
This is what I would like it to look like once it was run for say x=7
Record ID Customer ID Start Date Time Finish Date Time Re-occurence
9 123456 28/04/2010 13:49 31/04/2010 09:45 1
11 123456 05/05/2010 11:26 06/05/2010 16:23 2
I can solve this problem with a smaller set of records in Excel but have struggled to come up with a SQL solution in MS Access. I do have some SQL queries that I have tried but I am not sure I am on the right track.
Any advice would be appreciated.
I think this is a clear expression of what you want. It's not extremely high performance but I'm not sure that you can avoid either correlated sub-query or a cartesian JOIN of the table to itself to solve this problem. It is standard SQL and should work in most any engine, although the details of the date math may differ:
SELECT * FROM YourTable YT1 WHERE EXISTS
(SELECT * FROM YourTable YT2 WHERE
YT2.CustomerID = YT1.CustomerID AND YT2.StartTime <= YT2.FinishTime + 7)
In order to accomplish this you would need to make a self join as you are comparing the entire table to itself. Assuming similar names it would look something like this:
select r1.customer_id, min(start_time), max(end_time), count(1) as reoccurences
from records r1,
records r2
where r1.record_id > r2.record_id -- this ensures you don't double count the records
and r1.customer_id = r2.customer_id
and r1.finish_time - r2.start_time <= 7
group by r1.customer_id
You wouldn't be able to easily get both the record_id and the number of occurences, but you could go back and find it by correlating the start time to the record number with that customer_id and start_time.
This will do it:
declare #t table(Record_ID int, Customer_ID int, StartDateTime datetime, FinishDateTime datetime)
insert #t values(1 ,123456,'2010-04-24 16:49','2010-04-25 13:37')
insert #t values(3 ,654321,'2010-05-02 12:45','2010-05-03 18:48')
insert #t values(4 ,764352,'2010-03-24 21:36','2010-03-29 14:24')
insert #t values(9 ,123456,'2010-04-28 13:49','2010-04-30 09:45')
insert #t values(10,836472,'2010-03-19 19:05','2010-03-20 14:48')
insert #t values(11,123456,'2010-05-05 11:26','2010-05-06 16:23')
declare #days int
set #days = 7
;with a as (
select record_id, customer_id, startdatetime, finishdatetime,
rn = row_number() over (partition by customer_id order by startdatetime asc)
from #t),
b as (
select record_id, customer_id, startdatetime, finishdatetime, rn, 0 recurrence
from a
where rn = 1
union all
select a.record_id, a.customer_id, a.startdatetime, a.finishdatetime,
a.rn, case when a.startdatetime - #days < b.finishdatetime then recurrence + 1 else 0 end
from b join a
on b.rn = a.rn - 1 and b.customer_id = a.customer_id
)
select record_id, customer_id, startdatetime, recurrence from b
where recurrence > 0
Result:
https://data.stackexchange.com/stackoverflow/q/112808/
I just realize it should be done in access. I am so sorry, this was written for sql server 2005. I don't know how to rewrite it for access.