Clearing prioritized overlapping ranges in SQL Server - sql

This one is nasty complicated to solve.
I have a table containing date ranges, each date range has a priority. Highest priority means this date range is the most important.
Or in SQL
create table #ranges (Start int, Finish int, Priority int)
insert #ranges values (1 , 10, 0)
insert #ranges values (2 , 5 , 1)
insert #ranges values (3 , 4 , 2)
insert #ranges values (1 , 5 , 0)
insert #ranges values (200028, 308731, 0)
Start Finish Priority
----------- ----------- -----------
1 10 0
2 5 1
3 4 2
1 5 0
200028 308731 0
I would like to run a series of SQL queries on this table that will result in the table having no overlapping ranges, it is to take the highest priority ranges over the lower ones. Split off ranges as required, and get rid of duplicate ranges. It allows for gaps.
So the result should be:
Start Finish Priority
----------- ----------- -----------
1 2 0
2 3 1
3 4 2
4 5 1
5 10 0
200028 308731 0
Anyone care to give a shot at the SQL? I would also like it to be as efficient as possible.

This is most of the way there, possible improvement would be joining up adjacent ranges of the same priority. It's full of cool trickery.
select Start, cast(null as int) as Finish, cast(null as int) as Priority
into #processed
from #ranges
union
select Finish, NULL, NULL
from #ranges
update p
set Finish = (
select min(p1.Start)
from #processed p1
where p1.Start > p.Start
)
from #processed p
create clustered index idxStart on #processed(Start, Finish, Priority)
create index idxFinish on #processed(Finish, Start, Priority)
update p
set Priority =
(
select max(r.Priority)
from #ranges r
where
(
(r.Start <= p.Start and r.Finish > p.Start) or
(r.Start >= p.Start and r.Start < p.Finish)
)
)
from #processed p
delete from #processed
where Priority is null
select * from #processed

Here is something to get you started. It is helpful if you use a calendar table:
CREATE TABLE dbo.Calendar
(
dt SMALLDATETIME NOT NULL
PRIMARY KEY CLUSTERED
)
GO
SET NOCOUNT ON
DECLARE #dt SMALLDATETIME
SET #dt = '20000101'
WHILE #dt < '20200101'
BEGIN
INSERT dbo.Calendar(dt) SELECT #dt
SET #dt = #dt + 1
END
GO
Code to setup the problem:
create table #ranges (Start DateTime NOT NULL, Finish DateTime NOT NULL, Priority int NOT NULL)
create table #processed (dt DateTime NOT NULL, Priority int NOT NULL)
ALTER TABLE #ranges ADD PRIMARY KEY (Start,Finish, Priority)
ALTER TABLE #processed ADD PRIMARY KEY (dt)
declare #day0 datetime,
#day1 datetime,
#day2 datetime,
#day3 datetime,
#day4 datetime,
#day5 datetime
select #day0 = '2000-01-01',
#day1 = #day0 + 1,
#day2 = #day1 + 1,
#day3 = #day2 + 1,
#day4 = #day3 + 1,
#day5 = #day4 + 1
insert #ranges values (#day0, #day5, 0)
insert #ranges values (#day1, #day4, 1)
insert #ranges values (#day2, #day3, 2)
insert #ranges values (#day1, #day4, 0)
Actual solution:
DECLARE #start datetime, #finish datetime, #priority int
WHILE 1=1 BEGIN
SELECT TOP 1 #start = start, #finish = finish, #priority = priority
FROM #ranges
ORDER BY priority DESC, start, finish
IF ##ROWCOUNT = 0
BREAK
INSERT INTO #processed (dt, priority)
SELECT dt, #priority FROM calendar
WHERE dt BETWEEN #start and #finish
AND NOT EXISTS (SELECT * FROM #processed WHERE dt = calendar.dt)
DELETE FROM #ranges WHERE #start=start AND #finish=finish AND #priority=priority
END
Results: SELECT * FROM #processed
dt Priority
----------------------- -----------
2000-01-01 00:00:00.000 0
2000-01-02 00:00:00.000 1
2000-01-03 00:00:00.000 2
2000-01-04 00:00:00.000 2
2000-01-05 00:00:00.000 1
2000-01-06 00:00:00.000 0
The solution is not in the exact same format, but the idea is there.

I'm a little confused about what you want to end up with. Is this the same as simply having a set of dates where one range continues until the next one starts (in which case you don't really need the Finish date, do you?)
Or can a range Finish and there's a gap until the next one starts sometimes?
If the range Start and Finish are explicitly set, then I'd be inclined to leave both, but have the logic to apply the higher priority during the overlap. I'd suspect that if dates start getting adjusted, you'll eventually need to roll back a range that got shaved, and the original setting will be gone.
And you'll never be able to explain "how it got that way".
Do you want simply a table with a row for each date, including its priority value? Then when you have a new rule, you can bump the dates that would be trumped by the new rule?
I did a medical office scheduling app once that started with work/vacation/etc. requests with range-type data (plus a default work-week template.) Once I figured out to store the active schedule info as user/date/timerange records, things fell into place a lot more easily. YMMV.

This can be done in 1 SQL (i first made the query in Oracle using lag and lead, but since MSSQL doesn't support those functions i rewrote the query using row_number. I'm not sure if the result is MSSQL compliant, but it should be very close):
with x as (
select rdate rdate
, row_number() over (order by rdate) rn
from (
select start rdate
from ranges
union
select finish rdate
from ranges
)
)
select d.begin
, d.end
, max(r.priority)
from (
select begin.rdate begin
, end.rdate end
from x begin
, x end
where begin.rn = end.rn - 1
) d
, ranges r
where r.start <= d.begin
and r.finish >= d.end
and d.begin <> d.end
group by d.begin
, d.end
order by 1, 2
I first made a table (x) with all dates. Then I turned this into buckets by joining x with itself and taking 2 following rows. After this I linked all the possible priorities with the result. By taking the max(priority) I get the requested result.

Related

in a table column row by row subtraction to find the value in SQL Server

id patient_date
1 10/5/2017
2 6/6/2017
3 6/10/2017
4 8/7/2017
5 9/19/2017
Output:
id patient_date days
1 10/5/2017 (6/6/2017-10/5/2017)
2 6/6/2017 (6/10/2017-6/6/2017)
3 6/10/2017 (8/7/2017-6/10/2017)
4 8/7/2017 (9/19/2017-8/7/2017)
5 9/19/2017
Here's query with extra column for you to choose :)
declare #Table table(ID int identity(1,1), patient_date date)
insert into #Table values
('10/5/2017'),
('6/6/2017'),
('6/10/2017'),
('8/7/2017'),
('9/19/2017')
select A.ID,
A.patient_date,
cast(B.patient_date as varchar(10)) + ' - ' + cast(A.patient_date as varchar(10)) as Period, --this column will show exactly what you asked
abs(datediff(day,B.patient_date, A.patient_date)) as DaysDifference --this column is computed absolute difference in days between to dates
from #Table A left join #Table B on A.ID = B.ID - 1
You can try this. This will use lead to find your next value. The last value should either be null or nothing, correct it as you need.
The date 1900-01-01 should be changed to your desired wishes. It could also be NULL as value. Then it wont calculate the last row.
DECLARE #table TABLE (ID int,Patient_date date)
INSERT INTO #table VALUES
(1, '10/5/2017'),
(2,'6/6/2017'),
(3,'6/10/2017'),
(4,'8/7/2017'),
(5,'9/19/2017')
select *,DATEDIFF(DD,Patient_date,NextDate) as DaysBetween,
'('+cast(Patient_date as varchar(50)) + ' - ' + cast(NextDate as
varchar(50))+')' as DayString from (
select *,LEAD(Patient_date,1,'1900-01-01') over(order by ID ) as NextDate
from #table
) x
In my result i used NULL instead of 1900-01-01 - Also notice i use another date format than you, but it shouldnt be a problem.
Result:

How to optimize SQL Server code?

I have a table with the columns: Id, time, value.
First step: Given input parameters as signal id, start time and end time, I want to first extract rows with the the signal id and time is between start time and end time.
Second: Assume I have selected 100 rows in the first step. Given another input parameter which is max_num, I want to further select max_num samples out of 100 rows but in a uniform manner. For example, if max_num is set to 10, then I will select 1, 11, 21, .. 91 rows out of 100 rows.
I am not sure if the stored procedure below is optimal, if you find any inefficiencies of the code, please point that out to me and give some suggestion.
create procedure data_selection
#sig_id bigint,
#start_time datetime2,
#end_time datetime2,
#max_num float
AS
BEGIN
declare #tot float
declare #step int
declare #selected table (id int primary key identity not null, Date datetime2, Value real)
// first step
insert into #selected (Date, Value) select Date, Value from Table
where Id = #sig_id
and Date > = #start_time and Date < = #end_time
order by Date
// second step
select #tot = count(1) from #selected
set #step = ceiling(#tot / #max_num)
select * from #selected
where id % #step = 1
END
EDITED to calculate step on the fly. I had first thought this was an argument.
;with data as (
select row_number() over (order by [Date]) as rn, *
from Table
where Id = #sig_id and Date between #start_time and #end_time
), calc as (
select cast(ceiling(max(rn) / #max_num) as int) as step from data
)
select * from data cross apply calc as c
where (rn - 1) % step = 0 --and rn <= (#max_num - 1) * step + 1
Or I guess you can just order/filter by your identity value as you already had it:
;with calc as (select cast(ceiling(max(rn) / #max_num) as int) as step from #selected)
select * from #selected cross apply calc as c
where (id - 1) % step = 0 --and id <= (#max_num - 1) * step + 1
I think that because you're rounding step up with ceiling you'll easily find scenarios where you get fewer rows than #max_num. You might want to round down instead: case when floor(max(rn) / #max_num) = 0 then 1 else floor(max(rn) / #max_num) end as step?

Group data without changing query flow

For me it's hard to explait what do I want so article's name may be unclear, but I hope I can describe it with code.
I have some data with two most important value, so let it be time t and value f(t). It's stored in the table, for example
1 - 1000
2 - 1200
3 - 1100
4 - 1500
...
I want to plot a graph using it, and this graph should contain N points. If table has rows less than this N, then we just return this table. But if it hasn't, we should group this points, for example, N = Count/2, then for an example above:
1 - (1000+1200)/2 = 1100
2 - (1100+1500)/2 = 1300
...
I wrote an SQL script (it works fine for N >> Count) (MonitoringDateTime - is t, and ResultCount if f(t))
ALTER PROCEDURE [dbo].[usp_GetRequestStatisticsData]
#ResourceTypeID bigint,
#DateFrom datetime,
#DateTo datetime,
#EstimatedPointCount int
AS
BEGIN
SET NOCOUNT ON;
SET ARITHABORT ON;
declare #groupSize int;
declare #resourceCount int;
select #resourceCount = Count(*)
from ResourceType
where ID & #ResourceTypeID > 0
SELECT d.ResultCount
,MonitoringDateTime = d.GeneratedOnUtc
,ResourceType = a.ResourceTypeID,
ROW_NUMBER() OVER(ORDER BY d.GeneratedOnUtc asc) AS Row
into #t
FROM dbo.AgentData d
INNER JOIN dbo.Agent a ON a.CheckID = d.CheckID
WHERE d.EventType = 'Result' AND
a.ResourceTypeID & #ResourceTypeID > 0 AND
d.GeneratedOnUtc between #DateFrom AND #DateTo AND
d.Result = 1
select #groupSize = Count(*) / (#EstimatedPointCount * #resourceCount)
from #t
if #groupSize = 0 -- return all points
select ResourceType, MonitoringDateTime, ResultCount
from #t
else
select ResourceType, CAST(AVG(CAST(#t.MonitoringDateTime AS DECIMAL( 18, 6))) AS DATETIME) MonitoringDateTime, AVG(ResultCount) ResultCount
from #t
where [Row] % #groupSize = 0
group by ResourceType, [Row]
order by MonitoringDateTime
END
, but it's doesn't work for N ~= Count, and spend a lot of time for inserts.
This is why I wanted to use CTE's, but it doesn't work with if else statement.
So i calculated a formula for a group number (for use it in GroupBy clause), because we have
GroupNumber = Count < N ? Row : Row*NumberOfGroups
where Count - numer of rows in the table, and NumberOfGroups = Count/EstimatedPointCount
using some trivial mathematics we get a formula
GroupNumber = Row + (Row*Count/EstimatedPointCount - Row)*MAX(Count - Count/EstimatedPointCount,0)/(Count - Count/EstimatedPointCount)
but it doesn't work because of Count aggregate function:
Column 'dbo.AgentData.ResultCount' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
My english is very bad and I know it (and i'm trying to improve it), but hope dies last, so please advice.
results of query
SELECT d.ResultCount
, MonitoringDateTime = d.GeneratedOnUtc
, ResourceType = a.ResourceTypeID
FROM dbo.AgentData d
INNER JOIN dbo.Agent a ON a.CheckID = d.CheckID
WHERE d.GeneratedOnUtc between '2015-01-28' AND '2015-01-30' AND
a.ResourceTypeID & 1376256 > 0 AND
d.EventType = 'Result' AND
d.Result = 1
https://onedrive.live.com/redir?resid=58A31FC352FC3D1A!6118&authkey=!AATDebemNJIgHoo&ithint=file%2ccsv
Here's an example using NTILE and your simple sample data at the top of your question:
declare #samples table (ID int, sample int)
insert into #samples (ID,sample) values
(1,1000),
(2,1200),
(3,1100),
(4,1500)
declare #results int
set #results = 2
;With grouped as (
select *,NTILE(#results) OVER (order by ID) as nt
from #samples
)
select nt,AVG(sample) from grouped
group by nt
Which produces:
nt
-------------------- -----------
1 1100
2 1300
If #results is changed to 4 (or any higher number) then you just get back your original result set.
Unfortunately, I don't have your full data nor can I fully understand what you're trying to do with the full stored procedure, so the above would probably need to be adapted somewhat.
I haven't tried it, but how about instead of
select ResourceType, CAST(AVG(CAST(#t.MonitoringDateTime AS DECIMAL( 18, 6))) AS DATETIME) MonitoringDateTime, AVG(ResultCount) ResultCount
from #t
where [Row] % #groupSize = 0
group by ResourceType, [Row]
order by MonitoringDateTime
perhaps something like
select ResourceType, CAST(AVG(CAST(#t.MonitoringDateTime AS DECIMAL( 18, 6))) AS DATETIME) MonitoringDateTime, AVG(ResultCount) ResultCount
from #t
group by ResourceType, convert(int,[Row]/#groupSize)
order by MonitoringDateTime
Maybe that points you in some new direction? by converting to int we are truncating everything after the decimal so Im hoping that will give you a better grouping? you might need to put your row-number over resource type for this to work?

SQL Server 2008 filling gaps with dimension

I have a data table as below
#data
---------------
Account AccountType
---------------
1 2
2 0
3 5
4 2
5 1
6 5
AccountType 2 is headers and 5 is totals. Meaning accounts of type 2 have to look after the next 1 or 0 to determin if its Dim value is 1 or 0. Totals of type 5 have to look up at nearest 1 or 0 to determin its Dim value. Accounts of type 1 or 0 have there type as Dim.
Accounts of type 2 appear as islands so its not enough to just check RowNumber + 1 and same goes for accounsts of type 5.
I have arrived at the following table using CTE's. But can't find a quick way to go from here to my final result of Account, AccountType, Dim for all accounts
T3
-------------------
StartRow EndRow AccountType Dim
-------------------
1 1 2 0
2 2 0 0
3 3 5 0
4 4 2 1
5 5 0 1
6 6 5 1
Below code is MS TSQL copy paste it all and see it run. The final join on the CTE select statement is extremly slow for even 500 rows it takes 30 sec. I have 100.000 rows i need to handle. I done a cursor based solution which do it in 10-20 sec thats workable and a fast recursive CTE solution that do it in 5 sec for 100.000 rows, but it dependent on the fragmentation of the #data table. I should add this is simplified the real problem have alot more dimension that need to be taking into account. But it will work the same for this simple problem.
Anyway is there a fast way to do this using joins or another set based solution.
SET NOCOUNT ON
IF OBJECT_ID('tempdb..#data') IS NOT NULL
DROP TABLE #data
CREATE TABLE #data
(
Account INTEGER IDENTITY(1,1),
AccountType INTEGER,
)
BEGIN -- TEST DATA
DECLARE #Counter INTEGER = 0
DECLARE #MaxDataRows INTEGER = 50 -- Change here to check performance
DECLARE #Type INTEGER
WHILE(#Counter < #MaxDataRows)
BEGIN
SET #Type = CASE
WHEN #Counter % 10 < 3 THEN 2
WHEN #Counter % 10 >= 8 THEN 5
WHEN #Counter % 10 >= 3 THEN (CASE WHEN #Counter < #MaxDataRows / 2.0 THEN 0 ELSE 1 END )
ELSE 0
END
INSERT INTO #data VALUES(#Type)
SET #Counter = #Counter + 1
END
END -- TEST DATA END
;WITH groupIds_cte AS
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY AccountType ORDER BY Account) - Account AS GroupId
FROM #data
),
islandRanges_cte AS
(
SELECT
MIN(Account) AS StartRow,
MAX(Account) AS EndRow,
AccountType
FROM groupIds_cte
GROUP BY GroupId,AccountType
),
T3 AS
(
SELECT I.*, J.AccountType AS Dim
FROM islandRanges_cte I
INNER JOIN islandRanges_cte J
ON (I.EndRow + 1 = J.StartRow AND I.AccountType = 2)
UNION ALL
SELECT I.*, J.AccountType AS Dim
FROM islandRanges_cte I
INNER JOIN islandRanges_cte J
ON (I.StartRow - 1 = J.EndRow AND I.AccountType = 5)
UNION ALL
SELECT *, AccountType AS Dim
FROM islandRanges_cte
WHERE AccountType = 0 OR AccountType = 1
),
T4 AS
(
SELECT Account, Dim
FROM (
SELECT FlattenRow AS Account, StartRow, EndRow, Dim
FROM T3 I
CROSS APPLY (VALUES(StartRow),(EndRow)) newValues (FlattenRow)
) T
)
--SELECT * FROM T3 ORDER BY StartRow
--SELECT * FROM T4 ORDER BY Account
-- Final correct result but very very slow
SELECT D.Account, D.AccountType, I.Dim FROM T3 I
INNER JOIN #data D
ON D.Account BETWEEN I.StartRow AND I.EndRow
ORDER BY Account
EDIT with some time testing
SET NOCOUNT ON
IF OBJECT_ID('tempdb..#data') IS NULL
CREATE TABLE #times
(
RecId INTEGER IDENTITY(1,1),
Batch INTEGER,
Method NVARCHAR(255),
MethodDescription NVARCHAR(255),
RunTime INTEGER
)
IF OBJECT_ID('tempdb..#batch') IS NULL
CREATE TABLE #batch
(
Batch INTEGER IDENTITY(1,1),
Bit BIT
)
INSERT INTO #batch VALUES(0)
IF OBJECT_ID('tempdb..#data') IS NOT NULL
DROP TABLE #data
CREATE TABLE #data
(
Account INTEGER
)
CREATE NONCLUSTERED INDEX data_account_index ON #data (Account)
IF OBJECT_ID('tempdb..#islands') IS NOT NULL
DROP TABLE #islands
CREATE TABLE #islands
(
AccountFrom INTEGER ,
AccountTo INTEGER,
Dim INTEGER,
)
CREATE NONCLUSTERED INDEX islands_from_index ON #islands (AccountFrom, AccountTo, Dim)
BEGIN -- TEST DATA
INSERT INTO #data
SELECT TOP 100000 ROW_NUMBER() OVER(ORDER BY t1.number) AS N
FROM master..spt_values t1
CROSS JOIN master..spt_values t2
INSERT INTO #islands
SELECT MIN(Account) AS Start, MAX(Account), Grp
FROM (SELECT *, NTILE(10) OVER (ORDER BY Account) AS Grp FROM #data) T
GROUP BY Grp ORDER BY Start
END -- TEST DATA END
--SELECT * FROM #data
--SELECT * FROM #islands
--PRINT CONVERT(varchar(20),DATEDIFF(MS,#RunDate,GETDATE()))+' ms Sub Query'
DECLARE #RunDate datetime
SET #RunDate=GETDATE()
SELECT Account, (SELECT Dim From #islands WHERE Account BETWEEN AccountFrom AND AccountTo) AS Dim
FROM #data
INSERT INTO #times VALUES ((SELECT MAX(Batch) FROM #batch) ,'subquery','',DATEDIFF(MS,#RunDate,GETDATE()))
SET #RunDate=GETDATE()
SELECT D.Account, V.Dim
FROM #data D
CROSS APPLY
(
SELECT Dim From #islands V
WHERE D.Account BETWEEN V.AccountFrom AND V.AccountTo
) V
INSERT INTO #times VALUES ((SELECT MAX(Batch) FROM #batch) ,'crossapply','',DATEDIFF(MS,#RunDate,GETDATE()))
SET #RunDate=GETDATE()
SELECT D.Account, I.Dim
FROM #data D
JOIN #islands I
ON D.Account BETWEEN I.AccountFrom AND I.AccountTo
INSERT INTO #times VALUES ((SELECT MAX(Batch) FROM #batch) ,'join','',DATEDIFF(MS,#RunDate,GETDATE()))
SET #RunDate=GETDATE()
;WITH cte AS
(
SELECT Account, AccountFrom, AccountTo, Dim, 1 AS Counting
FROM #islands
CROSS APPLY (VALUES(AccountFrom),(AccountTo)) V (Account)
UNION ALL
SELECT Account + 1 ,AccountFrom, AccountTo, Dim, Counting + 1
FROM cte
WHERE (Account + 1) > AccountFrom AND (Account + 1) < AccountTo
)
SELECT Account, Dim, Counting FROM cte OPTION(MAXRECURSION 32767)
INSERT INTO #times VALUES ((SELECT MAX(Batch) FROM #batch) ,'recursivecte','',DATEDIFF(MS,#RunDate,GETDATE()))
You can select from the #times table to see the run times :)
I think you want a join, but using an inequality rather than an equality:
select tt.id, tt.dim1, it.dim2
from TallyTable tt join
IslandsTable it
on tt.id between it."from" and it."to"
This works for the data that you provide in the question.
Here is another idea that might work. Here is the query:
select d.*,
(select top 1 AccountType from #data d2 where d2.Account > d.Account and d2.AccountType not in (2, 5)
) nextAccountType
from #data d
order by d.account;
I just ran this on 50,000 rows and this version took 17 seconds on my system. Changing the table to:
CREATE TABLE #data (
Account INTEGER IDENTITY(1,1) primary key,
AccountType INTEGER,
);
Has actually slowed it down to about 1:33 -- quite to my surprise. Perhaps one of these will help you.

t-sql Summing differences between timestamps

I'm tracking machine state which can be 0,1 and 2,
and storing that data in sql table with time_stamp.
I have table in sql server with next fields:
id(int)
time_stamp(datetime)
machine_state(int)
Machine state is connected with machine condition:
machine_state =0 -machine stooped
machine_state =1-machine with alarm
machine_state =2-machine running
Now I want to calculate how long machine was in each state in each shift.
Shifts are
8:00-17:00
17:00-01:00
01:00-08:00.
My problem is how I can calculate time of each state of machine(sum_time_0, sum_time_1, sum_time_2) and group that times by the shift. I want to calculate time in seconds and then convert to minutes.
To have better picture I did export part of table
EXPORT_TABLE
id time_stamp machine_state
1623 6.10.2009 17:09:00 1
1624 6.10.2009 17:17:00 2
1625 6.10.2009 17:17:00 1
1626 6.10.2009 17:17:00 2
1627 6.10.2009 17:18:00 1
1628 6.10.2009 17:18:00 2
1629 6.10.2009 18:04:00 1
1630 6.10.2009 18:06:00 2
1631 6.10.2009 18:07:00 1
1632 6.10.2009 18:12:00 2
1633 6.10.2009 18:28:00 1
1634 6.10.2009 18:28:00 2
1635 6.10.2009 19:16:00 1
1636 6.10.2009 19:21:00 2
1637 6.10.2009 19:49:00 1
1638 6.10.2009 20:23:00 2
Any advice will help.
Thanks in advance.
You can join the next machine state for each row then group by the state and sum the difference in time...
create table #t(id int identity(1,1), ts datetime, ms tinyint);
insert into #t
select '6.10.2009 17:09:00', 1
union select '6.10.2009 17:17:00', 2
union select '6.10.2009 17:17:00', 1
union select '6.10.2009 17:17:00', 2
union select '6.10.2009 17:18:00', 1
union select '6.10.2009 17:18:00', 2
union select '6.10.2009 18:04:00', 1
union select '6.10.2009 18:06:00', 2
union select '6.10.2009 18:07:00', 1
union select '6.10.2009 18:12:00', 2
union select '6.10.2009 18:28:00', 1
union select '6.10.2009 18:28:00', 2
union select '6.10.2009 19:16:00', 1
union select '6.10.2009 19:21:00', 2
union select '6.10.2009 19:49:00', 1
union select '6.10.2009 20:23:00', 2
select
t.ms,
sum(datediff(mi, t.ts, tn.ts)) as total_mintues
from
#t t
inner join #t tn on
tn.id = (select top 1 t2.id
from #t t2
where t2.id > t.id and t2.ms <> t.ms
order by t2.id)
group by
t.ms
/*
ms total_mintues
1 54
2 140
*/
drop table #t
Here's an outline of how I'd do it. I am making some assumptions which may be invalid or not apply to your situation, so I'm not coding everything out.
First, I'd break the problem into chunks: calculate the data for one shift at a time. (I'm guessing you run this once a day, or maybe once a week.)
I would implement this as a stored procedure with two parameters:
#ShiftDate, specifying the date to be calculated (use the date portion only, ignore any time value)
#Shift, specifying which shift to analyze (1, 2, 3, as you defined)
Build two "full" datetimes, one for the start of the shift, one for the end. For example, if #ShiftDate = 'Oct 22, 2009' and #Shift = 2, you'd get
#ShiftStart = 'Oct 22, 2009 17:00:00'
#ShiftStop = 'Oct 23, 2009 1:00:00'
Create a temp table to hold the subset of the data that we'll be analyzing. Populate it like so:
Copy over all the data for between #ShiftStart and #ShiftStop
Do NOT include any data where consecutive (by time) entries have the same state. If any such data exists, discard all but the earliest entry.
(It looks like your data is generated this way--but do you want to assume the data will always be good?)
Add a column for a uniformly incrementing counter (1, 2, 3, etc.). It looks like you've already got this too, but again, you want to be sure here.
Next, check if entries are present for both #ShiftStart and #ShiftStop. If there are no such entries:
For #ShiftStart, create the entry and set machine_state to whatever the value from the most recent entry before #ShiftStart
For #ShiftStop, create the entry and set machine_state to, well anything, as we won't reference that value
In both cases, make sure you correctly configure the counter column (#ShiftStart's counter is one less than the earliest value, #ShiftStops' counter is one greater than the last value)
(The above is why you make it a temp table. If you can't load these dummy rows, you'll have to use procedural code to walk through the tables, which is the kind of procedural code that bogs down database servers.)
You need these entries to get the data for the time between the start of the shift and the first recorded entry within that shift, and ditto for the end of the shift.
At this point, items are ordered in time, with a uniformly incrementing counter column (1, 2, 3). Assuming all the above, the following query
should return the data you're looking for:
SELECT
et.machine_state
,sum(datediff(ss, et.time_stamp, thru.time_stamp)) TotalSeconds
,sum(datediff(ss, et.time_stamp, thru.time_stamp)) / 60 TotalMinutes
from #EXPORT_TABLE et
inner join #EXPORT_TABLE thru
on thru.id = et.id + 1
group by et.machine_state
order by et.machine_state
Notes:
This is written for MS SQL Server. Your language syntax may differ.
I have not tested this code. Any typos were intentionally included so that your final version will be superior to mine.
EXPORT_TABLE is the temporary table described above.
In MS SQL, dividing the sum of an integer by an integer will produce a truncated integer, meaning 59 seconds will turn into 0 minutes.
If you need better accuracy, dividing by 60.0 would produce a decimal value.
This is just a framework. I think you'd be able to exapnd this to whatever conditions you have to deal with.
You can use an exclusive join to find the previous row:
select
State = prev.ms,
MinutesInState = sum(datediff(mi, prev.ts, cur.ts))
from #t cur
inner join #t prev
on prev.id < cur.id
left join #t inbetween
on prev.id < inbetween.id
and inbetween.id < cur.id
where inbetween.id is null
group by prev.ms
The query then groups by machine state. The result differs from other answers here, I'm curious which one is right!
State MinutesInState
1 54
2 140
Here's the sample data I used:
declare #t table (id int identity(1,1), ts datetime, ms tinyint);
insert into #t
select '6.10.2009 17:09:00', 1
union select '6.10.2009 17:17:00', 2
union select '6.10.2009 17:17:00', 1
union select '6.10.2009 17:17:00', 2
union select '6.10.2009 17:18:00', 1
union select '6.10.2009 17:18:00', 2
union select '6.10.2009 18:04:00', 1
union select '6.10.2009 18:06:00', 2
union select '6.10.2009 18:07:00', 1
union select '6.10.2009 18:12:00', 2
union select '6.10.2009 18:28:00', 1
union select '6.10.2009 18:28:00', 2
union select '6.10.2009 19:16:00', 1
union select '6.10.2009 19:21:00', 2
union select '6.10.2009 19:49:00', 1
union select '6.10.2009 20:23:00', 2
If you just want quick and dirty, this will do:
select curr.*, prev.*
from EXPORT_TABLE curr
outer apply (
select top 1 * from EXPORT_TABLE prev
where curr.time_stamp > prev.time_stamp
order by time_stamp desc, id desc
) prev
And go from there.
But this method, and some of the similar methods on this page involving a non-equijoin, will not scale well with volume. To handle a high volume of data, we must use different techniques.
Your id appears sequential. Is it? This can be useful. If not, we should create one.
if object_id('tempdb..#pass1') is not null drop table #pass1
create table #pass1 (
id int
, time_stamp smalldatetime
, machine_state tinyint
, seqno int primary key -- this is important
)
insert #pass1
select
id
, time_stamp
, machine_state
, seqno = row_number() over (order by time_stamp, id)
from EXPORT_TABLE
Once we have a sequential id, we can equi-join on it:
if object_id('tempdb..#pass2') is not null drop table #pass2
create table #pass2 (
id int
, time_stamp smalldatetime
, machine_state tinyint
, seqno int primary key
, time_stamp_prev smalldatetime
)
insert #pass2
select
id
, time_stamp
, machine_state
, seqno
, time_stamp_prev = b.time_stamp
from #pass1 a
left join #pass1 b on a.seqno = b.seqno + 1
From here, your query should just about write itself. Look out for machine states that overlap a shift, though.
This method, though it looks expensive, will scale well with volume. You order the data once, and join once. If the id is sequential, you can skip the first step, make sure there is a clustered primary key on id, and join on id rather than seqno.
If you have a really high volume of data, you do this instead:
if object_id('tempdb..#export_table') is not null drop table #export_table
create table #pass1 (
id int
, time_stamp smalldatetime
, machine_state tinyint
, seqno int primary key -- ensures proper ordering for the UPDATE
, time_stamp_prev smalldatetime
)
insert #export_table (
id
, time_stamp
, machine_state
, seqno
)
select
id
, time_stamp
, machine_state
, seqno = row_number() over (order by time_stamp, id)
from EXPORT_TABLE
-- do some magic
declare #time_stamp smalldatetime
update #export_table set
time_stamp_prev = #time_stamp
, #time_stamp = time_stamp
This will out-perform all other methods. And if your id is in the right order (it does not have to be sequential, just in the right order), you can skip the first step and define a clustered index on id instead, if it's not already there.
You can do smth like this:
select t1.time_stamp time_start, t2.time_stamp time_finish, t1.machine_state
from EXPORT_TABLE t1, EXPORT_TABLE t2
where t2.time_stamp = (select min(time_stamp) from #table where time_stamp > t1.time_stamp)
This will return you the interval in one row, after that it's easy to calculate cumulative time for each state.
You can also look at this question. It seems to be almost similar to yours.
thanks on help.
I surprised how detail is the answer.
I will tests you solution and inform you about result.
Again I'm very surprised with detail answer.
I did test first part(to sum time of machine state 0, 1 i 2) and this is OK.
Now I will test rest part of the answer.
Biggest problem for me was time splitting during shift transition.
example:
'6.10.2009 16:30:00', 1
'6.10.2009 17:30:00', 2
'6.10.2009 19:16:00', 1
In time between 16:30 and 17:00 machine was in state 1 and that time I have to add to shift 1, and time between 17:00 and 17:30 machine was in state 1 and that time I have to add to shift 2.
But first I will go through you answer to see did you already make solution for this.
thanks again
CREATE PROCEDURE dbo.final #shiftdate datetime, #shift int
AS
BEGIN
DECLARE
#shiftstart as datetime ,
#shiftstop as datetime,
#date_m as varchar(33),
#timestart as char(8),
#smjena as int,
#ms_prev as int,
#t_rad as int,
#t_stop as int,
#t_alarm as int
if #shift = 1
begin
set #timestart = '08:00:00'
set #smjena=9
end
if #shift = 2
begin
set #timestart = '17:00:00'
set #smjena=8
end
if #shift = 3
begin
set #timestart = '01:00:00'
set #smjena=7
end
SELECT #date_m = convert(varchar, #shiftdate, 104) + ' ' + convert(varchar, #timestart, 114)
set #shiftstart = convert(datetime,#date_m,104)
select #shiftstop = dateadd(hh,#smjena,#shiftstart)
create table #t(id int identity(1,1), ts datetime, ms tinyint);
insert #t select time_stamp, stanje_stroja from perini where perini.time_stamp between #shiftstart and #shiftstop order by perini.time_stamp
if (select count(#t.id) from #t where #t.ts=#shiftstart)= 0
BEGIN
if (select count(perini.id) from perini where time_stamp < #shiftstart) > 0
begin
set #ms_prev = (select top 1 stanje_stroja from perini where time_stamp<#shiftstart order by time_stamp asc)
insert #t values (#shiftstart,#ms_prev)
end
end
if (select count(#t.id) from #t where #t.ts=#shiftstop)= 0
BEGIN
if (select count(perini.id) from perini where time_stamp > #shiftstop) > 0
begin
set #ms_prev = (select top 1 stanje_stroja from perini where time_stamp>#shiftstop order by time_stamp asc)
insert #t values (#shiftstop,#ms_prev)
end
end
select * into #t1 from #t where 1=2
insert into #t1 select ts, ms from #t order by ts
create table #t3(stanje int, trajanje int)
insert into #t3 select a.ms as stanje, convert(int,sum(datediff(ss,b.ts, a.ts))/60) as trajanje from
#t1 a left join #t1 b on a.id = b.id + 1
group by a.ms
set #t_rad = (select trajanje from #t3 where stanje = 2)
set #t_alarm = (select trajanje from #t3 where stanje = 1)
set #t_stop = (select trajanje from #t3 where stanje = 0)
insert into perini_smjene_new (smjena,t_rad, t_stop, t_alarm, time_stamp) values (#shift,#t_rad,#t_stop, #t_alarm, convert(datetime, #shiftdate, 103))
select * from #t3
END