Related
I have 2 tables, 01 is current status and 01 is finish status.
I want to calculate time difference of 2 rows that have the same PO_NO,MANAGEMENT_NO,PROCESS_NAME .
Each PROCESS_NAME has the STATUS (Start/Finish)
ID INDEXNO PO_NO ITEM_CD MANAGEMENT_NO SEQ PROCESS_NAME STATUS Time_Occurrence TimeDiff (Minute)
43 126690 GV12762 332393961 616244 6 RFID Start 17-03-18 13:28 NULL
44 126690 GV12762 332393961 616244 6 RFID Finish 17-03-18 13:29 0
49 141646 GV14859 7E7060100 619005 2 Imprint Start 19-03-18 13:23 NULL
50 141646 GV14859 7E7060100 619005 2 Imprint Finish 19-03-18 13:30 7
48 141646 GV14859 7E7060100 619005 1 R.M.Requisition Start 19-03-18 13:18 NULL
56 141646 GV14859 7E7060100 619005 1 R.M.Requisition Finish 19-03-18 15:54 156
The expected result is : TimeDiff (Minute) column
select PO_NO, [MANAGEMENT_NO],[STATUS] [Time_Occurrence],
datediff(minute, (isnull((select [Time_Occurrence] from [TBL_FINISH_STATUS] t1 where t1.id=t2.id-1), dateadd(dd, 0, datediff(dd, 0, getdate())))), [Time_Occurrence])TimeDiff
from [PROC_MN].[dbo].[TBL_FINISH_STATUS] t2
ORDER BY PO_NO,MANAGEMENT_NO,ITEM_CD,Time_Occurrence
With above query, the result is far wrong with the expected result
Could anyone help me please?
Note: the ID column (48,56) of SEQ 1 of PO_NO: GV14859
If I understand what you want, then this seems like a simple query for it:
select INDEXNO, PO_NO, ITEM_CD, MANAGEMENT_NO, SEQ,
datediff(minute,
min(case when status = 'Start' then Time_Occurrence end),
max(case when status = 'Finish' then Time_Occurrence end)
) as timediff
from t
group by INDEXNO, PO_NO, ITEM_CD, MANAGEMENT_NO, SEQ;
Here is a SQL Fiddle.
It is not really clear what you are expecting as a result. Looking at your data sample, the design looks flawed from the start. There is too much redundancy for an SQL database. Maybe you don't have any control over the existing database. Anyway, this could be solved in N different ways and if my memory is not wrong, LEAD\LAG functions didn't exist in SQL server 2008 (but row_number is there as another solution). I tried to create something that is even compatible with older versions, but not sure if that is what you meant as a result:
DECLARE #myTable TABLE([ID] INT,
[INDEXNO] INT,
[PO_NO] VARCHAR(7),
[ITEM_CD] VARCHAR(10),
[MANAGEMENT_NO] INT,
[SEQ] INT,
[PROCESS_NAME] VARCHAR(15),
[STATUS] VARCHAR(6),
[Time_Occurrence] DATETIME,
[TimeDiff] VARCHAR(4));
INSERT INTO #myTable([ID], [INDEXNO], [PO_NO], [ITEM_CD], [MANAGEMENT_NO], [SEQ], [PROCESS_NAME], [STATUS], [Time_Occurrence], [TimeDiff])
VALUES(43, 126690, 'GV12762', '332393961', 616244, 6, 'RFID', 'Start', '20180317 13:28', NULL),
(44, 126690, 'GV12762', '332393961', 616244, 6, 'RFID', 'Finish', '20180317 13:29', '0'),
(49, 141646, 'GV14859', '7E7060100', 619005, 2, 'Imprint', 'Start', '20180319 13:23', NULL),
(50, 141646, 'GV14859', '7E7060100', 619005, 2, 'Imprint', 'Finish', '20180319 13:30', '7'),
(48, 141646, 'GV14859', '7E7060100', 619005, 1, 'R.M.Requisition', 'Start', '20180318 13:18', NULL),
(56, 141646, 'GV14859', '7E7060100', 619005, 1, 'R.M.Requisition', 'Finish', '20180318 15:54', '156');
SELECT * FROM #myTable;
WITH
Starters AS (
SELECT ID, PO_NO, [MANAGEMENT_NO], [PROCESS_NAME], [Time_Occurrence]
FROM #myTable
WHERE STATUS='Start'
),
Finishers AS (
SELECT ID, PO_NO, [MANAGEMENT_NO], [PROCESS_NAME], [Time_Occurrence]
FROM #myTable
WHERE STATUS='Finish'
)
SELECT s.PO_NO, s.MANAGEMENT_NO, s.PROCESS_NAME,
s.Time_Occurrence as [Start], f.Time_Occurrence as [End],
DATEDIFF(MINUTE, s.Time_Occurrence, f.Time_Occurrence) AS TIMEdiff
FROM Starters s
LEFT JOIN Finishers f ON s.PO_NO=f.PO_NO
AND s.MANAGEMENT_NO=f.MANAGEMENT_NO
AND f.PROCESS_NAME=s.PROCESS_NAME;
My table looks a lot like the table shown in the following StackOverflow URL:
Calculating total time excluding overlapped time & breaks in SQLServer
My table also includes an OwnerID. Each person has an unique OwnerID, and I could easily join in the person name belonging to that ID.
The result requested should be just like in the linked URL, but per Owner. I tried modifying the selected answer for his URL but that gives me the following error:
The statement terminated. The maximum recursion 100 has been exhausted before statement completion.
This is the query I try to run...
;WITH addNR AS ( -- Add row numbers
SELECT StartDate, EndDate, ROW_NUMBER() OVER (ORDER BY StartDate, EndDate) AS RowID
FROM dbo.FollowUp AS T
WHERE StartDate > '2017-10-02 08:30:00.000'
), createNewTable AS ( -- Recreate table according overlap time
SELECT StartDate, EndDate, RowID
FROM addNR
WHERE RowID = 1
UNION ALL
SELECT
CASE
WHEN a.StartDate <= AN.StartDate AND AN.StartDate <= a.EndDate THEN a.StartDate
ELSE AN.StartDate END AS StartTime,
CASE WHEN a.StartDate <= AN.EndDate AND AN.EndDate <= a.EndDate THEN a.EndDate
ELSE AN.EndDate END AS EndTime,
AN.RowID
FROM addNR AS AN
INNER JOIN createNewTable AS a
ON a.RowID + 1 = AN.RowID
), getMinutes AS ( -- Get difference in minutes
SELECT DATEDIFF(MINUTE,StartDate,MAX(EndDate)) AS diffMinutes
FROM createNewTable
GROUP BY StartDate
)
SELECT SUM(diffMinutes) AS Result
FROM getMinutes
Where I replaced StartTime=StartDate and EndTime=EndDate since my columns are named so..
Sample Data
Coincidence #vitalygolub .
Try my script with various sample data.Also Time Calendar table should be permanent table so it is only time creation.
It is not Recursive so it should perform better.If output is thrown then distinct can be avoided.
create table #tbl (ownerid int,StartTime datetime,enddate datetime);
insert into #tbl values
(1,'2014-10-01 10:30:00.000','2014-10-01 12:00:00.000') -- 90 mins
,(1,'2014-10-01 10:40:00.000','2014-10-01 12:00:00.000') -- 0 since its overlapped with previous
,(1,'2014-10-01 10:42:00.000','2014-10-01 12:20:00.000') -- 20 mins excluding overlapped time
,(1,'2014-10-01 10:40:00.000','2014-10-01 13:00:00.000') -- 40 mins
,(1,'2014-10-01 10:44:00.000','2014-10-01 12:21:00.000') -- 0 previous ones have already covered this time range
,(1,'2014-10-13 15:50:00.000','2014-10-13 16:00:00.000') -- 10 mins
create table #Timetable(timecol time primary key )
insert into #Timetable
select dateadd(minute,(c.rn-1),'00:00')
from(
select top (24*60) row_number()over(order by number)rn from
master..spt_values order by number)c
SELECT c.ownerid
,cast(c.StartTime AS DATE)
,count(DISTINCT timecol) TimeMin
FROM #Timetable t
CROSS APPLY (
SELECT *
FROM #tbl c
WHERE timecol >= cast(c.StartTime AS TIME)
AND timecol < cast(c.enddate AS TIME)
) c
GROUP BY c.ownerid
,cast(c.StartTime AS DATE)
drop table #Timetable
drop table #tbl
Ok, here is working code, am not sure about performance. The idea: create "calendar" with 1 minute precision, fill it for every OwnerId and calculate number of records
DECLARE #table TABLE (OwnerId int,StartTime DateTime2, EndTime DateTime2)
INSERT INTO #table SELECT 1,'2014-10-01 10:30:00.000', '2014-10-01 12:00:00.000'
INSERT INTO #table SELECT 1,'2014-10-01 10:40:00.000', '2014-10-01 12:00:00.000'
INSERT INTO #table SELECT 1,'2014-10-01 10:42:00.000', '2014-10-01 12:20:00.000'
INSERT INTO #table SELECT 1,'2014-10-01 10:40:00.000', '2014-10-01 13:00:00.000'
INSERT INTO #table SELECT 1,'2014-10-01 10:44:00.000', '2014-10-01 12:21:00.000'
INSERT INTO #table SELECT 1,'2014-10-13 15:50:00.000', '2014-10-13 16:00:00.000'
----------------------------------------------------------------------------
INSERT INTO #table SELECT 2,'2014-10-01 10:30:00.000', '2014-10-01 12:00:00.000'
INSERT INTO #table SELECT 2,'2014-10-01 10:40:00.000', '2014-10-01 12:00:00.000'
INSERT INTO #table SELECT 2,'2014-10-01 10:42:00.000', '2014-10-01 12:20:00.000'
declare #period int, #start datetime;;
select #period=datediff(mi, MIN(starttime),MAX(endtime)),#start =MIN(StartTime) from #table;
declare #seconds table(num int identity(0,1),garbage bit not null);
insert into #seconds(garbage) values(0);
while( select COUNT(*) from #seconds) < #period
insert into #seconds(garbage ) select garbage from #seconds;
with a(ownerId, usedminute ) as
(
select distinct t.ownerID,s.num from #seconds s join #table t on
dateadd(mi,s.num, #start) between t.StartTime and dateadd(s,-1,t.EndTime)
)
select ownerId, count(*) time_in_minutes from a group by ownerID;
You can do this without while loops using a derived tally table and regular set based joins, which as a result will perform very efficiently:
-- Define test data
declare #table table (ownerid int,starttime datetime2, endtime datetime2);
insert into #table select 1,'2014-10-01 10:30:00.000', '2014-10-01 12:00:00.000';
insert into #table select 1,'2014-10-01 10:40:00.000', '2014-10-01 12:00:00.000';
insert into #table select 1,'2014-10-01 10:42:00.000', '2014-10-01 12:20:00.000';
insert into #table select 1,'2014-10-01 10:40:00.000', '2014-10-01 13:00:00.000';
insert into #table select 1,'2014-10-01 10:44:00.000', '2014-10-01 12:21:00.000';
insert into #table select 1,'2014-10-13 15:50:00.000', '2014-10-13 16:00:00.000';
----------------------------------------------------------------------------
insert into #table select 2,'2014-10-01 10:30:00.000', '2014-10-01 12:00:00.000';
insert into #table select 2,'2014-10-01 10:40:00.000', '2014-10-01 12:00:00.000';
insert into #table select 2,'2014-10-01 10:42:00.000', '2014-10-01 12:20:00.000';
-- Query
declare #MinStartTime datetime;
declare #Minutes int;
-- Define data boundaries
select #MinStartTime = min(starttime)
,#Minutes = datediff(minute,min(starttime), max(endtime))+1
from #table;
-- Initial Numbers Table - 10 rows
with t(t) as (select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1)
-- Create tally table of minutes by cross joining numbers table many times to generate 1m rows
,n(n) as (select top(#Minutes) dateadd(minute,row_number() over (order by (select null))-1,#MinStartTime) from t t1, t t2, t t3, t t4, t t5, t t6)
-- Define largest possible range for each OwnerID
,o(i,s,e) as (select ownerid, min(starttime), max(endtime) from #table group by ownerid)
select o.i as OwnerID
,cast(n.n as date) as DateValue
,count(n.n) as TotalMinutes
from o
join n -- Return minutes for each OwnerID range,
on n.n between o.s and o.e
where exists(select null -- where that minute should be included.
from #table as t
where n.n >= t.starttime
and n.n < t.endtime)
group by o.i
,cast(n.n as date)
order by o.i
,DateValue
Output:
+---------+------------+--------------+
| OwnerID | DateValue | TotalMinutes |
+---------+------------+--------------+
| 1 | 2014-10-01 | 150 |
| 1 | 2014-10-13 | 10 |
| 2 | 2014-10-01 | 111 |
+---------+------------+--------------+
I have the following data:
CREATE TABLE Table1
(
ID varchar(10),
StudentName varchar(30),
Course varchar(15),
SECTION varchar(2),
DAY varchar(10),
START_TIME time,
END_TIME time,
actual_starttime time,
actual_endtime time
);
INSERT INTO Table1
VALUES (111, 'Mary', 'Science', 'A', 'Mon', '13:30:00.0000000', '16:20:00.0000000', '09:00:00.0000000', '21:20:00.0000000')
INSERT INTO Table1
VALUES (111, 'Mary', 'Maths', 'A', 'Tue', '12:30:00.0000000', '13:20:00.0000000', '09:00:00.0000000', '21:20:00.0000000')
INSERT INTO Table1
VALUES (111, 'Mary', 'Physics', 'C', 'Tue', '10:30:00.0000000', '11:10:00.0000000', '09:00:00.0000000', '21:20:00.0000000')
INSERT INTO Table1
VALUES (112, 'Robert', 'Maths', 'A', 'Mon', '13:30:00.0000000', '16:20:00.0000000', '09:00:00.0000000', '21:20:00.0000000')
The scenario is as follows: the student can have class from morning 9 to night 9:30 from Monday to Friday. My requirement is I have to identify a timeslot where all the students in the same section are free so that a teacher can reschedule a class.
Example: both Mary and Robert are free in the morning from 9:00 to 1:30 in the afternoon on Monday. I would like to write query for this.
Please help.
Thanks in advance!
To return the full list of the timeslots available, you need to build a set of all the timeslots for each day of the week and then find if any of these slots have students being taught within it.
This is easily achieved with a recursive CTE to build your full timeslot set, from which you can JOIN into your Students data. The output of the query below is the day and time of each vacant session:
-- Build the dummy data sets:
declare #Data table
(
ID varchar(10),
StudentName varchar(30),
Course varchar(15),
SECTION varchar(2),
DAY varchar(10),
START_TIME time,
END_TIME time,
actual_starttime time,
actual_endtime time
);
insert into #Data values
(111, 'Mary', 'Science', 'A', 'Mon', '13:30:00.0000000', '16:20:00.0000000', '09:00:00.0000000', '21:20:00.0000000')
,(111, 'Mary', 'Maths', 'A', 'Tue', '12:30:00.0000000', '13:20:00.0000000', '09:00:00.0000000', '21:20:00.0000000')
,(111, 'Mary', 'Physics', 'C', 'Tue', '10:30:00.0000000', '11:10:00.0000000', '09:00:00.0000000', '21:20:00.0000000')
,(112, 'Robert', 'Maths', 'A', 'Mon', '13:30:00.0000000', '16:20:00.0000000', '09:00:00.0000000', '21:20:00.0000000');
-- Query the data:
with TimeSlots as -- Recursive CTE builds a table of all timeslots in TIME data type.
(
select cast('09:00:00' as time) as TimeSlotStart
,cast('09:30:00' as time) as TimeSlotEnd
union all
select dateadd(minute,30,TimeSlotStart)
,dateadd(minute,30,TimeSlotEnd)
from TimeSlots
where TimeSlotStart < cast('21:00:00' as time)
)
, TeachingDays as -- Used to return all the time slots above for each day of the week in CROSS JOIN below.
(
select 1 as DaySort
,'Mon' as TeachingDay
union all
select 2 as DaySort
,'Tue'
union all
select 3 as DaySort
,'Wed'
union all
select 4 as DaySort
,'Thu'
union all
select 5 as DaySort
,'Fri'
)
select td.TeachingDay
,t.TimeSlotStart
,t.TimeSlotEnd
from TimeSlots t -- Select all timeslots.
cross join TeachingDays td -- For each day.
left join #Data d -- And find all students that are being taught on that day at the specified time.
on(td.TeachingDay = d.DAY
and t.TimeSlotStart <= d.END_TIME
and t.TimeSlotEnd > d.START_TIME
)
where d.ID is null -- Then only return data where there are no students being taught at this timeslot.
order by td.DaySort
,t.TimeSlotStart;
You could create a Stored Procedure with following steps.
Step 1: Predefine timeslots in a different table.(09:00-10:00, 10:00-11:00 etc)
Step 2: Select count of students
Step 3:
for all the slots
Begin
for all the students
Begin
if(students.actual_starttime =slots.actual_starttime and
students.actual_endtime =slots.actual_endtime
break;
else count=count+1;
End
End
Step 4: if above count matches with count of total students, then slot is free for all the students else slot is not foree for all the students.
Hope this helps. Let me know if you find difficulty with it.
You should have three more tables to make it more simple
i.e. Student, Section and slots
I tried to create 1 more table with half hour slots
create table table2(timeslot time);
insert into table2 values ('9:00:00.0000000');
insert into table2 values ('9:30:00.0000000');
insert into table2 values ('10:00:00.0000000');
insert into table2 values ('10:30:00.0000000');
insert into table2 values ('11:00:00.0000000');
insert into table2 values ('11:30:00.0000000');
insert into table2 values ('12:00:00.0000000');
insert into table2 values ('12:30:00.0000000');
insert into table2 values ('13:00:00.0000000');
insert into table2 values ('13:30:00.0000000');
insert into table2 values ('14:00:00.0000000');
insert into table2 values ('14:30:00.0000000');
insert into table2 values ('15:00:00.0000000');
insert into table2 values ('15:30:00.0000000');
insert into table2 values ('16:00:00.0000000');
insert into table2 values ('16:30:00.0000000');
insert into table2 values ('17:00:00.0000000');
insert into table2 values ('17:30:00.0000000');
insert into table2 values ('18:00:00.0000000');
insert into table2 values ('18:30:00.0000000');
insert into table2 values ('19:00:00.0000000');
insert into table2 values ('19:30:00.0000000');
insert into table2 values ('20:00:00.0000000');
insert into table2 values ('20:30:00.0000000');
insert into table2 values ('21:00:00.0000000');
insert into table2 values ('21:30:00.0000000');
Following SQL will give you free slot and name of student:
Query:
select t1.StudentName,t2.timeslot
from Table2 t2,
Table1 t1
where t2.timeslot<t1.start_time
and t2.timeslot<t1.end_time
and t1.section='A'
group by t1.StudentName,t2.timeslot
order by t2.timeslot
Output:
StudentName timeslot
1 Mary 09:00:00
2 Robert 09:00:00
3 Mary 09:30:00
4 Robert 09:30:00
5 Mary 10:00:00
6 Robert 10:00:00
7 Mary 10:30:00
8 Robert 10:30:00
9 Mary 11:00:00
10 Robert 11:00:00
11 Mary 11:30:00
12 Robert 11:30:00
13 Mary 12:00:00
14 Robert 12:00:00
15 Mary 12:30:00
16 Robert 12:30:00
17 Mary 13:00:00
18 Robert 13:00:00
This is just half task done, I just showed you way to achieve it. Introduce two more joins with student and section table to achieve this.
Shred the day (09:00 to 21:30 interval) into minutes, find free minutes with respect to students of the group and days of interest and group minutes found back as intervals.
CREATE TABLE Table1 (ID varchar(10),StudentName varchar(30), Course varchar(15) ,SECTION varchar(2),DAY varchar(10),
START_TIME time , END_TIME time, actual_starttime time, actual_endtime time);
INSERT INTO Table1 VALUES (111, 'Mary','Science','A','Mon','13:30:00.0000000','16:20:00.0000000','09:00:00.0000000','21:20:00.0000000')
INSERT INTO Table1 VALUES (111, 'Mary','Maths','A','Tue','12:30:00.0000000','13:20:00.0000000','09:00:00.0000000','21:20:00.0000000')
INSERT INTO Table1 VALUES (111, 'Mary','Physics','C','Tue','10:30:00.0000000','11:10:00.0000000','09:00:00.0000000','21:20:00.0000000')
INSERT INTO Table1 VALUES (112, 'Robert','Maths','A','Mon','13:30:00.0000000','16:20:00.0000000','09:00:00.0000000','21:20:00.0000000')
;
-- parameters
declare #tds time = '09:00';
declare #tde time = '21:30';
declare #section varchar(2) = 'A';
create table #daysofinterest (DAY varchar(10) primary key);
insert #daysofinterest (DAY) values ('Mon'),('Tue'),('Fri');
create table #groupmembers(ID int primary key);
insert #groupmembers(ID) values (111),(112);
-- query
select DAY, startt = dateadd(minute, min(n), #tds), endt = dateadd (minute, max(n), #tds)
from (
select DAY, n, grp = n - row_number() over(partition by DAY order by n)
from (
-- all minutes of the day, #tds till #tde
select top (datediff(minute, #tds, #tde)) n = row_number() over(order by (select null))
from sys.all_objects
) tally
cross join #daysofinterest dd
join #groupmembers gm on
not exists (select 1 from table1 t
where t.ID = gm.ID and t.DAY = dd.DAY and SECTION = #section and
dateadd (minute, n, #tds) between t.START_TIME and t.END_TIME )
group by DAY, n
--this minute is free for every group member
having count(*) = (select count(*) from #groupmembers)
) g
group by DAY, grp
order by DAY, min(n)
I have data like shown below:
ID Duration Start Date End Date
------------------------------------------------------
10 2 2013-09-03 05:00:00 2013-09-03 05:02:00
I need output like below:
10 2 2013-09-03 05:00:00 2013-09-03 05:01:00 1
10 2 2013-09-03 05:01:00 2013-09-03 05:02:00 2
Based on the column Duration, if the value is 2, I need rows to be duplicated twice.
And if we see at the Output for Start Date and End Date time should be changed accordingly.
And Row count as an additional column for number rows duplicated in this case 1 / 2 shown above will help a lot.
And if duration is 0 and 1 then do nothing , only when duration > 1 then duplicate rows.
And at last Additional column for number row Sequence 1 , 2 ,3 for showing how many rows was duplicated.
try the sql below, I added some comments where I thought it was seemed necessery.
declare #table table(Id integer not null, Duration int not null, StartDate datetime, EndDate datetime)
insert into #table values (10,2, '2013-09-03 05:00:00', '2013-09-03 05:02:00')
insert into #table values (11,3, '2013-09-04 05:00:00', '2013-09-04 05:03:00')
;WITH
numbers AS (
--this is the number series generator
--(limited to 1000, you can change that to whatever you need
-- max possible duration in your case).
SELECT 1 AS num
UNION ALL
SELECT num+1 FROM numbers WHERE num+1<=100
)
SELECT t.Id
, t.Duration
, StartDate = DATEADD(MINUTE, IsNull(Num,1) - 1, t.StartDate)
, EndDate = DATEADD(MINUTE, IsNull(Num,1), t.StartDate)
, N.num
FROM #table t
LEFT JOIN numbers N
ON t.Duration >= N.Num
-- join it with numbers generator for Duration times
ORDER BY t.Id
, N.Num
This works better when Duration = 0:
declare #table table(Id integer not null, Duration int not null, StartDate datetime, EndDate datetime)
insert into #table values (10,2, '2013-09-03 05:00:00', '2013-09-03 05:02:00')
insert into #table values (11,3, '2013-09-04 05:00:00', '2013-09-04 05:03:00')
insert into #table values (12,0, '2013-09-04 05:00:00', '2013-09-04 05:03:00')
insert into #table values (13,1, '2013-09-04 05:00:00', '2013-09-04 05:03:00')
;WITH
numbers AS (
--this is the number series generator
--(limited to 1000, you can change that to whatever you need
-- max possible duration in your case).
SELECT 1 AS num
UNION ALL
SELECT num+1 FROM numbers WHERE num+1<=100
)
SELECT
Id
, Duration
, StartDate
, EndDate
, num
FROM
(SELECT
t.Id
, t.Duration
, StartDate = DATEADD(MINUTE, Num - 1, t.StartDate)
, EndDate = DATEADD(MINUTE, Num, t.StartDate)
, N.num
FROM #table t
INNER JOIN numbers N
ON t.Duration >= N.Num ) A
-- join it with numbers generator for Duration times
UNION
(SELECT
t.Id
, t.Duration
, StartDate-- = DATEADD(MINUTE, Num - 1, t.StartDate)
, EndDate --= DATEADD(MINUTE, Num, t.StartDate)
, 1 AS num
FROM #table t
WHERE Duration = 0)
ORDER BY Id,Num
I am facing a conceptual problem that I am having a hard time overcoming. I am hoping the SO folks can help me overcome it with a nudge in the right direction.
I am in the process of doing some ETL work with the source data being very similar and very large. I am loading it into a table that is intended for replication and I only want the most basic of information in this target table.
My source table looks something like this:
I need my target table to reflect it as such:
As you can see I didn't duplicate the InTransit status where it was duplicated in the source table. The steps I am trying to figure out how to achieve are
Get any new distinct rows entered since the last time the query ran. (Easy)
For each TrackingId I need to check if each new status is already the most recent status in the target and if so disregard otherwise go ahead and insert it. Which this means I have to also start at the earliest of the new statuses and go from there. (I have no *(!#in clue how I'll do this)
Do this every 15 minutes so that statuses are kept very recent so step #2 must be performant.
My source table could easily consist of 100k+ rows but having the need to run this every 15 minutes requires me to make sure this is very performant thus why I am really trying to avoid cursors.
Right now the only way I can see to do this is using a CLR sproc but I think there may be better ways thus I am hoping you guys can nudge me in the right direction.
I am sure I am probably leaving something out that you may need so please let me know what info you may need and I'll happily provide.
Thank you in advance!
EDIT:
Ok I wasn't explicit enough in my question. My source table is going to contain multiple tracking Ids. It may be up to 100k+ rows containing mulitple TrackingId's and multiple statuses for each trackingId. I have to update the target table as above for each individual tracking Id but my source will be an amalgam of trackingId's.
Here's a solution without self-joins:
WITH q AS
(
SELECT *,
ROW_NUMBER() OVER (ORDER BY statusDate) AS rn,
ROW_NUMBER() OVER (PARTITION BY status ORDER BY statusDate) AS rns
FROM tracking
WHERE tackingId = #id
),
qs AS
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY rn - rns ORDER BY statusDate) AS rnn
FROM q
)
SELECT *
FROM qs
WHERE rnn = 1
ORDER BY
statusDate
Here's a script to check:
DECLARE #tracking TABLE
(
id INT NOT NULL PRIMARY KEY,
trackingId INT NOT NULL,
status INT,
statusDate DATETIME
)
INSERT
INTO #tracking
SELECT 1, 1, 1, DATEADD(d, 1, '2010-01-01')
UNION ALL
SELECT 2, 1, 2, DATEADD(d, 2, '2010-01-01')
UNION ALL
SELECT 3, 1, 2, DATEADD(d, 3, '2010-01-01')
UNION ALL
SELECT 4, 1, 2, DATEADD(d, 4, '2010-01-01')
UNION ALL
SELECT 5, 1, 3, DATEADD(d, 5, '2010-01-01')
UNION ALL
SELECT 6, 1, 3, DATEADD(d, 6, '2010-01-01')
UNION ALL
SELECT 7, 1, 4, DATEADD(d, 7, '2010-01-01')
UNION ALL
SELECT 8, 1, 2, DATEADD(d, 8, '2010-01-01')
UNION ALL
SELECT 9, 1, 2, DATEADD(d, 9, '2010-01-01')
UNION ALL
SELECT 10, 1, 1, DATEADD(d, 10, '2010-01-01')
;
WITH q AS
(
SELECT *,
ROW_NUMBER() OVER (ORDER BY statusDate) AS rn,
ROW_NUMBER() OVER (PARTITION BY status ORDER BY statusDate) AS rns
FROM #tracking
),
qs AS
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY rn - rns ORDER BY statusDate) AS rnn
FROM q
)
SELECT *
FROM qs
WHERE rnn = 1
ORDER BY
statusDate
Here you go. I'll let you clean it up and do optimizations. one of the sub queries can go into a view and the messy date comparison can be cleaned up. If you're using SQL 2008 R2 then use CAST as DATE instead.
declare #tbl1 table(
id int, Trackingid int, Status varchar(50), StatusDate datetime
)
declare #tbl2 table(
id int, Trackingid int, Status varchar(50), StatusDate datetime
)
----Source data
insert into #tbl1 (id, trackingid, status, statusdate) values(1,1,'PickedUp','10/01/10 1:00') --
insert into #tbl1 (id, trackingid, status, statusdate) values(2,1,'InTransit','10/02/10 1:00') --
insert into #tbl1 (id, trackingid, status, statusdate) values(8,1,'InTransit','10/02/10 3:00')
insert into #tbl1 (id, trackingid, status, statusdate) values(4,1,'Delayed','10/03/10 1:00')
insert into #tbl1 (id, trackingid, status, statusdate) values(5,1,'InTransit','10/03/10 1:01')
insert into #tbl1 (id, trackingid, status, statusdate) values(6,1,'AtDest','10/03/10 2:00')
insert into #tbl1 (id, trackingid, status, statusdate) values(7,1,'Deliv','10/03/10 3:00') --
insert into #tbl1 (id, trackingid, status, statusdate) values(3,2,'InTransit','10/03/10 1:00')
insert into #tbl1 (id, trackingid, status, statusdate) values(9,2,'AtDest','10/04/10 1:00')
insert into #tbl1 (id, trackingid, status, statusdate) values(10,2,'Deliv','10/04/10 1:05')
insert into #tbl1 (id, trackingid, status, statusdate) values(11,1,'Delayed','10/02/10 2:05')
----Target data
insert into #tbl2 (id, trackingid, status, statusdate) values(1,1,'PickedUp','10/01/10 1:00')
insert into #tbl2 (id, trackingid, status, statusdate) values(2,1,'InTransit','10/02/10 1:00')
insert into #tbl2 (id, trackingid, status, statusdate) values(3,1,'Deliv','10/03/10 3:00')
select d.* from
(
select
* ,
ROW_NUMBER() OVER(PARTITION BY trackingid, CAST((STR( YEAR( statusdate ) ) + '/' +STR( MONTH(statusdate ) ) + '/' +STR( DAY( statusdate ) )) AS DATETIME) ORDER BY statusdate) AS 'RN'
from #tbl1
) d
where
not exists
(
select RN from
(
select
* ,
ROW_NUMBER() OVER(PARTITION BY trackingid, CAST((STR( YEAR( statusdate ) ) + '/' +STR( MONTH(statusdate ) ) + '/' +STR( DAY( statusdate ) )) AS DATETIME) ORDER BY statusdate) AS 'RN'
from #tbl1
)f where f.RN = d.RN + 1 and d.status = f.status and f.trackingid = d.trackingid and
CAST((STR( YEAR( f.statusdate ) ) + '/' +STR( MONTH(f.statusdate ) ) + '/' +STR( DAY( f.statusdate ) )) AS DATETIME) =
CAST((STR( YEAR( d.statusdate ) ) + '/' +STR( MONTH(d.statusdate ) ) + '/' +STR( DAY( d.statusdate ) )) AS DATETIME)
)
and
not exists
(
select 1 from #tbl2 t2
where (t2.trackingid = d.trackingid
and t2.statusdate = d.statusdate
and t2.status = d.status)
)
and (
not exists
(
select 1 from
(
select top 1 * from #tbl2 t2
where t2.trackingid = d.trackingid
order by t2.statusdate desc
) g
where g.status = d.status
)
or not exists
(
select 1 from
(
select top 1 * from #tbl2 t2
where t2.trackingid = d.trackingid
and t2.statusdate <= d.statusdate
order by t2.statusdate desc
) g
where g.status = d.status
)
)
order by trackingid,statusdate
How well this performs will depend on indexes, and particularly if you are targeting a single TrackingID at a time, but this is one way to use a CTE and self-join to obtain the desired results:
CREATE TABLE #foo
(
TrackingID INT,
[Status] VARCHAR(32),
StatusDate SMALLDATETIME
);
INSERT #foo SELECT 1, 'PickedUp', '2010-10-01 08:15';
INSERT #foo SELECT 1, 'InTransit', '2010-10-02 03:07';
INSERT #foo SELECT 1, 'InTransit', '2010-10-02 10:28';
INSERT #foo SELECT 1, 'Delayed', '2010-10-03 09:52';
INSERT #foo SELECT 1, 'InTransit', '2010-10-03 20:09';
INSERT #foo SELECT 1, 'AtDest', '2010-10-04 13:42';
INSERT #foo SELECT 1, 'Deliv', '2010-10-04 17:05';
WITH src AS
(
SELECT
TrackingID,
[Status],
StatusDate,
ab = ROW_NUMBER() OVER (ORDER BY [StatusDate])
FROM #foo
WHERE TrackingID = 1
),
realsrc AS
(
SELECT
a.TrackingID,
leftrow = a.ab,
rightrow = b.ab,
leftstatus = a.[Status],
leftstatusdate = a.StatusDate,
rightstatus = b.[Status],
rightstatusdate = b.StatusDate
FROM src AS a
LEFT OUTER JOIN src AS b
ON a.ab = b.ab - 1
)
SELECT
Id = ROW_NUMBER() OVER (ORDER BY [leftstatusdate]),
TrackingID,
[Status] = leftstatus,
[StatusDate] = leftstatusdate
FROM
realsrc
WHERE
rightrow IS NULL
OR (leftrow = rightrow - 1 AND leftstatus <> rightstatus)
ORDER BY
[StatusDate];
GO
DROP TABLE #foo;
If you need to support multiple TrackingIDs in the same query:
CREATE TABLE #foo
(
TrackingID INT,
[Status] VARCHAR(32),
StatusDate SMALLDATETIME
);
INSERT #foo SELECT 1, 'PickedUp', '2010-10-01 08:15';
INSERT #foo SELECT 1, 'InTransit', '2010-10-02 03:07';
INSERT #foo SELECT 1, 'InTransit', '2010-10-02 10:28';
INSERT #foo SELECT 1, 'Delayed', '2010-10-03 09:52';
INSERT #foo SELECT 1, 'InTransit', '2010-10-03 20:09';
INSERT #foo SELECT 1, 'AtDest', '2010-10-04 13:42';
INSERT #foo SELECT 1, 'Deliv', '2010-10-04 17:05';
INSERT #foo SELECT 2, 'InTransit', '2010-10-02 10:28';
INSERT #foo SELECT 2, 'Delayed', '2010-10-03 09:52';
INSERT #foo SELECT 2, 'InTransit', '2010-10-03 20:09';
INSERT #foo SELECT 2, 'AtDest', '2010-10-04 13:42';
WITH src AS
(
SELECT
TrackingID,
[Status],
StatusDate,
ab = ROW_NUMBER() OVER (ORDER BY [StatusDate])
FROM #foo
),
realsrc AS
(
SELECT
a.TrackingID,
leftrow = a.ab,
rightrow = b.ab,
leftstatus = a.[Status],
leftstatusdate = a.StatusDate,
rightstatus = b.[Status],
rightstatusdate = b.StatusDate
FROM src AS a
LEFT OUTER JOIN src AS b
ON a.ab = b.ab - 1
AND a.TrackingID = b.TrackingID
)
SELECT
Id = ROW_NUMBER() OVER (ORDER BY TrackingID, [leftstatusdate]),
TrackingID,
[Status] = leftstatus,
[StatusDate] = leftstatusdate
FROM
realsrc
WHERE
rightrow IS NULL
OR (leftrow = rightrow - 1 AND leftstatus <> rightstatus)
ORDER BY
TrackingID,
[StatusDate];
GO
DROP TABLE #foo;
If this is SQL 2005 then you can use ROW_NUMBER with a sub query or CTE:
If the dataset is really huge though and performance is an issue then one of the above that got pasted while I was trying to get the code block to work could well be more efficient.
/**
* This is just to create a sample table to use in the test query
**/
DECLARE #test TABLE(ID INT, TrackingID INT, Status VARCHAR(20), StatusDate DATETIME)
INSERT #test
SELECT 1,1,'PickedUp', '01 jan 2010 08:00' UNION
SELECT 2,1,'InTransit', '01 jan 2010 08:01' UNION
SELECT 3,1,'InTransit', '01 jan 2010 08:02' UNION
SELECT 4,1,'Delayed', '01 jan 2010 08:03' UNION
SELECT 5,1,'InTransit', '01 jan 2010 08:04' UNION
SELECT 6,1,'AtDest', '01 jan 2010 08:05' UNION
SELECT 7,1,'Deliv', '01 jan 2010 08:06'
/**
* This would be the select code to exclude the duplicate entries.
* Sorting desc in row_number would get latest instead of first
**/
;WITH n AS
(
SELECT ID,
TrackingID,
Status,
StatusDate,
--For each Status for a tracking ID number by ID (could use date but 2 may be the same)
ROW_NUMBER() OVER(PARTITION BY TrackingID, Status ORDER BY ID) AS [StatusNumber]
FROM #test
)
SELECT ID,
TrackingID,
Status,
StatusDate
FROM n
WHERE StatusNumber = 1
ORDER BY ID
I think this example will do what you're looking for:
CREATE TABLE dbo.srcStatus (
Id INT IDENTITY(1,1),
TrackingId INT NOT NULL,
[Status] VARCHAR(10) NOT NULL,
StatusDate DATETIME NOT NULL
);
CREATE TABLE dbo.tgtStatus (
Id INT IDENTITY(1,1),
TrackingId INT NOT NULL,
[Status] VARCHAR(10) NOT NULL,
StatusDate DATETIME NOT NULL
);
INSERT INTO dbo.srcStatus ( TrackingId, [Status], StatusDate ) VALUES ( 1,'PickedUp','10/1/2010 8:15 AM');
INSERT INTO dbo.srcStatus ( TrackingId, [Status], StatusDate ) VALUES ( 1,'InTransit','10/2/2010 3:07 AM');
INSERT INTO dbo.srcStatus ( TrackingId, [Status], StatusDate ) VALUES ( 1,'InTransit','10/2/2010 10:28 AM');
INSERT INTO dbo.srcStatus ( TrackingId, [Status], StatusDate ) VALUES ( 2,'PickedUp','10/1/2010 8:15 AM');
INSERT INTO dbo.srcStatus ( TrackingId, [Status], StatusDate ) VALUES ( 2,'InTransit','10/2/2010 3:07 AM');
INSERT INTO dbo.srcStatus ( TrackingId, [Status], StatusDate ) VALUES ( 2,'Delayed','10/2/2010 10:28 AM');
INSERT INTO dbo.srcStatus ( TrackingId, [Status], StatusDate ) VALUES ( 1,'Delayed','10/3/2010 9:52 AM');
INSERT INTO dbo.srcStatus ( TrackingId, [Status], StatusDate ) VALUES ( 1,'InTransit','10/3/2010 8:09 PM');
INSERT INTO dbo.srcStatus ( TrackingId, [Status], StatusDate ) VALUES ( 1,'AtDest','10/4/2010 1:42 PM');
INSERT INTO dbo.srcStatus ( TrackingId, [Status], StatusDate ) VALUES ( 1,'Deliv','10/4/2010 5:05 PM');
INSERT INTO dbo.srcStatus ( TrackingId, [Status], StatusDate ) VALUES ( 2,'InTransit','10/3/2010 9:52 AM');
INSERT INTO dbo.srcStatus ( TrackingId, [Status], StatusDate ) VALUES ( 2,'InTransit','10/3/2010 8:09 PM');
INSERT INTO dbo.srcStatus ( TrackingId, [Status], StatusDate ) VALUES ( 2,'AtDest','10/4/2010 1:42 PM');
INSERT INTO dbo.srcStatus ( TrackingId, [Status], StatusDate ) VALUES ( 2,'Deliv','10/4/2010 5:05 PM');
WITH cteSrcTrackingIds
AS ( SELECT DISTINCT
TrackingId
FROM dbo.srcStatus
),
cteAllTrackingIds
AS ( SELECT TrackingId ,
[Status] ,
StatusDate
FROM dbo.srcStatus
UNION
SELECT tgtStatus.TrackingId ,
tgtStatuS.[Status] ,
tgtStatus.StatusDate
FROM cteSrcTrackingIds
INNER JOIN dbo.tgtStatus ON cteSrcTrackingIds.TrackingId = tgtStatus.TrackingId
),
cteAllTrackingIdsWithRownums
AS ( SELECT TrackingId ,
[Status] ,
StatusDate ,
ROW_NUMBER() OVER ( PARTITION BY TrackingId ORDER BY StatusDate ) AS rownum
FROM cteAllTrackingIds
),
cteTrackingIdsWorkingSet
AS ( SELECT src.rownum AS [id] ,
src2.rownum AS [id2] ,
src.TrackingId ,
src.[Status] ,
src.StatusDate ,
ROW_NUMBER() OVER ( PARTITION BY src.TrackingId,
src.rownum ORDER BY src.StatusDate ) AS rownum
FROM cteAllTrackingIdsWithRownums AS [src]
LEFT OUTER JOIN cteAllTrackingIdsWithRownums AS [src2] ON src.TrackingId = src2.TrackingId
AND src.rownum < src2.rownum
AND src.[Status] != src2.[Status]
),
cteTrackingIdsSubset
AS ( SELECT id ,
TrackingId ,
[Status] ,
StatusDate ,
ROW_NUMBER() OVER ( PARTITION BY TrackingId, id2 ORDER BY id ) AS rownum
FROM cteTrackingIdsWorkingSet
WHERE rownum = 1
)
INSERT INTO dbo.tgtStatus
( TrackingId ,
[status] ,
StatusDate
)
SELECT cteTrackingIdsSubset.TrackingId ,
cteTrackingIdsSubset.[status] ,
cteTrackingIdsSubset.StatusDate
FROM cteTrackingIdsSubset
LEFT OUTER JOIN dbo.tgtStatus ON cteTrackingIdsSubset.TrackingId = tgtStatus.TrackingId
AND cteTrackingIdsSubset.[status] = tgtStatus.[status]
AND cteTrackingIdsSubset.StatusDate = tgtStatus.StatusDate
WHERE cteTrackingIdsSubset.rownum = 1
AND tgtStatus.id IS NULL
ORDER BY cteTrackingIdsSubset.TrackingId ,
cteTrackingIdsSubset.StatusDate;