Related
Based on feedback, I am restructuring my question.
I am working with SQL on a Presto database.
My objective is to report on employees that take consecutive days of PTO or Sick Time since the beginning of 2018. My desired output would have the individual islands of time taken by employee with the start and end dates, along the lines of:
The main table I am using is d_employee_time_off
There are only two time_off_type_name: PTO and Sick Leave.
The ds is a datestamp and I use the latest ds (usually the current date)
I have access to a date table named d_date
I can join the tables on d_employee_time_off.time_off_date = d_date.full_date
I hope that I have structured this question in a fashion that is understandable.
I believe the need here is to join the day off material to a calendar table.
In the example solution below I am generating this "on the fly" but I think you do have your own solution for this. Also in my example I have used the string 'Monday' and moved backward from that (or, you could use 'Friday' and move forward). I'm, not keen on language dependent solutions but as I'm not a Presto user wasn't able to test anything on Presto. So the example below uses some of your own logic, but using SQL Server syntax which I trust you can translate to Presto:
Query:
;WITH
Digits AS (
SELECT 0 AS digit UNION ALL
SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL
SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL
SELECT 9
)
, cal AS (
SELECT
ca.number
, dateadd(day,ca.number,'20180101') as cal_date
, datename(weekday,dateadd(day,ca.number,'20180101')) weekday
FROM Digits [1s]
CROSS JOIN Digits [10s]
CROSS JOIN Digits [100s] /* add more like this as needed */
cross apply (
SELECT
[1s].digit
+ [10s].digit * 10
+ [100s].digit * 100 /* add more like this as needed */
AS number
) ca
)
, time_off AS (
select
*
from cal
inner join mytable t on (cal.cal_date = t.time_off_date and cal.weekday <> 'Monday')
or (cal.cal_date between dateadd(day,-2,t.time_off_date)
and t.time_off_date and datename(weekday,t.time_off_date) = 'Monday')
)
, starting_points AS (
SELECT
employee_id,
cal_date,
dense_rank() OVER(partition by employee_id
ORDER BY
time_off_date
) AS rownum
FROM
time_off A
WHERE
NOT EXISTS (
SELECT
*
FROM
time_off B
WHERE
B.employee_id = A.employee_id
AND B.cal_date = DATEADD(day, -1, A.cal_date)
)
)
, ending_points AS (
SELECT
employee_id,
cal_date,
dense_rank() OVER(partition by employee_id
ORDER BY
time_off_date
) AS rownum
FROM
time_off A
WHERE
NOT EXISTS (
SELECT
*
FROM
time_off B
WHERE
B.employee_id = A.employee_id
AND B.cal_date = DATEADD(day, 1, A.cal_date)
)
)
SELECT
S.employee_id,
S.cal_date AS start_range,
E.cal_date AS end_range
FROM
starting_points S
JOIN
ending_points E
ON E.employee_id = S.employee_id
AND E.rownum = S.rownum
order by employee_id
, start_range
Result:
employee_id start_range end_range
1 200035 02.01.2018 02.01.2018
2 200035 20.04.2018 27.04.2018
3 200037 27.01.2018 29.01.2018
4 200037 31.03.2018 02.04.2018
see: http://rextester.com/MISZ50793
CREATE TABLE mytable(
ID INT NOT NULL
,employee_id INTEGER NOT NULL
,type VARCHAR(3) NOT NULL
,time_off_date DATE NOT NULL
,time_off_in_days INT NOT NULL
);
INSERT INTO mytable(id,employee_id,type,time_off_date,time_off_in_days) VALUES (1,200035,'PTO','2018-01-02',1);
INSERT INTO mytable(id,employee_id,type,time_off_date,time_off_in_days) VALUES (2,200035,'PTO','2018-04-20',1);
INSERT INTO mytable(id,employee_id,type,time_off_date,time_off_in_days) VALUES (3,200035,'PTO','2018-04-23',1);
INSERT INTO mytable(id,employee_id,type,time_off_date,time_off_in_days) VALUES (4,200035,'PTO','2018-04-24',1);
INSERT INTO mytable(id,employee_id,type,time_off_date,time_off_in_days) VALUES (5,200035,'PTO','2018-04-25',1);
INSERT INTO mytable(id,employee_id,type,time_off_date,time_off_in_days) VALUES (6,200035,'PTO','2018-04-26',1);
INSERT INTO mytable(id,employee_id,type,time_off_date,time_off_in_days) VALUES (7,200035,'PTO','2018-04-27',1);
INSERT INTO mytable(id,employee_id,type,time_off_date,time_off_in_days) VALUES (8,200037,'PTO','2018-01-29',1);
INSERT INTO mytable(id,employee_id,type,time_off_date,time_off_in_days) VALUES (9,200037,'PTO','2018-04-02',1);
I have one problem identifying and fixing some records having overlapping time intervals, for one scd type 2 dimension.
What I have is:
Bkey Uid startDate endDate
'John' 1 1990-01-01 (some time stamp) 2017-01-10 (some time stamp)
'John' 2 2016-11=03 (some time stamp) 2016-11-14 (some time stamp)
'John' 3 2016-11-14 (some time stamp) 2016-12-29 (some time stamp)
'John' 4 2016-12-29 (some time stamp) 2017-01-10 (some time stamp)
'John' 5 2017-01-10 (some time stamp) 2017-04-22 (some time stamp)
......
I want to find (first) which are all the Johns having overlapping time periods, for a table having lots and lots of Johns and then to figure out a way to correct those overlapping time periods. For the latest I know there are some function LAGG, LEAD, which can handle that, but it eludes me how to find those over lappings.
Any hints?
Regards,
[ 1 ] Following query will return overlapping time ranges:
SELECT *,
(
SELECT *
FROM #Dimension1 y
WHERE x.Bkey = y.Bkey
AND x.Uid <> y.Uid
AND NOT(x.startDate > y.endDate OR x.endDate < y.startDate)
FOR XML RAW, ROOT, TYPE
) OverlappingTimeRanges
FROM #Dimension1 x
Full script:
DECLARE #Dimension1 TABLE (
Bkey VARCHAR(50) NOT NULL,
Uid INT NOT NULL,
startDate DATE NOT NULL,
endDate DATE NOT NULL,
CHECK(startDate < endDate)
);
INSERT #Dimension1
SELECT 'John', 1, '1990-01-01', '2017-01-10' UNION ALL
SELECT 'John', 2, '2016-11-03', '2016-11-14' UNION ALL
SELECT 'John', 3, '2016-11-14', '2016-12-29' UNION ALL
SELECT 'John', 4, '2016-12-29', '2017-01-10' UNION ALL
SELECT 'John', 5, '2017-01-11', '2017-04-22';
SELECT *,
(
SELECT *
FROM #Dimension1 y
WHERE x.Bkey = y.Bkey
AND x.Uid <> y.Uid
AND NOT(x.startDate > y.endDate OR x.endDate < y.startDate)
FOR XML RAW, ROOT, TYPE
) OverlappingTimeRanges
FROM #Dimension1 x
Demo here
[ 2 ] In order to find distinct groups of time ranges with overlapping original rows I would use following approach:
-- Edit 1
DECLARE #Groups TABLE (
Bkey VARCHAR(50) NOT NULL,
Uid INT NOT NULL,
startDateNew DATE NOT NULL,
endDateNew DATE NOT NULL,
CHECK(startDateNew < endDateNew)
);
INSERT #Groups
SELECT x.Bkey, x.Uid, z.startDateNew, z.endDateNew
FROM #Dimension1 x
OUTER APPLY (
SELECT MIN(y.startDate) AS startDateNew, MAX(y.endDate) AS endDateNew
FROM #Dimension1 y
WHERE x.Bkey = y.Bkey
AND NOT(x.startDate > y.endDate OR x.endDate < y.startDate)
) z
-- End of Edit 1
-- This returns distinct groups identified by DistinctGroupId together with all overlapping Uid(s) from current group
SELECT *
FROM (
SELECT ROW_NUMBER() OVER(ORDER BY b.Bkey, b.startDateNew, b.endDateNew) AS DistinctGroupId, b.*
FROM (
SELECT DISTINCT a.Bkey, a.startDateNew, a.endDateNew
FROM #Groups a
) b
) c
OUTER APPLY (
SELECT d.Uid AS Overlapping_Uid
FROM #Groups d
WHERE c.Bkey = d.Bkey
AND c.startDateNew = d.startDateNew
AND c.endDateNew = d.endDateNew
) e
-- This returns distinct groups identified by DistinctGroupId together with an XML (XmlCol) which includes overlapping Uid(s)
SELECT *
FROM (
SELECT ROW_NUMBER() OVER(ORDER BY b.Bkey, b.startDateNew, b.endDateNew) AS DistinctGroupId, b.*
FROM (
SELECT DISTINCT a.Bkey, a.startDateNew, a.endDateNew
FROM #Groups a
) b
) c
OUTER APPLY (
SELECT (
SELECT d.Uid AS Overlapping_Uid
FROM #Groups d
WHERE c.Bkey = d.Bkey
AND c.startDateNew = d.startDateNew
AND c.endDateNew = d.endDateNew
FOR XML RAW, TYPE
) AS XmlCol
) e
Note: Last range used in my example is 'John', 5, '2017-01-11', '2017-04-22'; and not 'John', 5, '2017-01-10', '2017-04-22';. Also, data type used is DATE and not DATETIME[2][OFFSET].
I think the tricky part of your query is being able to articulate the logic for overlapping ranges. We can self join on the condition that a row on the left overlaps with any row on the right. All matching rows are those which overlap.
We can think of four possible overlap scenarios:
|---------| |---------| no overlap
|---------|
|---------| 1st end and 2nd start overlap
|---------|
|---------| 1st start and 2nd end overlap
|---------|
|---| 2nd completely contained inside 1st
(could be 1st inside 2nd also)
SELECT DISTINCT
t.Uid
FROM yourTable t1
INNER JOIN yourTable t2
ON t1.startDate <= t2.endDate AND
t2.startDate <= t1.endDate
WHERE
t1.Bkey = 'John' AND t2.Bkey = 'John'
This will at least let you identify overlapping records. Updating and separating them in a meaningful way will probably end up being an ugly gaps and islands problem, perhaps meriting another question.
we can acheive this by doing a self join of emp table.
a.emp_id != b.emp_id ensures same row is not joined with itself.
remaining comparison clause checks if any row's start date or end date falls in other row's date range.
create table emp(name varchar(20), emp_id numeric(10), start_date date, end_date date);
insert into emp values('John', 1, '1990-01-01', '2017-01-10');
insert into emp values( 'John', 2, '2016-11-03', '2016-11-14');
insert into emp values( 'John', 3, '2016-11-14', '2016-12-29');
insert into emp values( 'John', 4, '2016-12-29', '2017-01-10');
insert into emp values( 'John', 5, '2017-01-11', '2017-04-22');
commit;
with A as (select * from EMP),
B as (select * from EMP)
select A.* from A,B where A.EMP_ID != B.EMP_ID
and A.START_DATE < B.END_DATE and B.START_DATE < A.END_DATE
and (A.START_DATE between B.START_DATE and B.END_DATE
or A.END_DATE between B.START_DATE and B.END_DATE);
I have a recursive query that I have working for the most part. Here is what I have so far:
DECLARE #table TABLE(mgrQID VARCHAR(64), QID VARCHAR(64), NTID VARCHAR(64), FullName VARCHAR(64), lvl int, dt DATETIME, countOfDirects INT)
WITH empList(mgrQID, QID, NTID, FullName, lvl, metadate)
AS
(
SELECT TOP 1 mgrQID, QID, NTID, FirstName+' '+LastName, 0, Meta_LogDate
FROM dbo.EmployeeTable_Historical
WHERE QID IN (SELECT director FROM dbo.attritionDirectors) AND Meta_LogDate <= #pit
ORDER BY Meta_LogDate DESC
UNION ALL
SELECT b.mgrQID, b.QID, b.NTID, b.FirstName+' '+b.LastName, lvl+1, b.Meta_LogDate
FROM empList a
CROSS APPLY dbo.Fetch_DirectsHistorical_by_qid(a.QID, #pit)b
)
INSERT INTO #table(mgrQID, QID, NTID, FullName, lvl, dt)
SELECT empList.mgrQID ,
empList.QID ,
empList.NTID ,
empList.FullName ,
empList.lvl ,
empList.metadate
FROM empList
ORDER BY lvl
OPTION(MAXRECURSION 10)
Now, #table has a list of QIDs in it. I need to then join my employee table and find out how many people report to each of those QID's.
So, there will need to be an UPDATE that happens to #table which provides the count of employees that report to each of those QID's.
Here is the catch.. The employee table is a historical table that can contain multiple records for the same people. Any time a piece of their information is updated a new record is created with those changes.
If I wanted to pull the most recent record for some one right now, i would use this:
SELECT TOP 1 E.*
FROM employeeTable_historical AS E
WHERE E.qid = A.[subQID]
AND CONVERT (DATE, GETDATE()) > CONVERT (DATE, E.[Meta_LogDate])
ORDER BY meta_logDate DESC
The question..
I need to be able to get the count of employees in the historical table that report directly to each QID in the #table. The historical table has a column called mgrQID. Is there a way I can get this count in the original recursive query?
I would recommend first that you look at the approach you're taking. The historical table you're dealing with will certainly need to select the greatest Meta_LogDate for any given employee, but in the structure you've set up here, you'll never select more than one record from matching attritionDirectors, thanks to the TOP 1 in your anchor query. As such, I'd recommend a lightweight function on which you base your query:
create function dbo.EmployeesAsOf(#date datetime)
returns table
as return
select mgrQID, QID, NTID, FirstName, LastName, Meta_LogDate
from dbo.EmployeeTable_Historical A
where Meta_LogDate = (select max(Meta_LogDate) from dbo.EmployeeTable_Historical B where A.QID = B.QID and Meta_LogDate <= #date)
This will allow you to get the most recent record for anyone, and as long as EmployeeTable_Historical has an index on (QID, Meta_LogDate), this view will perform well.
Having said that, looking at your recursive query, you'll likely want to tweak the recursive query somewhat:
create function empList(#thisDate datetime)
returns #emptbl table (
mgrQID varchar(10)
, QID varchar(10)
, NTID varchar(10)
, Name varchar(21)
, Meta_LogDate datetime
, DirectsThisMany int
)
as
begin
;with empList AS (
select E.mgrQID, E.QID, E.NTID, E.FirstName + ' ' + E.LastName AS Name, E.Meta_LogDate
from dbo.EmployeesAsOf(#thisDate) E
inner join dbo.attritionDirectors D on E.QID = D.QID
union all
select E.mgrQID, E.QID, E.NTID, E.FirstName + ' ' + E.LastName AS Name, E.Meta_LogDate
from dbo.EmployeesAsOf(#thisDate) E
inner join empList D on E.mgrQID = D.QID
)
insert into #emptbl
select A.mgrQID, A.QID, A.NTID, A.Name, A.Meta_LogDate, count(b.QID) AS DirectsThisMany
from empList A
left join empList B on A.QID = B.mgrQID
group by A.mgrQID, A.QID, A.NTID, A.Name, A.Meta_LogDate
return
end
In this way, you'll be able to feed in any date and get a read of the tables, including counts from history as of that date. The self-join of the CTE is what enables us to get the current count of directs, as one can't use aggregates in the CTE. This function is easy to use, and the indexing strategy should become apparent by looking at the query plan in SSMS. A simple SELECT * FROM EmpList(GETDATE()) will give the current situation.
I have two tables. Table 1 is a master list of equipment with equipment_id and equipment_description. So, let's say for this table I have ten equipment_id's. 1,2,3....10 each with some description attached.
Table 2 logs when the equipment has been inspected:
equipment_id|inspection_date
1 | '1-22-2012'
2 | '1-22-2012'
4 | '1-22-2012'
2 | '1-23-2012'
3 | '1-23-2012'
I've created a view, v_dates which pulls out of table 2 all of the distinct inspection dates - not sure if I needed it but did it anyway.
I would like to create another view which shows all equipment that was NOT inspected for each date in the v_dates. So it would show:
3 | '1-22-2012'
5 | '1-22-2012'
and so on.
Rookie here and just not sure how to join these tables correctly. Can't get it to work and would appreciate any help.
Untested, but I think this should give the desired result:
SELECT i.id,d.date FROM
( SELECT DISTINCT inspection_date AS date FROM inspections ORDER BY 1 ) d
LEFT JOIN
inspections i
ON d.date=i.date
WHERE i.date IS NULL
GROUP BY 1,2
ORDER BY 1,2
Like mentioned in the comments would a table with inspection dates really help.
The following appears to work based on my test data using SQL SERVER 2005. I am using a CROSS JOIN of distinct dates along with a LEFT JOIN to throw out EQUIPMENT_ID records that exist for those dates.
Sorry, I am having problems getting my code formatting correct with tabs and spaces...
IF OBJECT_ID('tempdb..#EQUIPMENT') IS NOT NULL
DROP TABLE #EQUIPMENT
CREATE TABLE #EQUIPMENT
( EQUIPMENT_ID smallint,
EQUIPMENT_DESC varchar(32)
)
INSERT INTO #EQUIPMENT
( EQUIPMENT_ID, EQUIPMENT_DESC )
SELECT 1, 'AAA'
UNION SELECT 2, 'BBB'
UNION SELECT 3, 'CCC'
UNION SELECT 4, 'DDD'
UNION SELECT 5, 'EEE'
UNION SELECT 6, 'FFF'
UNION SELECT 7, 'GGG'
UNION SELECT 8, 'HHH'
UNION SELECT 9, 'III'
UNION SELECT 10, 'JJJ'
IF OBJECT_ID('tempdb..#INSPECTION') IS NOT NULL
DROP TABLE #INSPECTION
CREATE TABLE #INSPECTION
( EQUIPMENT_ID smallint,
INSPECTION_DATE smalldatetime
)
INSERT INTO #INSPECTION
( EQUIPMENT_ID, INSPECTION_DATE )
SELECT 1, '1-22-2012'
UNION SELECT 1, '1-27-2012'
UNION SELECT 3, '1-27-2012'
UNION SELECT 5, '1-29-2012'
UNION SELECT 7, '1-22-2012'
UNION SELECT 7, '1-27-2012'
UNION SELECT 7, '1-29-2012'
SELECT E.EQUIPMENT_ID, D.INSPECTION_DATE
FROM #EQUIPMENT E
CROSS JOIN ( SELECT DISTINCT INSPECTION_DATE
FROM #INSPECTION
) D
LEFT JOIN #INSPECTION I2
ON E.EQUIPMENT_ID = I2.EQUIPMENT_ID
AND D.INSPECTION_DATE = I2.INSPECTION_DATE
WHERE I2.EQUIPMENT_ID IS NULL
ORDER BY E.EQUIPMENT_ID, D.INSPECTION_DATE
As per my comment to the question, you really need a table of valid inspection dates. It makes the sql much more sensible, and besides it's the only way to do it if you want to see all items listed for dates when inspections were supposed to be done, but no inspections were done.
So, assuming the two tables:
create table inspections (equipment_id int, inspection_date date);
create table inspection_dates (id int, inspection_date date);
Then a join to get all the equipment that does not have an inspection on a date when an inspection should have taken place would be:
select i.equipment_id, id.inspection_date
from inspection_dates id,
(select distinct equipment_id from inspections) i
where not exists (select * from inspections i2
where i2.inspection_date = id.inspection_date
and i2.equipment_id = i.equipment_id);
You want the combos that do not exist. Thus the not exists predicate.
Note again, that presumably you would have a table for all the unique equipment_ids, but not knowing that I had to construct it myself in place.
I had this question in mind and since I just discovered this site I decided to post it here.
Let's say I have a table with a timestamp and a state for a given "object" (generic meaning, not OOP object); is there an optimal way to calculate the time between a state and the next occurrence of another (or same) state (what I call a "trip") with a single SQL statement (inner SELECTs and UNIONs aren't counted)?
Ex: For the following, the trip time between Initial and Done would be 6 days, but between Initial and Review it would be 2 days.
2008-08-01 13:30:00 - Initial
2008-08-02 13:30:00 - Work
2008-08-03 13:30:00 - Review
2008-08-04 13:30:00 - Work
2008-08-05 13:30:00 - Review
2008-08-06 13:30:00 - Accepted
2008-08-07 13:30:00 - Done
No need to be generic, just say what SGBD your solution is specific to if not generic.
Here's an Oracle methodology using an analytic function.
with data as (
SELECT 1 trip_id, to_date('20080801 13:30:00','YYYYMMDD HH24:mi:ss') dt, 'Initial' step from dual UNION ALL
SELECT 1 trip_id, to_date('20080802 13:30:00','YYYYMMDD HH24:mi:ss') dt, 'Work' step from dual UNION ALL
SELECT 1 trip_id, to_date('20080803 13:30:00','YYYYMMDD HH24:mi:ss') dt, 'Review' step from dual UNION ALL
SELECT 1 trip_id, to_date('20080804 13:30:00','YYYYMMDD HH24:mi:ss') dt, 'Work' step from dual UNION ALL
SELECT 1 trip_id, to_date('20080805 13:30:00','YYYYMMDD HH24:mi:ss') dt, 'Review' step from dual UNION ALL
SELECT 1 trip_id, to_date('20080806 13:30:00','YYYYMMDD HH24:mi:ss') dt, 'Accepted' step from dual UNION ALL
SELECT 1 trip_id, to_date('20080807 13:30:00','YYYYMMDD HH24:mi:ss') dt, 'Done' step from dual )
select trip_id,
step,
dt - lag(dt) over (partition by trip_id order by dt) trip_time
from data
/
1 Initial
1 Work 1
1 Review 1
1 Work 1
1 Review 1
1 Accepted 1
1 Done 1
These are very commonly used in situations where traditionally we might use a self-join.
PostgreSQL syntax :
DROP TABLE ObjectState;
CREATE TABLE ObjectState (
object_id integer not null,--foreign key
event_time timestamp NOT NULL,
state varchar(10) NOT NULL,
--Other fields
CONSTRAINT pk_ObjectState PRIMARY KEY (object_id,event_time)
);
For given state find first folowing state of given type
select parent.object_id,parent.event_time,parent.state,min(child.event_time) as ch_event_time,min(child.event_time)-parent.event_time as step_time
from
ObjectState parent
join ObjectState child on (parent.object_id=child.object_id and parent.event_time<child.event_time)
where
--Starting state
parent.object_id=1 and parent.event_time=to_timestamp('01-Aug-2008 13:30:00','dd-Mon-yyyy hh24:mi:ss')
--needed state
and child.state='Review'
group by parent.object_id,parent.event_time,parent.state;
This query is not the shortest posible but it should be easy to understand and used as part of other queries :
List events and their duration for given object
select parent.object_id,parent.event_time,parent.state,min(child.event_time) as ch_event_time,
CASE WHEN parent.state<>'Done' and min(child.event_time) is null THEN (select localtimestamp)-parent.event_time ELSE min(child.event_time)-parent.event_time END as step_time
from
ObjectState parent
left outer join ObjectState child on (parent.object_id=child.object_id and parent.event_time<child.event_time)
where parent.object_id=4
group by parent.object_id,parent.event_time,parent.state
order by parent.object_id,parent.event_time,parent.state;
List current states for objects that are not "done"
select states.object_id,states.event_time,states.state,(select localtimestamp)-states.event_time as step_time
from
(select parent.object_id,parent.event_time,parent.state,min(child.event_time) as ch_event_time,min(child.event_time)-parent.event_time as step_time
from
ObjectState parent
left outer join ObjectState child on (parent.object_id=child.object_id and parent.event_time<child.event_time)
group by parent.object_id,parent.event_time,parent.state) states
where
states.object_id not in (select object_id from ObjectState where state='Done')
and ch_event_time is null;
Test data
insert into ObjectState (object_id,event_time,state)
select 1,to_timestamp('01-Aug-2008 13:30:00','dd-Mon-yyyy hh24:mi:ss'),'Initial' union all
select 1,to_timestamp('02-Aug-2008 13:40:00','dd-Mon-yyyy hh24:mi:ss'),'Work' union all
select 1,to_timestamp('03-Aug-2008 13:50:00','dd-Mon-yyyy hh24:mi:ss'),'Review' union all
select 1,to_timestamp('04-Aug-2008 14:30:00','dd-Mon-yyyy hh24:mi:ss'),'Work' union all
select 1,to_timestamp('04-Aug-2008 16:20:00','dd-Mon-yyyy hh24:mi:ss'),'Review' union all
select 1,to_timestamp('06-Aug-2008 18:00:00','dd-Mon-yyyy hh24:mi:ss'),'Accepted' union all
select 1,to_timestamp('07-Aug-2008 21:30:00','dd-Mon-yyyy hh24:mi:ss'),'Done';
insert into ObjectState (object_id,event_time,state)
select 2,to_timestamp('01-Aug-2008 13:30:00','dd-Mon-yyyy hh24:mi:ss'),'Initial' union all
select 2,to_timestamp('02-Aug-2008 13:40:00','dd-Mon-yyyy hh24:mi:ss'),'Work' union all
select 2,to_timestamp('07-Aug-2008 13:50:00','dd-Mon-yyyy hh24:mi:ss'),'Review' union all
select 2,to_timestamp('14-Aug-2008 14:30:00','dd-Mon-yyyy hh24:mi:ss'),'Work' union all
select 2,to_timestamp('15-Aug-2008 16:20:00','dd-Mon-yyyy hh24:mi:ss'),'Review' union all
select 2,to_timestamp('16-Aug-2008 18:02:00','dd-Mon-yyyy hh24:mi:ss'),'Accepted' union all
select 2,to_timestamp('17-Aug-2008 22:10:00','dd-Mon-yyyy hh24:mi:ss'),'Done';
insert into ObjectState (object_id,event_time,state)
select 3,to_timestamp('12-Sep-2008 13:30:00','dd-Mon-yyyy hh24:mi:ss'),'Initial' union all
select 3,to_timestamp('13-Sep-2008 13:40:00','dd-Mon-yyyy hh24:mi:ss'),'Work' union all
select 3,to_timestamp('14-Sep-2008 13:50:00','dd-Mon-yyyy hh24:mi:ss'),'Review' union all
select 3,to_timestamp('15-Sep-2008 14:30:00','dd-Mon-yyyy hh24:mi:ss'),'Work' union all
select 3,to_timestamp('16-Sep-2008 16:20:00','dd-Mon-yyyy hh24:mi:ss'),'Review';
insert into ObjectState (object_id,event_time,state)
select 4,to_timestamp('21-Aug-2008 03:10:00','dd-Mon-yyyy hh24:mi:ss'),'Initial' union all
select 4,to_timestamp('22-Aug-2008 03:40:00','dd-Mon-yyyy hh24:mi:ss'),'Work' union all
select 4,to_timestamp('23-Aug-2008 03:20:00','dd-Mon-yyyy hh24:mi:ss'),'Review' union all
select 4,to_timestamp('24-Aug-2008 04:30:00','dd-Mon-yyyy hh24:mi:ss'),'Work';
I don't think you can get that answer with one SQL statement as you are trying to obtain one result from many records. The only way to achieve that in SQL is to get the timestamp field for two different records and calculate the difference (datediff). Therefore, UNIONS or Inner Joins are needed.
I'm not sure I understand the question exactly, but you can do something like the following which reads the table in one pass then uses a derived table to calculate it. SQL Server code:
CREATE TABLE #testing
(
eventdatetime datetime NOT NULL,
state varchar(10) NOT NULL
)
INSERT INTO #testing (
eventdatetime,
state
)
SELECT '20080801 13:30:00', 'Initial' UNION ALL
SELECT '20080802 13:30:00', 'Work' UNION ALL
SELECT '20080803 13:30:00', 'Review' UNION ALL
SELECT '20080804 13:30:00', 'Work' UNION ALL
SELECT '20080805 13:30:00', 'Review' UNION ALL
SELECT '20080806 13:30:00', 'Accepted' UNION ALL
SELECT '20080807 13:30:00', 'Done'
SELECT DATEDIFF(dd, Initial, Review)
FROM (
SELECT MIN(CASE WHEN state='Initial' THEN eventdatetime END) AS Initial,
MIN(CASE WHEN state='Review' THEN eventdatetime END) AS Review
FROM #testing
) AS A
DROP TABLE #testing
It is probably easier if you have a sequence number as well as the time-stamp: in most RDBMSs you can create an auto-increment column and not change any of the INSERT statements. Then you join the table with a copy of itself to get the deltas
select after.moment - before.moment, before.state, after.state
from object_states before, object_states after
where after.sequence + 1 = before.sequence
(where the details of SQL syntax will vary according to which database system).
-- Oracle SQl
CREATE TABLE ObjectState
(
startdate date NOT NULL,
state varchar2(10) NOT NULL
);
insert into ObjectState
select to_date('01-Aug-2008 13:30:00','dd-Mon-rrrr hh24:mi:ss'),'Initial' union all
select to_date('02-Aug-2008 13:30:00','dd-Mon-rrrr hh24:mi:ss'),'Work' union all
select to_date('03-Aug-2008 13:30:00','dd-Mon-rrrr hh24:mi:ss'),'Review' union all
select to_date('04-Aug-2008 13:30:00','dd-Mon-rrrr hh24:mi:ss'),'Work' union all
select to_date('05-Aug-2008 13:30:00','dd-Mon-rrrr hh24:mi:ss'),'Review' union all
select to_date('06-Aug-2008 13:30:00','dd-Mon-rrrr hh24:mi:ss'),'Accepted' union all
select to_date('07-Aug-2008 13:30:00','dd-Mon-rrrr hh24:mi:ss'),'Done';
-- Days in between two states
select o2.startdate - o1.startdate as days
from ObjectState o1, ObjectState o2
where o1.state = 'Initial'
and o2.state = 'Review';
create table A (
At datetime not null,
State varchar(20) not null
)
go
insert into A(At,State)
select '2008-08-01T13:30:00','Initial' union all
select '2008-08-02T13:30:00','Work' union all
select '2008-08-03T13:30:00','Review' union all
select '2008-08-04T13:30:00','Work' union all
select '2008-08-05T13:30:00','Review' union all
select '2008-08-06T13:30:00','Accepted' union all
select '2008-08-07T13:30:00','Done'
go
--Find trip time from Initial to Done
select DATEDIFF(day,t1.At,t2.At)
from
A t1
inner join
A t2
on
t1.State = 'Initial' and
t2.State = 'Review' and
t1.At < t2.At
left join
A t3
on
t3.State = 'Initial' and
t3.At > t1.At and
t4.At < t2.At
left join
A t4
on
t4.State = 'Review' and
t4.At < t2.At and
t4.At > t1.At
where
t3.At is null and
t4.At is null
Didn't say whether joins were allowed or not. Joins to t3 and t4 (and their comparisons) let you say whether you want the earliest or latest occurrence of the start and end states (in this case, I'm asking for latest "Initial" and earliest "Review")
In real code, my start and end states would be parameters
Edit: Oops, need to include "t3.At < t2.At" and "t4.At > t1.At", to fix some odd sequences of States (e.g. If we removed the second "Review" and then queried from "Work" to "Review", the original query will fail)
I think that your steps (each record of your trip can be seen as a step) can be somewhere grouped together as part of the same activity. It is then possible to group your data on it, as, for example:
SELECT Min(Tbl_Step.dateTimeStep) as tripBegin, _
Max(Tbl_Step.dateTimeStep) as tripEnd _
FROM
Tbl_Step
WHERE
id_Activity = 'AAAAAAA'
Using this principle, you can then calculate other aggregates like the number of steps in the activity and so on. But you will not find an SQL way to calculate values like gap between 2 steps, as such a data does not belong either to the first or to the second step. Some reporting tools use what they call "running sums" to calculate such intermediate data. Depending on your objectives, this might be a solution for you.
I tried to do this in MySQL. You would need to use a variable since there is no rank function in MySQL, so it would go like this:
set #trip1 = 0; set #trip2 = 0;
SELECT trip1.`date` as startdate, datediff(trip2.`date`, trip1.`date`) length_of_trip
FROM
(SELECT #trip1 := #trip1 + 1 as rank1, `date` from trip where state='Initial') as trip1
INNER JOIN
(SELECT #trip2 := #trip2 + 1 as rank2, `date` from trip where state='Done') as trip2
ON rank1 = rank2;
I am assuming that you want to calculate the time between 'Initial' and 'Done' states.
+---------------------+----------------+
| startdate | length_of_trip |
+---------------------+----------------+
| 2008-08-01 13:30:00 | 6 |
+---------------------+----------------+
Ok, this is a bit beyond geeky, but I built a web application to track my wife's contractions just before we had a baby so that I could see from work when it was getting close to time to go to the hospital. Anyway, I built this basic thing fairly easily as two views.
create table contractions time_date timestamp primary key;
create view contraction_time as
SELECT a.time_date, max(b.prev_time) AS prev_time
FROM contractions a, ( SELECT contractions.time_date AS prev_time
FROM contractions) b
WHERE b.prev_time < a.time_date
GROUP BY a.time_date;
create view time_between as
SELECT contraction_time.time_date, contraction_time.prev_time, contraction_time.time_date - contraction_time.prev_time
FROM contraction_time;
This could be done as a subselect obviously as well, but I used the intermediate views for other things as well, and so this worked out well.