How to combine tables based on timestamps - sql

Imagine you have two tables of events. Both tables A and B have a single column, called timestamp, with multiple rows.
Now I'd like to combine these two tables into a table C with the following properties:
C has a row for every row in A
C has a timestamp column that perfectly reflects the contents of A
C has another column called near_event that is true if there is a row in B within 1s of the timestamp of this row, false otherwise
How might I do that efficiently?

mauro pointed me to this one, saying that Vertica could do better than that - and, indeed, it can, as it has a predicate that enables what we call the event series join. All you need to do is to run a non-inner join (left, right or full outer) and use INTERPOLATE PREVIOUS VALUE intelligently as the join predicate.
You might want to have a look on my LinkedIn post :
https://www.linkedin.com/pulse/verticas-event-series-join-joining-two-time-tables-marco-gessner/
.. which illustrates even a more complex use case.
Using the same tables as in that blog:
CREATE LOCAL TEMPORARY TABLE oilpressure (
op_ts,op_psi
) ON COMMIT PRESERVE ROWS AS (
SELECT TIMESTAMP '2015-04-01 07:00:00', 25.356
UNION ALL SELECT TIMESTAMP '2015-04-01 07:00:10', 35.124
UNION ALL SELECT TIMESTAMP '2015-04-01 07:00:20', 47.056
UNION ALL SELECT TIMESTAMP '2015-04-01 07:00:30', 45.225
)
;
CREATE LOCAL TEMPORARY TABLE revspeed (
rs_ts,rpm
) ON COMMIT PRESERVE ROWS AS (
SELECT TIMESTAMP '2015-04-01 07:00:00', 2201
UNION ALL SELECT TIMESTAMP '2015-04-01 07:00:08', 3508
UNION ALL SELECT TIMESTAMP '2015-04-01 07:00:15', 6504
UNION ALL SELECT TIMESTAMP '2015-04-01 07:00:20', 6608
)
;
Let oilpressurebe your table A, and revspeed be your table B.
Then what you would want (if you only want the timestamps) is this:
SELECT
op_ts
, rs_ts
FROM oilpressure
LEFT JOIN revspeed
ON op_ts INTERPOLATE PREVIOUS VALUE rs_ts;
op_ts |rs_ts
2015-04-01 07:00:00|2015-04-01 07:00:00
2015-04-01 07:00:10|2015-04-01 07:00:08
2015-04-01 07:00:20|2015-04-01 07:00:20
2015-04-01 07:00:30|2015-04-01 07:00:20

You may be able to do this, if you don't have too many duplicates. Here is the idea:
select timestamp,
(case when timestamp < timestamp_add(second, 1, last_b_timestamp) or
timestamp > timestamp_add(second, -1, next_b_timestamp)
then 1 else 0
end) as flag
from (select timestamp, which,
last_value(case when which = 'b' then timestamp) over (order by timestamp) as last_b_timestamp,
last_value(case when which = 'b' then timestamp) over (order by timestamp desc) as next_b_timestamp,
from ((select a.timestamp, 'a' as which from a) union all
(select b.timestamp, 'b' as which from b)
) ab
) ab
where which = 'a';

Related

How to compare date fields between two tables and get the less or equal date from the second table

I have two tables. Table A and table B. Both of them have date fields. I need compare those fields and get a table C with the less or equal date between Table A and table B, taking into account that the table A is the main.
CONTEXT: I have in Table A Expiration of products, and in table B on business days. The user can update table B when it is determined
that a date is not to be considered as a "business day". Then delete
the date from table B and then go to table A to update all product
expirations that were registered with that date and assign them the
previous business day. So in my case I am creating table C, which
contains the Id of table A and the working date less or equal to the
date mentioned. Then I will make the respective update.
IF OBJECT_ID('tempdb..#tmpA') IS NOT NULL DROP TABLE #tmpA
IF OBJECT_ID('tempdb..#tmpB') IS NOT NULL DROP TABLE #tmpB
CREATE TABLE #tmpA(Id INT IDENTITY(100,1),Fecha date)
INSERT INTO #tmpA(Fecha)
VALUES
('20170101'),('20171003'),('20170504'),('2017-09-01')
SELECT * FROM #tmpA
Id Fecha
----------- ----------
100 2017-01-01
101 2017-10-03
102 2017-05-04
103 2017-09-01
CREATE TABLE #tmpB(Id INT IDENTITY(1,4),Fecha date)
INSERT INTO #tmpB(Fecha)
VALUES
('20170101'),('20171001'),('20170504')
SELECT * FROM #tmpB
Id Fecha
----------- ----------
1 2017-01-01
5 2017-10-01
9 2017-05-04
I want to get this result (The same number of records in table A):
Id Fecha
----------- ----------
100 2017-01-01
101 2017-10-01 --> **this row is less than 2017-10-03**
102 2017-05-04
103 2017-05-04 --> **this row is less than 2017-09-01**
I tried to built some queries without results,
IF OBJECT_ID('tempdb..#tmpC') IS NOT NULL DROP TABLE #tmpC
SELECT A.* INTO #tmpC FROM #tmpA A LEFT JOIN #tmpB B ON A.Fecha = B.Fecha WHERE B.Fecha IS NULL
SELECT * FROM #tmpC
SELECT *
FROM #tmpA A INNER JOIN
(
SELECT *
FROM #tmpC
GROUP BY id, Fecha
) AS Q ON MAX(Q.Fecha) <= A.Fecha
UPDATE:
NOTE. The Id column is simply an identity, but it does not mean that it should be related. The important thing is the dates.
Regards
While I'm not sure if this will scale well (if you have more than 100k rows) this will bring back the results which you want.
Theoretically, the correct way for you to do this, in a fashion which will scale well, would be to have a view where you utilize RANK() and join both of these tables together, though this was the quick and easy way. Please try this and let me know if it meets your requirements.
For your edification, I have left both of the dates in there for you to be able to compare them.
SELECT
A.ID
,A.FECHA OLDDATE
,B.FECHA CORRECTDATE
FROM #TMPA A
LEFT OUTER JOIN #TMPB B ON 1=1
WHERE 1=1
AND B.FECHA = (
SELECT MAX(FECHA)
FROM #TMPB
WHERE FECHA <= A.FECHA)
Is this what you want?
select a.id,
(case when b.fecha < a.fecha then b.fecha else a.fecha end) as fecha
from #tmpA a left join
#tmpB b
on a.id = b.id;
You can get minmum by union all
select id, min(fecha) from (
select * from #tmpA
union all
select * from #tmpB
) a
group by a.id
#JotaPardo WHERE 1=1 is used to basically make sure the query runs if the WHERE conditions don't hold up. 1=1 will equate to true so saying WHERE 1=1 or WHERE TRUE, and TRUE is always TRUE, ensures the query will have at least one WHERE clause condition that will always hold up.

Update date range in Postgres table

I have table with dates:
select id,date date_ranges where range_id = 1;
1 2016-04-12
2 2016-04-13
3 2016-04-14
also i have an array:
example:
array('2016-04-11','2016-04-12','2016-04-13','2016-04-14','2016-04-15')
or
array('2016-04-13','2016-04-14','2016-04-15')
How can i insert new values from array to my table without changing existing table values?
And if i have second array, how can i delete value 2016-04-12 from table?
Help plz, I need one query)
WITH current_values AS (
SELECT generate_series('2016-04-13'::DATE, '2016-04-17'::DATE, '1 day')::DATE AS date
),
deleted_values AS (
DELETE FROM date_ranges WHERE date NOT IN (SELECT * FROM current_values) RETURNING id
)
INSERT INTO date_ranges ("date", range_id)
WITH new_values AS (
SELECT new."date"
FROM current_values AS new
LEFT JOIN date_ranges AS old
ON old."date" = new."date"
WHERE old.id IS NULL
)
SELECT date, 1 FROM new_values;

TSQL - Run date comparison for "duplicates"/false positives on initial query?

I'm pretty new to SQL and am working on pulling some data from several very large tables for analysis. The data is basically triggered events for assets on a system. The events all have a created_date (datetime) field that I care about.
I was able to put together the query below to get the data I need (YAY):
SELECT
event.efkey
,event.e_id
,event.e_key
,l.l_name
,event.created_date
,asset.a_id
,asset.asset_name
FROM event
LEFT JOIN asset
ON event.a_key = asset.a_key
LEFT JOIN l
ON event.l_key = l.l_key
WHERE event.e_key IN (350, 352, 378)
ORDER BY asset.a_id, event.created_date
However, while this gives me the data for the specific events I want, I still have another problem. Assets can trigger these events repeatedly, which can result in large numbers of "false positives" for what I'm looking at.
What I need to do is go through the result set of the query above and remove any events for an asset that occur closer than N minutes together (say 30 minutes for this example). So IF the asset_ID is the same AND the event.created_date is within 30 minutes of another event for that asset in the set THEN I want that removed. For example:
For the following records
a_id 1124 created 2016-02-01 12:30:30
a_id 1124 created 2016-02-01 12:35:31
a_id 1124 created 2016-02-01 12:40:33
a_id 1124 created 2016-02-01 12:45:42
a_id 1124 created 2016-02-02 12:30:30
a_id 1124 created 2016-02-02 13:00:30
a_id 1115 created 2016-02-01-12:30:30
I'd want to return only:
a_id 1124 created 2016-02-01 12:30:30
a_id 1124 created 2016-02-02 12:30:30
a_id 1124 created 2016-02-02 13:00:30
a_id 1115 created 2016-02-01-12:30:30
I tried referencing this and this but I can't make the concepts there work for me. I know I probably need to do a SELECT * FROM (my existing query) but I can't seem to do that without ending up with tons of "multi-part identifier can't be bound" errors (and I have no experience creating temp tables, my attempts at that have failed thus far). I also am not exactly sure how to use DATEDIFF as the date filtering function.
Any help would be greatly appreciated! If you could dumb it down for a novice (or link to explanations) that would also be helpful!
This is a trickier problem than it initially appears. The hard part is capturing the previous good row and removing the next bad rows but not allowing those bad rows to influence whether or not the next row is good. Here is what I came up with. I've tried to explain what is going on with comments in the code.
--sample data since I don't have your table structure and your original query won't work for me
declare #events table
(
id int,
timestamp datetime
)
--note that I changed some of your sample data to test some different scenarios
insert into #events values( 1124, '2016-02-01 12:30:30')
insert into #events values( 1124, '2016-02-01 12:35:31')
insert into #events values( 1124, '2016-02-01 12:40:33')
insert into #events values( 1124, '2016-02-01 13:05:42')
insert into #events values( 1124, '2016-02-02 12:30:30')
insert into #events values( 1124, '2016-02-02 13:00:30')
insert into #events values( 1115, '2016-02-01 12:30:30')
--using a cte here to split the result set of your query into groups
--by id (you would want to partition by whatever criteria you use
--to determine that rows are talking about the same event)
--the row_number function gets the row number for each row within that
--id partition
--the over clause specifies how to break up the result set into groups
--(partitions) and what order to put the rows in within that group so
--that the numbering stays consistant
;with orderedEvents as
(
select id, timestamp, row_number() over (partition by id order by timestamp) as rn
from #events
--you would replace #events here with your query
)
--using a second recursive cte here to determine which rows are "good"
--and which ones are not.
, previousGoodTimestamps as
(
--this is the "seeding" part of the recursive cte where I pick the
--first rows of each group as being a desired result. Since they
--are the first in each group, I know they are good. I also assign
--their timestamp as the previous good timestamp since I know that
--this row is good.
select id, timestamp, rn, timestamp as prev_good_timestamp, 1 as is_good
from orderedEvents
where rn = 1
union all
--this is the recursive part of the cte. It takes the rows we have
--already added to this result set and joins those to the "next" rows
--(as defined by our ordering in the first cte). Then we output
--those rows and do some calculations to determine if this row is
--"good" or not. If it is "good" we set it's timestamp as the
--previous good row timestamp so that rows that come after this one
--can use it to determine if they are good or not. If a row is "bad"
--we just forward along the last known good timestamp to the next row.
--
--We also determine if a row is good by checking if the last good row
--timestamp plus 30 minutes is less than or equal to the current row's
--timestamp. If it is then the row is good.
select e2.id
, e2.timestamp
, e2.rn
, last_good_timestamp.timestamp
, case
when dateadd(mi, 30, last_good_timestamp.timestamp) <= e2.timestamp then 1
else 0
end
from previousGoodTimestamps e1
inner join orderedEvents e2 on e2.id = e1.id and e2.rn = e1.rn + 1
--I used a cross apply here to calculate the last good row timestamp
--once. I could have used two identical subqueries above in the select
--and case statements, but I would rather not duplicate the code.
cross apply
(
select case
when e1.is_good = 1 then e1.timestamp --if the last row is good, just use it's timestamp
else e1.prev_good_timestamp --the last row was bad, forward on what it had for the last good timestamp
end as timestamp
) last_good_timestamp
)
select *
from previousGoodTimestamps
where is_good = 1 --only take the "good" rows
Links to MSDN for some of the more complicated things here:
CTEs and Recursive CTEs
CROSS APPLY
-- Sample data.
declare #Samples as Table ( Id Int Identity, A_Id Int, CreatedDate DateTime );
insert into #Samples ( A_Id, CreatedDate ) values
( 1124, '2016-02-01 12:30:30' ),
( 1124, '2016-02-01 12:35:31' ),
( 1124, '2016-02-01 12:40:33' ),
( 1124, '2016-02-01 12:45:42' ),
( 1124, '2016-02-02 12:30:30' ),
( 1124, '2016-02-02 13:00:30' ),
( 1125, '2016-02-01 12:30:30' );
select * from #Samples;
-- Calculate the windows of 30 minutes before and after each CreatedDate and check for conflicts with other rows.
with Ranges as (
select Id, A_Id, CreatedDate,
DateAdd( minute, -30, S.CreatedDate ) as RangeStart, DateAdd( minute, 30, S.CreatedDate ) as RangeEnd
from #Samples as S )
select Id, A_Id, CreatedDate, RangeStart, RangeEnd,
-- Check for a conflict with another row with:
-- the same A_Id value and an earlier CreatedDate that falls inside the +/-30 minute range.
case when exists ( select 42 from #Samples where A_Id = R.A_Id and CreatedDate < R.CreatedDate and R.RangeStart < CreatedDate and CreatedDate < R.RangeEnd ) then 1
else 0 end as Conflict
from Ranges as R;

Most efficient way of selecting the changes between timestamped snapshots

I have a table that holds data about items that existed at a certain time - regular snapshots taken.
Simple example:
Timestamp ID
1 A
1 B
2 A
2 B
2 C
3 A
3 D
4 D
4 E
In this case, Item C gets created sometime between snapshot 1 and 2 and sometime between snapshot 2 and 3 B and C disappear and D gets created, etc.
The table is reasonably large (millions of records) and for each timestamp there are about 50 records.
What's the most efficient way of selecting the item IDs for items that disappear between two consecutive timestamps?
So for the above example ...
Between 1 and 2: NULL
Between 2 and 3: B, C
Between 3 and 4: A
If it doesn't make the query inefficient, can it be extended to automatically use the latest (i.e. MAX) timestamp and the previous one?
Another way to view this is that you want to find records that exist in timestamp #1 that do not exist in timestamp #2. The easiest way?
SELECT Timestamp
FROM records AS t1
WHERE NOT EXISTS (SELECT 1 FROM records AS t2 WHERE t2.id = t1.id AND t2.Timestamp = t1.Timestamp + 1)
Of course, I'm exploiting here the fact that your example timestamps are integers, when in reality I imagine they are genuine timestamps. But it turns out the integers work so well for this particular purpose, they'd be really handy to have around. So, perhaps we should make a numbered list of all available timestamps. The easiest way to get that?
CREATE TEMPORARY TABLE timestamp_map AS (
timestamp_id AS INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
timestamp_value AS DATETIME
);
INSERT INTO timestamp_map (timestamp_value) (SELECT DISTINCT timestamp FROM records ORDER BY timestamp);
(You could also maintain such a table permanently by use of triggers.)
It's a bit out there, but I've gotten similar techniques to work very efficiently in the past for data like what you describe, when lots of back-and-forth subqueries and NOT EXISTS proved too slow.
Update:
See this entry in my blog for performance details:
MySQL: difference between sets
SELECT ts,
(
SELECT GROUP_CONCAT(id)
FROM mytable mi
WHERE mi.ts =
(
SELECT MAX(ts)
FROM mytable mp
WHERE mp.ts = mo.pts
)
AND NOT EXISTS
(
SELECT NULL
FROM mytable mn
WHERE mn.ts = mo.ts
AND mn.id = mi.id
)
)
FROM (
SELECT #r AS pts,
#r := ts AS ts
FROM (
SELECT #r := NULL
) vars,
(
SELECT DISTINCT ts
FROM mytable
) moo
) mo
To select only the last change:
SELECT ts,
(
SELECT GROUP_CONCAT(id)
FROM mytable mi
WHERE mi.ts =
(
SELECT MAX(ts)
FROM mytable mp
WHERE mp.ts < mo.ts
)
AND NOT EXISTS
(
SELECT NULL
FROM mytable mn
WHERE mn.ts = mo.ts
AND mn.id = mi.id
)
)
FROM (
SELECT MAX(ts) AS ts
FROM mytable
) mo
For this to be efficient, you need to have a composite index on mytable (timestamp, id) (in this order).

Trip time calculation in relational databases?

I had this question in mind and since I just discovered this site I decided to post it here.
Let's say I have a table with a timestamp and a state for a given "object" (generic meaning, not OOP object); is there an optimal way to calculate the time between a state and the next occurrence of another (or same) state (what I call a "trip") with a single SQL statement (inner SELECTs and UNIONs aren't counted)?
Ex: For the following, the trip time between Initial and Done would be 6 days, but between Initial and Review it would be 2 days.
2008-08-01 13:30:00 - Initial
2008-08-02 13:30:00 - Work
2008-08-03 13:30:00 - Review
2008-08-04 13:30:00 - Work
2008-08-05 13:30:00 - Review
2008-08-06 13:30:00 - Accepted
2008-08-07 13:30:00 - Done
No need to be generic, just say what SGBD your solution is specific to if not generic.
Here's an Oracle methodology using an analytic function.
with data as (
SELECT 1 trip_id, to_date('20080801 13:30:00','YYYYMMDD HH24:mi:ss') dt, 'Initial' step from dual UNION ALL
SELECT 1 trip_id, to_date('20080802 13:30:00','YYYYMMDD HH24:mi:ss') dt, 'Work' step from dual UNION ALL
SELECT 1 trip_id, to_date('20080803 13:30:00','YYYYMMDD HH24:mi:ss') dt, 'Review' step from dual UNION ALL
SELECT 1 trip_id, to_date('20080804 13:30:00','YYYYMMDD HH24:mi:ss') dt, 'Work' step from dual UNION ALL
SELECT 1 trip_id, to_date('20080805 13:30:00','YYYYMMDD HH24:mi:ss') dt, 'Review' step from dual UNION ALL
SELECT 1 trip_id, to_date('20080806 13:30:00','YYYYMMDD HH24:mi:ss') dt, 'Accepted' step from dual UNION ALL
SELECT 1 trip_id, to_date('20080807 13:30:00','YYYYMMDD HH24:mi:ss') dt, 'Done' step from dual )
select trip_id,
step,
dt - lag(dt) over (partition by trip_id order by dt) trip_time
from data
/
1 Initial
1 Work 1
1 Review 1
1 Work 1
1 Review 1
1 Accepted 1
1 Done 1
These are very commonly used in situations where traditionally we might use a self-join.
PostgreSQL syntax :
DROP TABLE ObjectState;
CREATE TABLE ObjectState (
object_id integer not null,--foreign key
event_time timestamp NOT NULL,
state varchar(10) NOT NULL,
--Other fields
CONSTRAINT pk_ObjectState PRIMARY KEY (object_id,event_time)
);
For given state find first folowing state of given type
select parent.object_id,parent.event_time,parent.state,min(child.event_time) as ch_event_time,min(child.event_time)-parent.event_time as step_time
from
ObjectState parent
join ObjectState child on (parent.object_id=child.object_id and parent.event_time<child.event_time)
where
--Starting state
parent.object_id=1 and parent.event_time=to_timestamp('01-Aug-2008 13:30:00','dd-Mon-yyyy hh24:mi:ss')
--needed state
and child.state='Review'
group by parent.object_id,parent.event_time,parent.state;
This query is not the shortest posible but it should be easy to understand and used as part of other queries :
List events and their duration for given object
select parent.object_id,parent.event_time,parent.state,min(child.event_time) as ch_event_time,
CASE WHEN parent.state<>'Done' and min(child.event_time) is null THEN (select localtimestamp)-parent.event_time ELSE min(child.event_time)-parent.event_time END as step_time
from
ObjectState parent
left outer join ObjectState child on (parent.object_id=child.object_id and parent.event_time<child.event_time)
where parent.object_id=4
group by parent.object_id,parent.event_time,parent.state
order by parent.object_id,parent.event_time,parent.state;
List current states for objects that are not "done"
select states.object_id,states.event_time,states.state,(select localtimestamp)-states.event_time as step_time
from
(select parent.object_id,parent.event_time,parent.state,min(child.event_time) as ch_event_time,min(child.event_time)-parent.event_time as step_time
from
ObjectState parent
left outer join ObjectState child on (parent.object_id=child.object_id and parent.event_time<child.event_time)
group by parent.object_id,parent.event_time,parent.state) states
where
states.object_id not in (select object_id from ObjectState where state='Done')
and ch_event_time is null;
Test data
insert into ObjectState (object_id,event_time,state)
select 1,to_timestamp('01-Aug-2008 13:30:00','dd-Mon-yyyy hh24:mi:ss'),'Initial' union all
select 1,to_timestamp('02-Aug-2008 13:40:00','dd-Mon-yyyy hh24:mi:ss'),'Work' union all
select 1,to_timestamp('03-Aug-2008 13:50:00','dd-Mon-yyyy hh24:mi:ss'),'Review' union all
select 1,to_timestamp('04-Aug-2008 14:30:00','dd-Mon-yyyy hh24:mi:ss'),'Work' union all
select 1,to_timestamp('04-Aug-2008 16:20:00','dd-Mon-yyyy hh24:mi:ss'),'Review' union all
select 1,to_timestamp('06-Aug-2008 18:00:00','dd-Mon-yyyy hh24:mi:ss'),'Accepted' union all
select 1,to_timestamp('07-Aug-2008 21:30:00','dd-Mon-yyyy hh24:mi:ss'),'Done';
insert into ObjectState (object_id,event_time,state)
select 2,to_timestamp('01-Aug-2008 13:30:00','dd-Mon-yyyy hh24:mi:ss'),'Initial' union all
select 2,to_timestamp('02-Aug-2008 13:40:00','dd-Mon-yyyy hh24:mi:ss'),'Work' union all
select 2,to_timestamp('07-Aug-2008 13:50:00','dd-Mon-yyyy hh24:mi:ss'),'Review' union all
select 2,to_timestamp('14-Aug-2008 14:30:00','dd-Mon-yyyy hh24:mi:ss'),'Work' union all
select 2,to_timestamp('15-Aug-2008 16:20:00','dd-Mon-yyyy hh24:mi:ss'),'Review' union all
select 2,to_timestamp('16-Aug-2008 18:02:00','dd-Mon-yyyy hh24:mi:ss'),'Accepted' union all
select 2,to_timestamp('17-Aug-2008 22:10:00','dd-Mon-yyyy hh24:mi:ss'),'Done';
insert into ObjectState (object_id,event_time,state)
select 3,to_timestamp('12-Sep-2008 13:30:00','dd-Mon-yyyy hh24:mi:ss'),'Initial' union all
select 3,to_timestamp('13-Sep-2008 13:40:00','dd-Mon-yyyy hh24:mi:ss'),'Work' union all
select 3,to_timestamp('14-Sep-2008 13:50:00','dd-Mon-yyyy hh24:mi:ss'),'Review' union all
select 3,to_timestamp('15-Sep-2008 14:30:00','dd-Mon-yyyy hh24:mi:ss'),'Work' union all
select 3,to_timestamp('16-Sep-2008 16:20:00','dd-Mon-yyyy hh24:mi:ss'),'Review';
insert into ObjectState (object_id,event_time,state)
select 4,to_timestamp('21-Aug-2008 03:10:00','dd-Mon-yyyy hh24:mi:ss'),'Initial' union all
select 4,to_timestamp('22-Aug-2008 03:40:00','dd-Mon-yyyy hh24:mi:ss'),'Work' union all
select 4,to_timestamp('23-Aug-2008 03:20:00','dd-Mon-yyyy hh24:mi:ss'),'Review' union all
select 4,to_timestamp('24-Aug-2008 04:30:00','dd-Mon-yyyy hh24:mi:ss'),'Work';
I don't think you can get that answer with one SQL statement as you are trying to obtain one result from many records. The only way to achieve that in SQL is to get the timestamp field for two different records and calculate the difference (datediff). Therefore, UNIONS or Inner Joins are needed.
I'm not sure I understand the question exactly, but you can do something like the following which reads the table in one pass then uses a derived table to calculate it. SQL Server code:
CREATE TABLE #testing
(
eventdatetime datetime NOT NULL,
state varchar(10) NOT NULL
)
INSERT INTO #testing (
eventdatetime,
state
)
SELECT '20080801 13:30:00', 'Initial' UNION ALL
SELECT '20080802 13:30:00', 'Work' UNION ALL
SELECT '20080803 13:30:00', 'Review' UNION ALL
SELECT '20080804 13:30:00', 'Work' UNION ALL
SELECT '20080805 13:30:00', 'Review' UNION ALL
SELECT '20080806 13:30:00', 'Accepted' UNION ALL
SELECT '20080807 13:30:00', 'Done'
SELECT DATEDIFF(dd, Initial, Review)
FROM (
SELECT MIN(CASE WHEN state='Initial' THEN eventdatetime END) AS Initial,
MIN(CASE WHEN state='Review' THEN eventdatetime END) AS Review
FROM #testing
) AS A
DROP TABLE #testing
It is probably easier if you have a sequence number as well as the time-stamp: in most RDBMSs you can create an auto-increment column and not change any of the INSERT statements. Then you join the table with a copy of itself to get the deltas
select after.moment - before.moment, before.state, after.state
from object_states before, object_states after
where after.sequence + 1 = before.sequence
(where the details of SQL syntax will vary according to which database system).
-- Oracle SQl
CREATE TABLE ObjectState
(
startdate date NOT NULL,
state varchar2(10) NOT NULL
);
insert into ObjectState
select to_date('01-Aug-2008 13:30:00','dd-Mon-rrrr hh24:mi:ss'),'Initial' union all
select to_date('02-Aug-2008 13:30:00','dd-Mon-rrrr hh24:mi:ss'),'Work' union all
select to_date('03-Aug-2008 13:30:00','dd-Mon-rrrr hh24:mi:ss'),'Review' union all
select to_date('04-Aug-2008 13:30:00','dd-Mon-rrrr hh24:mi:ss'),'Work' union all
select to_date('05-Aug-2008 13:30:00','dd-Mon-rrrr hh24:mi:ss'),'Review' union all
select to_date('06-Aug-2008 13:30:00','dd-Mon-rrrr hh24:mi:ss'),'Accepted' union all
select to_date('07-Aug-2008 13:30:00','dd-Mon-rrrr hh24:mi:ss'),'Done';
-- Days in between two states
select o2.startdate - o1.startdate as days
from ObjectState o1, ObjectState o2
where o1.state = 'Initial'
and o2.state = 'Review';
create table A (
At datetime not null,
State varchar(20) not null
)
go
insert into A(At,State)
select '2008-08-01T13:30:00','Initial' union all
select '2008-08-02T13:30:00','Work' union all
select '2008-08-03T13:30:00','Review' union all
select '2008-08-04T13:30:00','Work' union all
select '2008-08-05T13:30:00','Review' union all
select '2008-08-06T13:30:00','Accepted' union all
select '2008-08-07T13:30:00','Done'
go
--Find trip time from Initial to Done
select DATEDIFF(day,t1.At,t2.At)
from
A t1
inner join
A t2
on
t1.State = 'Initial' and
t2.State = 'Review' and
t1.At < t2.At
left join
A t3
on
t3.State = 'Initial' and
t3.At > t1.At and
t4.At < t2.At
left join
A t4
on
t4.State = 'Review' and
t4.At < t2.At and
t4.At > t1.At
where
t3.At is null and
t4.At is null
Didn't say whether joins were allowed or not. Joins to t3 and t4 (and their comparisons) let you say whether you want the earliest or latest occurrence of the start and end states (in this case, I'm asking for latest "Initial" and earliest "Review")
In real code, my start and end states would be parameters
Edit: Oops, need to include "t3.At < t2.At" and "t4.At > t1.At", to fix some odd sequences of States (e.g. If we removed the second "Review" and then queried from "Work" to "Review", the original query will fail)
I think that your steps (each record of your trip can be seen as a step) can be somewhere grouped together as part of the same activity. It is then possible to group your data on it, as, for example:
SELECT Min(Tbl_Step.dateTimeStep) as tripBegin, _
Max(Tbl_Step.dateTimeStep) as tripEnd _
FROM
Tbl_Step
WHERE
id_Activity = 'AAAAAAA'
Using this principle, you can then calculate other aggregates like the number of steps in the activity and so on. But you will not find an SQL way to calculate values like gap between 2 steps, as such a data does not belong either to the first or to the second step. Some reporting tools use what they call "running sums" to calculate such intermediate data. Depending on your objectives, this might be a solution for you.
I tried to do this in MySQL. You would need to use a variable since there is no rank function in MySQL, so it would go like this:
set #trip1 = 0; set #trip2 = 0;
SELECT trip1.`date` as startdate, datediff(trip2.`date`, trip1.`date`) length_of_trip
FROM
(SELECT #trip1 := #trip1 + 1 as rank1, `date` from trip where state='Initial') as trip1
INNER JOIN
(SELECT #trip2 := #trip2 + 1 as rank2, `date` from trip where state='Done') as trip2
ON rank1 = rank2;
I am assuming that you want to calculate the time between 'Initial' and 'Done' states.
+---------------------+----------------+
| startdate | length_of_trip |
+---------------------+----------------+
| 2008-08-01 13:30:00 | 6 |
+---------------------+----------------+
Ok, this is a bit beyond geeky, but I built a web application to track my wife's contractions just before we had a baby so that I could see from work when it was getting close to time to go to the hospital. Anyway, I built this basic thing fairly easily as two views.
create table contractions time_date timestamp primary key;
create view contraction_time as
SELECT a.time_date, max(b.prev_time) AS prev_time
FROM contractions a, ( SELECT contractions.time_date AS prev_time
FROM contractions) b
WHERE b.prev_time < a.time_date
GROUP BY a.time_date;
create view time_between as
SELECT contraction_time.time_date, contraction_time.prev_time, contraction_time.time_date - contraction_time.prev_time
FROM contraction_time;
This could be done as a subselect obviously as well, but I used the intermediate views for other things as well, and so this worked out well.