Time difference between rows based on condition - sql

Not really an expert with SQL and im having problems figuring out how to do this one.
Got a table like this one:
ID
Message
TimeStamp
User
1
Hello
2022-08-01 10:00:00
A
1
How are you?
2022-08-01 10:00:05
A
1
Hello there
2022-08-01 10:00:10
B
1
I am okay
2022-08-01 10:00:12
B
1
Good to know
2022-08-01 10:00:15
A
1
Bye
2022-08-01 10:00:25
B
2
Hello
2022-08-01 10:02:50
A
2
Hi
2022-08-01 10:03:50
B
I need to calculate the time difference each time there is a response from the B user after a message from A.
Expected result would be like this
ID
Difference
1
5
1
10
2
60
Trying to use Lead function to obtain the next desired timestamp but im not getting the expected result
Any tips or advice?
Thanks

Even if it is already answered - it's a nice use case for Vertica's MATCH() clause. Looking for a pattern consisting of sender = 'A' followed by sender = 'B'.
You get a pattern id, and then you can group by other stuff plus the pattern id to get max timestamp and min timestamp.
Also note that I renamed both "user" and "timestamp", as they are reserved words...
-- your input, don't use in final query
indata(ID,Message,ts,Usr) AS (
SELECT 1,'Hello' ,TIMESTAMP '2022-08-01 10:00:00','A'
UNION ALL SELECT 1,'How are you?',TIMESTAMP '2022-08-01 10:00:05','A'
UNION ALL SELECT 1,'Hello there' ,TIMESTAMP '2022-08-01 10:00:10','B'
UNION ALL SELECT 1,'I am okay' ,TIMESTAMP '2022-08-01 10:00:12','B'
UNION ALL SELECT 1,'Good to know',TIMESTAMP '2022-08-01 10:00:15','A'
UNION ALL SELECT 1,'Bye' ,TIMESTAMP '2022-08-01 10:00:25','B'
UNION ALL SELECT 2,'Hello' ,TIMESTAMP '2022-08-01 10:02:50','A'
UNION ALL SELECT 2,'Hi' ,TIMESTAMP '2022-08-01 10:03:50','B'
)
-- end of input, real query starts here , replace following comma with "WITH"
,
w_match_clause AS (
SELECT
*
, event_name()
, pattern_id()
, match_id()
FROM indata
MATCH (
PARTITION BY id ORDER BY ts
DEFINE
sentbya AS usr='A'
, sentbyb AS usr='B'
PATTERN
p AS (sentbya sentbyb)
)
-- ctl SELECT * FROM w_match_clause;
-- ctl ID | Message | ts | Usr | event_name | pattern_id | match_id
-- ctl ----+--------------+---------------------+-----+------------+------------+----------
-- ctl 1 | How are you? | 2022-08-01 10:00:05 | A | sentbya | 1 | 1
-- ctl 1 | Hello there | 2022-08-01 10:00:10 | B | sentbyb | 1 | 2
-- ctl 1 | Good to know | 2022-08-01 10:00:15 | A | sentbya | 2 | 1
-- ctl 1 | Bye | 2022-08-01 10:00:25 | B | sentbyb | 2 | 2
-- ctl 2 | Hello | 2022-08-01 10:02:50 | A | sentbya | 1 | 1
-- ctl 2 | Hi | 2022-08-01 10:03:50 | B | sentbyb | 1 | 2
)
SELECT
id
, MAX(ts) - MIN(ts) AS difference
FROM w_match_clause
GROUP BY
id
, pattern_id
ORDER BY
id;
-- out id | difference
-- out ----+------------
-- out 1 | 00:00:05
-- out 1 | 00:00:10
-- out 2 | 00:01

With mySQL 8.0:
WITH cte AS (
SELECT id, user, timestamp, ( LEAD(user) OVER (ORDER BY timestamp) ) AS to_user,
TIME_TO_SEC(TIMEDIFF(LEAD(timestamp) OVER (ORDER BY timestamp), timestamp)) AS time_diff
FROM msg_tab
)
SELECT id, time_diff
FROM cte
WHERE user='A' AND to_user IN ('B', NULL)

Related

SQL query to find the visitor together with the date time

My visitor log table has id, visitor, department,vtime fields.
id | visitor | Visittime | Department_id
--------------------------------------------------------------
1 1 2019-05-07 13:53:50 1
2 2 2019-05-07 13:56:54 1
3 1 2019-05-07 14:54:10 3
4 2 2019-05-08 13:54:49 1
5 1 2019-05-08 13:58:15 1
6 2 2019-05-08 18:54:30 2
7 1 2019-05-08 18:54:37 2
And I have already have the following index
CREATE INDEX Idx_VisitorLog_Visitor_VisitTime_Includes ON VisitorLog
(Visitor, VisitTime) INCLUDE (DepartmentId, ID)
From the above table 4 filters are passed from User interface, visitor 1 and visitor 2 and visiting start time and end time.
In what are the department visitor 1 and visitor 2 both together with the VisitTime difference with in 5 mins those need to be filtered
Output shout be
id | visitor | Visittime | Department_id
--------------------------------------------------------------
1 1 2019-05-07 13:53:50 1
2 2 2019-05-07 13:56:54 1
4 2 2019-05-08 13:54:49 1
5 1 2019-05-08 13:58:15 1
For that I had used the following query,
;with CTE1 AS(
Select id,visitor,Visittime,department_id from visitorlog where visitor=1
)
,CTE2 AS(
Select id,visitor,Visittime,department_id from visitorlog where visitor=2
)
select * from CTE2 V2
Inner join CTE1 V1 on V2.department_id=V1.department_id and DATEDIFF(minute,V2.Visittime,V1.Visittime)between -5 and 5**
The above query takes too much of time to give response. Because in my table, almost 20 million records are available
Could any one suggest the correct way for my requirement.
Thanks in advance
This is a completely revised answer, based upon your additional information above.
After reviewing the data file above and the results you desire, this seems like the cleanest way to provide your results. First, we need a different index:
create index idx_POC_visitorlog on visitorlog
(visitor, Department_id, Visittime) include(id);
With this index, we can limit the queries to only the two passed in IDs. To simulate that, I created variables to hold their values. This query returns the data you are looking for.
DECLARE #Visitor1 int = 1,
#Visitor2 int = 2
;with t as (
select Department_id,
dateadd(minute, -5, visittime) as EarlyTime,
dateadd(minute, 5, Visittime) as LateTime,
id
from visitorlog
where visitor = #Visitor1
),
v as (
select v.id,
t.id as tid
from visitorlog v
INNER JOIN t
ON v.visitor = #Visitor2
AND v.Department_id = t.Department_id
and v.Visittime BETWEEN t.EarlyTime and t.LateTime
)
SELECT *
FROM visitorlog vl
WHERE ID IN (
SELECT v.id
FROM v
UNION
SELECT v.tid
FROM v
)
ORDER BY visittime;
If your version of SQL Server supports the LAG and LEAD functions, try rewriting the query as follows:
with t as (
select
*,
dateadd(minute, 5,
lag(Visittime) over(partition by Department_id order by Visittime)) lag_visit_time,
dateadd(minute, -5,
lead(Visittime) over(partition by Department_id order by Visittime)) lead_visit_time
from visitorlog
where visitor in(1, 2)
)
select
id, visitor, visittime, department_id
from t
where lag_visit_time >= Visittime or lead_visit_time <= Visittime;
This index is called a POC.
Results:
+----+---------+----------------------+---------------+
| id | visitor | visittime | department_id |
+----+---------+----------------------+---------------+
| 1 | 1 | 2019-05-07T13:53:50Z | 1 |
| 2 | 2 | 2019-05-07T13:56:54Z | 1 |
| 4 | 2 | 2019-05-08T13:54:49Z | 1 |
| 5 | 1 | 2019-05-08T13:58:15Z | 1 |
| 6 | 2 | 2019-05-08T18:54:30Z | 2 |
| 7 | 1 | 2019-05-08T18:54:37Z | 2 |
+----+---------+----------------------+---------------+
Demo.

Possible to use a column name in a UDF in SQL?

I have a query in which a series of steps is repeated constantly over different columns, for example:
SELECT DISTINCT
MAX (
CASE
WHEN table_2."GRP1_MINIMUM_DATE" <= cohort."ANCHOR_DATE" THEN 1
ELSE 0
END)
OVER (PARTITION BY cohort."USER_ID")
AS "GRP1_MINIMUM_DATE",
MAX (
CASE
WHEN table_2."GRP2_MINIMUM_DATE" <= cohort."ANCHOR_DATE" THEN 1
ELSE 0
END)
OVER (PARTITION BY cohort."USER_ID")
AS "GRP2_MINIMUM_DATE"
FROM INPUT_COHORT cohort
LEFT JOIN INVOLVE_EVER table_2 ON cohort."USER_ID" = table_2."USER_ID"
I was considering writing a function to accomplish this as doing so would save on space in my query. I have been reading a bit about UDF in SQL but don't yet understand if it is possible to pass a column name in as a parameter (i.e. simply switch out "GRP1_MINIMUM_DATE" for "GRP2_MINIMUM_DATE" etc.). What I would like is a query which looks like this
SELECT DISTINCT
FUNCTION(table_2."GRP1_MINIMUM_DATE") AS "GRP1_MINIMUM_DATE",
FUNCTION(table_2."GRP2_MINIMUM_DATE") AS "GRP2_MINIMUM_DATE",
FUNCTION(table_2."GRP3_MINIMUM_DATE") AS "GRP3_MINIMUM_DATE",
FUNCTION(table_2."GRP4_MINIMUM_DATE") AS "GRP4_MINIMUM_DATE"
FROM INPUT_COHORT cohort
LEFT JOIN INVOLVE_EVER table_2 ON cohort."USER_ID" = table_2."USER_ID"
Can anyone tell me if this is possible/point me to some resource that might help me out here?
Thanks!
There is no such direct as #Tejash already stated, but the thing looks like your database model is not ideal - it would be better to have a table that has USER_ID and GRP_ID as keys and then MINIMUM_DATE as seperate field.
Without changing the table structure, you can use UNPIVOT query to mimic this design:
WITH INVOLVE_EVER(USER_ID, GRP1_MINIMUM_DATE, GRP2_MINIMUM_DATE, GRP3_MINIMUM_DATE, GRP4_MINIMUM_DATE)
AS (SELECT 1, SYSDATE, SYSDATE, SYSDATE, SYSDATE FROM dual UNION ALL
SELECT 2, SYSDATE-1, SYSDATE-2, SYSDATE-3, SYSDATE-4 FROM dual)
SELECT *
FROM INVOLVE_EVER
unpivot ( minimum_date FOR grp_id IN ( GRP1_MINIMUM_DATE AS 1, GRP2_MINIMUM_DATE AS 2, GRP3_MINIMUM_DATE AS 3, GRP4_MINIMUM_DATE AS 4))
Result:
| USER_ID | GRP_ID | MINIMUM_DATE |
|---------|--------|--------------|
| 1 | 1 | 09/09/19 |
| 1 | 2 | 09/09/19 |
| 1 | 3 | 09/09/19 |
| 1 | 4 | 09/09/19 |
| 2 | 1 | 09/08/19 |
| 2 | 2 | 09/07/19 |
| 2 | 3 | 09/06/19 |
| 2 | 4 | 09/05/19 |
With this you can write your query without further code duplication and if you need use PIVOT-syntax to get one line per USER_ID.
The final query could then look like this:
WITH INVOLVE_EVER(USER_ID, GRP1_MINIMUM_DATE, GRP2_MINIMUM_DATE, GRP3_MINIMUM_DATE, GRP4_MINIMUM_DATE)
AS (SELECT 1, SYSDATE, SYSDATE, SYSDATE, SYSDATE FROM dual UNION ALL
SELECT 2, SYSDATE-1, SYSDATE-2, SYSDATE-3, SYSDATE-4 FROM dual)
, INPUT_COHORT(USER_ID, ANCHOR_DATE)
AS (SELECT 1, SYSDATE-1 FROM dual UNION ALL
SELECT 2, SYSDATE-2 FROM dual UNION ALL
SELECT 3, SYSDATE-3 FROM dual)
-- Above is sampledata query starts from here:
, unpiv AS (SELECT *
FROM INVOLVE_EVER
unpivot ( minimum_date FOR grp_id IN ( GRP1_MINIMUM_DATE AS 1, GRP2_MINIMUM_DATE AS 2, GRP3_MINIMUM_DATE AS 3, GRP4_MINIMUM_DATE AS 4)))
SELECT qcsj_c000000001000000 user_id, GRP1_MINIMUM_DATE, GRP2_MINIMUM_DATE, GRP3_MINIMUM_DATE, GRP4_MINIMUM_DATE
FROM INPUT_COHORT cohort
LEFT JOIN unpiv table_2
ON cohort.USER_ID = table_2.USER_ID
pivot (MAX(CASE WHEN minimum_date <= cohort."ANCHOR_DATE" THEN 1 ELSE 0 END) AS MINIMUM_DATE
FOR grp_id IN (1 AS GRP1,2 AS GRP2,3 AS GRP3,4 AS GRP4))
Result:
| USER_ID | GRP1_MINIMUM_DATE | GRP2_MINIMUM_DATE | GRP3_MINIMUM_DATE | GRP4_MINIMUM_DATE |
|---------|-------------------|-------------------|-------------------|-------------------|
| 3 | | | | |
| 1 | 0 | 0 | 0 | 0 |
| 2 | 0 | 1 | 1 | 1 |
This way you only have to write your calculation logic once (see line starting with pivot).

How to find next row in ordered table that matches a condition, given an initial match in SQL

I'm querying a table that contains state transitions for a state engine. The table is set up so that it has the previous_state, current_state, and timestamp of the transition, grouped by unique ids.
My goal is to find a sequence of target intervals, defined as timestamp of the initial state transition (eg timestamp when we shift from from 1->2), and timestamp of the target next state transition that matches a specific condition (eg the next timestamp that current_state=3 OR current_state=4).
state_transition_table
+------------+---------------+-----------+----+
| prev_state | current_state | timestamp | id |
+------------+---------------+-----------+----+
| 1 | 2 | 4.5 | 1 |
| 2 | 3 | 5.2 | 1 |
| 3 | 1 | 5.4 | 1 |
| 1 | 2 | 10.3 | 1 |
| 2 | 5 | 10.4 | 1 |
| 5 | 4 | 10.8 | 1 |
| 4 | 1 | 11.0 | 1 |
| 1 | 2 | 12.3 | 1 |
| 2 | 3 | 13.5 | 1 |
| 3 | 1 | 13.6 | 1 |
+------------+---------------+-----------+----+
Within a given id, we want to find all intervals that start with 1->2 (easy enough query), and end with either state 3 or 4.
1->2->anything->3 or 4
An example output table given the input above would have the three states and the timestamps for when we transition between the states:
target output
+------------+---------------+------------+-----------+-----------+
| prev_state | current_state | end_state | curr_time | end_time |
+------------+---------------+------------+-----------+-----------+
| 1 | 2 | 3 | 4.5 | 5.2 |
| 1 | 2 | 4 | 10.3 | 10.8 |
| 1 | 2 | 3 | 12.3 | 13.5 |
+------------+---------------+------------+-----------+-----------+
The best query I could come up with is using window functions in a sub-table, and then creating the new columns from that table. But this solution only finds the next row following the initial transition, and doesnt allow other states to occur between then and when our target state arrives.
WITH state_transitions as (
SELECT
id
previous_state, current_state,
LEAD(current_state) OVER ( PARTITION BY id ORDER BY timestamp) AS end_state,
timestamp as curr_time,
LEAD(timestamp) OVER ( PARTITION BY id ORDER BY timestamp) AS end_time
FROM
state_transition_table
SELECT
previous_state,
current_state,
end_state,
curr_time,
end_time
FROM state_transitions
WHERE previous_state=1 and current_state=2
ORDER BY curr_time
This query would incorrectly give the second output row end_state==5, which is not what I am looking for.
How can one search a table for the next row that matches my target condition, eg end_state=3 OR end_state=4?
This requires a recursive query that checks each row against siblings. This query should account for more than three rows. I assumed ORACLE for the seed data, may need to adapt your syntax to your database engine. I tried to document the query as best as I thought it was needed.
WITH /*SEED DATA*/
state_transition_table(prev_state, current_state, time_stamp, id) as (
SELECT 1 , 2 , 4.5 , 1 --FROM DUAL
UNION ALL SELECT 2 , 3 , 5.2 , 1 --FROM DUAL
UNION ALL SELECT 3 , 1 , 5.4 , 1 --FROM DUAL
UNION ALL SELECT 1 , 2 , 10.3 , 1 --FROM DUAL
UNION ALL SELECT 2 , 5 , 10.4 , 1 --FROM DUAL
UNION ALL SELECT 5 , 4 , 10.8 , 1 --FROM DUAL
UNION ALL SELECT 4 , 1 , 11.0 , 1 --FROM DUAL
UNION ALL SELECT 1 , 2 , 12.3 , 1 --FROM DUAL
UNION ALL SELECT 2 , 3 , 13.5 , 1 --FROM DUAL
UNION ALL SELECT 3 , 1 , 13.6 , 1 --FROM DUAL
)
/*THE END STATES YOU ARE LOOKING FOR*/
, end_states (a_state) as (
select 3 --FROM DUAL
union all select 4 --FROM DUAL
)
/*ORDER THE STEPS TO USE THE order_id COLUMN TO EVALUATE THE NEXT NODE*/
, ordered_states as (
SELECT row_number() OVER (ORDER BY time_stamp) order_id
, prev_state
, current_state
, id
, time_stamp
FROM state_transition_table
)
/*RECURSIVE QUERY WITH ANSI SYNTAX*/
, recursive (
root_order_id
, order_id
, time_stamp
, prev_state
, current_state
--, id
, steps
)
as (
SELECT order_id root_order_id /*THE order_id OF EACH ROOT ROW*/
, order_id
, time_stamp
, prev_state
, current_state
, CAST(order_id as char(100)) as steps /*INITIAL VALIDATION PATH*/
FROM ordered_states
WHERE prev_state = 1 AND current_state = 2 /*INITIAL CONDITION*/
UNION ALL
SELECT prev.root_order_id
, this.order_id
, this.time_stamp
, prev.prev_state
, this.current_state
, CAST(CONCAT(CONCAT(RTRIM(LTRIM(prev.steps)), ', '), RTRIM(LTRIM(CAST(this.order_id as char(3))))) as char(100)) as steps
FROM recursive prev /*ANSI PSEUDO TABLE*/
, ordered_states this /*THE SIBLING ROW TO CHECK*/
WHERE prev.order_id = this.order_id - 1 /*ROW TO PREVIOUS ROW JOIN*/
and prev.current_state not in (select a_state from end_states) /*THE PREVIOUS ROW STATE IS NOT AN END STATE */
)
select init_state.prev_state
, init_state.current_state as mid_state /*this name is better, I think*/
, end_state.current_state
, init_state.time_stamp as initial_time /*initial_time is better, I think*/
, end_state.time_stamp as end_time /*end_time is better, I think*/
, recursive.steps as validation_path_by_order_id
from recursive
inner join ordered_states init_state
on init_state.order_id = recursive.root_order_id
inner join ordered_states end_state
on end_state.order_id = recursive.order_id
where recursive.current_state in (select a_state from end_states)
One final note. The resulting columns are only accounting for 3 rows (prev_state, mid_state and current_state). As I said above, there are cases where you can have a path from (1) to (2) to (3 or 4) with more than three rows, lets say 1 to 2 to 5 to 2 to 3, thus the mid_state is really just one state in the middle.
Final-final note: Your desired results table was wrong, but you corrected it. 👍

Vertica dynamic pivot/transform

I have a table in vertica :
id Timestamp Mask1 Mask2
-------------------------------------------
1 11:30 50 100
1 11:35 52 101
2 12:00 53 102
3 09:00 50 100
3 22:10 52 105
. . . .
. . . .
Which I want to transform into :
id rows 09:00 11:30 11:35 12:00 22:10 .......
--------------------------------------------------------------
1 Mask1 Null 50 52 Null Null .......
Mask2 Null 100 101 Null Null .......
2 Mask1 Null Null Null 53 Null .......
Mask2 Null Null Null 102 Null .......
3 Mask1 50 Null Null Null 52 .......
Mask2 100 Null Null Null 105 .......
The dots (...) indicate that I have many records.
Timestamp is for a whole day and is of format hours:minutes:seconds starting from 00:00:00 to 24:00:00 for a day (I have just used hours:minutes for the question).
I have defined just two extra columns Mask1 and Mask2. I have about 200 Mask columns to work with.
I have shown 5 records but in real I have about a million record.
What I have tried so far:
Dumping each records based on id in a csv file.
Applying transpose in python pandas.
Joining the transposed tables.
The possible generic solution may be pivoting in vertica (or UDTF), but I am fairly new to this database.
I am struggling with this logic for couple of days. Can anyone please help me. Thanks a lot.
Below is the solution as I would code it for just the time values that you have in your data examples.
If you really want to be able to display all 86400 of '00:00:00' through '23:59:59', though, you won't be able to. Vertica's maximum number of columns is 1600.
You could, however, play with the Vertica function TIME_SLICE(timestamp::TIMESTAMP,1,'MINUTE')::TIME
(TIME_SLICE takes a timestamp as input and returns a timestamp, so you have to cast (::) back and forth), to reduce the number of rows to 1440 ...
In any case, I would start with SELECT DISTINCT timestamp FROM input ORDER BY 1;, and then, in the final query, would generate one line per found timestamp (hoping they won't be more than 1598....), like the ones actually used for your data, into your query:
, SUM(CASE timestamp WHEN '09:00' THEN val END) AS "09:00"
, SUM(CASE timestamp WHEN '11:30' THEN val END) AS "11:30"
, SUM(CASE timestamp WHEN '11:35' THEN val END) AS "11:35"
, SUM(CASE timestamp WHEN '12:00' THEN val END) AS "12:00"
, SUM(CASE timestamp WHEN '22:10' THEN val END) AS "22:10"
SQL in general has no variable number of output columns from any given query. If the number of final columns varies depending on the data, you will have to generate your final query from the data, and then run it.
Welcome to SQL and relational databases ..
Here's the complete script for your data. I pivot vertically first, along the "Mask-n" column names, and then I re-pivot horizontally, along the timestamps.
\pset null Null
-- ^ this is a vsql command to display nulls with the "Null" string
WITH
-- your input, not in final query
input(id,Timestamp,Mask1,Mask2) AS (
SELECT 1 , TIME '11:30' , 50 , 100
UNION ALL SELECT 1 , TIME '11:35' , 52 , 101
UNION ALL SELECT 2 , TIME '12:00' , 53 , 102
UNION ALL SELECT 3 , TIME '09:00' , 50 , 100
UNION ALL SELECT 3 , TIME '22:10' , 52 , 105
)
,
-- real WITH clause starts here
-- need an index for your 200 masks
i(i) AS (
SELECT MICROSECOND(ts) FROM (
SELECT TIMESTAMPADD(MICROSECOND, 1,TIMESTAMP '2000-01-01') AS tm
UNION ALL SELECT TIMESTAMPADD(MICROSECOND,200,TIMESTAMP '2000-01-01') AS tm
)x
TIMESERIES ts AS '1 MICROSECOND' OVER(ORDER BY tm)
)
,
-- verticalised masks
vertical AS (
SELECT
id
, i
, CASE i
WHEN 1 THEN 'Mask001'
WHEN 2 THEN 'Mask002'
WHEN 200 THEN 'Mask200'
END AS rows
, timestamp
, CASE i
WHEN 1 THEN Mask1
WHEN 2 THEN Mask2
WHEN 200 THEN 0 -- no mask200 present
END AS val
FROM input CROSS JOIN i
WHERE i <=2 -- only 2 masks present currently
)
-- test the vertical CTE ...
-- SELECT * FROM vertical order by id,rows,timestamp;
-- out id | i | rows | timestamp | val
-- out ----+---+---------+-----------+-----
-- out 1 | 1 | Mask001 | 11:30:00 | 50
-- out 1 | 1 | Mask001 | 11:35:00 | 52
-- out 1 | 2 | Mask002 | 11:30:00 | 100
-- out 1 | 2 | Mask002 | 11:35:00 | 101
-- out 2 | 1 | Mask001 | 12:00:00 | 53
-- out 2 | 2 | Mask002 | 12:00:00 | 102
-- out 3 | 1 | Mask001 | 09:00:00 | 50
-- out 3 | 1 | Mask001 | 22:10:00 | 52
-- out 3 | 2 | Mask002 | 09:00:00 | 100
-- out 3 | 2 | Mask002 | 22:10:00 | 105
SELECT
id
, rows
, SUM(CASE timestamp WHEN '09:00' THEN val END) AS "09:00"
, SUM(CASE timestamp WHEN '11:30' THEN val END) AS "11:30"
, SUM(CASE timestamp WHEN '11:35' THEN val END) AS "11:35"
, SUM(CASE timestamp WHEN '12:00' THEN val END) AS "12:00"
, SUM(CASE timestamp WHEN '22:10' THEN val END) AS "22:10"
FROM vertical
GROUP BY
id
, rows
ORDER BY
id
, rows
;
-- out Null display is "Null".
-- out id | rows | 09:00 | 11:30 | 11:35 | 12:00 | 22:10
-- out ----+---------+-------+-------+-------+-------+-------
-- out 1 | Mask001 | Null | 50 | 52 | Null | Null
-- out 1 | Mask002 | Null | 100 | 101 | Null | Null
-- out 2 | Mask001 | Null | Null | Null | 53 | Null
-- out 2 | Mask002 | Null | Null | Null | 102 | Null
-- out 3 | Mask001 | 50 | Null | Null | Null | 52
-- out 3 | Mask002 | 100 | Null | Null | Null | 105
-- out (6 rows)
-- out
-- out Time: First fetch (6 rows): 28.143 ms. All rows formatted: 28.205 ms
You can use union all to unpivot the data and then conditional aggregation:
select id, which,
max(case when timestamp >= '09:00' and timestamp < '09:30' then mask end) as "09:00",
max(case when timestamp >= '09:30' and timestamp < '10:00' then mask end) as "09:30",
max(case when timestamp >= '10:00' and timestamp < '10:30' then mask end) as "10:00",
. . .
from ((select id, timestamp,
'Mask1' as which, Mask1 as mask
from t
) union all
(select id, timestamp, 'Mask2' as which, Mask2 as mask
from t
)
) t
group by t.id, t.which;
Note: This includes the id on each row. I strongly recommend doing that, but you could use:
select (case when which = 'Mask1' then id end) as id
If you really wanted to.

Teradata sql query from grouping records using Intervals

In Teradata SQL how to assign same row numbers for the group of records created with in 8 seconds of time Interval.
Example:-
Customerid Customername Itembought dateandtime
(yyy-mm-dd hh:mm:ss)
100 ALex Basketball 2017-02-10 10:10:01
100 ALex Circketball 2017-02-10 10:10:06
100 ALex Baseball 2017-02-10 10:10:08
100 ALex volleyball 2017-02-10 10:11:01
100 ALex footbball 2017-02-10 10:11:05
100 ALex ringball 2017-02-10 10:11:08
100 Alex football 2017-02-10 10:12:10
My Expected result shoud have additional column with Row_number where it should assign the same number for all the purchases of the customer with in 8 seconds: Refer the below expected result
Customerid Customername Itembought dateandtime Row_number
(yyy-mm-dd hh:mm:ss)
100 ALex Basketball 2017-02-10 10:10:01 1
100 ALex Circketball 2017-02-10 10:10:06 1
100 ALex Baseball 2017-02-10 10:10:08 1
100 ALex volleyball 2017-02-10 10:11:01 2
100 ALex footbball 2017-02-10 10:11:05 2
100 ALex ringball 2017-02-10 10:11:08 2
100 Alex football 2017-02-10 10:12:10 3
This is one way to do it with a recursive cte. Reset the running total of difference from the previous row's timestamp when it gets > 8 to 0 and start a new group.
WITH ROWNUMS AS
(SELECT T.*
,ROW_NUMBER() OVER(PARTITION BY ID ORDER BY TM) AS RNUM
/*Replace DATEDIFF with Teradata specific function*/
,DATEDIFF(SECOND,COALESCE(MIN(TM) OVER(PARTITION BY ID
ORDER BY TM ROWS BETWEEN 1 PRECEDING AND CURRENT ROW), TM),TM) AS DIFF
FROM T --replace this with your tablename and add columns as required
)
,RECURSIVE CTE(ID,TM,DIFF,SUM_DIFF,RNUM,GRP) AS
(SELECT ID,
TM,
DIFF,
DIFF,
RNUM,
CAST(1 AS int)
FROM ROWNUMS
WHERE RNUM=1
UNION ALL
SELECT T.ID,
T.TM,
T.DIFF,
CASE WHEN C.SUM_DIFF+T.DIFF > 8 THEN 0 ELSE C.SUM_DIFF+T.DIFF END,
T.RNUM,
CAST(CASE WHEN C.SUM_DIFF+T.DIFF > 8 THEN T.RNUM ELSE C.GRP END AS int)
FROM CTE C
JOIN ROWNUMS T ON T.RNUM=C.RNUM+1 AND T.ID=C.ID
)
SELECT ID,
TM,
DENSE_RANK() OVER(PARTITION BY ID ORDER BY GRP) AS row_num
FROM CTE
Demo in SQL Server
I am going to interpret the problem differently from vkp. Any row within 8 seconds of another row should be in the same group. Such values can chain together, so the overall span can be more than 8 seconds.
The advantage of this method is that recursive CTEs are not needed, so it should be faster. (Of course, this is not an advantage if the OP does not agree with the definition.)
The basic idea is to look at the previous date/time value; if it is more than 8 seconds away, then add a flag. The cumulative sum of the flag is the row number you are looking for.
select t.*,
sum(case when prev_dt >= dateandtime - interval '8' second
then 0 else 1
end) over (partition by customerid order by dateandtime
) as row_number
from (select t.*,
max(dateandtime) over (partition by customerid order by dateandtime row between 1 preceding and 1 preceding) as prev_dt
from t
) t;
Using Teradata's PERIOD data type and the awesome td_normalize_overlap_meet:
Consider table test32:
SELECT * FROM test32
+----+----+------------------------+
| f1 | f2 | f3 |
+----+----+------------------------+
| 1 | 2 | 2017-05-11 03:59:00 PM |
| 1 | 3 | 2017-05-11 03:59:01 PM |
| 1 | 4 | 2017-05-11 03:58:58 PM |
| 1 | 5 | 2017-05-11 03:59:26 PM |
| 1 | 2 | 2017-05-11 03:59:28 PM |
| 1 | 2 | 2017-05-11 03:59:46 PM |
+----+----+------------------------+
The following will group your records:
WITH
normalizedCTE AS
(
SELECT *
FROM TABLE
(
td_normalize_overlap_meet(NEW VARIANT_TYPE(periodCTE.f1), periodCTE.fper)
RETURNS (f1 integer, fper PERIOD(TIMESTAMP(0)), recordCount integer)
HASH BY f1
LOCAL ORDER BY f1, fper
) as output(f1, fper, recordcount)
),
periodCTE AS
(
SELECT f1, f2, f3, PERIOD(f3, f3 + INTERVAL '9' SECOND) as fper FROM test32
)
SELECT t2.f1, t2.f2, t2.f3, t1.fper, DENSE_RANK() OVER (PARTITION BY t2.f1 ORDER BY t1.fper) as fgroup
FROM normalizedCTE t1
INNER JOIN periodCTE t2 ON
t1.fper P_INTERSECT t2.fper IS NOT NULL
Results:
+----+----+------------------------+-------------+
| f1 | f2 | f3 | fgroup |
+----+----+------------------------+-------------+
| 1 | 2 | 2017-05-11 03:59:00 PM | 1 |
| 1 | 3 | 2017-05-11 03:59:01 PM | 1 |
| 1 | 4 | 2017-05-11 03:58:58 PM | 1 |
| 1 | 5 | 2017-05-11 03:59:26 PM | 2 |
| 1 | 2 | 2017-05-11 03:59:28 PM | 2 |
| 1 | 2 | 2017-05-11 03:59:46 PM | 3 |
+----+----+------------------------+-------------+
A Period in Teradata is a special data type that holds a date or datetime range. The first parameter is the start of the range and the second is the ending time (up to, but not including which is why it's "+ 9 seconds"). The result is that we get a 8 second time "Period" where each record might "intersect" with another record.
We then use td_normalize_overlap_meet to merge records that intersect, sharing the f1 field's value as the key. In your case that would be customerid. The result is three records for this one customer since we have three groups that "overlap" or "meet" each other's time periods.
We then join the td_normalize_overlap_meet output with the output from when we determined the periods. We use the P_INTERSECT function to see which periods from the normalized CTE INTERSECT with the periods from the initial Period CTE. From the result of that P_INTERSECT join we grab the values we need from each CTE.
Lastly, Dense_Rank() gives us a rank based on the normalized period for each group.