Vertica dynamic pivot/transform - sql

I have a table in vertica :
id Timestamp Mask1 Mask2
-------------------------------------------
1 11:30 50 100
1 11:35 52 101
2 12:00 53 102
3 09:00 50 100
3 22:10 52 105
. . . .
. . . .
Which I want to transform into :
id rows 09:00 11:30 11:35 12:00 22:10 .......
--------------------------------------------------------------
1 Mask1 Null 50 52 Null Null .......
Mask2 Null 100 101 Null Null .......
2 Mask1 Null Null Null 53 Null .......
Mask2 Null Null Null 102 Null .......
3 Mask1 50 Null Null Null 52 .......
Mask2 100 Null Null Null 105 .......
The dots (...) indicate that I have many records.
Timestamp is for a whole day and is of format hours:minutes:seconds starting from 00:00:00 to 24:00:00 for a day (I have just used hours:minutes for the question).
I have defined just two extra columns Mask1 and Mask2. I have about 200 Mask columns to work with.
I have shown 5 records but in real I have about a million record.
What I have tried so far:
Dumping each records based on id in a csv file.
Applying transpose in python pandas.
Joining the transposed tables.
The possible generic solution may be pivoting in vertica (or UDTF), but I am fairly new to this database.
I am struggling with this logic for couple of days. Can anyone please help me. Thanks a lot.

Below is the solution as I would code it for just the time values that you have in your data examples.
If you really want to be able to display all 86400 of '00:00:00' through '23:59:59', though, you won't be able to. Vertica's maximum number of columns is 1600.
You could, however, play with the Vertica function TIME_SLICE(timestamp::TIMESTAMP,1,'MINUTE')::TIME
(TIME_SLICE takes a timestamp as input and returns a timestamp, so you have to cast (::) back and forth), to reduce the number of rows to 1440 ...
In any case, I would start with SELECT DISTINCT timestamp FROM input ORDER BY 1;, and then, in the final query, would generate one line per found timestamp (hoping they won't be more than 1598....), like the ones actually used for your data, into your query:
, SUM(CASE timestamp WHEN '09:00' THEN val END) AS "09:00"
, SUM(CASE timestamp WHEN '11:30' THEN val END) AS "11:30"
, SUM(CASE timestamp WHEN '11:35' THEN val END) AS "11:35"
, SUM(CASE timestamp WHEN '12:00' THEN val END) AS "12:00"
, SUM(CASE timestamp WHEN '22:10' THEN val END) AS "22:10"
SQL in general has no variable number of output columns from any given query. If the number of final columns varies depending on the data, you will have to generate your final query from the data, and then run it.
Welcome to SQL and relational databases ..
Here's the complete script for your data. I pivot vertically first, along the "Mask-n" column names, and then I re-pivot horizontally, along the timestamps.
\pset null Null
-- ^ this is a vsql command to display nulls with the "Null" string
WITH
-- your input, not in final query
input(id,Timestamp,Mask1,Mask2) AS (
SELECT 1 , TIME '11:30' , 50 , 100
UNION ALL SELECT 1 , TIME '11:35' , 52 , 101
UNION ALL SELECT 2 , TIME '12:00' , 53 , 102
UNION ALL SELECT 3 , TIME '09:00' , 50 , 100
UNION ALL SELECT 3 , TIME '22:10' , 52 , 105
)
,
-- real WITH clause starts here
-- need an index for your 200 masks
i(i) AS (
SELECT MICROSECOND(ts) FROM (
SELECT TIMESTAMPADD(MICROSECOND, 1,TIMESTAMP '2000-01-01') AS tm
UNION ALL SELECT TIMESTAMPADD(MICROSECOND,200,TIMESTAMP '2000-01-01') AS tm
)x
TIMESERIES ts AS '1 MICROSECOND' OVER(ORDER BY tm)
)
,
-- verticalised masks
vertical AS (
SELECT
id
, i
, CASE i
WHEN 1 THEN 'Mask001'
WHEN 2 THEN 'Mask002'
WHEN 200 THEN 'Mask200'
END AS rows
, timestamp
, CASE i
WHEN 1 THEN Mask1
WHEN 2 THEN Mask2
WHEN 200 THEN 0 -- no mask200 present
END AS val
FROM input CROSS JOIN i
WHERE i <=2 -- only 2 masks present currently
)
-- test the vertical CTE ...
-- SELECT * FROM vertical order by id,rows,timestamp;
-- out id | i | rows | timestamp | val
-- out ----+---+---------+-----------+-----
-- out 1 | 1 | Mask001 | 11:30:00 | 50
-- out 1 | 1 | Mask001 | 11:35:00 | 52
-- out 1 | 2 | Mask002 | 11:30:00 | 100
-- out 1 | 2 | Mask002 | 11:35:00 | 101
-- out 2 | 1 | Mask001 | 12:00:00 | 53
-- out 2 | 2 | Mask002 | 12:00:00 | 102
-- out 3 | 1 | Mask001 | 09:00:00 | 50
-- out 3 | 1 | Mask001 | 22:10:00 | 52
-- out 3 | 2 | Mask002 | 09:00:00 | 100
-- out 3 | 2 | Mask002 | 22:10:00 | 105
SELECT
id
, rows
, SUM(CASE timestamp WHEN '09:00' THEN val END) AS "09:00"
, SUM(CASE timestamp WHEN '11:30' THEN val END) AS "11:30"
, SUM(CASE timestamp WHEN '11:35' THEN val END) AS "11:35"
, SUM(CASE timestamp WHEN '12:00' THEN val END) AS "12:00"
, SUM(CASE timestamp WHEN '22:10' THEN val END) AS "22:10"
FROM vertical
GROUP BY
id
, rows
ORDER BY
id
, rows
;
-- out Null display is "Null".
-- out id | rows | 09:00 | 11:30 | 11:35 | 12:00 | 22:10
-- out ----+---------+-------+-------+-------+-------+-------
-- out 1 | Mask001 | Null | 50 | 52 | Null | Null
-- out 1 | Mask002 | Null | 100 | 101 | Null | Null
-- out 2 | Mask001 | Null | Null | Null | 53 | Null
-- out 2 | Mask002 | Null | Null | Null | 102 | Null
-- out 3 | Mask001 | 50 | Null | Null | Null | 52
-- out 3 | Mask002 | 100 | Null | Null | Null | 105
-- out (6 rows)
-- out
-- out Time: First fetch (6 rows): 28.143 ms. All rows formatted: 28.205 ms

You can use union all to unpivot the data and then conditional aggregation:
select id, which,
max(case when timestamp >= '09:00' and timestamp < '09:30' then mask end) as "09:00",
max(case when timestamp >= '09:30' and timestamp < '10:00' then mask end) as "09:30",
max(case when timestamp >= '10:00' and timestamp < '10:30' then mask end) as "10:00",
. . .
from ((select id, timestamp,
'Mask1' as which, Mask1 as mask
from t
) union all
(select id, timestamp, 'Mask2' as which, Mask2 as mask
from t
)
) t
group by t.id, t.which;
Note: This includes the id on each row. I strongly recommend doing that, but you could use:
select (case when which = 'Mask1' then id end) as id
If you really wanted to.

Related

Time difference between rows based on condition

Not really an expert with SQL and im having problems figuring out how to do this one.
Got a table like this one:
ID
Message
TimeStamp
User
1
Hello
2022-08-01 10:00:00
A
1
How are you?
2022-08-01 10:00:05
A
1
Hello there
2022-08-01 10:00:10
B
1
I am okay
2022-08-01 10:00:12
B
1
Good to know
2022-08-01 10:00:15
A
1
Bye
2022-08-01 10:00:25
B
2
Hello
2022-08-01 10:02:50
A
2
Hi
2022-08-01 10:03:50
B
I need to calculate the time difference each time there is a response from the B user after a message from A.
Expected result would be like this
ID
Difference
1
5
1
10
2
60
Trying to use Lead function to obtain the next desired timestamp but im not getting the expected result
Any tips or advice?
Thanks
Even if it is already answered - it's a nice use case for Vertica's MATCH() clause. Looking for a pattern consisting of sender = 'A' followed by sender = 'B'.
You get a pattern id, and then you can group by other stuff plus the pattern id to get max timestamp and min timestamp.
Also note that I renamed both "user" and "timestamp", as they are reserved words...
-- your input, don't use in final query
indata(ID,Message,ts,Usr) AS (
SELECT 1,'Hello' ,TIMESTAMP '2022-08-01 10:00:00','A'
UNION ALL SELECT 1,'How are you?',TIMESTAMP '2022-08-01 10:00:05','A'
UNION ALL SELECT 1,'Hello there' ,TIMESTAMP '2022-08-01 10:00:10','B'
UNION ALL SELECT 1,'I am okay' ,TIMESTAMP '2022-08-01 10:00:12','B'
UNION ALL SELECT 1,'Good to know',TIMESTAMP '2022-08-01 10:00:15','A'
UNION ALL SELECT 1,'Bye' ,TIMESTAMP '2022-08-01 10:00:25','B'
UNION ALL SELECT 2,'Hello' ,TIMESTAMP '2022-08-01 10:02:50','A'
UNION ALL SELECT 2,'Hi' ,TIMESTAMP '2022-08-01 10:03:50','B'
)
-- end of input, real query starts here , replace following comma with "WITH"
,
w_match_clause AS (
SELECT
*
, event_name()
, pattern_id()
, match_id()
FROM indata
MATCH (
PARTITION BY id ORDER BY ts
DEFINE
sentbya AS usr='A'
, sentbyb AS usr='B'
PATTERN
p AS (sentbya sentbyb)
)
-- ctl SELECT * FROM w_match_clause;
-- ctl ID | Message | ts | Usr | event_name | pattern_id | match_id
-- ctl ----+--------------+---------------------+-----+------------+------------+----------
-- ctl 1 | How are you? | 2022-08-01 10:00:05 | A | sentbya | 1 | 1
-- ctl 1 | Hello there | 2022-08-01 10:00:10 | B | sentbyb | 1 | 2
-- ctl 1 | Good to know | 2022-08-01 10:00:15 | A | sentbya | 2 | 1
-- ctl 1 | Bye | 2022-08-01 10:00:25 | B | sentbyb | 2 | 2
-- ctl 2 | Hello | 2022-08-01 10:02:50 | A | sentbya | 1 | 1
-- ctl 2 | Hi | 2022-08-01 10:03:50 | B | sentbyb | 1 | 2
)
SELECT
id
, MAX(ts) - MIN(ts) AS difference
FROM w_match_clause
GROUP BY
id
, pattern_id
ORDER BY
id;
-- out id | difference
-- out ----+------------
-- out 1 | 00:00:05
-- out 1 | 00:00:10
-- out 2 | 00:01
With mySQL 8.0:
WITH cte AS (
SELECT id, user, timestamp, ( LEAD(user) OVER (ORDER BY timestamp) ) AS to_user,
TIME_TO_SEC(TIMEDIFF(LEAD(timestamp) OVER (ORDER BY timestamp), timestamp)) AS time_diff
FROM msg_tab
)
SELECT id, time_diff
FROM cte
WHERE user='A' AND to_user IN ('B', NULL)

Calculate data in Pivot

I have the following SQL table with 4 columns.
Table Name: tblTimeTransaction
Columns: EmployeeNumber, TransactionDate, CodeType, TimeShowninSeconds
CodeType has values : REG, OT1, OT2, OT3 respectively
I want it to show like this using pivot using 15 days incrementals starting from Jan 1 2020 onwards:
Employee Number | Effective Date | REG | OT1 | OT2 | OT3
E12345 | Between 10-1 till 10-15 | 200 | 100 | 50 | 45
E15000 | Between 10-1 till 10-15 | 400 | 600 | 903 | 49
E12345 | Between 10-15 till 10-31 | 200 | 100 | 50 | 45
E15000 | Between 10-15 till 10-31 | 400 | 600 | 903 | 49
E12346 | Between 11-1 till 11-15 | 4200 | 100 | 50 | 45
E15660 | Between 11-1 till 11-15 | 1200 | 600 | 6903 | 49
My SQL Code so far:
SELECT
Employee Number,
[TransactionDate] as [Effective Date],
[REG],
[OT1],
[OT2],
[OT3]
FROM
( SELECT Employee Number, TransactionDate, CodeType, TimeInSeconds
FROM [tblTimetransaction]
) ps
PIVOT
( SUM (TimeInSeconds)
FOR CodeType IN ( [REG], [OT1], [OT2], [OT3])
) AS pvt
where TransactionDate between '2020-01-01' and '2020-12-31'
If I follow you correctly, you can truncate the effective_date to either the 1st of 15th of the month depending on their day of the month, then use conditional aggregation to compute the total time_in_seconds for each code_type:
select employee_number,
datefromparts(year(effective_date), month(effective_date), case when day(effective_date) < 15 then 1 else 15 end) as dt,
sum(case when code_type = 'REG' then time_in_seconds else 0 end) as reg,
sum(case when code_type = 'OT1' then time_in_seconds else 0 end) as ot1,
sum(case when code_type = 'OT2' then time_in_seconds else 0 end) as ot2,
sum(case when code_type = 'OT3' then time_in_seconds else 0 end) as ot3
from tblTimetransaction
where effective_date >= '20200101' and effective_date < '20210101'
group by employee_number,
datefromparts(year(effective_date), month(effective_date), case when day(effective_date) < 15 then 1 else 15 end)

Teradata sql query from grouping records using Intervals

In Teradata SQL how to assign same row numbers for the group of records created with in 8 seconds of time Interval.
Example:-
Customerid Customername Itembought dateandtime
(yyy-mm-dd hh:mm:ss)
100 ALex Basketball 2017-02-10 10:10:01
100 ALex Circketball 2017-02-10 10:10:06
100 ALex Baseball 2017-02-10 10:10:08
100 ALex volleyball 2017-02-10 10:11:01
100 ALex footbball 2017-02-10 10:11:05
100 ALex ringball 2017-02-10 10:11:08
100 Alex football 2017-02-10 10:12:10
My Expected result shoud have additional column with Row_number where it should assign the same number for all the purchases of the customer with in 8 seconds: Refer the below expected result
Customerid Customername Itembought dateandtime Row_number
(yyy-mm-dd hh:mm:ss)
100 ALex Basketball 2017-02-10 10:10:01 1
100 ALex Circketball 2017-02-10 10:10:06 1
100 ALex Baseball 2017-02-10 10:10:08 1
100 ALex volleyball 2017-02-10 10:11:01 2
100 ALex footbball 2017-02-10 10:11:05 2
100 ALex ringball 2017-02-10 10:11:08 2
100 Alex football 2017-02-10 10:12:10 3
This is one way to do it with a recursive cte. Reset the running total of difference from the previous row's timestamp when it gets > 8 to 0 and start a new group.
WITH ROWNUMS AS
(SELECT T.*
,ROW_NUMBER() OVER(PARTITION BY ID ORDER BY TM) AS RNUM
/*Replace DATEDIFF with Teradata specific function*/
,DATEDIFF(SECOND,COALESCE(MIN(TM) OVER(PARTITION BY ID
ORDER BY TM ROWS BETWEEN 1 PRECEDING AND CURRENT ROW), TM),TM) AS DIFF
FROM T --replace this with your tablename and add columns as required
)
,RECURSIVE CTE(ID,TM,DIFF,SUM_DIFF,RNUM,GRP) AS
(SELECT ID,
TM,
DIFF,
DIFF,
RNUM,
CAST(1 AS int)
FROM ROWNUMS
WHERE RNUM=1
UNION ALL
SELECT T.ID,
T.TM,
T.DIFF,
CASE WHEN C.SUM_DIFF+T.DIFF > 8 THEN 0 ELSE C.SUM_DIFF+T.DIFF END,
T.RNUM,
CAST(CASE WHEN C.SUM_DIFF+T.DIFF > 8 THEN T.RNUM ELSE C.GRP END AS int)
FROM CTE C
JOIN ROWNUMS T ON T.RNUM=C.RNUM+1 AND T.ID=C.ID
)
SELECT ID,
TM,
DENSE_RANK() OVER(PARTITION BY ID ORDER BY GRP) AS row_num
FROM CTE
Demo in SQL Server
I am going to interpret the problem differently from vkp. Any row within 8 seconds of another row should be in the same group. Such values can chain together, so the overall span can be more than 8 seconds.
The advantage of this method is that recursive CTEs are not needed, so it should be faster. (Of course, this is not an advantage if the OP does not agree with the definition.)
The basic idea is to look at the previous date/time value; if it is more than 8 seconds away, then add a flag. The cumulative sum of the flag is the row number you are looking for.
select t.*,
sum(case when prev_dt >= dateandtime - interval '8' second
then 0 else 1
end) over (partition by customerid order by dateandtime
) as row_number
from (select t.*,
max(dateandtime) over (partition by customerid order by dateandtime row between 1 preceding and 1 preceding) as prev_dt
from t
) t;
Using Teradata's PERIOD data type and the awesome td_normalize_overlap_meet:
Consider table test32:
SELECT * FROM test32
+----+----+------------------------+
| f1 | f2 | f3 |
+----+----+------------------------+
| 1 | 2 | 2017-05-11 03:59:00 PM |
| 1 | 3 | 2017-05-11 03:59:01 PM |
| 1 | 4 | 2017-05-11 03:58:58 PM |
| 1 | 5 | 2017-05-11 03:59:26 PM |
| 1 | 2 | 2017-05-11 03:59:28 PM |
| 1 | 2 | 2017-05-11 03:59:46 PM |
+----+----+------------------------+
The following will group your records:
WITH
normalizedCTE AS
(
SELECT *
FROM TABLE
(
td_normalize_overlap_meet(NEW VARIANT_TYPE(periodCTE.f1), periodCTE.fper)
RETURNS (f1 integer, fper PERIOD(TIMESTAMP(0)), recordCount integer)
HASH BY f1
LOCAL ORDER BY f1, fper
) as output(f1, fper, recordcount)
),
periodCTE AS
(
SELECT f1, f2, f3, PERIOD(f3, f3 + INTERVAL '9' SECOND) as fper FROM test32
)
SELECT t2.f1, t2.f2, t2.f3, t1.fper, DENSE_RANK() OVER (PARTITION BY t2.f1 ORDER BY t1.fper) as fgroup
FROM normalizedCTE t1
INNER JOIN periodCTE t2 ON
t1.fper P_INTERSECT t2.fper IS NOT NULL
Results:
+----+----+------------------------+-------------+
| f1 | f2 | f3 | fgroup |
+----+----+------------------------+-------------+
| 1 | 2 | 2017-05-11 03:59:00 PM | 1 |
| 1 | 3 | 2017-05-11 03:59:01 PM | 1 |
| 1 | 4 | 2017-05-11 03:58:58 PM | 1 |
| 1 | 5 | 2017-05-11 03:59:26 PM | 2 |
| 1 | 2 | 2017-05-11 03:59:28 PM | 2 |
| 1 | 2 | 2017-05-11 03:59:46 PM | 3 |
+----+----+------------------------+-------------+
A Period in Teradata is a special data type that holds a date or datetime range. The first parameter is the start of the range and the second is the ending time (up to, but not including which is why it's "+ 9 seconds"). The result is that we get a 8 second time "Period" where each record might "intersect" with another record.
We then use td_normalize_overlap_meet to merge records that intersect, sharing the f1 field's value as the key. In your case that would be customerid. The result is three records for this one customer since we have three groups that "overlap" or "meet" each other's time periods.
We then join the td_normalize_overlap_meet output with the output from when we determined the periods. We use the P_INTERSECT function to see which periods from the normalized CTE INTERSECT with the periods from the initial Period CTE. From the result of that P_INTERSECT join we grab the values we need from each CTE.
Lastly, Dense_Rank() gives us a rank based on the normalized period for each group.

PostgreSQL - Detecting patterns in a series

Consider the following table:
id | date | status
1 | 2014-01-10 | 1
1 | 2014-02-10 | 1
1 | 2014-03-10 | 1
1 | 2014-04-10 | 1
1 | 2014-05-10 | 0
1 | 2014-06-10 | 0
------------------------
2 | 2014-01-10 | 1
2 | 2014-02-10 | 1
2 | 2014-03-10 | 0
2 | 2014-04-10 | 1
2 | 2014-05-10 | 0
2 | 2014-06-10 | 0
------------------------
3 | 2014-01-10 | 1
3 | 2014-02-10 | 0
3 | 2014-03-10 | 0
3 | 2014-04-10 | 1
3 | 2014-05-10 | 0
3 | 2014-06-10 | 0
------------------------
4 | 2014-01-10 | 0
4 | 2014-02-10 | 1
4 | 2014-03-10 | 1
4 | 2014-04-10 | 1
4 | 2014-05-10 | 0
4 | 2014-06-10 | 0
------------------------
5 | 2014-01-10 | 0
5 | 2014-02-10 | 1
5 | 2014-03-10 | 0
5 | 2014-04-10 | 1
5 | 2014-05-10 | 0
5 | 2014-06-10 | 0
------------------------
The Id field is the user id, the date field is when a certain checkpoint is due and the status indicates if the checkpoint is accomplished by its user.
I'm having a big trouble trying to detect users that skipped some checkpoint, like the users with ids 2, 3, 4 and 5. Actually I need a query that lists the id's that have a missing checkpoint in the middle or start of the series, returning only the ids.
I've tried hard to find a way of doing that just with queries, but I couldn't create one. I know that I could do it coding some script, but that project I'm working on requires that I do it just using SQL.
Anyone have a slightest idea on how to accomplish that ?
EDIT: As recommended by the mods here are more details and some things I unsuccessfully tried:
My most successful try was to count how many statuses were registered for each id with this query:
SELECT
id,
SUM(CASE WHEN status = 1 THEN 1 ELSE 0 END) AS check,
SUM(CASE WHEN status = 0 THEN 1 ELSE 0 END) AS non_check
FROM
example_table
GROUP BY
id
ORDER BY
id
Getting the following result:
id | check | non_check
1 | 4 | 2
2 | 3 | 3
3 | 2 | 4
4 | 3 | 3
5 | 2 | 4
With that result I could select each id entries limiting by it's check result doing a SUM on the status field, if the SUM result is equal with the check result then the checkpoint is contiguous, like in:
WITH tbl AS (
SELECT id, status, SUM(status) AS "sum"
FROM (
SELECT id, status FROM example_table WHERE id = 1 ORDER BY date LIMIT 4
) AS tbl2
GROUP BY
status,id
)
SELECT
id,"sum"
FROM
tbl
WHERE
status = 1
Getting the following result:
id | sum
1 | 4
As the sum result is equal to check on the first query, I can determine that the checkpoints are contiguous. But take the id 2 as an example this time, it's query is:
WITH tbl AS (
SELECT id, status, SUM(status) AS "sum"
FROM (
SELECT id, status FROM example_table WHERE id = 2 ORDER BY date LIMIT 3
) AS tbl2
GROUP BY
status,id
)
SELECT
id,"sum"
FROM
tbl
WHERE
status = 1
Notice that I changed the id on WHERE and the LIMIT values based on which id I'm working with and its check result on the first query, and I got the following result:
id | sum
2 | 2
As the sum field value for id 2 in that query differs from its check value, I can say it's not contiguous. That pattern can be repeated with every id.
As I said before, to work that problem out that way I would need to do it by code, but in that specific case I need it to be in SQL.
Also I found the following article:
postgres detect repeating patterns of zeros
In which the problem resembles mine, but he wants to detect repeating zeroes, it has enlighten me a bit, but not enough to solve my own problem.
Thanks in advance!
The pattern you're looking for is a missed checkpoint followed by an accomplished checkpoint. Join each checkpoint from a user with the next (by timestamp) checkpoint then look for status 0 joined to status 1.
Here is an example:
create table tab (id int,date date,status int);
insert into tab values(1 , '2014-01-10' , 1),(1 , '2014-02-10' , 1),(1 , '2014-03-10' , 1),(1 , '2014-04-10' , 1),(1 , '2014-05-10' , 0),(1 , '2014-06-10' , 0),(2 , '2014-01-10' , 1),(2 , '2014-02-10' , 1),(2 , '2014-03-10' , 0),(2 , '2014-04-10' , 1),(2 , '2014-05-10' , 0),(2 , '2014-06-10' , 0),(3 , '2014-01-10' , 1),(3 , '2014-02-10' , 0),(3 , '2014-03-10' , 0),(3 , '2014-04-10' , 1),(3 , '2014-05-10' , 0),(3 , '2014-06-10' , 0),(4 , '2014-01-10' , 0),(4 , '2014-02-10' , 1),(4 , '2014-03-10' , 1),(4 , '2014-04-10' , 1),(4 , '2014-05-10' , 0),(4 , '2014-06-10' , 0),(5 , '2014-01-10' , 0),(5 , '2014-02-10' , 1),(5 , '2014-03-10' , 0),(5 , '2014-04-10' , 1),(5 , '2014-05-10' , 0),(5 , '2014-06-10' , 0);
with tabwithrow as
(select *
, row_number() OVER(PARTITION by id order by date) rnum
from tab)
select *
from tabwithrow a
join tabwithrow b on b.rnum = a.rnum + 1
and a.id = b.id
and a.status = 0
and b.status = 1;

Group Date column based on hours

I have a table in sqlite database where I store data about call logs. As an example assume that my table looks like this
| Calls_count | Calls_duration | Time_slice | Time_stamp |
| 10 | 500 | 21 | 1399369269 |
| 2 | 88 | 22 | 1399383668 |
Here
Calls_count is calls made since last observations
Calls_duration is the duration of calls in ms since last observations
Time-slice represents a time portion of week. Every day is divided into 4 portions of 6 hours each such that
06:00-11:59 | 12:00-17:59 | 18:00- 23.59 | 24:00-05:59 |
Mon| 11 | 12 | 13 | 14 |
Tue| 21 | 22 | 23 | 24 |
Wed| 31 | 32 | 33 | 34 |
Thu| 41 | 42 | 43 | 44 |
Fri| 51 | 52 | 53 | 54 |
Sat| 61 | 62 | 63 | 64 |
Sun| 71 | 72 | 73 | 74 |
And the time_stamp is unix epoch when the observation was made/ record was inserted in the database
Now I want to create a query so that if I specify time_stamp for a start and the end of week, The result is 168 rows of data, giving me sum of calls grouped by hour such that I get 24 rows for each day of week. This is an hourly break down of calls in a week.
SUM_CALLS | Time_Slice | Hour_of_Week |
10 | 11 | 1 |
0 | 11 | 2 |
....
7 | 74 | 167 |
4 | 74 | 168 |
In the above example of intended result,
1st row is Monday 06:00-06:59
2nd row is Monday 07:00-07:59
Last row is Sunday 04:00-05:59
Since version 3.8.3 SQLite supports common table expressions
and this is a possible solution
WITH RECURSIVE
hours(x,y) AS (SELECT CAST(STRFTIME('%s',STRFTIME('%Y-%m-%d %H:00:00', '2014-05-05 00:00:00')) AS INTEGER),
CAST(STRFTIME('%s',STRFTIME('%Y-%m-%d %H:59:59', '2014-05-05 00:00:00')) AS INTEGER)
UNION ALL
SELECT x+3600,y+3600 FROM hours LIMIT 168)
SELECT
COALESCE(SUM(Calls_count),0) AS SUM_CALLS,
CASE CAST(STRFTIME('%w',x,'unixepoch') AS INTEGER)
WHEN 0 THEN 7 ELSE STRFTIME('%w',x,'unixepoch') END
||
CASE
WHEN STRFTIME('%H:%M:%S',x,'unixepoch') BETWEEN '06:00:00' AND '11:59:59' THEN 1
WHEN STRFTIME('%H:%M:%S',x,'unixepoch') BETWEEN '12:00:00' AND '17:59:59' THEN 2
WHEN STRFTIME('%H:%M:%S',x,'unixepoch') BETWEEN '18:00:00' AND '23:59:59' THEN 3
WHEN STRFTIME('%H:%M:%S',x,'unixepoch') BETWEEN '00:00:00' AND '05:59:59' THEN 4
END AS Time_Slice,
((x-(SELECT MIN(x) FROM hours))/3600)+1 AS Hour_of_Week
FROM hours LEFT JOIN call_logs
ON call_logs.time_stamp >= hours.x AND call_logs.time_stamp <= hours.y
GROUP BY Hour_of_Week
ORDER BY Hour_of_Week
;
This is tested with SQLite version 3.7.13 without cte:
DROP VIEW IF EXISTS digit;
CREATE TEMPORARY VIEW digit AS SELECT 0 AS d UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION
SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9
;
DROP VIEW IF EXISTS hours;
CREATE TEMPORARY VIEW hours AS SELECT STRFTIME('%s','2014-05-05 00:00:00') + s AS x,
STRFTIME('%s','2014-05-05 00:00:00') + s+3599 AS y
FROM (SELECT (a.d || b.d || c.d) * 3600 AS s FROM digit a, digit b, digit c LIMIT 168)
;
SELECT
COALESCE(SUM(Calls_count),0) AS SUM_CALLS,
CASE CAST(STRFTIME('%w',x,'unixepoch') AS INTEGER)
WHEN 0 THEN 7 ELSE STRFTIME('%w',x,'unixepoch') END
||
CASE
WHEN STRFTIME('%H:%M:%S',x,'unixepoch') BETWEEN '06:00:00' AND '11:59:59' THEN 1
WHEN STRFTIME('%H:%M:%S',x,'unixepoch') BETWEEN '12:00:00' AND '17:59:59' THEN 2
WHEN STRFTIME('%H:%M:%S',x,'unixepoch') BETWEEN '18:00:00' AND '23:59:59' THEN 3
WHEN STRFTIME('%H:%M:%S',x,'unixepoch') BETWEEN '00:00:00' AND '05:59:59' THEN 4
END AS Time_Slice,
((x-(SELECT MIN(x) FROM hours))/3600)+1 AS Hour_of_Week
FROM hours LEFT JOIN call_logs
ON call_logs.time_stamp >= hours.x AND call_logs.time_stamp <= hours.y
GROUP BY Hour_of_Week
ORDER BY Hour_of_Week
;