I'm trying to write a query that is sort of like a running total but not really. I want to get the previous weight (kg) and keep outputting that for each day until another weight (kg) is recorded then continue to output that until next weight recorded. Below is example of what I'm trying to accomplish (see KG column).
Current results:
ENCOUNTER_ID | KG | DATE_RECORDED | CALENDAR_DT
-----------------------------------------------
100 10 2019-01-01 2019-01-01
NULL NULL NULL 2019-01-02
100 12 2019-01-03 2019-01-03
NULL NULL NULL 2019-01-04
NULL NULL NULL 2019-01-05
NULL NULL NULL 2019-01-06
100 13 2019-01-07 2019-01-07
NULL NULL NULL 2019-01-08
Desired Results:
ENCOUNTER_ID | KG | DATE_RECORDED | CALENDAR_DT
-----------------------------------------------
100 10 2019-01-01 2019-01-01
NULL 10 NULL 2019-01-02
100 12 2019-01-03 2019-01-03
NULL 12 NULL 2019-01-04
NULL 12 NULL 2019-01-05
NULL 12 NULL 2019-01-06
100 13 2019-01-07 2019-01-07
NULL 13 NULL 2019-01-08
In standard SQL, you would use lag() with the ignore nulls option:
select t.*,
lag(kg ignore nulls) over (order by calendar_dt)
from t;
Not all databases support ignore nulls. But it is standard SQL and you haven't specified the database you are using.
A solution can be achieved by combining a CASE with a subquery that will fetch the first valid value ordered by data.
See the below example using T-SQL.
create table dbo.WeightLog
(
ENCOUNTER_ID int null,
KG int null,
DATE_RECORDED date null,
CALENDAR_DT date not null
)
GO
insert into dbo.WeightLog values
(100 , 10, '2019-01-01', '2019-01-01'),
(NULL, NULL, NULL, '2019-01-02'),
(100 , 12, '2019-01-03', '2019-01-03'),
(NULL, NULL, NULL, '2019-01-04'),
(NULL, NULL, NULL, '2019-01-05'),
(NULL, NULL, NULL, '2019-01-06'),
(100 , 13, '2019-01-07', '2019-01-07'),
(NULL, NULL, NULL, '2019-01-08')
GO
select
wl.ENCOUNTER_ID,
case when wl.KG is null
then (select top 1 x.KG from dbo.WeightLog x where x.CALENDAR_DT < wl.CALENDAR_DT
and x.KG is not null order by x.CALENDAR_DT desc)
else wl.KG end as [Kg],
wl.DATE_RECORDED,
wl.CALENDAR_DT
from dbo.WeightLog wl
Results in:
ENCOUNTER_ID Kg DATE_RECORDED CALENDAR_DT
------------ ----------- ------------- -----------
100 10 2019-01-01 2019-01-01
NULL 10 NULL 2019-01-02
100 12 2019-01-03 2019-01-03
NULL 12 NULL 2019-01-04
NULL 12 NULL 2019-01-05
NULL 12 NULL 2019-01-06
100 13 2019-01-07 2019-01-07
NULL 13 NULL 2019-01-08
Note: it doesn't explore the particular case where a first record is null.
Related
I'm querying a large data set to figure out if a bunch of campaign events (i.e. event 1,2,..) during different timepoints gives a result in user activity (active, inactive) during the following 3 days after each event (but not in the same day as the campaign event itself).
I'm merging two tables to do this, and they look like this merged:
| date | user | events | day_activity |
| 2020-01-01 | 1 | event1 | active |
| 2020-01-01 | 2 | event1 | inactive |
| 2020-01-02 | 1 | null | inactive |
| 2020-01-02 | 2 | null | active |
| 2020-01-03 | 1 | null | inactive |
| 2020-01-03 | 2 | null | active |
| 2020-01-04 | 1 | null | active |
| 2020-01-04 | 2 | null | active |
What I am trying to achieve is, for each user/date/event gang (= row) where an event occured, to add another column called 3_day_activity, containing the activity not on the event (= current row) day but the following 3 days only (giving a score of 1 per active day). An example for how the 1st day of this table would look after (I add * in the activity days counted in the added column for user 1, and # for the events counted in the column for user 2)):
| date | user | events | day_activity | 3_day_activity
| 2020-01-01 | 1 | event1 | active | 1
| 2020-01-01 | 2 | event1 | inactive | 3
| 2020-01-02 | 1 | null | inactive * (0)| null (bco no event)
| 2020-01-02 | 2 | null | active # (1) | null (bco no event)
| 2020-01-03 | 1 | null | inactive * (0)| null (bco no event)
| 2020-01-03 | 2 | null | active # (1) | null (bco no event)
| 2020-01-04 | 1 | null | active * (1) | null (bco no event)
| 2020-01-04 | 2 | null | active # (1) | null (bco no event)
I tried solving this with a window function. It runs, but I think I misunderstood some important idea on how to design it, because the result contains a ton of repetitions...
cm.date,
cm.user,
event,
day_activity,
COUNTIF(active_today = 'active') OVER 3d_later AS 3_day_activity
FROM `customer_message` cm
INNER JOIN `customer_day` ud
ON cm.user = ud.user
AND cm.date = ud.date
WHERE
cm.date > '2019-12-25'
WINDOW 3d_later AS (PARTITION BY user ORDER BY UNIX_DATE(cm.date) RANGE BETWEEN 1 FOLLOWING AND 3 FOLLOWING)
EDIT:
I was asked to supply an example of how this repetition might look. Here's what I see if I add an "ORDER BY 3_day_activity" clause at the end of the query:
Row date user day_activity 3_day_activity
1 2020-01-01 2 active 243
2 2020-01-01 2 active 243
3 2020-01-01 2 active 243
4 2020-01-01 2 active 243
5 2020-01-01 2 active 243
6 2020-01-01 2 active 243
7 2020-01-02 2 active 243
8 2020-01-02 2 active 243
EDIT2 :
This remains unsolved.. I have tried asking another question, as per the suggestion of one commenter, but I am locked from doing so even if the problem is not identical (I suppose due to the similarities to this one). I have tested grouping based on user and date, but I then it instead throws an error due to not aggregating in the 'COUNTIF' clause.
This is the attempt mentioned; SQL: Error demanding aggregation when counting, grouping and windowing
Below example is for BigQuery Standard SQL
#standardSQL
SELECT *, IF(events IS NULL, 0, COUNTIF(day_activity = 'active') OVER(three_day_activity_window)) AS three_day_activity
FROM `project.dataset.table`
WINDOW three_day_activity_window AS (
PARTITION BY user
ORDER BY UNIX_DATE(date)
RANGE BETWEEN 1 FOLLOWING AND 3 FOLLOWING
)
You can test, play with above using sample data from your question as in below example
#standardSQL
WITH `project.dataset.table` AS (
SELECT DATE '2020-01-01' date , 1 user, 'event1' events, 'active' day_activity UNION ALL
SELECT '2020-01-01', 2, 'event1', 'inactive' UNION ALL
SELECT '2020-01-02', 1, NULL, 'inactive' UNION ALL
SELECT '2020-01-02', 2, NULL, 'active' UNION ALL
SELECT '2020-01-03', 1, NULL, 'inactive' UNION ALL
SELECT '2020-01-03', 2, NULL, 'active' UNION ALL
SELECT '2020-01-04', 1, NULL, 'active' UNION ALL
SELECT '2020-01-04', 2, NULL, 'active'
)
SELECT *, IF(events IS NULL, 0, COUNTIF(day_activity = 'active') OVER(three_day_activity_window)) AS three_day_activity
FROM `project.dataset.table`
WINDOW three_day_activity_window AS (
PARTITION BY user
ORDER BY UNIX_DATE(date)
RANGE BETWEEN 1 FOLLOWING AND 3 FOLLOWING
)
ORDER BY date, user
with output
Row date user events day_activity three_day_activity
1 2020-01-01 1 event1 active 1
2 2020-01-01 2 event1 inactive 3
3 2020-01-02 1 null inactive 0
4 2020-01-02 2 null active 0
5 2020-01-03 1 null inactive 0
6 2020-01-03 2 null active 0
7 2020-01-04 1 null active 0
8 2020-01-04 2 null active 0
Update for - to avoid registering the same user as active multiple times in one day (and tallying those up to a huge sum)?
If you want to avoid counting all activity for user on same day - use below adjusted version (note extra entry in sample data to introduce user's multiple activity on same day)
#standardSQL
WITH `project.dataset.table` AS (
SELECT DATE '2020-01-01' DATE , 1 user, 'event1' events, 'active' day_activity UNION ALL
SELECT '2020-01-01', 2, 'event1', 'inactive' UNION ALL
SELECT '2020-01-02', 1, NULL, 'inactive' UNION ALL
SELECT '2020-01-02', 2, NULL, 'active' UNION ALL
SELECT '2020-01-03', 1, NULL, 'inactive' UNION ALL
SELECT '2020-01-03', 2, NULL, 'active' UNION ALL
SELECT '2020-01-04', 1, NULL, 'active' UNION ALL
SELECT '2020-01-04', 1, NULL, 'active' UNION ALL
SELECT '2020-01-04', 2, NULL, 'active'
)
SELECT *,
IF(events IS NULL, 0, COUNTIF(day_activity = 'active') OVER(three_day_activity_window)) AS three_day_activity
FROM (
SELECT date, user, MAX(events) events, MIN(day_activity) day_activity
FROM `project.dataset.table`
GROUP BY date, user
)
WINDOW three_day_activity_window AS (
PARTITION BY user
ORDER BY UNIX_DATE(date)
RANGE BETWEEN 1 FOLLOWING AND 3 FOLLOWING
)
ORDER BY date, user
You seem to be quite there. A range partition is the right way to go. BigQuery only supports integers in such frame, so we need to convert the date to a number; since you have dates with no time component, UNIX_DATE() comes to mind:
WINDOW 3d_later AS (
PARTITION BY user
ORDER BY UNIX_DATE(cm.date)
RANGE BETWEEN 1 FOLLOWING AND 3 FOLLOWING
)
I have an Audit table where we record changes to fields in our database. I have a query where I was able to get a subset of the data from the Audit regarding a few columns, their recorded change, and when, associated against the applicable ID's. Here is a sample of what the output looks like:
ID ada IsHD HDF DTStamp
-----------------------------------------------------
68 NULL 0 0 2020-04-28 21:12:21.287
68 NULL NULL NULL 2020-04-17 14:59:49.700
68 No/Unsure NULL NULL 2020-04-17 14:03:46.160
68 NULL 0 0 2020-04-17 13:49:49.720
102 NULL NULL NULL 2020-04-30 13:11:15.273
102 No/Unsure NULL NULL 2020-04-20 16:00:35.410
102 NULL 1 1 2020-04-20 15:59:55.750
105 No/Unsure 1 1 2020-04-17 12:06:10.833
105 NULL NULL NULL 2020-04-13 07:51:30.180
126 NULL NULL NULL 2020-05-01 17:59:24.460
126 NULL 0 0 2020-04-28 21:12:21.287
What I am trying to figure out is the most efficient means to "roll-up" the multiple rows of a given ID so that the newest Non-NULL value is kept, leaving only a single line for that ID.
That is, turn this:
68 NULL 0 0 2020-04-28 21:12:21.287
68 NULL NULL NULL 2020-04-17 14:59:49.700
68 No/Unsure NULL NULL 2020-04-17 14:03:46.160
68 NULL 0 0 2020-04-17 13:49:49.720
102 NULL NULL NULL 2020-04-30 13:11:15.273
102 No/Unsure NULL NULL 2020-04-20 16:00:35.410
102 NULL 1 1 2020-04-20 15:59:55.750
Into this:
68 No/Unsure 0 0 2020-04-28 21:12:21.287
102 No/Unsure 1 1 2020-04-30 13:11:15.273
...and so on down the list. It's almost like you were to push down on the top of the results and squeeze out all the NULLs, as it were.
Dumping the above results into a temp table #audit I then run the following query:
SELECT DISTINCT a.[ID]
, (SELECT TOP 1 [ADA]
FROM #audit
WHERE [ID] = a.[ID]
AND [ADA] IS NOT NULL
ORDER BY [DTStamp] DESC) AS 'ADA'
, (SELECT TOP 1 [IsHD]
FROM #audit
WHERE [ID] = a.[ID]
AND [IsHD] IS NOT NULL
ORDER BY [DTStamp] DESC) AS 'IsHD'
, (SELECT TOP 1 [HDF]
FROM #audit
WHERE [ID] = a.[ID]
AND [HDF] IS NOT NULL
ORDER BY [DTStamp] DESC) AS 'HDF'
, (SELECT Max([DTStamp])
FROM #audit
WHERE [ID] = a.[ID]) AS 'DTStamp'
FROM #audit a
ORDER BY [ID]
This is what I've come up with and it does work, but it feels very klunky and inefficient. Is there a better way to accomplish the end goal?
If you want one row per id, then use aggregation:
select id, max(ada), max(IsHD), max(HDF), max(DTStamp)
from #audit a
group by id;
This works for the data you have provided and seems to fit the rule that you want.
I understand that you want the "latest" non-null value per id for each column, using column DTStampfor ordering.
Your approach using multiple subqueries does what you want would. An alternative be to use multiple row_number()s and conditional aggregation. This might actually be more efficient, since it avoids multiple scans on the table.
select
id,
max(case when rn_ada = 1 then ada end) ada,
max(case when rn_isHd = 1 then isHd end) isHd,
max(case when rn_hdf = 1 then hdf end) hdf,
max(DTStamp) DTStamp
from (
select
a.*,
row_number() over(
partition by id
order by case when ada is not null then DTStamp end desc
) rn_ada,
row_number() over(
partition by id
order by case when isHd is not null then DTStamp end desc
) rn_isHd,
row_number() over(
partition by id
order by case when hdf is not null then DTStamp end desc
) rn_hdf
from #audit a
) t
group by id
order by id
Demo on DB Fiddle:
id | ada | isHd | hdf | DTStamp
--: | :-------- | ---: | --: | :----------------------
68 | No/Unsure | 0 | 0 | 2020-04-28 21:12:21.287
102 | No/Unsure | 1 | 1 | 2020-04-30 13:11:15.273
I'm using SQL Server, I have a table with 3 columns (timeseries) data, with date, hour beginning, AwardStatus.
The award status for the most part is randomly generated. There can be two options, Awarded or Not Awarded.
However, the business requirement is that we MUST print 'NotAwarded' for 3 consecutive rows if the status is NotAwarded, and 4 consecutive rows if the status is Awarded.
Goal: a new column ShouldBe details.
Once it meets the minimum requirements, then it checks that the current row's AwardStatus and continues to overwrite the logic.
Question: Is that possible in SQL without any kind of cursor/looping?
The picture in below as an example.
Here's an example:
AwardStatusMinimum 3
AwardStatusMaximum 4
Date Hour AwardStatus ShouldBe
--------------------------------------
1/1/2019 1 NotAwarded NotAwarded
1/1/2019 2 NotAwarded NotAwarded
1/1/2019 3 Awarded NotAwarded
1/1/2019 4 Awarded Awarded
1/1/2019 5 NotAwarded Awarded
1/1/2019 6 NotAwarded Awarded
1/1/2019 7 Awarded Awarded
1/1/2019 8 NotAwarded NotAwarded
1/1/2019 9 Awarded NotAwarded
1/1/2019 10 Awarded NotAwarded
Since recursion was mentioned.
Here's a solution that uses a recursive CTE.
Sample data:
CREATE TABLE Table1 (
[Date] DATETIME NOT NULL,
[Hour] INT NOT NULL,
[AwardStatus] VARCHAR(10)
);
INSERT INTO Table1
([Date], [Hour], [AwardStatus])
VALUES
('2019-01-01', 1, 'NotAwarded'),
('2019-01-01', 2, 'NotAwarded'),
('2019-01-01', 3, 'Awarded'),
('2019-01-01', 4, 'Awarded'),
('2019-01-01', 5, 'NotAwarded'),
('2019-01-01', 6, 'NotAwarded'),
('2019-01-01', 7, 'Awarded'),
('2019-01-01', 8, 'NotAwarded'),
('2019-01-01', 9, 'Awarded'),
('2019-01-01', 10, 'Awarded');
Query:
;with CTE_DATA AS
(
select *
, dense_rank()
over (order by cast([Date] as date)) as grp
, row_number()
over (partition by cast([Date] as date) order by [Hour]) as rn
from Table1
)
, RCTE_AWARDS as
(
select [Date], [Hour]
, AwardStatus
, grp
, rn
, 1 as Lvl
, AwardStatus AS CalcStatus
from CTE_DATA
where rn = 1
union all
select t.[Date], t.[Hour]
, t.AwardStatus
, t.grp
, t.rn
, case
when (c.lvl < 3)
or (c.lvl < 4 and c.CalcStatus = 'Awarded')
then c.lvl+1
else 1
end
, case
when (c.lvl = 3 and c.CalcStatus = 'NotAwarded')
or (c.lvl = 4)
then t.AwardStatus
else c.CalcStatus
end
from RCTE_AWARDS c
join CTE_DATA t
on t.grp = c.grp
and t.rn = c.rn + 1
)
select [Date], [Hour], AwardStatus
, CalcStatus AS NewAwardStatus
from RCTE_AWARDS
order by [Date], [Hour]
GO
Date | Hour | AwardStatus | NewAwardStatus
:---------------------- | ---: | :---------- | :-------------
2019-01-01 00:00:00.000 | 1 | NotAwarded | NotAwarded
2019-01-01 00:00:00.000 | 2 | NotAwarded | NotAwarded
2019-01-01 00:00:00.000 | 3 | Awarded | NotAwarded
2019-01-01 00:00:00.000 | 4 | Awarded | Awarded
2019-01-01 00:00:00.000 | 5 | NotAwarded | Awarded
2019-01-01 00:00:00.000 | 6 | NotAwarded | Awarded
2019-01-01 00:00:00.000 | 7 | Awarded | Awarded
2019-01-01 00:00:00.000 | 8 | NotAwarded | NotAwarded
2019-01-01 00:00:00.000 | 9 | Awarded | NotAwarded
2019-01-01 00:00:00.000 | 10 | Awarded | NotAwarded
A test on db<>fiddle here
This allows you to do it without using cursor.
declare #date date
declare #hour int
declare #CurrentStatus varchar(50)
set #CurrentStatus=''
while exists(select * from Awards where ShouldBe is null)
begin
select top 1 #date=[date], #hour=[hour] , #CurrentStatus=AwardStatus
from Awards
where [ShouldBe] is null
order by [date],[hour]
if(#CurrentStatus='Awarded')
begin
update top(4) Awards
set ShouldBe=#CurrentStatus
where [date]=#date and [hour]>=#hour
end
else
begin
update top(3) Awards
set ShouldBe=#CurrentStatus
where [date]=#date and [hour]>=#hour
end
end
I have a following table in Vertica:
Item_id event_date Price
A 2019-01-01 100
A 2019-01-04 200
B 2019-01-05 150
B 2019-01-06 250
B 2019-01-09 350
As you see, there are some missing dates between 2019-01-04 and 2019-01-01, and also 2019-01-09 - 2019-01-06.
What I need is to add for each item_id missing dates between the existing ones, and as the price cell will be NULL, fill it with the previous date Price.
So it will be like this:
Item_id event_date Price
A 2019-01-01 100
A 2019-01-02 100
A 2019-01-03 100
A 2019-01-04 200
B 2019-01-05 150
B 2019-01-06 250
B 2019-01-07 250
B 2019-01-08 250
B 2019-01-09 350
I tried to go with
SELECT Item_id, event_date
CASE Price WHEN 0 THEN NVL( LAG( CASE Price WHEN 0 THEN NULL ELSE Price END ) IGNORE NULLS OVER ( ORDER BY NULL ), 0 ) ELSE Price END AS Price_new
FROM item_price_table
from this article https://blog.jooq.org/2015/12/17/how-to-fill-sparse-data-with-the-previous-non-empty-value-in-sql/ , but it seems it works for SQL Server, but not for Vertica, as there are no IGNORE NULLS function...
Does anyone know how to deal with it?
Let me assume you have a calendar table. In Vertica, you can then use last_value(ignore nulls) to fill in the rest:
select c.event_date, i.item_id,
coalesce(ipt.price,
last_value(ipt.price ignore nulls) over (partition by i.item_id order by c.event_date)
) as price
from calendar c cross join
(select distinct item_id from item_price_table) i left join
item_price_table ipt
on i.item_it = ipt.item_id and c.date = ipt.event_date
I was waiting for that one....!
I just love Vertica's TIMESERIES clause !
It works on TIMESTAMPs, not DATEs, so I have to cast back and forth, but it's unbeatable.
See here:
WITH
input(item_id,event_dt,price) AS (
SELECT 'A',DATE '2019-01-01',100
UNION ALL SELECT 'A',DATE '2019-01-04',200
UNION ALL SELECT 'B',DATE '2019-01-05',150
UNION ALL SELECT 'B',DATE '2019-01-06',250
UNION ALL SELECT 'B',DATE '2019-01-09',350
)
SELECT
item_id
, event_dts::DATE AS event_dt
, TS_FIRST_VALUE(price) AS price
FROM input
TIMESERIES event_dts AS '1 DAY' OVER(PARTITION BY item_id ORDER BY event_dt::timestamp)
-- out item_id | event_dt | price
-- out ---------+------------+-------
-- out A | 2019-01-01 | 100
-- out A | 2019-01-02 | 100
-- out A | 2019-01-03 | 100
-- out A | 2019-01-04 | 200
-- out B | 2019-01-05 | 150
-- out B | 2019-01-06 | 250
-- out B | 2019-01-07 | 250
-- out B | 2019-01-08 | 250
-- out B | 2019-01-09 | 350
-- out (9 rows)
-- out
-- out Time: First fetch (9 rows): 68.057 ms. All rows formatted: 68.221 ms
;
Need explanations?
Happy playing ...
I don't have the code on the top of my head, but to add the missing dates you'll want to create a Calendar table and join to that. Then you can use the lag function to replace the null Price with the one above it. There's plenty of code if you search CTE to create calendar table.
I have a table in vertica :
id Timestamp Mask1 Mask2
-------------------------------------------
1 11:30 50 100
1 11:35 52 101
2 12:00 53 102
3 09:00 50 100
3 22:10 52 105
. . . .
. . . .
Which I want to transform into :
id rows 09:00 11:30 11:35 12:00 22:10 .......
--------------------------------------------------------------
1 Mask1 Null 50 52 Null Null .......
Mask2 Null 100 101 Null Null .......
2 Mask1 Null Null Null 53 Null .......
Mask2 Null Null Null 102 Null .......
3 Mask1 50 Null Null Null 52 .......
Mask2 100 Null Null Null 105 .......
The dots (...) indicate that I have many records.
Timestamp is for a whole day and is of format hours:minutes:seconds starting from 00:00:00 to 24:00:00 for a day (I have just used hours:minutes for the question).
I have defined just two extra columns Mask1 and Mask2. I have about 200 Mask columns to work with.
I have shown 5 records but in real I have about a million record.
What I have tried so far:
Dumping each records based on id in a csv file.
Applying transpose in python pandas.
Joining the transposed tables.
The possible generic solution may be pivoting in vertica (or UDTF), but I am fairly new to this database.
I am struggling with this logic for couple of days. Can anyone please help me. Thanks a lot.
Below is the solution as I would code it for just the time values that you have in your data examples.
If you really want to be able to display all 86400 of '00:00:00' through '23:59:59', though, you won't be able to. Vertica's maximum number of columns is 1600.
You could, however, play with the Vertica function TIME_SLICE(timestamp::TIMESTAMP,1,'MINUTE')::TIME
(TIME_SLICE takes a timestamp as input and returns a timestamp, so you have to cast (::) back and forth), to reduce the number of rows to 1440 ...
In any case, I would start with SELECT DISTINCT timestamp FROM input ORDER BY 1;, and then, in the final query, would generate one line per found timestamp (hoping they won't be more than 1598....), like the ones actually used for your data, into your query:
, SUM(CASE timestamp WHEN '09:00' THEN val END) AS "09:00"
, SUM(CASE timestamp WHEN '11:30' THEN val END) AS "11:30"
, SUM(CASE timestamp WHEN '11:35' THEN val END) AS "11:35"
, SUM(CASE timestamp WHEN '12:00' THEN val END) AS "12:00"
, SUM(CASE timestamp WHEN '22:10' THEN val END) AS "22:10"
SQL in general has no variable number of output columns from any given query. If the number of final columns varies depending on the data, you will have to generate your final query from the data, and then run it.
Welcome to SQL and relational databases ..
Here's the complete script for your data. I pivot vertically first, along the "Mask-n" column names, and then I re-pivot horizontally, along the timestamps.
\pset null Null
-- ^ this is a vsql command to display nulls with the "Null" string
WITH
-- your input, not in final query
input(id,Timestamp,Mask1,Mask2) AS (
SELECT 1 , TIME '11:30' , 50 , 100
UNION ALL SELECT 1 , TIME '11:35' , 52 , 101
UNION ALL SELECT 2 , TIME '12:00' , 53 , 102
UNION ALL SELECT 3 , TIME '09:00' , 50 , 100
UNION ALL SELECT 3 , TIME '22:10' , 52 , 105
)
,
-- real WITH clause starts here
-- need an index for your 200 masks
i(i) AS (
SELECT MICROSECOND(ts) FROM (
SELECT TIMESTAMPADD(MICROSECOND, 1,TIMESTAMP '2000-01-01') AS tm
UNION ALL SELECT TIMESTAMPADD(MICROSECOND,200,TIMESTAMP '2000-01-01') AS tm
)x
TIMESERIES ts AS '1 MICROSECOND' OVER(ORDER BY tm)
)
,
-- verticalised masks
vertical AS (
SELECT
id
, i
, CASE i
WHEN 1 THEN 'Mask001'
WHEN 2 THEN 'Mask002'
WHEN 200 THEN 'Mask200'
END AS rows
, timestamp
, CASE i
WHEN 1 THEN Mask1
WHEN 2 THEN Mask2
WHEN 200 THEN 0 -- no mask200 present
END AS val
FROM input CROSS JOIN i
WHERE i <=2 -- only 2 masks present currently
)
-- test the vertical CTE ...
-- SELECT * FROM vertical order by id,rows,timestamp;
-- out id | i | rows | timestamp | val
-- out ----+---+---------+-----------+-----
-- out 1 | 1 | Mask001 | 11:30:00 | 50
-- out 1 | 1 | Mask001 | 11:35:00 | 52
-- out 1 | 2 | Mask002 | 11:30:00 | 100
-- out 1 | 2 | Mask002 | 11:35:00 | 101
-- out 2 | 1 | Mask001 | 12:00:00 | 53
-- out 2 | 2 | Mask002 | 12:00:00 | 102
-- out 3 | 1 | Mask001 | 09:00:00 | 50
-- out 3 | 1 | Mask001 | 22:10:00 | 52
-- out 3 | 2 | Mask002 | 09:00:00 | 100
-- out 3 | 2 | Mask002 | 22:10:00 | 105
SELECT
id
, rows
, SUM(CASE timestamp WHEN '09:00' THEN val END) AS "09:00"
, SUM(CASE timestamp WHEN '11:30' THEN val END) AS "11:30"
, SUM(CASE timestamp WHEN '11:35' THEN val END) AS "11:35"
, SUM(CASE timestamp WHEN '12:00' THEN val END) AS "12:00"
, SUM(CASE timestamp WHEN '22:10' THEN val END) AS "22:10"
FROM vertical
GROUP BY
id
, rows
ORDER BY
id
, rows
;
-- out Null display is "Null".
-- out id | rows | 09:00 | 11:30 | 11:35 | 12:00 | 22:10
-- out ----+---------+-------+-------+-------+-------+-------
-- out 1 | Mask001 | Null | 50 | 52 | Null | Null
-- out 1 | Mask002 | Null | 100 | 101 | Null | Null
-- out 2 | Mask001 | Null | Null | Null | 53 | Null
-- out 2 | Mask002 | Null | Null | Null | 102 | Null
-- out 3 | Mask001 | 50 | Null | Null | Null | 52
-- out 3 | Mask002 | 100 | Null | Null | Null | 105
-- out (6 rows)
-- out
-- out Time: First fetch (6 rows): 28.143 ms. All rows formatted: 28.205 ms
You can use union all to unpivot the data and then conditional aggregation:
select id, which,
max(case when timestamp >= '09:00' and timestamp < '09:30' then mask end) as "09:00",
max(case when timestamp >= '09:30' and timestamp < '10:00' then mask end) as "09:30",
max(case when timestamp >= '10:00' and timestamp < '10:30' then mask end) as "10:00",
. . .
from ((select id, timestamp,
'Mask1' as which, Mask1 as mask
from t
) union all
(select id, timestamp, 'Mask2' as which, Mask2 as mask
from t
)
) t
group by t.id, t.which;
Note: This includes the id on each row. I strongly recommend doing that, but you could use:
select (case when which = 'Mask1' then id end) as id
If you really wanted to.