Snowflake SQL Time Breakdown

Snowflake SQL Time Breakdown - sql

I have a table with a timestamp for when an incident occurred and the downtime associated with that timestamp (in minutes). I want to break down this table by minute using Time_slice and show the minute associated with each slice. For example:
Time Duration
11:34 4.5
11:40 2
to:
time Duration
11:34 1
11:35 1
11:36 1
11:37 1
11:38 0.5
11:39 1
11:40 1
How can I accomplish this?

if you are fine with the same minute being listed many times if the input time + duration over lap, then you can do this.
WITH big_list_of_numbers AS (
SELECT
ROW_NUMBER() OVER (ORDER BY SEQ4())-1 as rn
FROM generator(ROWCOUNT => 1000)
)
SELECT
DATEADD('minute', r.rn, t.time) AS TIME
IFF(r.rn > t.duration, r.rn - t.duration, 1) AS duration
FROM table AS t
JOIN big_list_of_numbers AS r
ON t.duration < r.time
ORDER BY 1
if you want the total per minute you can put a grouping on it like:
WITH big_list_of_numbers AS (
SELECT
ROW_NUMBER() OVER (ORDER BY SEQ4()) as rn
FROM generator(ROWCOUNT => 1000)
)
SELECT
DATEADD('minute', r.rn, t.time) AS TIME
SUM(IFF(r.rn > t.duration, r.rn - t.duration, 1)) AS duration
FROM table AS t
JOIN big_list_of_numbers AS r
ON t.duration < r.time
GROUP BY 1
ORDER BY 1
The GENERATOR needs fixed input, so just use a huge number, it's not the expensive. Also SEQx() function can (and do) have gaps in them, so for data where you need continuous values (like this example) the SEQx() needs to be feed into a ROW_NUMBER() to force non-distributed allocation of numbers.

Related

How to lift or increase BigQuery's limit on recursive iterations

It seems like Google BigQuery limits the number of iterations done for recursive queries:
A recursive CTE has reached the maximum number of iterations: 100
I cannot find any docs how to lift or at least increase this limit. Is it possible?

Documentation says (emphasis mine):
If recursion does not terminate, the query fails after reaching 100 iterations, which can be customized at the project level.
However, I could not find the setting in the project.
It's not in the project's BigQuery Quota Policy or BigQuery API quotas. It's not mentioned in the quota documentation either but it does say that one can request quota increases albeit some quota increases need to be requested via Cloud Customer Care.
Maybe this is one such limit or maybe we need to wait until GA... 🤷🏻‍♂

The work around I came up for this was to just chain CTEs, that may or may not be possible depending on how you're doing it.
Example:
DECLARE StartDate DATE DEFAULT '2022-03-15';
WITH RECURSIVE
Dates1 AS (
SELECT
StartDate AS Day,
1 RowNumber,
1 Ranking
UNION ALL
SELECT
DATE_ADD(Day, INTERVAL 1 DAY) AS Day,
RowNumber + 1 AS RowNumber,
Ranking + 1 AS Ranking
FROM
Dates1
WHERE
Day < CURRENT_DATE()
AND RowNumber < 100
),
Dates2 AS (
SELECT
DATE_ADD(Day, INTERVAL 1 DAY) AS Day,
1 AS RowNumber,
Ranking + 1 AS Ranking
FROM
Dates1
WHERE
RowNumber = 100
AND Day < CURRENT_DATE()
UNION ALL
SELECT
DATE_ADD(Day, INTERVAL 1 DAY) AS Day,
RowNumber + 1 AS RowNumber,
Ranking + 1 AS Ranking
FROM
Dates2
WHERE
Day < CURRENT_DATE()
AND RowNumber < 100
),
Dates AS (
SELECT
Day,
Ranking
FROM
Dates1
UNION ALL
SELECT
Day,
Ranking
FROM
Dates2
)
SELECT
*
FROM
Dates
ORDER BY
2 DESC

SQL calculates total down time minutes

I'm working on a down time management system that is capable of saving support tickets for problems in a database, my database has the following columns:
-ID
-DateOpen
-DateClosed
-Total
I want to obtain the sum of minutes in a day, taking into account that the tickets can be simultaneous, for example:
ID | DateOpen | DateClosed | Total
1 2019-04-01 08:00:00 AM 2019-04-01 08:45:00 45
2 2019-04-01 08:10:00 AM 2019-04-01 08:20:00 10
3 2019-04-01 09:06:00 AM 2019-04-01 09:07:00 1
4 2019-04-01 09:06:00 AM 2019-04-01 09:41:00 33
Someone can helpme with that please!! :c
If I use the query "SUM", it will return 89, but if you see the dates, you will understand that the actual result must be 78 because the tickets 2 and 3 were launched while another ticket was working ...
DECLARE #DateOpen date = '2019-04-01'
SELECT AlarmID, DateOpen, DateClosed, TDT FROM AlarmHistory
WHERE CONVERT(date,DateOpen) = #DateOpen

What you need to do is generate a sequence of integers and use that to generate times of the day. Join that sequence of times on between your open and close dates, then count the number of distinct times.
Here is an example that will work with MySQL:
SET #row_num = 0;
SELECT COUNT(DISTINCT time_stamp)
-- this simulates your dateopen and dateclosed table
FROM (SELECT '2019-04-01 08:00:00' open_time, '2019-04-01 08:45:00' close_time
UNION SELECT '2019-04-01 08:10:00', '2019-04-01 08:20:00'
UNION SELECT '2019-04-01 09:06:00', '2019-04-01 09:07:00'
UNION SELECT '2019-04-01 09:06:00', '2019-04-01 09:41:00') times_used
JOIN (
-- generate sequence of minutes in day
SELECT TIME(sequence*100) time_stamp
FROM (
-- create sequence 1 - 10000
SELECT (#row_num:=#row_num + 1) AS sequence
FROM {table_with_10k+_records}
LIMIT 10000
) minutes
HAVING time_stamp IS NOT NULL
LIMIT 1440
) times ON (time_stamp >= TIME(open_time) AND time_stamp < TIME(close_time));
Since you are selecting only distinct times that are found in the result, minutes that overlap will not be counted.
NOTE: Depending on your database, there may be a better way to go about generating a sequence. MySQL does not have a generate sequence function I did it this way to show the basic idea that can easily be converted to work with whatever database you are using.

#drakin8564's answer adapted for SQL Server which I believe you're using:
;WITH Gen AS
(
SELECT TOP 1440
CONVERT(TIME, DATEADD(minute, ROW_NUMBER() OVER (ORDER BY (SELECT NULL)), '00:00:00')) AS t
FROM sys.all_objects a1
CROSS
JOIN sys.all_objects a2
)
SELECT COUNT(DISTINCT t)
FROM incidents inci
JOIN Gen
ON Gen.t >= CONVERT(TIME, inci.DateOpen)
AND Gen.t < CONVERT(TIME, inci.DateClosed)
Your total for the last record is wrong, says 33 while it's 35, so the query results in 80, not 78.

By the way, just as MarcinJ told you, 41 - 6 is 35, not 33. So the answer is 80, not 78.
The following solution would work even if the date parameter is not one day only (1,440 minutes). Say if the date parameter is a month, or even year, this solution would still work.
Live demo: http://sqlfiddle.com/#!18/462ac/5
-- arranged the opening and closing downtime
with a as
(
select
DateOpen d, 1 status
from dt
union all
select
DateClosed, 2
from dt
)
-- don't compute the downtime from previous date
-- if the current date's status is opened
-- yet the previous status is closed
, downtime_minutes AS
(
select
*,
lag(status) over(order by d, status desc) as prev_status,
case when status = 1 and lag(status) over(order by d, status desc) = 2 then
null
else
datediff(minute, lag(d) over(order by d, status desc), d)
end as downtime
from a
)
select sum(downtime) as all_downtime from downtime_minutes;
Output:
| all_downtime |
|--------------|
| 80 |
See how it works:
It works by computing the downtime from previous downtime. Don't compute downtime if the current date's status is open and the previous date's status is closed, which means the current downtime is a non-overlapping one. Non-overlapping downtime are denoted by null.
For that new downtime opened, its downtime is null initially, downtime will be computed on succeeding dates up to when it is closed.
Can make the code shorter by reversing the condition:
-- arranged the opening and closing downtime
with a as
(
select
DateOpen d, 1 status
from dt
union all
select
DateClosed, 2
from dt
-- order by d. postgres can do this?
)
-- don't compute the downtime from previous date
-- if the current date's status is opened
-- yet the previous status is closed
, downtime_minutes AS
(
select
*,
lag(status) over(order by d, status desc) as prev_status,
case when not ( status = 1 and lag(status) over(order by d, status desc) = 2 ) then
datediff(minute, lag(d) over(order by d, status desc), d)
end as downtime
from a
)
select sum(downtime) from downtime_minutes;
Not particularly proud of my original solution: http://sqlfiddle.com/#!18/462ac/1
As for the status desc on order by d, status desc, if a DateClosed is similar to other downtime's DateOpen, status desc will sort the DateClosed first.
For this data where 8:00 is present on both DateOpened and DateClosed:
INSERT INTO dt
([ID], [DateOpen], [DateClosed], [Total])
VALUES
(1, '2019-04-01 07:00:00', '2019-04-01 07:50:00', 50),
(2, '2019-04-01 07:45:00', '2019-04-01 08:00:00', 15),
(3, '2019-04-01 08:00:00', '2019-04-01 08:45:00', 45);
;
For similar time (e.g., 8:00), if we will not sort the closing first before the open, then 7:00 will be computed up to 7:50 only, instead of up to 8:00, as 8:00-open's downtime is initially zero. Here's how the open and closed downtimes are arranged and computed if there's no status desc for similar date, e.g., 8:00. The total downtime is 95 minutes only, which is wrong. It should be 105 minutes.
Here's how that will be arranged and computed if we sort the DateClosed first before the DateOpen (by using status desc) when they have similar date, e.g., 8:00. The total downtime is 105 minutes, which is correct.

Another approach, uses gaps and islands approach. Answer is based on SQL Time Packing of Islands
Live test: http://sqlfiddle.com/#!18/462ac/11
with gap_detector as
(
select
DateOpen, DateClosed,
case when
lag(DateClosed) over (order by DateOpen) is null
or lag(DateClosed) over (order by DateOpen) < DateOpen
then
1
else
0
end as gap
from dt
)
, downtime_grouper as
(
select
DateOpen, DateClosed,
sum(gap) over (order by DateOpen) as downtime_group
from gap_detector
)
-- group's open and closed detector. then computes the group's downtime
select
downtime_group,
min(DateOpen) as group_date_open,
max(DateClosed) as group_date_closed,
datediff(minute, min(DateOpen), max(DateClosed)) as group_downtime,
sum(datediff(minute, min(DateOpen), max(DateClosed)))
over(order by downtime_group) as downtime_running_total
from downtime_grouper
group by downtime_group
Output:
How it works
A DateOpen is the start of the series of downtime if it has no previous downtime (indicated by null lag(DateClosed)). A DateOpen is also a start of the series of downtime if it has a gap from the previous downtime's DateClosed.
with gap_detector as
(
select
lag(DateClosed) over (order by DateOpen) as previous_downtime_date_closed,
DateOpen, DateClosed,
case when
lag(DateClosed) over (order by DateOpen) is null
or lag(DateClosed) over (order by DateOpen) < DateOpen
then
1
else
0
end as gap
from dt
)
select *
from gap_detector
order by DateOpen;
Output:
After detecting the gap starters, we do a running total of the gap so we can group downtimes that are contiguous to each other.
with gap_detector as
(
select
DateOpen, DateClosed,
case when
lag(DateClosed) over (order by DateOpen) is null
or lag(DateClosed) over (order by DateOpen) < DateOpen
then
1
else
0
end as gap
from dt
)
select
DateOpen, DateClosed, gap,
sum(gap) over (order by DateOpen) as downtime_group
from gap_detector
order by DateOpen;
As we can see from the output above, we can now easily detect the downtime group's earliest DateOpen and latest DateClosed by applying MIN(DateOpen) and MAX(DateClosed) by grouping on downtime_group. On downtime_group 1, we have earliest DateOpen of 08:00 and latest DateClosed of 08:45. On downtime_group 2, we have earliest DateOpen of 09:06 and latest DateClosed of 9:41. And from that we can recalculate the correct downtime even if there are simultaneous downtimes.
We can make the code shorter by eliminating the detection of null previous downtime (the current row we are evaluating is the firstmost row in the table) by reversing the logic. Instead of detecting the gaps, we detect the islands (contiguous downtimes). Something is contiguous if the previous downtime's DateClosed overlaps the DateOpen of the current downtime, denoted by 0. If it does not overlaps, then it is a gap, denoted by 1.
Here's the query:
Live test: http://sqlfiddle.com/#!18/462ac/12
with gap_detector as
(
select
DateOpen, DateClosed,
case when lag(DateClosed) over (order by DateOpen) >= DateOpen
then
0
else
1
end as gap
from dt
)
, downtime_grouper as
(
select
DateOpen, DateClosed,
sum(gap) over (order by DateOpen) as downtime_group
from gap_detector
)
-- group's open and closed detector. then computes the group's downtime
select
downtime_group,
min(DateOpen) as group_date_open,
max(DateClosed) as group_date_closed,
datediff(minute, min(DateOpen), max(DateClosed)) as group_downtime,
sum(datediff(minute, min(DateOpen), max(DateClosed)))
over(order by downtime_group) as downtime_running_total
from downtime_grouper
group by downtime_group
If you are using SQL Server 2012 or higher:
iif(lag(DateClosed) over (order by DateOpen) >= DateOpen, 0, 1) as gap

How to return value based on the last available timestamp if the exact time is unavailable?

I am trying to return data in fifteen minute intervals. The first thing I thought to do was this:
select * from myTable where DATEPART(minute, Timestamp) % 15 = 0
But there are two problems with this approach. The first is that there will not necessarily always be data with a timestamp at a given minute, the other is that sometimes there are multiple data points at a given minute with different second values. I want to have exactly one row for each fifteen minute group, at :00, :15, :30, etc.
This data is only recorded when something changes, so if I don't have a data point at 12:30, for example, I could take the closest data point before that and use that value for 12:30 and it would be correct.
So basically I need to be able to return timestamps at exactly :00, :30, etc along with the data from the record closest to that time.
The data could span years but is more likely to be a shorter amount of time, days or weeks. This is what the expected output would look like:
Timestamp Value
1/1/2015 12:30:00 25
1/1/2015 12:45:00 41
1/1/2015 1:00:00 45
I'm having trouble thinking of a way to do this in SQL. Is it possible?

Given a fixed start time, all you would need is a table of numbers to add your intervals to. If you don't already have a table of numbers (which are useful) then a quick way to generate one on the fly is
WITH N1 AS (SELECT N FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) t (N)),
N2 (N) AS (SELECT 1 FROM N1 AS N1 CROSS JOIN N1 AS N2),
Numbers (N) AS (SELECT ROW_NUMBER() OVER(ORDER BY N1.N) FROM N2 AS N1 CROSS JOIN N2 AS N2)
SELECT *
FROM Numbers;
This simply generates a sequence from 1 to 10,000. For more reading on this see the following series:
Generate a set or sequence without loops – part 1
Generate a set or sequence without loops – part 2
Generate a set or sequence without loops – part 3
Then once you have your numbers you can generate your intervals:
DECLARE #StartDateTime SMALLDATETIME = '20150714 14:00',
#EndDateTime SMALLDATETIME = '20150715 15:00';
WITH N1 AS (SELECT N FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) t (N)),
N2 (N) AS (SELECT 1 FROM N1 AS N1 CROSS JOIN N1 AS N2),
Numbers (N) AS (SELECT ROW_NUMBER() OVER(ORDER BY N1.N) FROM N2 AS N1 CROSS JOIN N2 AS N2)
SELECT Interval = DATEADD(MINUTE, 15 * (N - 1), #StartDateTime)
FROM Numbers
WHERE DATEADD(MINUTE, 15 * (N - 1), #StartDateTime) <= #EndDateTime
Which gives something like:
Interval
----------------------
2015-07-14 14:00:00
2015-07-14 14:15:00
2015-07-14 14:30:00
2015-07-14 14:45:00
2015-07-14 15:00:00
2015-07-14 15:15:00
2015-07-14 15:30:00
Then you just need to find the closest value on or before each interval using APPLY and TOP:'
/*****************************************************************
SAMPLE DATA
*****************************************************************/
DECLARE #T TABLE ([Timestamp] DATETIME, Value INT);
INSERT #T ([Timestamp], Value)
SELECT DATEADD(SECOND, RAND(CHECKSUM(NEWID())) * -100000, GETDATE()),
CEILING(RAND(CHECKSUM(NEWID())) * 100)
FROM sys.all_objects;
/*****************************************************************
QUERY
*****************************************************************/
DECLARE #StartDateTime SMALLDATETIME = '20150714 14:00',
#EndDateTime SMALLDATETIME = '20150715 15:00';
WITH N1 AS (SELECT N FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) t (N)),
N2 (N) AS (SELECT 1 FROM N1 AS N1 CROSS JOIN N1 AS N2),
Numbers (N) AS (SELECT ROW_NUMBER() OVER(ORDER BY N1.N) FROM N2 AS N1 CROSS JOIN N2 AS N2),
Intervals AS
( SELECT Interval = DATEADD(MINUTE, 15 * (N - 1), #StartDateTime)
FROM Numbers
WHERE DATEADD(MINUTE, 15 * (N - 1), #StartDateTime) <= #EndDateTime
)
SELECT i.Interval, t.[Timestamp], t.Value
FROM Intervals AS i
OUTER APPLY
( SELECT TOP 1 t.[Timestamp], t.Value
FROM #T AS t
WHERE t.[Timestamp] <= i.Interval
ORDER BY t.[Timestamp] DESC, t.Value
) AS t
ORDER BY i.Interval;
Edit
One point to note is that in the case of having two equal timestamps that are both on or closest to an interval, I have applied a secondary level of ordering by Value:
SELECT i.Interval, t.[Timestamp], t.Value
FROM Intervals AS i
OUTER APPLY
( SELECT TOP 1 t.[Timestamp], t.Value
FROM #T AS t
WHERE t.[Timestamp] <= i.Interval
ORDER BY t.[Timestamp] DESC, t.Value --- ORDERING HERE
) AS t
ORDER BY i.Interval;
This is arbitrary and could be anything you chose, it would be advisable to ensure that you order by enough items to ensure the results are deterministic, that is to say, if you ran the query on the same data many times the same results would be returned because there is only one row that satisfies the criteria. If you had two rows like this:
Timestamp | Value | Field1
-----------------+---------+--------
2015-07-14 14:00 | 100 | 1
2015-07-14 14:00 | 100 | 2
2015-07-14 14:00 | 50 | 2
If you just order by timestamp, for the interval 2015-07-14 14:00, you don't know whether you will get a value of 50 or 100, and it could be different between executions depending on statistics and the execution plan. Similarly if you order by Timestamp and Value, then you don't know whether Field1 will be 1 or 2.

Like Shnugo mention, you can use a tally table to get your data in an interval of 15 minutes, something like this.
I am creating a dynamic tally table using CTE however you can even use a physical calendar table as per your needs.
DECLARE #StartTime DATETIME = '2015-01-01 00:00:00',#EndTime DATETIME = '2015-01-01 14:00:00'
DECLARE #TimeData TABLE ([Timestamp] datetime, [Value] int);
INSERT INTO #TimeData([Timestamp], [Value])
VALUES ('2015-01-01 12:30:00', 25),
('2015-01-01 12:45:00', 41),
('2015-01-01 01:00:00', 45);
;WITH CTE(rn) AS
(
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
), CTE2 as
(
SELECT C1.rn
FROM CTE C1 CROSS JOIN CTE C2
), CTE3 as
(
SELECT TOP (CEILING(DATEDIFF(minute,#StartTime,#EndTime)/15)) ROW_NUMBER()OVER(ORDER BY C1.rn) - 1 rn
FROM CTE2 C1 CROSS JOIN CTE2 C2
)
SELECT DATEADD(minute,rn*15,#StartTime) CurrTime,T.Value
FROM CTE3
CROSS APPLY (SELECT TOP 1 Value FROM #TimeData WHERE [Timestamp] <= DATEADD(minute,rn*15,#StartTime) ORDER BY [Timestamp] DESC) T;
OUTPUT
CurrTime Value
2015-01-01 01:00:00.000 45
2015-01-01 01:15:00.000 45
.
.
.
2015-01-01 12:00:00.000 45
2015-01-01 12:15:00.000 45
2015-01-01 12:30:00.000 25
2015-01-01 12:45:00.000 41
2015-01-01 13:00:00.000 41
2015-01-01 13:15:00.000 41
2015-01-01 13:30:00.000 41
2015-01-01 13:45:00.000 41

Now you really have enough ways to create your tally table :-)
DECLARE #startdate DATETIME={ts'2015-06-01 00:00:00'};
WITH JumpsOf15 AS
(
SELECT ROW_NUMBER() OVER(ORDER BY object_id) * 15 AS Step
FROM sys.objects --take any large table here (should have many rows...)
)
SELECT Step,steppedDate.steppedDate
FROM JumpsOf15
CROSS APPLY(SELECT DATEADD(MINUTE,Step,#startdate) AS steppedDate ) AS steppedDate
WHERE GETDATE()>steppedDate.steppedDate;

The question is missing original data and schema information, so I'll address the question mainly in general form.
You're looking for results in a range that won't have any missing records, covering data that can have missing records. Given that requirement, the normal solution is to create a projection for just the values you need on the left hand side, using a source like a Numbers table that has nothing to do with your actual data. The Numbers table will be guaranteed not to be missing any records in your range. For date projections, you just add the appropriate number of days or minutes to your starting value, for the number of records you expect in the results.
Once you have the projection, you make an OUTER JOIN from the projection against your actual data. In this case, the JOIN is complicated by the fact that you have some date values extra records. I know of two ways to address this problem. One way is to GROUP BY the values in the projection. The other is to use an OUTER APPLY instead of a join. With an OUTER APPLY, you can just use TOP 1 filter on the applied query to limit results to one item.
In summary, here is some psuedo-code that should help you get to where you need to be:
WITH Numbers AS
(
--select numbers here
),
DateProjection As
(
SELECT DATEADD(minute, 15*Numbers.Number, '2015-01-01') As RangeStart,
DATEADD(minute, 15*(Numbers.Number+1), '2015-01-01') AS RangeEnd
FROM Numbers
)
SELECT dp.RangeStart as TimeStamp, oa.Value
FROM DateProjection dp
OUTER APPLY (SELECT TOP 1 Value FROM [myTable] WHERE myTable.TimeStamp >= dp.RangeStart AND myTable.TimeStamp < dp.RangeEnd) oa

Very tricky, but something along these lines may work:
select * from mytable where TimeStamp in (
select max(TimeStamp) from (
select date(TimeStamp) dt, hour(TimeStamp) as hr,
case when minute(TimeStamp) < 15 then 15 else
case when minute(TimeStamp) < 30 then 30 else
case when minute(TimeStamp) < 45 then 45 else 60 end end end as mint
from mytable where TimeStamp between <some TS> and <some other TS>
) t group by dt, hr, mint
)
Of course this will not work if there are two readings with the exact same timestamp, in that case you need yet another group by. Messy querying no matter what.

I would use an OVER clause to partition the rows by the timestamp, rounded to the nearest quarter hour. Then order each partition by the difference between the timestamp and the rounded timestamp, ascending, and grab the first row of each partition. I think that would do what you want. This will give you the nearest rows to the 15 minute mark. However, it will not add extrapolated values where there are no rows within a 15 minute period.
SELECT ROW_NUMBER() OVER(PARTITION BY [Timestamp Moded to 15 minutes] ORDER BY [Diff timestamp - timestamp moded to 15 minutes] ASC) AS RowNum, *
FROM MyTable where RowNum = 1

You can use next query to grouping data by 15 min intervals:
select *, CASE DATEPART(minute, timestamp) /15
WHEN 0 THEN '0-15' WHEN 1 THEN '15-30' WHEN 2 THEN '30-45' WHEN 3 THEN '45-60' END
AS [Time Group]
from myTable where
DATEPART(minute, timestamp) /15 = 2 /* for group 30-45 min*/
With account of date and hour:
select *,
CAST(CAST(timestamp as date) AS VARCHAR(MAX))+ ' ' +
CAST(DATEPART(hour, timestamp) AS VARCHAR(MAX)) + ':' +
CAST(
CASE DATEPART(minute, timestamp) /15
WHEN 0 THEN '0-15'
WHEN 1 THEN '15-30'
WHEN 2 THEN '30-45'
WHEN 3 THEN '45-60' END
AS VARCHAR(MAX)) AS [Interval]
from myTable
order by [Interval]

Query aggregated data with a given sampling time

Suppose my raw data is:
Timestamp High Low Volume
10:24.22345 100 99 10
10:24.23345 110 97 20
10:24.33455 97 89 40
10:25.33455 60 40 50
10:25.93455 40 20 60
With a sample time of 1 second, the output data should be as following (add additional column):
Timestamp High Low Volume Count
10:24 110 89 70 3
10:25 60 20 110 2
The sampling unit from varying from 1 second, 5 sec, 1 minute, 1 hour, 1 day, ...
How to query the sampled data in quick time in the PostgreSQL database with Rails?
I want to fill all the interval by getting the error
ERROR: JOIN/USING types bigint and timestamp without time zone cannot be matched
SQL
SELECT
t.high,
t.low
FROM
(
SELECT generate_series(
date_trunc('second', min(ticktime)) ,
date_trunc('second', max(ticktime)) ,
interval '1 sec'
) FROM czces AS g (time)
LEFT JOIN
(
SELECT
date_trunc('second', ticktime) AS time ,
max(last_price) OVER w AS high ,
min(last_price) OVER w AS low
FROM czces
WHERE product_type ='TA' AND contract_month = '2014-08-01 00:00:00'::TIMESTAMP
WINDOW w AS (
PARTITION BY date_trunc('second', ticktime)
ORDER BY ticktime ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
)
) t USING (time)
ORDER BY 1
) AS t ;

Simply use date_trunc() before you aggregate. Works for basic time units 1 second, 1 minute, 1 hour, 1 day - but not for 5 sec. Arbitrary intervals are slightly more complex, see link below!
SELECT date_trunc('second', timestamp) AS timestamp -- or minute ...
, max(high) AS high, min(low) AS low, sum(volume) AS vol, count(*) AS ct
FROM tbl
GROUP BY 1
ORDER BY 1;
If there are no rows for a sample point, you get no row in the result. If you need one row for every sample point:
SELECT g.timestamp, t.high, t.low, t.volume, t.ct
FROM (SELECT generate_series(date_trunc('second', min(timestamp))
,date_trunc('second', max(timestamp))
,interval '1 sec') AS g (timestamp) -- or minute ...
LEFT JOIN (
SELECT date_trunc('second', timestamp) AS timestamp -- or minute ...
, max(high) AS high, min(low) AS low, sum(volume) AS vol, count(*) AS ct
FROM tbl
GROUP BY 1
) t USING (timestamp)
ORDER BY 1;
The LEFT JOIN is essential.
For arbitrary intervals:
Best way to count records by arbitrary time intervals in Rails+Postgres
Retrieve aggregates for arbitrary time intervals
Aside: Don't use timestamp as column name. It's a basic type name and a reserved word in standard SQL. It's also misleading for data that's not actually a timestamp.

Count occurrences of combinations of columns

I have daily time series (actually business days) for different companies and I work with PostgreSQL. There is also an indicator variable (called flag) taking the value 0 most of the time, and 1 on some rare event days. If the indicator variable takes the value 1 for a company, I want to further investigate the entries from two days before to one day after that event for the corresponding company. Let me refer to that as [-2,1] window with the event day being day 0.
I am using the following query
CREATE TABLE test AS
WITH cte AS (
SELECT *
, MAX(flag) OVER(PARTITION BY company ORDER BY day
ROWS BETWEEN 1 preceding AND 2 following) Lead1
FROM mytable)
SELECT *
FROM cte
WHERE Lead1 = 1
ORDER BY day,company
The query takes the entries ranging from 2 days before the event to one day after the event, for the company experiencing the event.
The query does that for all events.
This is a small section of the resulting table.
day company flag
2012-01-23 A 0
2012-01-24 A 0
2012-01-25 A 1
2012-01-25 B 0
2012-01-26 A 0
2012-01-26 B 0
2012-01-27 B 1
2012-01-30 B 0
2013-01-10 A 0
2013-01-11 A 0
2013-01-14 A 1
Now I want to do further calculations for every [-2,1] window separately. So I need a variable that allows me to identify each [-2,1] window. The idea is that I count the number of windows for every company with the variable "occur", so that in further calculations I can use the clause
GROUP BY company, occur
Therefore my desired output looks like that:
day company flag occur
2012-01-23 A 0 1
2012-01-24 A 0 1
2012-01-25 A 1 1
2012-01-25 B 0 1
2012-01-26 A 0 1
2012-01-26 B 0 1
2012-01-27 B 1 1
2012-01-30 B 0 1
2013-01-10 A 0 2
2013-01-11 A 0 2
2013-01-14 A 1 2
In the example, the company B only occurs once (occur = 1). But the company A occurs two times. For the first time from 2012-01-23 to 2012-01-26. And for the second time from 2013-01-10 to 2013-01-14. The second time range of company A does not consist of all four days surrounding the event day (-2,-1,0,1) since the company leaves the dataset before the end of that time range.
As I said I am working with business days. I don't care for holidays, I have data from monday to friday. Earlier I wrote the following function:
CREATE OR REPLACE FUNCTION addbusinessdays(date, integer)
RETURNS date AS
$BODY$
WITH alldates AS (
SELECT i,
$1 + (i * CASE WHEN $2 < 0 THEN -1 ELSE 1 END) AS date
FROM generate_series(0,(ABS($2) + 5)*2) i
),
days AS (
SELECT i, date, EXTRACT('dow' FROM date) AS dow
FROM alldates
),
businessdays AS (
SELECT i, date, d.dow FROM days d
WHERE d.dow BETWEEN 1 AND 5
ORDER BY i
)
-- adding business days to a date --
SELECT date FROM businessdays WHERE
CASE WHEN $2 > 0 THEN date >=$1 WHEN $2 < 0
THEN date <=$1 ELSE date =$1 END
LIMIT 1
offset ABS($2)
$BODY$
LANGUAGE 'sql' VOLATILE;
It can add/substract business days from a given date and works like that:
select * from addbusinessdays('2013-01-14',-2)
delivers the result 2013-01-10. So in Jakub's approach we can change the second and third last line to
w.day BETWEEN addbusinessdays(t1.day, -2) AND addbusinessdays(t1.day, 1)
and can deal with the business days.

Function
While using the function addbusinessdays(), consider this instead:
CREATE OR REPLACE FUNCTION addbusinessdays(date, integer)
RETURNS date AS
$func$
SELECT day
FROM (
SELECT i, $1 + i * sign($2)::int AS day
FROM generate_series(0, ((abs($2) * 7) / 5) + 3) i
) sub
WHERE EXTRACT(ISODOW FROM day) < 6 -- truncate weekend
ORDER BY i
OFFSET abs($2)
LIMIT 1
$func$ LANGUAGE sql IMMUTABLE;
Major points
Never quote the language name sql. It's an identifier, not a string.
Why was the function VOLATILE? Make it IMMUTABLE for better performance in repeated use and more options (like using it in a functional index).
(ABS($2) + 5)*2) is way too much padding. Replace with ((abs($2) * 7) / 5) + 3).
Multiple levels of CTEs were useless cruft.
ORDER BY in last CTE was useless, too.
As mentioned in my previous answer, extract(ISODOW FROM ...) is more convenient to truncate weekends.
Query
That said, I wouldn't use above function for this query at all. Build a complete grid of relevant days once instead of calculating the range of days for every single row.
Based on this assertion in a comment (should be in the question, really!):
two subsequent windows of the same firm can never overlap.
WITH range AS ( -- only with flag
SELECT company
, min(day) - 2 AS r_start
, max(day) + 1 AS r_stop
FROM tbl t
WHERE flag <> 0
GROUP BY 1
)
, grid AS (
SELECT company, day::date
FROM range r
,generate_series(r.r_start, r.r_stop, interval '1d') d(day)
WHERE extract('ISODOW' FROM d.day) < 6
)
SELECT *, sum(flag) OVER(PARTITION BY company ORDER BY day
ROWS BETWEEN UNBOUNDED PRECEDING
AND 2 following) AS window_nr
FROM (
SELECT t.*, max(t.flag) OVER(PARTITION BY g.company ORDER BY g.day
ROWS BETWEEN 1 preceding
AND 2 following) in_window
FROM grid g
LEFT JOIN tbl t USING (company, day)
) sub
WHERE in_window > 0 -- only rows in [-2,1] window
AND day IS NOT NULL -- exclude missing days in [-2,1] window
ORDER BY company, day;
How?
Build a grid of all business days: CTE grid.
To keep the grid to its smallest possible size, extract minimum and maximum (plus buffer) day per company: CTE range.
LEFT JOIN actual rows to it. Now the frames for ensuing window functions works with static numbers.
To get distinct numbers per flag and company (window_nr), just count flags from the start of the grid (taking buffers into account).
Only keep days inside your [-2,1] windows (in_window > 0).
Only keep days with actual rows in the table.
Voilá.
SQL Fiddle.

Basically the strategy is to first enumarate the flag days and then join others with them:
WITH windows AS(
SELECT t1.day
,t1.company
,rank() OVER (PARTITION BY company ORDER BY day) as rank
FROM table1 t1
WHERE flag =1)
SELECT t1.day
,t1.company
,t1.flag
,w.rank
FROM table1 AS t1
JOIN windows AS w
ON
t1.company = w.company
AND
w.day BETWEEN
t1.day - interval '2 day' AND t1.day + interval '1 day'
ORDER BY t1.day, t1.company;
Fiddle.
However there is a problem with work days as those can mean whatever (do holidays count?).

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Snowflake SQL Time Breakdown - sql

Related

How to lift or increase BigQuery's limit on recursive iterations

SQL calculates total down time minutes

How to return value based on the last available timestamp if the exact time is unavailable?

Query aggregated data with a given sampling time

Count occurrences of combinations of columns

Categories

Resources