Postgres consecutive time intervals - sql

The table:
timestamp
---------------------
2018-01-15 14:31:23
2018-01-15 14:31:25
2018-01-15 14:31:26
2018-01-15 14:31:28
2018-01-15 14:31:29
2018-01-15 14:31:30
2018-01-15 14:31:35
It would be really helpful if someone shows how to get consecutive time intervals using sql. Consequent means that it has seconds followed one by one, if there is a gap in more than 1 second it is not consequent, and should be counted as separate interval.
The following is expected result:
result
--------
1
2
3
1

I see. You can use row_number() trick for this:
select grp, count(*)
from (select t.*,
(time_stamp - row_number() over (order by timestamp) * interval '1 second') as grp
from t
) t
group by grp
order by min(timestamp);
The idea is to subtract a sequence of numbers from the timestamp. Timestamps that are in a sequence will all end up with the same value. That appears to be the groups that you want.

Related

SQL_ Extract failure/restoral events per user only if they are consecutive

I have a table for certain failure and restore events for users. I would like to find time difference between failure and restore event of every user.
Also for some failure events, there are multiple restore events, I want to find time diff only between consecutive ones.
And, there might be multiple failures before restores. I only want difference between consecutive failure and restore (ignoring all leading and trailing extra events for that user)
sr imei time event_type
1 1 2020-01-01 14:28:06.269000+00:00 failure
2 1 2020-01-01 14:28:29.910000+00:00 failure_restored
3 5 2020-01-01 15:24:52.714000+00:00 failure
4 5 2020-01-01 15:29:59.045000+00:00 failure_restored
5 6 2020-01-01 21:21:32.715000+00:00 failure_restored
6 7 2020-01-01 21:48:43.798000+00:00 failure_restored
7 9 2020-01-01 22:18:34.112000+00:00 failure_restored
8 9 2020-01-01 22:20:16.165000+00:00 failure
9 9 2020-01-01 22:25:29.648000+00:00 failure_restored
I want to find time diff between failure/failure_restored events for every imei.
for eg: between row 1 and 2, 3 and 4, row 6, 7 and 8 should be ignored and between row 8 and 9.
I could achieve it with pandas with custom made functions but need help with doing it in SQL.
You don't specify what to do if there are two "failure"s in a row. If you always know that the next row is a restore, then:
SELECT imei, TIMESTAMP_DIFF(next_time, time, MILLISECOND) as time_diff_in_milliseconds
FROM (SELECT t.*,
LEAD(time) OVER (PARTITION BY imei ORDER BY time) as next_time
FROM t
)
WHERE event_type = 'failure';
If not, then check the next event type as well:
SELECT imei, TIMESTAMP_DIFF(next_time, time, MILLISECOND) as time_diff_in_milliseconds
FROM (SELECT t.*,
LEAD(time) OVER (PARTITION BY imei ORDER BY time) as next_time,
LEAD(event_type) OVER (PARTITION BY imei ORDER BY time) as next_event_type
FROM t
)
WHERE event_type = 'failure' and next_event_type = 'failure-restored';
Below is for BigQuery Standard SQL
#standardSQL
SELECT imei, TIMESTAMP_DIFF(restore_time, time, MILLISECOND) time_diff_in_milliseconds
FROM (
SELECT *, LEAD(IF(event_type = 'failure_restored', time, NULL)) OVER(PARTITION BY imei ORDER BY time) restore_time
FROM `project.dataset.table`
)
WHERE event_type = 'failure'
if to apply to sample data from your question - result is
Row imei time_diff_in_milliseconds
1 1 23641
2 5 306331
3 9 313483

Sum of item count in an SQL query based on DATE_TRUNC

I've got a table which contains event status data, similar to this:
ID Time Status
------ -------------------------- ------
357920 2019-12-25 09:31:38.854764 1
362247 2020-01-02 09:31:42.498483 1
362248 2020-01-02 09:31:46.166916 1
362249 2020-01-02 09:31:47.430933 1
362300 2020-01-03 09:31:46.932333 1
362301 2020-01-03 09:31:47.231288 1
I'd like to construct a query which returns the number of successful events each day, so:
Time Count
-------------------------- -----
2019-12-25 00:00:00.000000 1
2020-01-02 00:00:00.000000 3
2020-01-03 00:00:00.000000 2
I've stumbled across this SO answer to a similar question, but the answer there is for all the data returned by the query, whereas I need the sum grouped by date range.
Also, I cannot use BETWEEN to select a specific date range, since this query is for a Grafana dashboard, and the date range is determined by the dashboard's UI. I'm using Postgres for the SQL dialect, in case that matters.
You need to remove the time from time component. In most databases, you can do this by converting to a date:
select cast(time as date) as dte,
sum(case when status = 1 then 1 else 0 end) as num_successful
from t
group by cast(time as date)
order by dte;
This assumes that 1 means "successful".
The cast() does not work in all databases. Other alternatives are things like trunc(time), date_trunc('day', time), date_trunc(time, day) -- and no doubt many others.
In Postgres, I would phrase this as:
select date_trunc('day', time) as dte,
count(*) filter (where status = 1) as num_successful
from t
group by dte
order by dte;
How about like this:
SELECT date(Time), sum(status)
FROM table
GROUP BY date(Time)
ORDER BY min(Time)

I need to calculate the time between dates in different lines. (PLSQL)

I have a table where I store all status changes and the time that it has been made. So, when I search the order number on the table of times I get all the dates of my changes, but what I realy want is the time (hours/minutes) that the order was in each status.
The table of time seems like this
ID_ORDER | Status | Date
1 Waiting 27/09/2017 12:00:00
1 Late 27/09/2017 14:00:00
1 In progress 28/09/2017 08:00:00
1 Validating 30/09/2017 14:00:00
1 Completed 30/09/2017 14:00:00
Thanks!
Use lead():
select t.*,
(lead(date) over (partition by id_order order by date) - date) as time_in_order
from t;

Get MAX count but keep the repeated calculated value if highest

I have the following table, I am using SQL Server 2008
BayNo FixDateTime FixType
1 04/05/2015 16:15:00 tyre change
1 12/05/2015 00:15:00 oil change
1 12/05/2015 08:15:00 engine tuning
1 04/05/2016 08:11:00 car tuning
2 13/05/2015 19:30:00 puncture
2 14/05/2015 08:00:00 light repair
2 15/05/2015 10:30:00 super op
2 20/05/2015 12:30:00 wiper change
2 12/05/2016 09:30:00 denting
2 12/05/2016 10:30:00 wiper repair
2 12/06/2016 10:30:00 exhaust repair
4 12/05/2016 05:30:00 stereo unlock
4 17/05/2016 15:05:00 door handle repair
on any given day need do find the highest number of fixes made on a given bay number, and if that calculated number is repeated then it should also appear in the resultset
so would like to see the result set as follows
BayNo FixDateTime noOfFixes
1 12/05/2015 00:15:00 2
2 12/05/2016 09:30:00 2
4 12/05/2016 05:30:00 1
4 17/05/2016 15:05:00 1
I manage to get the counts of each but struggling to get the max and keep the highest calculated repeated value. can someone help please
Use window functions.
Get the count for each day by bayno and also find the min fixdatetime for each day per bayno.
Then use dense_rank to compute the highest ranked row for each bayno based on the number of fixes.
Finally get the highest ranked rows.
select distinct bayno,minfixdatetime,no_of_fixes
from (
select bayno,minfixdatetime,no_of_fixes
,dense_rank() over(partition by bayno order by no_of_fixes desc) rnk
from (
select t.*,
count(*) over(partition by bayno,cast(fixdatetime as date)) no_of_fixes,
min(fixdatetime) over(partition by bayno,cast(fixdatetime as date)) minfixdatetime
from tablename t
) x
) y
where rnk = 1
Sample Demo
You are looking for rank() or dense_rank(). I would right the query like this:
select bayno, thedate, numFixes
from (select bayno, cast(fixdatetime) as date) as thedate,
count(*) as numFixes,
rank() over (partition by cast(fixdatetime as date) order by count(*) desc) as seqnum
from t
group by bayno, cast(fixdatetime as date)
) b
where seqnum = 1;
Note that this returns the date in question. The date does not have a time component.

postgresql group by sequence time stamps

I'm about two weeks old in SQL years so if you could humor me a little it would be very helpful.
I'm having trouble figuring out how to group by a series of sequential timestamps (hour steps in this case).
For example:
ID time
1 2008-11-11 01:00:00
2 2008-11-11 02:00:00
3 2008-11-11 04:00:00
4 2008-11-11 05:00:00
5 2008-11-11 06:00:00
6 2008-11-11 08:00:00
I'd like to end up with grouping like so:
Group above_table_ID's
1 1,2
2 3,4,5
3 6
This would be easy to express in the python loop or something but I really don't understand how to express this type of logic in sql/postgresql.
If anyone could help explain this process to me it would be greatly appreciated.
thank you
You can do this by subtracting an increasing number from the time stamps, in hours. Things that are sequential will have the same value.
select row_number() over (order by grp) as GroupId,
string_agg(id, ',') as ids
from (select t.*,
(time - row_number() over (order by time) * interval '1 hour') as grp
from table t
) t
group by grp;