postgresql group by sequence time stamps - sql

I'm about two weeks old in SQL years so if you could humor me a little it would be very helpful.
I'm having trouble figuring out how to group by a series of sequential timestamps (hour steps in this case).
For example:
ID time
1 2008-11-11 01:00:00
2 2008-11-11 02:00:00
3 2008-11-11 04:00:00
4 2008-11-11 05:00:00
5 2008-11-11 06:00:00
6 2008-11-11 08:00:00
I'd like to end up with grouping like so:
Group above_table_ID's
1 1,2
2 3,4,5
3 6
This would be easy to express in the python loop or something but I really don't understand how to express this type of logic in sql/postgresql.
If anyone could help explain this process to me it would be greatly appreciated.
thank you

You can do this by subtracting an increasing number from the time stamps, in hours. Things that are sequential will have the same value.
select row_number() over (order by grp) as GroupId,
string_agg(id, ',') as ids
from (select t.*,
(time - row_number() over (order by time) * interval '1 hour') as grp
from table t
) t
group by grp;

Related

PostgreSQL - Select splitted rows based on a column value

Could someone please suggest a query which splits items by working minutes per hour?
Source table
start_timestamp
item_id
total_working_minutes
2021-02-01 14:10
A
120
2021-02-01 14:30
B
20
2021-02-01 16:30
A
10
Expected result
timestamp_by_hour
item_id
working_minutes
2021-02-01 14:00
A
50
2021-02-01 14:00
B
20
2021-02-01 15:00
A
60
2021-02-01 16:00
A
20
Thanks in advance!
You can accomplish this using a recursive query, which should work in both Redshift and PostgreSQL. First, extract
The hour and amount of minutes worked the first hour
The total minutes worked
Then, repeat by recursion for each row where the minutes worked in the current hour is less than total minutes worked. In the recursion, increase the starting hour by 1, and reduce total minutes worked by the minutes worked in the preceding hour.
Finally, aggregate the results by hour and ID.
with recursive
split_times(timestamp_by_hour, item_id, working_minutes, total_working_minutes) as
(
select
date_trunc('hour', start_timestamp),
item_id,
least(total_working_minutes, 60 - extract(minutes from start_timestamp)),
total_working_minutes
from work_time
union all
select
timestamp_by_hour + interval '1 hour',
item_id,
least(total_working_minutes - working_minutes, 60),
total_working_minutes - working_minutes
from split_times
where total_working_minutes > working_minutes
)
select timestamp_by_hour, item_id, sum(working_minutes) working_minutes
from split_times
group by timestamp_by_hour, item_id
order by timestamp_by_hour, item_id;
DB Fiddle

Postgres consecutive time intervals

The table:
timestamp
---------------------
2018-01-15 14:31:23
2018-01-15 14:31:25
2018-01-15 14:31:26
2018-01-15 14:31:28
2018-01-15 14:31:29
2018-01-15 14:31:30
2018-01-15 14:31:35
It would be really helpful if someone shows how to get consecutive time intervals using sql. Consequent means that it has seconds followed one by one, if there is a gap in more than 1 second it is not consequent, and should be counted as separate interval.
The following is expected result:
result
--------
1
2
3
1
I see. You can use row_number() trick for this:
select grp, count(*)
from (select t.*,
(time_stamp - row_number() over (order by timestamp) * interval '1 second') as grp
from t
) t
group by grp
order by min(timestamp);
The idea is to subtract a sequence of numbers from the timestamp. Timestamps that are in a sequence will all end up with the same value. That appears to be the groups that you want.

I need to calculate the time between dates in different lines. (PLSQL)

I have a table where I store all status changes and the time that it has been made. So, when I search the order number on the table of times I get all the dates of my changes, but what I realy want is the time (hours/minutes) that the order was in each status.
The table of time seems like this
ID_ORDER | Status | Date
1 Waiting 27/09/2017 12:00:00
1 Late 27/09/2017 14:00:00
1 In progress 28/09/2017 08:00:00
1 Validating 30/09/2017 14:00:00
1 Completed 30/09/2017 14:00:00
Thanks!
Use lead():
select t.*,
(lead(date) over (partition by id_order order by date) - date) as time_in_order
from t;

Get MAX count but keep the repeated calculated value if highest

I have the following table, I am using SQL Server 2008
BayNo FixDateTime FixType
1 04/05/2015 16:15:00 tyre change
1 12/05/2015 00:15:00 oil change
1 12/05/2015 08:15:00 engine tuning
1 04/05/2016 08:11:00 car tuning
2 13/05/2015 19:30:00 puncture
2 14/05/2015 08:00:00 light repair
2 15/05/2015 10:30:00 super op
2 20/05/2015 12:30:00 wiper change
2 12/05/2016 09:30:00 denting
2 12/05/2016 10:30:00 wiper repair
2 12/06/2016 10:30:00 exhaust repair
4 12/05/2016 05:30:00 stereo unlock
4 17/05/2016 15:05:00 door handle repair
on any given day need do find the highest number of fixes made on a given bay number, and if that calculated number is repeated then it should also appear in the resultset
so would like to see the result set as follows
BayNo FixDateTime noOfFixes
1 12/05/2015 00:15:00 2
2 12/05/2016 09:30:00 2
4 12/05/2016 05:30:00 1
4 17/05/2016 15:05:00 1
I manage to get the counts of each but struggling to get the max and keep the highest calculated repeated value. can someone help please
Use window functions.
Get the count for each day by bayno and also find the min fixdatetime for each day per bayno.
Then use dense_rank to compute the highest ranked row for each bayno based on the number of fixes.
Finally get the highest ranked rows.
select distinct bayno,minfixdatetime,no_of_fixes
from (
select bayno,minfixdatetime,no_of_fixes
,dense_rank() over(partition by bayno order by no_of_fixes desc) rnk
from (
select t.*,
count(*) over(partition by bayno,cast(fixdatetime as date)) no_of_fixes,
min(fixdatetime) over(partition by bayno,cast(fixdatetime as date)) minfixdatetime
from tablename t
) x
) y
where rnk = 1
Sample Demo
You are looking for rank() or dense_rank(). I would right the query like this:
select bayno, thedate, numFixes
from (select bayno, cast(fixdatetime) as date) as thedate,
count(*) as numFixes,
rank() over (partition by cast(fixdatetime as date) order by count(*) desc) as seqnum
from t
group by bayno, cast(fixdatetime as date)
) b
where seqnum = 1;
Note that this returns the date in question. The date does not have a time component.

Generate cyclic column variable conditional on lag of another variable postgreSQL

So suppose I have a table as follows:
user date
a 10/15/2015
a 11/15/2015
a 12/15/2015
a 2/15/2015
b 1/15/2015
b 2/15/2015
b 4/15/2015
b 6/15/2015
I need to create three column variables (acutally two - i figured the time lag variable) (1) one that counts the number of successive logins by month and if there is a lapse restarts the counter (2) the numbers of days between logins (figured this one out) (3) if the counter resets then their cycle count increases by one. The resulting table should look as follows: (I am going to just use 30 days for 1 month span for illustrative purposes.)
user date count timelapse cycle
a 10/15/2015 1 0 1
a 11/15/2015 2 30 1
a 12/15/2015 3 30 1
a 2/15/2015 1 60 2
b 1/15/2015 1 0 1
b 2/15/2015 2 30 1
b 4/15/2015 1 60 2
b 6/15/2015 1 60 3
Any ideas? I was able to get the count column to work - but I could not get it to reset when the timelapse was greater than 30. Since the cycle was conditional on two columns I was at a bit of a loss there.
Any help or ideas would be greatly appreciated.
Here is the idea. Use lag() to determine when a gap occurs. You can do this by truncating the date to the beginning of the month, for comparison purposes.
Then, do a cumulative sum of the gap flags. This provides the cycle column. The count is then row_number() using the cycle:
select t.*,
row_number() over (partition by user, cycle order by date) as count
from (select t.*, sum(IsGap) over (partition by user order by date) as cycle
from (select user, date,
(case when date_trunc('month', date) = date_trunc('month', lag(date) over (partition by user order by date) + interval '1 month'
then 0
else 1
end) as IsGap
from t
) t
) t