DB2 SQL group row count by time intervals of 10 minutes - sql

I have a table in DB2 which contains data such as the following:
completed_timestamp
details
2021-12-19-15.38.10
abcd
2021-12-19-15.39.10
efgh
2021-12-19-15.48.10
ijkl
2021-12-19-15.49.10
mnop
2021-12-19-15.54.10
qrst
I want to be able to count the number of rows in the table for every 10 minutes e.g.
Time
count
2021-12-19-15.40
2
2021-12-19-15.50
2
2021-12-19-16.00
1
completed_timestamp = Timestamp, details = varchar
I've seen this done in other SQL languages but so far have not figured out how to do it with DB2. How would I do that?

You can get the minute from the timestamp, divide it by ten, get the next integer, multiply by ten and add it to the hour as minutes :
WITH TABLE1 (completed_timestamp, details) AS (
VALUES
(timestamp '2021-12-19-15.38.10', 'abcd'),
(timestamp '2021-12-19-15.39.10', 'efgh'),
(timestamp '2021-12-19-15.48.10', 'ijkl'),
(timestamp '2021-12-19-15.49.10', 'mnop'),
(timestamp '2021-12-19-15.54.10', 'qrst')
),
trunc_timestamps (time, details) AS (
SELECT trunc(completed_timestamp, 'HH24') + (ceiling(minute(completed_timestamp) / 10.0) * 10) MINUTES, details FROM table1
)
SELECT trunc_timestamps.time, count(*) FROM trunc_timestamps GROUP BY trunc_timestamps.time

Related

SQL query to detect when accumulated value reaches limit

Given the following table in PostgreSQL
CREATE TABLE my_table (
time TIMESTAMP NOT NULL,
minutes INTEGER NOT NULL);
I am forming a query to detect when the accumulated value of 'minutes' crosses an hour boundary. For example with the following data in the table:
time | minutes
-------------------------
<some timestamp> | 55
<some timestamp> | 4
I want to know how many minutes remain before we reach 60 (one hour). In the example the answer would be 1 since 55 + 4 + 1 = 60.
Further, I would like to know this at insert time, so if my last insert made the accumulated number of minutes cross an "hour boundary" I would like it to return boolean true.
My naive attempt, without the insert part, looks like this:
SELECT
make_timestamptz(
date_part('year', (SELECT current_timestamp))::int,
date_part('month', (SELECT current_timestamp))::int,
date_part('day', (SELECT current_timestamp))::int,
date_part('hour', (SELECT current_timestamp))::int,
0,
0
) AS current_hour,
SUM(minutes) as sum_minutes
FROM
my_table
WHERE
sum_minutes >= 60
I would then take a row count above 0 to mean we crossed the boundary. But it is hopelessly inelegant, and does not work. Is this even possible? Would be possible to make it somewhat performant?
I am using Timescaledb/PostgreSQL on linux.
Hmmm . . . insert doesn't really return values. But you can use a CTE to do the insert and then sum the values after the insert:
with i as (
insert into my_table ( . . . )
values ( . . . )
returning *
)
select ( coalesce(i.minutes, 0) + coalesce(t.minutes, 0) ) > 60
from (select sum(minutes) as minutes from i) i cross join
(select sum(minutes) as minutes from my_table) t
The INSERT could look like this:
WITH cur_sum AS (
SELECT coalesce(sum(minutes), 0) AS minutes
FROM my_table
WHERE date_trunc('hour', current_timestamp) = date_trunc('hour', time)
)
INSERT INTO my_table (time, minutes)
SELECT current_timestamp, 12 FROM cur_sum
RETURNING cur_sum.minutes + 12 > 60;
This example inserts 12 minutes at the current time.

Count the number of minutes with datediff and substring

I have data like this
availabilities
[{"starts_at":"09:00","ends_at":"17:00"}]
I have query below and it works
select COALESCE(availabilities,'Total') as availabilities,
SUM(DATEDIFF(minute,start_at,end_at)) as 'Total Available Hours in Minutes'
from (
select cast(availabilities as NVARCHAR) as availabilities,
cast(SUBSTRING(availabilities,16,5) as time) as start_at,
cast(SUBSTRING(availabilities,34,5) as time) as end_at
from alfy.dbo.daily_availabilities
)x
GROUP by ROLLUP(availabilities);
Result
availabilities Total Available Hours in Minutes
[{"starts_at":"09:00","ends_at":"17:00"}] 480
How if the data like below
availabilities
[{"starts_at":"10:00","ends_at":"13:30"},{"starts_at":"14:00","ends_at":"18:00"}]
[{"starts_at":"09:00","ends_at":"12:30"},{"starts_at":"13:00","ends_at":"15:30"},{"starts_at":"16:00","ends_at":"18:00"}]
How to count the number of minutes over two or more time ranges?
Since you have JSON data use OPENJSON (Transact-SQL) to parse it, e.g.:
create table dbo.daily_availabilities (
id int,
availabilities nvarchar(max) --JSON
);
insert dbo.daily_availabilities (id, availabilities) values
(1, N'[{"starts_at":"09:00","ends_at":"17:00"}]'),
(2, N'[{"starts_at":"10:00","ends_at":"13:30"},{"starts_at":"14:00","ends_at":"18:00"}]'),
(3, N'[{"starts_at":"09:00","ends_at":"12:30"},{"starts_at":"13:00","ends_at":"15:30"},{"starts_at":"16:00","ends_at":"18:00"}]');
select id, sum(datediff(mi, starts_at, ends_at)) as total_minutes
from dbo.daily_availabilities
cross apply openjson(availabilities) with (
starts_at time,
ends_at time
) av
group by id
id
total_minutes
1
480
2
450
3
480

SQLite group datetime by 168 hours (7 days) gives NULL back

I have an Orders table in my SQLite database. What I want to do is group by the data by every 168 hours (7 days), and count to total Orders per 168 hours.
What I did was create an in memory "calendar table" and I joined my Orders table to that calendar set.
This works fine when I group by 12, 24, 48 or even 120 hours (5 days). But for some reason it doesn't work when I group by 168 hours (7 days). I get NULL values back instead of what count() should really return.
The following sql code is an example that groups by every 120 hours (5 days).
CREATE TABLE IF NOT EXISTS Orders (
Id INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
Key TEXT,
Timestamp TEXT NOT NULL
);
INSERT INTO Orders (Key, Timestamp) VALUES ('k1', '2019-10-01 10:00:23');
INSERT INTO Orders (Key, Timestamp) VALUES ('k2', '2019-10-01 15:45:19');
INSERT INTO Orders (Key, Timestamp) VALUES ('k3', '2019-10-02 17:05:19');
INSERT INTO Orders (Key, Timestamp) VALUES ('k4', '2019-10-03 20:12:19');
INSERT INTO Orders (Key, Timestamp) VALUES ('k5', '2019-10-04 08:49:19');
INSERT INTO Orders (Key, Timestamp) VALUES ('k6', '2019-10-05 11:24:19');
INSERT INTO Orders (Key, Timestamp) VALUES ('k7', '2019-10-07 11:24:19');
WITH RECURSIVE dates(date1) AS (
VALUES('2019-10-01 00:00:00')
UNION ALL
SELECT datetime(date1, '+120 hours')
FROM dates
WHERE date1 <= '2019-10-29 00:00:00'
)
SELECT date1 as __ddd, d2.* FROM dates AS d1
LEFT JOIN (
SELECT count(Key) AS OrderKey,
datetime((strftime('%s', timestamp) / 432000) * 432000, 'unixepoch') as __interval
FROM `Orders`
WHERE `Timestamp` >= '2019-09-29T00:00:00.000'
GROUP BY __interval LIMIT 10
) d2 ON d1.date1 = d2.__interval
Important note:
If you want to update this code to test it with 168 hours (7 days), then you should do the following:
Change +120 hours to +168 hours
Change 432000 (432000 == 120 hours) to 604800 (604800 == 168 hours)
note that this number occurs twice, both should be replaced
Anyone any idea why it stops working properly when I change the sql code to 168 hours?
Your problem is that when you change to a 7-day interval, the values in your dates CTE don't align with the intervals generated from your Orders table. You can fix that by making the dates CTE start on a similarly aligned date:
WITH RECURSIVE dates(date1) AS (
SELECT datetime((strftime('%s', '2019-10-01 00:00:00') / 604800) * 604800, 'unixepoch')
UNION ALL
SELECT datetime(date1, '+168 hours')
FROM dates
WHERE date1 <= '2019-10-29 00:00:00'
)
Output:
__ddd OrderKey __interval
2019-09-26 00:00:00 3 2019-09-26 00:00:00
2019-10-03 00:00:00 4 2019-10-03 00:00:00
2019-10-10 00:00:00 null null
2019-10-17 00:00:00 null null
2019-10-24 00:00:00 null null
2019-10-31 00:00:00 null null
Demo on dbfiddle

How do i give the condition to group by time period?

I need to get the count of records using PostgreSQL from time 7:00:00 am till next day 6:59:59 am and the count resets again from 7:00am to 6:59:59 am.
Where I am using backend as java (Spring boot).
The columns in my table are
id (primary_id)
createdon (timestamp)
name
department
createdby
How do I give the condition for shift wise?
You'd need to pick a slice based on the current time-of-day (I am assuming this to be some kind of counter which will be auto-refreshed in some application).
One way to do that is using time ranges:
SELECT COUNT(*)
FROM mytable
WHERE createdon <# (
SELECT CASE
WHEN current_time < '07:00'::time THEN
tsrange(CURRENT_DATE - '1d'::interval + '07:00'::time, CURRENT_DATE + '07:00'::time, '[)')
ELSE
tsrange(CURRENT_DATE + '07:00'::time, CURRENT_DATE + '1d'::interval + '07:00'::time, '[)')
END
)
;
Example with data: https://rextester.com/LGIJ9639
As I understand the question, you need to have a separate group for values in each 24-hour period that starts at 07:00:00.
SELECT
(
date_trunc('day', (createdon - '7h'::interval))
+ '7h'::interval
) AS date_bucket,
count(id) AS count
FROM lorem
GROUP BY date_bucket
ORDER BY date_bucket
This uses the date and time functions and the GROUP BY clause:
Shift the timestamp value back 7 hours ((createdon - '7h'::interval)), so the distinction can be made by a change of date (at 00:00:00). Then,
Truncate the value to the date (date_trunc('day', …)), so that all values in a bucket are flattened to a single value (the date at midnight). Then,
Add 7 hours again to the value (… + '7h'::interval), so that it represents the starting time of the bucket. Then,
Group by that value (GROUP BY date_bucket).
A more complete example, with schema and data:
DROP TABLE IF EXISTS lorem;
CREATE TABLE lorem (
id serial PRIMARY KEY,
createdon timestamp not null
);
INSERT INTO lorem (createdon) (
SELECT
generate_series(
CURRENT_TIMESTAMP - '36h'::interval,
CURRENT_TIMESTAMP + '36h'::interval,
'45m'::interval)
);
Now the query:
SELECT
(
date_trunc('day', (createdon - '7h'::interval))
+ '7h'::interval
) AS date_bucket,
count(id) AS count
FROM lorem
GROUP BY date_bucket
ORDER BY date_bucket
;
produces this result:
date_bucket | count
---------------------+-------
2019-03-06 07:00:00 | 17
2019-03-07 07:00:00 | 32
2019-03-08 07:00:00 | 32
2019-03-09 07:00:00 | 16
(4 rows)
You can use aggregation -- by subtracting 7 hours:
select (createdon - interval '7 hour')::date as dy, count(*)
from t
group by dy
order by dy;

Grouping data in SQL by difference in column values

I have following data in my logs table in postgres table:
logid => int (auto increment)
start_time => bigint (stores epoch value)
inserted_value => int
Following is the data stored in the table (where start time actual is not a column, just displaying start_time value in UTC format in 24 hour format)
logid user_id start_time inserted_value start time actual
1 1 1518416562 15 12-Feb-2018 06:22:42
2 1 1518416622 8 12-Feb-2018 06:23:42
3 1 1518417342 9 12-Feb-2018 06:35:42
4 1 1518417402 12 12-Feb-2018 06:36:42
5 1 1518417462 18 12-Feb-2018 06:37:42
6 1 1518418757 6 12-Feb-2018 06:59:17
7 1 1518418808 11 12-Feb-2018 07:00:08
I want to group and sum values according to difference in start_time
For above data, sum should be calculated in three groups:
user_id sum
1 15 + 8
1 9 + 12 + 18
1 6 + 11
So, values in each group has 1 minute difference. This 1 can be considered as any x minutes difference.
I was also trying LAG function but could not understand it fully. I hope I'm able to explain my question.
You can use a plain group by to achieve what you want. Just make all start_time values equal that belong to the same minute. For example
select user_id, start_time/60, sum(inserted_value)
from log_table
group by user_id, start_time/60
I assume your start_time column contains integers representing milliseconds, so /60 will properly truncate them to minutes. If the values are floats, you should use floor(start_time/60).
If you also want to select a human readable date of the minute you're grouping, you can add to_timestamp((start_time/60)*60) to the select list.
You can use LAG to check if current row is > 60 seconds more than previous row and set group_changed (a virtual column) each time this happens.
In next step, use running sum over that column. This creates a group_number which you can use to group results in the third step.
WITH cte1 AS (
SELECT
testdata.*,
CASE WHEN start_time - LAG(start_time, 1, start_time) OVER (PARTITION BY user_id ORDER BY start_time) > 60 THEN 1 ELSE 0 END AS group_changed
FROM testdata
), cte2 AS (
SELECT
cte1.*,
SUM(group_changed) OVER (PARTITION BY user_id ORDER BY start_time) AS group_number
FROM cte1
)
SELECT user_id, SUM(inserted_value)
FROM cte2
GROUP BY user_id, group_number
SQL Fiddle