How to group records by hours considering start date and end date - sql

I'm trying to group records by hours with consideration of duration. Assume there are long running processes and there is log data when process has been started and finished. I'm trying to get report by hours how many processes were running
The data looks like this
Process_name Start End
'A' '2019/01/01 14:10' '2019/01/01/ 14:55'
'B' '2019/01/01 14:20' '2019/01/01/ 16:30'
'C' '2019/01/01 15:05' '2019/01/01/ 15:10'
The result should be like this
Hour ProcessQount
14 2
15 2
16 1

You can do it if you join a recursive cte which returns all the hours of the day to the table:
with cte as (
select 0 as hour
union all
select hour + 1
from cte
where hour < 23
)
select c.hour Hour, count(*) ProcessQount
from cte c inner join tablename t
on c.hour between datepart(hh, t.[Start]) and datepart(hh, t.[End])
group by c.hour
See the demo.
Results:
> Hour | ProcessQount
> ---: | -----------:
> 14 | 2
> 15 | 2
> 16 | 1
If you change to a LEFT JOIN and count([Process_name]) then you get results for all the hours of the day:
> Hour | ProcessQount
.........................
> 12 | 0
> 13 | 0
> 14 | 2
> 15 | 2
> 16 | 1
> 17 | 0
> 18 | 0
.........................

Generate the hours and then use inequalities and aggregation:
select h, count(t.process_name)
from (values (14), (15), (16)) v(h) left join
t
on datepart(hour, start <= v.h) and
datepart(hour, end >= v.h)
group by v.h
order by v.h;
For reasonable results, this assumes that all the data you are looking at is for one day, as in your sample data.

Related

SQL query group by using day startdatetime and end datetime

I have the following table Jobs:
|Id | StartDateTime | EndDateTime
+----+---------------------+----------------------
|1 | 2020-10-20 23:00:00 | 2020-10-21 05:00:00
|2 | 2020-10-21 10:00:00 | 2020-10-21 11:00:00
Note job id 1 spans October 20 and 21.
I am using the following query
SELECT DAY(StartDateTime), COUNT(id)
FROM Job
GROUP BY DAY(StartDateTime)
To get the following output. But the problem I am facing is that day 21 is not including job id 1. Since the job spans two days I want to include it in both days 20 and 21.
Day | TotalJobs
----+----------
20 | 1
21 | 1
I am struggling to get the following expected output:
Day | TotalJobs
----+----------
20 | 1
21 | 2
One method is to generate the days that you want and then count overlaps:
with days as (
select convert(date, min(j.startdatetime)) as startd,
convert(date, max(j.enddatetime)) as endd
from jobs j
union all
select dateadd(day, 1, startd), endd
from days
where startd < endd
)
select days.startd, count(j.id)
from days left join
jobs j
on j.startdatetime < dateadd(day, 1, startd) and
j.enddatetime >= startd
group by days.startd;
Here is a db<>fiddle.
You can first group by with same start and end date and then group by for start and end date having different start and end date
SELECT a.date, SUM(counts) from (
SELECT DAY(StartDateTime) as date, COUNT(id) counts
FROM Table1
WHERE DAY(StartDateTime) = DAY(EndDateTime)
GROUP BY StartDateTime
UNION ALL
SELECT DAY(EndDateTime), COUNT(id)
FROM Table1
WHERE DAY(StartDateTime) != DAY(EndDateTime)
GROUP BY EndDateTime
UNION ALL
SELECT DAY(StartDateTime), COUNT(id)
FROM Table1
WHERE DAY(StartDateTime) != DAY(EndDateTime)
GROUP BY StartDateTime) a
GROUP BY a.date
Here is SQL Fiddle link
SQL Fiddle
Also replace Table1 with Jobs when running over your db context

SQL not returning a value if no row exist for time queried

I'm writing this SQL query which returns the number of records created in an hour in last 24 hours. I'm getting the result for only those hours that have a non zero value. If no records were created, it doesn't return anything at all.
Here's my query:
SELECT HOUR(timeStamp) as hour, COUNT(*) as count
FROM `events`
WHERE timeStamp > DATE_SUB(NOW(), INTERVAL 24 HOUR)
GROUP BY HOUR(timeStamp)
ORDER BY HOUR(timeStamp)
The output of current Query:
+-----------------+----------+
| hour | count |
+-----------------+----------+
| 14 | 6 |
| 15 | 5 |
+-----------------+----------+
But i'm expecting 0 for hours in which no records were created. Where am I going wrong?
One solution is to generate a table of numbers from 0 to 23 and left join it with your original table.
Here is a query that uses a recursive query to generate the list of hours (if you are running MySQL, this requires version 8.0):
with hours as (
select 0 hr
union all select hr + 1 where h < 23
)
select h.hr, count(e.eventID) as cnt
from hours h
left join events e
on e.timestamp > now() - interval 1 day
and hour(e.timestamp) = h.hr
group by h.hr
If your RDBMS does not support recursive CTEs, then one option is to use an explicit derived table:
select h.hr, count(e.eventID) as cnt
from (
select 0 hr union all select 1 union all select 2 ... union all select 23
) h
left join events e
on e.timestamp > now() - interval 1 day
and hour(e.timestamp) = h.hr
group by h.hr

SQL select MAX data within each 12 hours

I have a table called temp. In this table I have Date and Value.
Date | Value
2016/04/01 07:00am | 1
2016/04/01 09:00am | 2
2016/04/01 11:00am | 3
...
2016/04/01 07:00pm | 5
2016/04/01 11:00pm | 2
...
2016/04/02 07:00am | 10
2016/04/02 09:00am | 13
2016/04/02 11:00am | 1
...
2016/04/02 07:00pm | 32
2016/04/02 09:00pm | 40
I would like to return:
Date | Value
04/01/2016 11:00am | 3
04/01/2016 07:00pm | 5
04/02/2016 09:00am | 13
04/02/2016 09:00pm | 40
The idea is to group in 12 hour intervals and then find the max value of said group.
So far I have:
SELECT t.date, max(t.value)
FROM temp t
WHERE t.Date between DATEADD(hour, 7, '04/01/2016') and DATEADD(minute, 1859, '04/02/2016')
GROUP BY DATEPART(Hour, t.date)%12, t.date
ORDER BY Date
But it returns all the data, no 12 hour groups.
Any ideas?
You don't want MAX as you don't want to group by the date, you want the single instance of the datetime that has the largest value. Therefore you can use ROW_NUMBER with a PARTITION based on the date and AM/PM period to get the row with the largest value in that period (ORDER BY t.value DESC):
SELECT date, value
FROM
(SELECT t.date,
t.value,
ROW_NUMBER()
OVER(PARTITION BY CAST(t.date AS date), CASE WHEN DATEPART(hour, t.date) < 12 THEN 0 ELSE 1 END
ORDER BY t.value DESC) AS rownum
FROM temp t
WHERE t.Date between DATEADD(hour, 7, '04/01/2016') and DATEADD(minute, 1859, '04/02/2016')
) max_val
WHERE max_val.rownum = 1
ORDER BY Date

GROUP BY next months over N years

I need to aggregate amounts grouped by "horizon" 12 next months over 5 year:
assuming we are 2015-08-15
SUM amount from 0 to 12 next months (from 2015-08-16 to 2016-08-15)
SUM amount from 12 to 24 next months (from 2016-08-16 to 2017-08-15)
SUM amount from 24 to 36 next months ...
SUM amount from 36 to 48 next months
SUM amount from 48 to 60 next months
Here is a fiddled dataset example:
+----+------------+--------+
| id | date | amount |
+----+------------+--------+
| 1 | 2015-09-01 | 10 |
| 2 | 2015-10-01 | 10 |
| 3 | 2016-10-01 | 10 |
| 4 | 2017-06-01 | 10 |
| 5 | 2018-06-01 | 10 |
| 6 | 2019-05-01 | 10 |
| 7 | 2019-04-01 | 10 |
| 8 | 2020-04-01 | 10 |
+----+------------+--------+
Here is the expected result:
+---------+--------+
| horizon | amount |
+---------+--------+
| 1 | 20 |
| 2 | 20 |
| 3 | 10 |
| 4 | 20 |
| 5 | 10 |
+---------+--------+
How can I get these 12 next months grouped "horizons" ?
I tagged PostgreSQL but I'm actually using an ORM so it's just to find the idea. (by the way I don't have access to the date formatting functions)
I would split by 12 months time frame and group by this:
SELECT
FLOOR(
(EXTRACT(EPOCH FROM date) - EXTRACT(EPOCH FROM now()))
/ EXTRACT(EPOCH FROM INTERVAL '12 month')
) + 1 AS "horizon",
SUM(amount) AS "amount"
FROM dataset
GROUP BY horizon
ORDER BY horizon;
SQL Fiddle
Inspired by: Postgresql SQL GROUP BY time interval with arbitrary accuracy (down to milli seconds)
Assuming you need intervals from current date to this day next year and so on, I would query this like this:
SELECT 1 AS horizon, SUM(amount) FROM dataset
WHERE date > now()
AND date < (now() + '12 months'::INTERVAL)
UNION
SELECT 2 AS horizon, SUM(amount) FROM dataset
WHERE date > (now() + '12 months'::INTERVAL)
AND date < (now() + '24 months'::INTERVAL)
UNION
SELECT 3 AS horizon, SUM(amount) FROM dataset
WHERE date > (now() + '24 months'::INTERVAL)
AND date < (now() + '36 months'::INTERVAL)
UNION
SELECT 4 AS horizon, SUM(amount) FROM dataset
WHERE date > (now() + '36 months'::INTERVAL)
AND date < (now() + '48 months'::INTERVAL)
UNION
SELECT 5 AS horizon, SUM(amount) FROM dataset
WHERE date > (now() + '48 months'::INTERVAL)
AND date < (now() + '60 months'::INTERVAL)
ORDER BY horizon;
You can generalize it and make something like this using additional variable:
SELECT number AS horizon, SUM(amount) FROM dataset
WHERE date > (now() + ((number - 1) * '12 months'::INTERVAL))
AND date < (now() + (number * '12 months'::INTERVAL));
Where number is an integer from range [1,5]
Here is what I get from the Fiddle:
| horizon | sum |
|---------|-----|
| 1 | 20 |
| 2 | 20 |
| 3 | 10 |
| 4 | 20 |
| 5 | 10 |
Perhaps CTE?
WITH RECURSIVE grps AS
(
SELECT 1 AS Horizon, (date '2015-08-15') + interval '1' day AS FromDate, (date '2015-08-15') + interval '1' year AS ToDate
UNION ALL
SELECT Horizon + 1, ToDate + interval '1' day AS FromDate, ToDate + interval '1' year
FROM grps WHERE Horizon < 5
)
SELECT
Horizon,
(SELECT SUM(amount) FROM dataset WHERE date BETWEEN g.FromDate AND g.ToDate) AS SumOfAmount
FROM
grps g
SQL fiddle
Rather simply:
SELECT horizon, sum(amount) AS amount
FROM generate_series(1, 5) AS s(horizon)
JOIN dataset ON "date" >= current_date + (horizon - 1) * interval '1 year'
AND "date" < current_date + horizon * interval '1 year'
GROUP BY horizon
ORDER BY horizon;
You need a union and an aggregate function:
select 1 as horizon,
sum(amount) amount
from the_table
where date >= current_date
and date < current_date + interval '12' month
union all
select 2 as horizon,
sum(amount) amount
where date >= current_date + interval '12' month
and date < current_date + interval '24' month
union all
select 3 as horizon,
sum(amount) amount
where date >= current_date + interval '24' month
and date < current_date + interval '36' month
... and so on ...
But I don't know, how to do that with an obfuscation layer (aka ORM) but I'm sure it supports (or it should) aggregation and unions.
This could easily be wrapped up into a PL/PgSQL function where you pass the "horizon" and the SQL is built dynamically so that all you need to call is something like: select * from sum_horizon(5) where 5 indicates the number of years.
Btw: date is a horrible name for a column. For one because it's a reserved word, but more importantly because it doesn't document the meaning of the column. Is it a "release date"? A "due date"? An "order date"?
Try this
select
id,
sum(case when date>=current_date and date<current_date+interval 1 year then amount else 0 end) as year1,
sum(case when date>=current_date+interval 1 year and date<current_date+interval 2 year then amount else 0 end) as year2,
sum(case when date>=current_date+interval 2 year and date<current_date+interval 3 year then amount else 0 end) as year3,
sum(case when date>=current_date+interval 3 year and date<current_date+interval 4 year then amount else 0 end) as year4,
sum(case when date>=current_date+interval 4 year and date<current_date+interval 5 year then amount else 0 end) as year5
from table
group by id

Oracle select sum by time window

Lets assume that we have the ORACLE table of the following format and data:
TIMESTAMP MESSAGENO ORGMESSAGE
------------------------- ---------------------- -------------------------------------
27.04.13 1 START PERIOD
27.04.13 3 10
27.04.13 4 5
28.04.13 5 6
28.04.13 3 20
29.04.13 4 25
29.04.13 5 26
30.04.13 2 END PERIOD
30.04.13 1 START PERIOD
01.05.13 3 10
02.05.13 4 15
02.05.13 5 16
03.05.13 3 30
03.05.13 4 35
04.05.13 5 36
05.05.13 2 END PERIOD
I want to select sum of all the ORGMESSAGE for all the period (window between START PERIOD and END PERIOD) grouped by MESSAGENO.
Exapmle output would be:
PERIOD START PERIOD END MESSAGENO SUM
------------ ------------- -------- ----
27.04.13 30.04.13 3 25
27.04.13 30.04.13 4 30
27.04.13 30.04.13 5 32
30.04.13 05.05.13 3 45
30.04.13 05.05.13 4 50
30.04.13 05.05.13 5 52
I am guessing that use of ORACLE Analityc function woulde be suitable but really dont know how and where to start.
Thanks in advance for any help.
If we assume that the period starts and ends match, then a simple way to find the matching messages is to count the preceding number of starts. This is a cumulative sum and it is easy in Oracle. The rest is just aggregation:
select min(timestamp) as periodstart, max(timestamp) as periodend, messageno, count(*)
from (select om.*,
sum(case when messageno = 1 then 1 else 0 end) over (order by timestamp) as grp
from orgmessages om
) om
where messageno not in (1, 2)
group by grp, messageno;
Note that this method (as with the others) really wants the timestamp to be unique on each record. In the data presented, these solutions will work. But if you have multiple starts and ends on the same day, none of them will work assuming that timestamp only has the date.
First find all period ends per period start. Then join with your table to group and sum.
select
dates.start_date,
dates.end_date,
messageno,
sum(to_number(orgmessage)) as period_sum
from mytable
join
(
select start_dates.timestmp as start_date, min(end_dates.timestmp) as end_date
from (select * from mytable where orgmessage = 'START PERIOD') start_dates
join (select * from mytable where orgmessage = 'END PERIOD') end_dates
on start_dates.timestmp < end_dates.timestmp
group by start_dates.timestmp
) dates on mytable.timestmp between dates.start_date and dates.end_date
where mytable.orgmessage not like '%PERIOD%'
group by dates.start_date, dates.end_date, messageno
order by dates.start_date, dates.end_date, messageno;
SQL fiddle: http://www.sqlfiddle.com/#!4/365de/15.
please, try this one, replace rrr with your table name
select periodstart, periodend, messageno, sum(to_number(orgmessage)) s
from (select TIMESTAMP periodstart,
(select min (TIMESTAMP) from rrr r2 where orgmessage = 'END PERIOD' and r2.TIMESTAMP > r.TIMESTAMP) periodend
from rrr r
where orgmessage = 'START PERIOD'
) borders, rrr r
where r.TIMESTAMP between borders.periodstart and borders.periodend
and r.orgmessage not in ('END PERIOD', 'START PERIOD')
group by periodstart, periodend, messageno
order by periodstart, periodend, messageno