Select overlaped hours - sql

I have table
CREATE TABLE card_tab
(card_no NUMBER,
emp_no VARCHAR2(100),
DATA DATE,
start_time DATE,
end_time DATE)
insert into card_tab (CARD_NO, EMP_NO, DATA, START_TIME, END_TIME)
values (1, '100', to_date('15-11-2019', 'dd-mm-yyyy'), to_date('15-11-2019 20:00:00', 'dd-mm-yyyy hh24:mi:ss'), to_date('16-11-2019 03:00:00', 'dd-mm-yyyy hh24:mi:ss'));
insert into card_tab (CARD_NO, EMP_NO, DATA, START_TIME, END_TIME)
values (2, '100', to_date('15-11-2019', 'dd-mm-yyyy'), to_date('15-11-2019 22:00:00', 'dd-mm-yyyy hh24:mi:ss'), to_date('15-11-2019 23:00:00', 'dd-mm-yyyy hh24:mi:ss'));
The card_no is only sequence. Emp_no was working 7 hours
SELECT t.*, (t.end_time - t.start_time) * 24 work_hours FROM card_tab t;
CARD_NO EMP_NO DATA START_TIME END_TIME WORK_HOURS
1 100 15.11.2019 15.11.2019 20:00:00 16.11.2019 03:00:00 7
2 100 15.11.2019 15.11.2019 22:00:00 15.11.2019 23:00:00 1
If hours are overlaped then sould be divided by 2
in this example it will be for
CARD_NO WORK_HOURS
1 6,5
2 0,5
The sum of working hours is 7 so it's correct.
There can be more than two overlaped records. I do a lot of loops but i think this can be do more easier. It's loo like island and gap problem but i don't know how to solve it.

You can try this:
with data as
(
select exact_hour
, card_no
, emp_no
, ( select count(1)
from card_tab
where exact_hour >= start_time
and exact_hour < end_time) cnt
from
(
select distinct start_time + (level - 1)/24 exact_hour
, tab.card_no
, emp_no
from card_tab tab
connect by nocycle level <= (end_time - start_time) * 24
)
)
select card_no, sum(1 / cnt)
from data
group by card_no
;
Some explanation:
(level - 1)
it has been made that way because we only refer to start of each hour
exact_hour >= start_time and exact_hour < end_time
we can't use between keyword because in your data start of each period is equal to end of the previous one so for example period 22:00-23:00 would appear in two start hours
sum(1 / cnt)
we want to sum every part of hour so we need to divide each hour by number of different cards which were worked (?? - I don't know the exact business case)
I suppose everything except that three points is clear.

Related

use generate series

Im writing a psql procedure to read source table, then agregate and write in aggregate table.
My table source contains 2 columns beg, and end refers to client connection to the website, and client disconnect.
I want to caculate for each client the time that he spends . The purpose to use generate series is when the event is over one day.
My pseudo code is below
execute $$SELECT MAX(date_) FROM $$||aggregate_table INTO max_date;
IF max_date is not NULL THEN
execute $$DELETE FROM $$||aggregate_table||$$ WHERE date_ >= $$||quote_literal(max_date);
ELSE
max_date := 'XXXXXXX';
end if;
SELECT * from (
select
Id, gs.due_date,
(case
When TRIM(set) ~ '^OPT[0-9]{3}/MINUTE/$'
Then 'minute'
When TRIM(set) ~ '^OPT[0-9]{3}/SECOND/$'
Then 'second'
as TIME,
sum(extract(epoch from (least(s.end, gs.date_ + interval '1 day') -
greatest(s.beg, gs.date_)
)
) / 60) as Timing
from source s cross join lateral
generate_series(date_trunc(‘day’, s.beg), date_trunc('day',
least(s.end,
CASE WHEN $$||quote_literal(max_date)||$$ = ‘XXXXXXX’
THEN (current_date)
ELSE $$||quote_literal(max_date)||$$
END)
), interval '1 day’) gs(date_)
where ( (beg, end) overlaps ($$||quote_literal(max_date)||$$'00:00:00', $$||quote_literal(max_date)||$$'23:59:59’))
group by id, gs.date_, TIME
) as X
where ($$||quote_literal(max_date)||$$ = X.date_ and $$||quote_literal(max_date)||$$ != ‘XXXXXXX’)
OR ($$||quote_literal(max_date)||$$ ='XXXXXXX')
Data of table source
number, beg, end, id, set
(10, '2019-10-25 13:00:00', '2019-10-25 13:30:00', 1234, 'OPT111/MINUTE/'),
(11, '2019-10-25 13:00:00', '2019-10-25 14:00:00', 1234, 'OPT111/MINUTE/'),
(12, '2019-11-04 09:19:00', '2019-11-04 09:29:00', 1124, 'OPT111/SECOND/'),
(13, '2019-11-04 22:00:00', '2019-11-05 02:00:00', 1124, 'OPT111/MINUTE/')
Expected_output agregate table
2019-10-25, 1234, MINUTE, 90(1h30)
2019-11-04, 1124, SECOND, 10
2019-11-04, 1124, MINUTE, 120
2019-11-05, 1124, MINUTE, 120
The problem of my code is that, it diesn't work if i have new row that will be added tomorrow with for example (14, '2019-11-06 12:00:00', '2019-11-06 13:00:00', 1124, 'OPT111/MINUTE/').
Please guys who can help?
thank you
Here is my solution. I have changed column names in order to avoid reserved words. You may need to touch the formatting of duration.
with mycte as
(
select -- the first / first and only days
id, col_beg,
case when col_beg::date = col_end::date then col_end else date_trunc('day', col_end) end as col_end
from mytable
union all
select -- the last days of multi-day periods
id, date_trunc('day', col_end) as col_beg, col_end
from mytable
where col_end::date > col_beg::date
union all
select -- the middle days of multi-day periods
id, rd as col_beg, rd::date + 1 as col_end
from mytable
cross join lateral generate_series(col_beg::date + 1, col_end::date - 1, interval '1 day') g(rd)
where col_end::date > col_beg::date + 1
)
select
col_beg::date as start_time, id, sum(col_end - col_beg) as duration
from mycte group by 1, 2 order by 1;

Select a date within a year in SQL Oracle

Given:
INSERT INTO EP_ACCESS (PROFILE_ID, EPISODE_ID, START_TIMESTAMP, DISCONNECT_TIMESTAMP)
VALUES ('1', '1', TO_DATE('2020-01-01 00:00:01','yyyy/mm/dd hh24:mi:ss'), TO_DATE('2020-01-01 00:00:02','yyyy/mm/dd hh24:mi:ss'));
How can I select those who start_timestamp is in 2020?
You would use:
where start_timestamp >= date '2020-01-01' and
start_timestamp < date '2021-01-01'
Of course, you can use a timestamp literal if you prefer typing longer strings.
There are several options.
1 - Use BETWEEN
SELECT *
FROM EP_ACCESS
WHERE START_TIMESTAMP BETWEEN TO_DATE('2020-01-01 00:00:00', 'YYYY-MM-DD HH24:MI:SS')
AND TO_DATE('2020-12-31 23:59:59', 'YYYY-MM-DD HH24:MI:SS')
or
SELECT *
FROM EP_ACCESS
WHERE START_TIMESTAMP BETWEEN DATE '2020-01-01'
AND DATE '2021-01-01' - INTERVAL '1' SECOND
2 - Use EXTRACT
SELECT *
FROM EP_ACCESS
WHERE EXTRACT(YEAR FROM START_TIMESTAMP) = 2020
3 - Use TRUNC
SELECT *
FROM EP_ACCESS
WHERE TRUNC(START_TIMESTAMP, 'YYYY') = DATE '2020-01-01'
Of these options, BETWEEN will probably provide the best performance as the other two require executing a function against the START_TIMESTAMP field in every row in the table.

SQL query (of minute time series data points) to get all data points within a given hour plus the first data point of the following hour?

I have a table of data values keyed by a stream id and a time stamp, basically each row represents a minute of data given a specific stream at a specific minute, and the table has many streams and many minutes.
So I'm trying to query over a set of streams, any data points within a given hour plus the (chronologically) first data point of the following hour (this is the part I'm having trouble with).
It's also difficult because any of the 60+1 minute rows could be missing, and I want the single data point even if is in the middle of the hour, as long as its the first one. So I can't just query over '2019-12-06 00:00:00' - '2019-12-06 01:01:00'.
Sorry this is probably unclear but if you look at my examples, I think it will make sense.
I made a couple attempts that work on my test cases but I have a feeling like they are not universal or I could be doing it a better way.
SELECT stream_id, time_stamp, my_data
FROM data_points_minutes
WHERE
time_stamp >= '2019-12-06 00:00:00'
AND time_stamp < '2019-12-06 01:00:00'
AND stream_id IN (123, 456, 789)
UNION
SELECT DISTINCT ON (stream_id) stream_id, time_stamp, my_data
FROM data_point_minutes
WHERE
time_slot >= '2019-12-06 01:00:00'
AND time_slot < '2019-12-06 02:00:00'
AND stream_id IN (123, 456, 789)
ORDER BY
stream_id, time_stamp;
This works for my test data but I'm worried that the SELECT DISTINCT only working because they are already sorted by timestamp but would not work if they weren't, which led me to
SELECT *
FROM(
SELECT stream_id, time_stamp, my_value
FROM
data_point_minutes
WHERE
time_stamp >= '2019-12-06 00:00:00'
AND time_stamp < '2019-12-06 01:00:00'
AND stream_id IN (123, 456, 789)
) as q1
UNION
SELECT *
FROM(
SELECT
DISTINCT ON (stream_id) stream_id, time_stamp, my_value
FROM
data_point_minutes
WHERE
time_stamp >= '2019-12-06 01:00:00'
AND time_stamp < '2019-12-06 02:00:00'
AND stream_id IN (123, 456, 789)
ORDER BY
stream_id, time_stamp ASC
) AS q2
ORDER BY
stream_id, time_stamp;
and I think this is mostly working, and I might go with this but nesting this way seems a little awkward to me so I'm hoping someone could suggest something more elegant.
You could or the condition on the upper bound of the date range with an equality check on the next timestamp, that can be computed with a subquery:
select stream_id, time_stamp, my_data
from data_points_minutes
where
stream_id in (123, 456, 789)
and time_stamp >= '2019-12-06 00:00:00'
and (
time_stamp < '2019-12-06 01:00:00'
or time_stamp = (
select min(d1.time_stamp)
from data_points_minutes d1
where d1.stream_id in (123, 456, 789) and d1.timestamp >= '2019-12-06 01:00:00'
)
)
Or possibly, if you want the next data point for each stream_id, you can correlate the subquery:
select stream_id, time_stamp, my_data
from data_points_minutes d
where
stream_id in (123, 456, 789)
and time_stamp >= '2019-12-06 00:00:00'
and (
time_stamp < '2019-12-06 01:00:00'
or time_stamp = (
select min(d1.time_stamp)
from data_points_minutes d1
where d1.stream_id = d.stream_id and d1.timestamp >= '2019-12-06 01:00:00'
)
)
What you basically want is the minimum value of timestamp for each stream in the given row set (selection from the next hour) and argmin, the row on which minimum value is achieved. There are a few ways to solve it, but probably the most readable way is using window functions.
Here is a query which generates some test values:
WITH Data AS (
select * from (values
(NOW() , 1),
(NOW() + interval '1m', 1),
(NOW() + interval '1m', 2),
(NOW() + interval '2m', 2)
) T(ts, stream)
)
SELECT * FROM Data;
ts | stream
-------------------------------+--------
2019-12-14 01:08:07.556573+00 | 1
2019-12-14 01:09:07.556573+00 | 1
2019-12-14 01:09:07.556573+00 | 2
2019-12-14 01:10:07.556573+00 | 2
A query which calculates the minimum timestamps and its argmin for each stream:
WITH Data AS (
select * from (values
(NOW() , 1),
(NOW() + interval '1m', 1),
(NOW() + interval '1m', 2),
(NOW() + interval '2m', 2)
) T(ts, stream)
),
RankedData AS (
SELECT ts,
RANK() OVER (PARTITION BY stream ORDER BY ts),
stream
FROM Data
)
SELECT * FROM RankedData WHERE rank=1;
ts | rank | stream
-------------------------------+------+--------
2019-12-14 01:12:08.676228+00 | 1 | 1
2019-12-14 01:13:08.676228+00 | 1 | 2
If you build Data as selection of rows from the next hour, it will solve your problem:
SELECT stream_id, time_stamp, my_data
FROM data_points_minutes
WHERE
time_stamp >= '2019-12-06 00:00:00'
AND time_stamp < '2019-12-06 01:00:00'
AND stream_id IN (123, 456, 789)
UNION (
WITH Data AS (
SELECT stream_id, time_stamp, my_data
FROM data_point_minutes
WHERE
time_slot >= '2019-12-06 01:00:00'
AND time_slot < '2019-12-06 02:00:00'
AND stream_id IN (123, 456, 789)
),
RankedData AS (
SELECT time_stamp, my_data
RANK() OVER (PARTITION BY stream_id ORDER BY time_stamp),
stream_id
FROM Data
)
SELECT stream_id, time_stamp, my_data FROM RankedData WHERE rank=1
)

get count of records in every hour in the last 24 hour

i need number of records in each hour in the last 24 hours, i need my query to show 0 if there are no records in any of the particular hour for that day, i am just able to get data for hours that are in table.
SELECT TRUNC(systemdate,'HH24') + (trunc(to_char(systemdate,'mi')/10)*10)/24/60 AS date1,
count(*) AS txncount
FROM transactionlog
GROUP BY TRUNC(systemdate,'HH24') + (trunc(to_char(systemdate,'mi')/10)*10)/24/60 order by date1 desc;
result:
What should i do to get data in each hour of the last 24 hours?
Expected data:
record count in each hour for last 24 hours , starting from current date time.. if no record exist in that particular hour, 0 is shown.
The following might be what you need. It seems to work when I run it against the all_objects view.
WITH date_range
AS (SELECT TRUNC(sysdate - (rownum/24),'HH24') as the_hour
FROM dual
CONNECT BY ROWNUM <= 1000),
the_data
AS (SELECT TRUNC(created, 'HH24') as cr_ddl, count(*) as num_obj
FROM all_objects
GROUP BY TRUNC(created, 'HH24'))
SELECT TO_CHAR(dr.the_hour,'DD/MM/YYYY HH:MI AM'), NVL(num_obj,0)
FROM date_range dr LEFT OUTER JOIN the_data ao
ON ao.cr_ddl = dr.the_hour
ORDER BY dr.the_hour DESC
The 'date_range' generates a record for each hour over the past 24.
The 'the_data' does a count of the number of records in your target table based on the date truncated to the hour.
The main query then outer joins the two of them showing the date and the count from the sub-query.
I prefer both parts of the query in their own CTE because it makes the actual query very obvious and 'clean'.
In terms of your query you want this;
WITH date_range
AS (SELECT TRUNC(sysdate - (rownum/24),'HH24') as the_hour
FROM dual
CONNECT BY ROWNUM <= 24),
the_data
AS (SELECT TRUNC(systemdate, 'HH24') as log_date, count(*) as num_obj
FROM transactionlog
GROUP BY TRUNC(systemdate, 'HH24'))
SELECT TO_CHAR(dr.the_hour,'DD/MM/YYYY HH:MI AM'), NVL(trans_log.num_obj,0)
FROM date_range dr LEFT OUTER JOIN the_data trans_log
ON trans_log.log_date = dr.the_hour
ORDER BY dr.the_hour DESC
You could use this:
WITH transactionlog AS
(
SELECT TO_DATE('03/05/2018 01:12','dd/mm/yyyy hh24:mi') AS systemdate, 60 AS value
FROM dual UNION ALL
SELECT TO_DATE('03/05/2018 01:32','dd/mm/yyyy hh24:mi'), 35 FROM dual UNION ALL
SELECT TO_DATE('03/05/2018 09:44','dd/mm/yyyy hh24:mi'), 31 FROM dual UNION ALL
SELECT TO_DATE('03/05/2018 08:56','dd/mm/yyyy hh24:mi'), 24 FROM dual UNION ALL
SELECT TO_DATE('03/05/2018 08:02','dd/mm/yyyy hh24:mi'), 98 FROM dual
)
, time_range AS
(
SELECT TRUNC(sysdate, 'hh24') - 23/24 + (ROWNUM - 1) / 24 AS time1
FROM all_objects
WHERE ROWNUM <= 24
)
SELECT TO_CHAR(r.time1, 'mm/dd/yyyy hh:mi AM') AS date1,
COUNT(t.systemdate) AS txncount
FROM time_range r
LEFT JOIN transactionlog t
ON r.time1 = TRUNC(t.systemdate, 'hh24') --+ 1/24
GROUP BY r.time1
ORDER BY r.time1;
If 01:12 AM means 02:00 AM in result, then omit the comment code.
Reference: Generating Dates between two date ranges_AskTOM
Edited: For OP, you only need this:
WITH time_range AS
(
SELECT TRUNC(sysdate, 'hh24') - 23/24 + (ROWNUM - 1) / 24 AS time1
FROM all_objects
WHERE ROWNUM <= 24
)
SELECT TO_CHAR(r.time1, 'mm/dd/yyyy hh:mi AM') AS date1,
COUNT(t.systemdate) AS txncount
FROM time_range r
LEFT JOIN transactionlog t
ON r.time1 = TRUNC(t.systemdate, 'hh24') --+ 1/24
GROUP BY r.time1
ORDER BY r.time1;
You need to write a last 24 hours calendar table,then LEFT JOIN calendar table by Original table.
count(t.systemdate) need to count t.systemdate because t.systemdate might be NULL
connect by create last 24 hours calendar table
on clause TO_CHAR(t.systemdate,'YYYY/MM/DD hh24','nls_language=american') make sure the dateformat language are the same.
You can try this.
WITH Hours as
(
select sysdate + (level/24) dates
from dual
connect by level <= 24
)
SELECT TO_CHAR(h.dates,'YYYY-MM-DD HH24') AS dateHour, count(t.systemdate) AS totlecount
FROM Hours h
LEFT JOIN transactionlog t
on TO_CHAR(t.systemdate,'YYYY/MM/DD hh24','nls_language=american')
= TO_CHAR(h.dates,'YYYY/MM/DD hh24','nls_language=american')
GROUP BY h.dates
ORDER BY h.dates
sqlfiddle:http://sqlfiddle.com/#!4/73db7/2
CTE Recursive Version
You can also use CTE Recursive to write a calendar table
WITH Hours(dates,i) as
(
SELECT sysdate,1
FROM DUAL
UNION ALL
SELECT sysdate + (i/24),i+1
FROM Hours
WHERE i<24
)
SELECT TO_CHAR(h.dates,'YYYY-MM-DD HH24') AS dateHour, count(t.systemdate) AS totlecount
FROM Hours h
LEFT JOIN transactionlog t
on TO_CHAR(t.systemdate,'YYYY/MM/DD hh24','nls_language=american')
= TO_CHAR(h.dates,'YYYY/MM/DD hh24','nls_language=american')
GROUP BY h.dates
ORDER BY h.dates
sqlfiddle:http://sqlfiddle.com/#!4/73db7/7

Count max number of overlapping timeranges in a timerange

I explain it with an example. We have 5 events (each with an start- and end-date) which partly overlap:
create table event (
id integer primary key,
date_from date,
date_to date
);
--
insert into event (id, date_from, date_to) values (1, to_date('01.01.2016', 'DD.MM.YYYY'), to_date('03.01.2016 23:59:59', 'DD.MM.YYYY HH24:MI:SS'));
insert into event (id, date_from, date_to) values (2, to_date('05.01.2016', 'DD.MM.YYYY'), to_date('08.01.2016 23:59:59', 'DD.MM.YYYY HH24:MI:SS'));
insert into event (id, date_from, date_to) values (3, to_date('03.01.2016', 'DD.MM.YYYY'), to_date('05.01.2016 23:59:59', 'DD.MM.YYYY HH24:MI:SS'));
insert into event (id, date_from, date_to) values (4, to_date('03.01.2016', 'DD.MM.YYYY'), to_date('03.01.2016 23:59:59', 'DD.MM.YYYY HH24:MI:SS'));
insert into event (id, date_from, date_to) values (5, to_date('05.01.2016', 'DD.MM.YYYY'), to_date('07.01.2016 23:59:59', 'DD.MM.YYYY HH24:MI:SS'));
--
commit;
Here the events visualized:
1.JAN 2.JAN 3.JAN 4.JAN 5.JAN 6.JAN 7.JAN 8.JAN
---------1--------- ------------2-------------
---------3---------
--4-- ---------5---------
Now I would like to select the maximum number of events which overlap in a given timerange.
For the timerange 01.01.2016 00:00:00 - 08.01.2016 23:59:59 the result should be 3 because max 3 events overlap (between 03.01.2016 00:00:00 - 03.01.2016 23:59:59 and between 05.01.2016 00:00:00 - 05.01.2016 23:59:59).
For the timerange 06.01.2016 00:00:00 - 08.01.2016 23:59:59 the result should be 2 because max 2 events overlap (between 06.01.2016 00:00:00 - 07.01.2016 23:59:59).
Would there be a (performant) solution in SQL? I am thinking about performance because there could be many events in a wide timerange.
Update #1
I like MTOs answer most. It even works for the timerange 01.01.2016 00:00:00 - 01.01.2016 23:59:59. I adapted the SQL to my exact needs:
select max(num_events)
from (
select sum(startend) over (order by dt) num_events
from (
select e1.date_from dt,
1 startend
from event e1
where e1.date_to >= :date_from
and e1.date_from <= :date_to
union all
select e2.date_to dt,
-1 startend
from event e2
where e2.date_to >= :date_from
and e2.date_from <= :date_to
)
);
This will get all the time ranges and the count of events occurring within those ranges:
SELECT *
FROM (
SELECT dt AS date_from,
LEAD( dt ) OVER ( ORDER BY dt ) AS date_to
SUM( startend ) OVER ( ORDER BY dt ) AS num_events
FROM (
SELECT date_from AS dt, 1 AS startend FROM event
UNION ALL
SELECT date_to, -1 FROM event
)
)
WHERE date_from < date_to;
If you need only to get the number and don't need to operate with more precise time values, it will be just something like this:
SELECT MAX(c) max_overlap FROM
(SELECT d, COUNT(1) c
FROM
(SELECT date_from d FROM event
UNION ALL
SELECT date_to FROM event
) A
GROUP BY A.d
) B
Otherwise it will need to use recursion etc.
You will have to break up the problem into several sub-problems:
Find all affected events within your given timerange
Find all starts of the affected events
For each start, look how many events are overlapping at this time
Find the maximum of these overlaps
You could try following query, in which these sub-problems are modeled in with-statements (common table expressions):
with myinterval as (
select to_date('2016-01-01 0:00:00', 'yyyy-mm-dd hh24:mi:ss') as date_from,
to_date('2016-01-08 23:59:59', 'yyyy-mm-dd hh24:mi:ss') as date_to
from dual
), affected_events as (
select *
from event e
where wm_overlaps(
wm_period(myinterval.date_from, myinterval.date_to),
wm_period(e.date_from, e.date_to)
) = 1
), starts as (
select distinct date_from from affected_events
), overlapped as (
select starts.date_from, count(*) as cnt
from affected_events ae
join starts on (wm_overlaps(wm_period(starts.date_from, starts.date_from+0.001), wm_period(ae.date_from, ae.date_to))= 1)
group by starts.date_from
)
select max(cnt) from overlapped
This will return all peaks of overlaps of events, including the start of the peak and the end.
select distinct
max(e1.date_from),
case when
min(e1.date_to) < min(e2.date_to)
then
min(e1.date_to)
else
min(e2.date_to)
end,
count(1) + 1
from event e1 inner join event e2 on (e2.date_from <= e1.date_from and e2.date_to >= e1.date_from and e1.id != e2.id)
group by e1.ID;