Oracle sql query to group consecutive records by date - sql

With the below sample data, I am trying to group record with same rate.
id start_date end_date rate
-----------------------------------------------------------------
1 01/01/2017 12:00:00 am 01/01/2017 12:00:00 am 300
1 02/01/2017 12:00:00 am 02/01/2017 12:00:00 am 300
1 03/01/2017 12:00:00 am 03/01/2017 12:00:00 am 300
1 04/01/2017 12:00:00 am 04/01/2017 12:00:00 am 1000
1 05/01/2017 12:00:00 am 05/01/2017 12:00:00 am 500
1 06/01/2017 12:00:00 am 06/01/2017 12:00:00 am 500
1 07/01/2017 12:00:00 am 07/01/2017 12:00:00 am 1000
1 08/01/2017 12:00:00 am 08/01/2017 12:00:00 am 1000
1 09/01/2017 12:00:00 am 09/01/2017 12:00:00 am 300
What I've tried :
select distinct id, mn_date, mx_date,rate
from (
select id, min(start_date) over (partition by grp order by start_date) mn_date,
max(end_date) over(partition by grp order by start_date desc) mx_date, rate
from (
select t.*, row_number() over(partition by id order by start_date) -row_number() over(partition by rate order by start_date)grp
from t
)
)
order by mn_date;
Output :
id mn_date mx_date rate
--------------------------------------------------------
1 01/01/2017 12:00:00 am 03/01/2017 12:00:00 am 300
1 04/01/2017 12:00:00 am 04/01/2017 12:00:00 am 1000
1 05/01/2017 12:00:00 am 06/01/2017 12:00:00 am 500
1 07/01/2017 12:00:00 am 09/01/2017 12:00:00 am 300
1 07/01/2017 12:00:00 am 09/01/2017 12:00:00 am 1000
Desired Output:
id mn_date mx_date rate
--------------------------------------------------------
1 01/01/2017 12:00:00 am 03/01/2017 12:00:00 am 300
1 04/01/2017 12:00:00 am 04/01/2017 12:00:00 am 1000
1 05/01/2017 12:00:00 am 06/01/2017 12:00:00 am 500
1 07/01/2017 12:00:00 am 08/01/2017 12:00:00 am 1000
1 09/01/2017 12:00:00 am 09/01/2017 12:00:00 am 300
Final result to group by consecutive dates: (Thanks to Gordon )
select id, min(start_date), max(end_date), rate
from (
select id, start_date, end_date, rate, seqnum_i-seqnum_ir grp, sum(x) over(partition by id order by start_date) grp1
from (
select t.*,
row_number() over (partition by id order by start_date) as seqnum_i,
row_number() over (partition by id, rate order by start_date) as seqnum_ir,
case when LEAD(start_date) over (partition by id order by start_date)= end_date + 1
then 0
else 1
end x
from t
)
)
group by id, grp+grp1, rate
order by min(start_date);

Assuming we can just use start_date to identify the adjacent records (i.e., there are no gaps), then you can use the difference of row numbers approach:
select id, min(start_date) as mn_date, max(end_date) as mx_date, rate
from (select t.*,
row_number() over (partition by id order by start_date) as seqnum_i,
row_number() over (partition by id, rate order by start_date) as seqnum_ir
from t
) t
group by id (seqnum_i - seqnum_ir), rate;
To see how this works, look at the results of the subquery. You should be able to "see" how the difference of the two row numbers defines the groups of adjacent records with the same rate.

I found that the last value wasn't being grouped correctly as the calculation of X wasn't handling the NULL return, so I changed it to this:
,CASE
WHEN LEAD (start_date)
OVER (PARTITION BY id ORDER BY start_date)
IS NULL
THEN
0
WHEN LEAD (start_date)
OVER (PARTITION BY id ORDER BY start_date) =
end_date + 1
THEN
0
ELSE
1
END
x

Related

Bigquery merge row where start date for one row is the end date for another

In bigquery, I have a customer table with information about how much he spends X amount of money between a start date and end date like this:
id
start_date
end_date
amount
1
2022-01-01
2022-01-10
100
1
2022-01-10
2022-01-15
30
1
2022-02-10
2022-02-18
10
1
2022-02-18
2022-02-20
30
1
2022-02-20
2022-02-25
50
1
2022-02-18
2022-03-20
5000
2
2022-01-12
2022-01-15
30
2
2022-01-15
2022-01-27
30
And I would like to have this:
id
start_date
end_date
amount
1
2022-01-01
2022-01-15
130
1
2022-02-10
2022-02-25
90
1
2022-02-18
2022-03-20
5000
2
2022-01-12
2022-01-27
60
The catch is that there can be multiple contiguous rows for the same id, and if there is a merge we want to merge the row with the smallest time interval possible, in the example the row with id=1,start_date=2022-02-18,end_date=2022-03-20 is not merged.
Consider below approach
select id, min(start_date) start_date, max(end_date) end_date, sum(amount) amount
from (
select *, countif(ifnull(new_group, true)) over (partition by id order by end_date) grp
from (
select *, start_date != lag(end_date) over(partition by id order by end_date) new_group
from your_table
)
)
group by id, grp
if applied to sample data in your question - output is

Write a query in which rate was not change in between these dates

Here is input and output.
Write a query in which rate was not change in between these dates.
input | output
===========================|===========================================
date rate | startdate end date rate
2014-09-18 270 | 2014-09-18 2014-09-19 270
2014-09-19 270 | 2014-09-20 2014-09-22 310
2014-09-20 310 | 2014-09-23 2014-09-23 320
2014-09-21 310 | 2014-09-24 2014-09-24 310
2014-09-22 310 | 2014-09-25 2014-09-25 320
2014-09-23 320 | 2014-09-26 2014-09-26 270
2014-09-24 310 |
2014-09-25 320 |
2014-09-26 270 |
This is a gaps and islands problem. One solution uses the difference in row numbers method:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (ORDER BY date) rn1,
ROW_NUMBER() OVER (PARTITION BY rate ORDER BY date) rn2
FROM yourTable
)
SELECT MIN(date) AS startdate, MAX(date) AS enddate, rate
FROM cte
GROUP BY rate, rn1-rn2
ORDER BY startdate;
Demo
with data as
(
select 1 as Id, '2021/10/01' as date
union all
select 1 as Id, '2021/10/02' as date
union all
select 1 as Id, '2021/10/03' as date
union all
select 2 as Id, '2021/10/04' as date
union all
select 2 as Id, '2021/10/05' as date
union all
select 2 as Id, '2021/10/06' as date
)
SELECT
DISTINCT
Id,
FIRST_VALUE(date) OVER (PARTITION BY Id ORDER BY date ) as FirstValue,
LAST_VALUE (date) OVER (PARTITION BY Id ORDER BY date RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as LastValue
from data
GROUP BY id, date

How to count the number of entries between a time period

I have a sample table below which shows the ticket number, time when the ticket was opened and time when it was closed.
TKTNUM OPEN_DATE CLOSE_DATE
1234 12-Mar-19 08:36 14-Mar-19 08:36
1235 13-Mar-19 08:36 15-Mar-19 08:36
1236 14-Mar-19 08:36 16-Mar-19 08:36
1237 15-Mar-19 08:36
1238 16-Mar-19 08:36
1239 17-Mar-19 08:36
1240 18-Mar-19 08:36 20-Mar-19 08:36
1241 19-Mar-19 08:36 20-Mar-19 08:36
1242 20-Mar-19 08:36 21-Mar-19 08:36
I need to count the number of open/closed tickets on a given day...
DATE OPEN CLOSED
12-Mar-19 08:36 1 0
13-Mar-19 08:36 2 0
14-Mar-19 08:36 2 1
15-Mar-19 08:36 2 2
16-Mar-19 08:36 2 3
17-Mar-19 08:36 3 3
18-Mar-19 08:36 4 3
19-Mar-19 08:36 5 3
20-Mar-19 08:36 4 5
Any help is greatly appreciated. Thanks
Used the query(c/o Tejash) below on a sample job_history table
EMPLOYEE_ID START_DATE END_DATE JOB_ID DEPARTMENT_ID
----------- -------------------- -------------------- ---------- -------------
200 17/SEP/1995 00:00:00 17/JUN/2001 00:00:00 AD_ASST 90
101 21/SEP/1997 00:00:00 27/OCT/2001 00:00:00 AC_ACCOUNT 110
102 13/JAN/2001 00:00:00 24/JUL/2006 00:00:00 IT_PROG 60
101 28/OCT/2001 00:00:00 15/MAR/2005 00:00:00 AC_MGR 110
200 01/JUL/2002 00:00:00 31/DEC/2006 00:00:00 AC_ACCOUNT 90
201 17/FEB/2004 00:00:00 19/DEC/2007 00:00:00 MK_REP 20
114 24/MAR/2006 00:00:00 31/DEC/2007 00:00:00 ST_CLERK 50
176 24/MAR/2006 00:00:00 31/DEC/2006 00:00:00 SA_REP 80
176 01/JAN/2007 00:00:00 31/DEC/2007 00:00:00 SA_MAN 80
122 01/JAN/2007 00:00:00 31/DEC/2007 00:00:00 ST_CLERK 50
With dates(dt)
As (Select mindt + level - 1 from
(Select min(start_date) mindt, max(end_date) maxdt from job_history)
Connect by level <= maxdt - mindt + 1)
Select dt,
sum(case when dt between start_date and coalesce(end_date,dt) then 1 end) as startdate,
Sum(case when dt >= end_date then 1 end) as enddate
From dates cross join job_history
Group by dt
Order by dt desc
On 17/JUN/2001, the query gave
DT STARTDATE ENDDATE
-------------------- ---------- ----------
31/DEC/2007 00:00:00 3 10
<SNIPPED>
17/JUN/2001 00:00:00 3 1
Instead of
DT STARTDATE ENDDATE
-------------------- ---------- ----------
31/DEC/2007 00:00:00 3 10
<SNIPPED>
17/JUN/2001 00:00:00 2 1
Tried to edit the query and now its giving me
DT STARTDATE ENDDATE
-------------------- ---------- ----------
31/DEC/2007 00:00:00 <<< 10
<snipped>
18/JUN/2001 00:00:00 2 1
17/JUN/2001 00:00:00 2 <<< 1
16/JUN/2001 00:00:00 3 1
You can use dates as cte for total days and join it again with same table as following:
With dates(dt)
As
(
Select mindt + level - 1 from
(Select min(open_date) mindt, max(open_dt) maxdt from your_table)
Connect by level <= maxdt - mindt + 1
)
Select dt,
sum(case when dt between open_date and coalesce(close_date,dt) then 1 end) as open,
Sum(case when dt >= close_date then 1 end) as closed
From dates cross join your_table
Group by dt;
Cheers!!
You can unpivot and aggregate:
select dte, sum(is_open) as num_opens, sum(is_close) as num_closes
from ((select open_date as dte, 1 as is_open, 0 as is_close
from t
) union all
(select close_date, 0 as is_open, 1 as is_close
from t
)
) t
group by dte
order by dte;
Note: It is probably a good idea to truncate the date so it has no time component:
select trunc(dte), sum(is_open) as num_opens, sum(is_close) as num_closes
from ((select open_date as dte, 1 as is_open, 0 as is_close
from t
) union all
(select close_date, 0 as is_open, 1 as is_close
from t
)
) t
where dte is not null
group by trunc(dte)
order by trunc(dte);
And in Oracle 12C you can use a lateral join for this:
select trunc(dte), sum(is_open), sum(is_close)
from t cross join lateral
(select t.open_date as dte, 1 as is_open, 0 as is_close from dual union all
select t.close_date, 0 as is_open, 1 as is_close from dual
) t
group by trunc(dte)
order by trunc(dte);

Calculating concurrency from a set of ranges

I have a set of rows containing a start timestamp and a duration. I want to perform various summaries using the overlap or concurrency.
For example: peak daily concurrency, peak concurrency grouped on another column.
Example data:
timestamp,duration
2016-01-01 12:00:00,300
2016-01-01 12:01:00,300
2016-01-01 12:06:00,300
I would like to know that peak for the period was 12:01:00-12:05:00 at 2 concurrent.
Any ideas on how to achieve this using BigQuery or, less exciting, a Map/Reduce job?
For a per-minute resolution, with session lengths of up to 255 minutes:
SELECT session_minute, COUNT(*) c
FROM (
SELECT start, DATE_ADD(start, i, 'MINUTE') session_minute FROM (
SELECT * FROM (
SELECT TIMESTAMP("2015-04-30 10:14") start, 7 minutes
),(
SELECT TIMESTAMP("2015-04-30 10:15") start, 12 minutes
),(
SELECT TIMESTAMP("2015-04-30 10:15") start, 12 minutes
),(
SELECT TIMESTAMP("2015-04-30 10:18") start, 12 minutes
),(
SELECT TIMESTAMP("2015-04-30 10:23") start, 3 minutes
)
) a
CROSS JOIN [fh-bigquery:public_dump.numbers_255] b
WHERE a.minutes>b.i
)
GROUP BY 1
ORDER BY 1
STEP 1 - First you need find all periods (start and end) with
respective concurrent entries
SELECT ts AS start, LEAD(ts) OVER(ORDER BY ts) AS finish,
SUM(entry) OVER(ORDER BY ts) AS concurrent_entries
FROM (
SELECT ts, SUM(entry)AS entry
FROM
(SELECT ts, 1 AS entry FROM yourTable),
(SELECT DATE_ADD(ts, duration, 'second') AS ts, -1 AS entry FROM yourTable)
GROUP BY ts
HAVING entry != 0
)
ORDER BY ts
Assuming input as below
(SELECT TIMESTAMP('2016-01-01 12:00:00') AS ts, 300 AS duration),
(SELECT TIMESTAMP('2016-01-01 12:01:00') AS ts, 300 AS duration),
(SELECT TIMESTAMP('2016-01-01 12:06:00') AS ts, 300 AS duration),
(SELECT TIMESTAMP('2016-01-01 12:07:00') AS ts, 300 AS duration),
(SELECT TIMESTAMP('2016-01-01 12:10:00') AS ts, 300 AS duration),
(SELECT TIMESTAMP('2016-01-01 12:11:00') AS ts, 300 AS duration)
the output of above query will look somehow like this:
start finish concurrent_entries
2016-01-01 12:00:00 UTC 2016-01-01 12:01:00 UTC 1
2016-01-01 12:01:00 UTC 2016-01-01 12:05:00 UTC 2
2016-01-01 12:05:00 UTC 2016-01-01 12:07:00 UTC 1
2016-01-01 12:07:00 UTC 2016-01-01 12:10:00 UTC 2
2016-01-01 12:10:00 UTC 2016-01-01 12:12:00 UTC 3
2016-01-01 12:12:00 UTC 2016-01-01 12:15:00 UTC 2
2016-01-01 12:15:00 UTC 2016-01-01 12:16:00 UTC 1
2016-01-01 12:16:00 UTC null 0
You might still want to polish above query a little - but mainly it does what you need
STEP 2 - now you can do any stats off of above result
For example peak on whole period:
SELECT
start, finish, concurrent_entries, RANK() OVER(ORDER BY concurrent_entries DESC) AS peak
FROM (
SELECT ts AS start, LEAD(ts) OVER(ORDER BY ts) AS finish,
SUM(entry) OVER(ORDER BY ts) AS concurrent_entries
FROM (
SELECT ts, SUM(entry)AS entry FROM
(SELECT ts, 1 AS entry FROM yourTable),
(SELECT DATE_ADD(ts, duration, 'second') AS ts, -1 AS entry FROM yourTable)
GROUP BY ts
HAVING entry != 0
)
)
ORDER BY peak

how to query the count of records from first day to last day of the month

I would like to get the count of every day records from my table.
For example I have a table “Employee” with the following fields ID, EmpNo, DateHired.
And I have the following records
ID EmpNo DateHired
1 000001 3/2/2013 12:00:00 AM
2 000002 3/14/2013 12:00:00 AM
3 000003 3/14/2013 12:00:00 AM
4 000004 3/21/2013 12:00:00 AM
5 000005 4/2/2013 12:00:00 AM
6 000006 4/3/2013 12:00:00 AM
7 000007 4/3/2013 12:00:00 AM
8 000008 4/3/2013 12:00:00 AM
9 000009 4/3/2013 12:00:00 AM
10 000010 4/4/2013 12:00:00 AM
11 000011 4/5/2013 12:00:00 AM
12 000012 5/1/2013 12:00:00 AM
And the current month is april,
how can I get this value:
Count Day
0 4/1/2013 12:00:00 AM
1 4/2/2013 12:00:00 AM
4 4/3/2013 12:00:00 AM
1 4/4/2013 12:00:00 AM
1 4/5/2013 12:00:00 AM
0 4/6/2013 12:00:00 AM
0 4/7/2013 12:00:00 AM
0 4/8/2013 12:00:00 AM
Up to
0 4/30/2013 12:00:00 AM
You need to create a calendar for the whole month of April in order to get the whole dates of the month. With the aid of using Common Table Expression, you can get what you want.
After creating a calendar, join it with table Employee using LEFT JOIN so dates will have no matches on table Employee will still be included on the result.
WITH April_Calendar
AS
(
SELECT CAST('20130401' as datetime) AS [date]
UNION ALL
SELECT DATEADD(dd, 1, [date])
FROM April_Calendar
WHERE DATEADD(dd, 1, [date]) <= '20130430'
)
SELECT a.date, COUNT(b.DateHired) totalCount
FROM April_Calendar a
LEFT JOIN Employee b
ON a.date = b.DateHired
GROUP BY a.date
ORDER BY a.date
SQLFiddle Demo
Try
SELECT COUNT(*) as 'Count', DateHired as 'Day'
FROM Employee
WHERE DateHired BEWTEEN %date1 AND %date2
GROUP BY DateHired
Untested, should work though.
This could be the query...
SELECT COUNT(ID) as Count, DateHired FROM Employee GROUP BY DateHired
Following can be helpful in the case...
http://www.sqlite.org/lang_datefunc.html
Hope it helps..