Dense_rank query in sql(4 different columns) in - sql

I have a table as follows:
Sn no. t_time Value rate
ABC 17-MAY-18 08:00:00 100.00 3
ABC 17-MAY-18 22:00:00 200.00 1
ABC 16-MAY-18 08:00:00 100.00 1
XYZ 14-MAY-18 01:00:00 700.00 1
XYZ 15-MAY-18 10:00:00 500.00 2
XYZ 15-MAY-18 13:00:00 100.00 2
And I want to generate the output as follows:
Sn no. New_value
ABC 150
XYZ 450
It is grouped by the Sn no. The New_value is the latest time of each date value multiplied by rate, and then averaged together.
For example ABC new_value is
Average of:[(100*1) and (200*1)]
Its a large dataset. How do I write a query for the above in the most efficient way. Please help.

You can use analytical function(row_number()) to achieve the result
SQL> WITH cte_table(Snno, t_time, Value, rate) AS (
2 SELECT 'ABC', to_date('2018-05-17 08:00:00', 'YYYY-MM-DD HH24:MI:SS'), 100.00, 3 FROM DUAL UNION ALL
3 SELECT 'ABC', to_date('2018-05-17 22:00:00', 'YYYY-MM-DD HH24:MI:SS'), 200.00, 1 FROM DUAL UNION ALL
4 SELECT 'ABC', to_date('2018-05-16 08:00:00', 'YYYY-MM-DD HH24:MI:SS'), 100.00, 1 FROM DUAL UNION ALL
5 SELECT 'XYZ', to_date('2018-05-14 01:00:00', 'YYYY-MM-DD HH24:MI:SS'), 700.00, 1 FROM DUAL UNION ALL
6 SELECT 'XYZ', to_date('2018-05-15 10:00:00', 'YYYY-MM-DD HH24:MI:SS'), 500.00, 2 FROM DUAL UNION ALL
7 SELECT 'XYZ', to_date('2018-05-15 13:00:00', 'YYYY-MM-DD HH24:MI:SS'), 100.00, 2 FROM DUAL),
8 --------------------------------
9 -- End of data preparation
10 --------------------------------
11 rn_table AS (
12 SELECT t.*, row_number() OVER (PARTITION BY TRUNC(t_time) ORDER BY t_time DESC) AS rn
13 FROM cte_table t)
14 SELECT snno,
15 AVG(VALUE * rate) new_value
16 FROM rn_table
17 WHERE rn = 1
18 GROUP BY snno;
Output:
SNNO NEW_VALUE
---- ----------
ABC 150
XYZ 450

Use the ROW_NUMBER (or RANK/DENSE_RANK if it is more appropriate) analytic function in a sub-query and then aggregate in the outer query:
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE table_name ( Snno, t_time, Value, rate ) AS
SELECT 'ABC', TIMESTAMP '2018-05-17 08:00:00', 100.00, 3 FROM DUAL UNION ALL
SELECT 'ABC', TIMESTAMP '2018-05-17 22:00:00', 200.00, 1 FROM DUAL UNION ALL
SELECT 'ABC', TIMESTAMP '2018-05-16 08:00:00', 100.00, 1 FROM DUAL UNION ALL
SELECT 'XYZ', TIMESTAMP '2018-05-14 01:00:00', 700.00, 1 FROM DUAL UNION ALL
SELECT 'XYZ', TIMESTAMP '2018-05-15 10:00:00', 500.00, 2 FROM DUAL UNION ALL
SELECT 'XYZ', TIMESTAMP '2018-05-15 13:00:00', 100.00, 2 FROM DUAL;
Query 1:
SELECT snno,
AVG( value * rate ) As new_value
FROM (
SELECT t.*,
ROW_NUMBER() OVER (
PARTITION BY snno, value
ORDER BY t_time DESC
) AS rn
FROM table_name t
)
WHERE rn = 1
GROUP BY snno
Results:
| SNNO | NEW_VALUE |
|------|-------------------|
| ABC | 250 |
| XYZ | 633.3333333333334 |

Related

sql query need to check date along with Store_id

I have two tables, both has millions of rows.
Table A:-
Store_id, Purchase_dt, Amount
-------- ----------- ------
1001 02JAN19 12.20
1001 05MAY20 13.30
1002 07JUL21 10.97
Table B:-
Store_id, Valid_from, Valid_to, Profile_ID
-------- ---------- -------- ----------
1001 01JAN17 08JUL19 56
1001 09JUL19 12DEC99 60
1002 01JAN20 12DEC99 70
I need to find only transaction from stores that has a profile id of 60 and 70 and Purchase_dt should be between Valid_from and valid_to and for this joining column is Store_id.
Target table expectation is:
Store_id, Purchase_dt, Amount, Profile_ID
-------- ----------- ------
1001 05MAY20 13.30 60
1002 07JUL21 10.97 70
I tried with
Select
a.Store_id,
a.Purchase_dt,
a.Amount,
b.Profile_ID
from
table_a a,
table_b b
where
a.Store_id = b.Store_id
and
a.Purchase_dt between b.Valid_from and b.Valid_to
and
b.Profile_ID in (60,70)
but not getting the desired result, all dates are date data type any help is appreciated!
If dates are really stored as strings (that's what sample data you posted looks like), then - if you want between to work properly - you first have to convert these strings into valid DATE datatype values (using to_date function with appropriate format model).
Moreover, you're looking for trouble keeping 2-digits years; didn't Y2K bug teach you anything?
I'd suggest you to keep dates in DATE datatype columns and avoid many kinds of problems.
As of your current problem, here you are:
Sample data:
SQL> with
2 table_a (store_id, purchase_dt, amount) as
3 (select 1001, '02JAN19', 12.20 from dual union all
4 select 1001, '05MAY20', 13.30 from dual union all
5 select 1002, '07JUL21', 10.97 from dual
6 ),
7 table_b (store_id, valid_from, valid_to, profile_id) as
8 (select 1001, '01JAN17', '08JUL19', 56 from dual union all
9 select 1001, '09JUL19', '12DEC99', 60 from dual union all
10 select 1002, '01JAN20', '12DEC99', 70 from dual
11 )
Query begins here:
12 select a.store_id, a.purchase_dt, a.amount, b.profile_id
13 from table_a a join table_b b
14 on a.store_id = b.store_id
15 and to_date(a.purchase_dt, 'ddMONyy') between
16 to_date(b.valid_from, 'ddMONyy') and to_date(b.valid_to, 'ddMONyy')
17 where b.profile_id in (60, 70);
STORE_ID PURCHAS AMOUNT PROFILE_ID
---------- ------- ---------- ----------
1001 05MAY20 13,3 60
1002 07JUL21 10,97 70
SQL>
If - as you commented - date values really are DATEs - then it gets simpler.
Compare:
Strings:
15 and to_date(a.purchase_dt, 'ddMONyy') between
16 to_date(b.valid_from, 'ddMONyy') and
to_date(b.valid_to, 'ddMONyy')
Dates:
15 and a.purchase_dt between b.valid_from and b.valid_to
The whole query that deals with DATE datatype:
SQL> with
2 table_a (store_id, purchase_dt, amount) as
3 (select 1001, date '2019-01-02', 12.20 from dual union all
4 select 1001, date '2020-05-05', 13.30 from dual union all
5 select 1002, date '2021-07-07', 10.97 from dual
6 ),
7 table_b (store_id, valid_from, valid_to, profile_id) as
8 (select 1001, date '2017-01-01', date '2019-07-08', 56 from dual union all
9 select 1001, date '2019-07-09', date '2099-12-12', 60 from dual union all
10 select 1002, date '2020-01-01', date '2099-12-12', 70 from dual
11 )
12 select a.store_id, a.purchase_dt, a.amount, b.profile_id
13 from table_a a join table_b b
14 on a.store_id = b.store_id
15 and a.purchase_dt between b.valid_from and b.valid_to
16 where b.profile_id in (60, 70) ;
STORE_ID PURCHASE AMOUNT PROFILE_ID
---------- -------- ---------- ----------
1001 05.05.20 13,3 60
1002 07.07.21 10,97 70
SQL>
Your query applied to same sample data also works:
SQL> with
2 table_a (store_id, purchase_dt, amount) as
3 (select 1001, date '2019-01-02', 12.20 from dual union all
4 select 1001, date '2020-05-05', 13.30 from dual union all
5 select 1002, date '2021-07-07', 10.97 from dual
6 ),
7 table_b (store_id, valid_from, valid_to, profile_id) as
8 (select 1001, date '2017-01-01', date '2019-07-08', 56 from dual union all
9 select 1001, date '2019-07-09', date '2099-12-12', 60 from dual union all
10 select 1002, date '2020-01-01', date '2099-12-12', 70 from dual
11 )
This is your query:
12 Select
13 a.Store_id,
14 a.Purchase_dt,
15 a.Amount,
16 b.Profile_ID
17 from
18 table_a a,
19 table_b b
20 where
21 a.Store_id = b.Store_id
22 and
23 a.Purchase_dt between b.Valid_from and b.Valid_to
24 and
25 b.Profile_ID in (60,70);
STORE_ID PURCHASE AMOUNT PROFILE_ID
---------- -------- ---------- ----------
1001 05.05.20 13,3 60
1002 07.07.21 10,97 70
SQL>

Time difference between different stages in data set

I was wondering if you could help me out. Im not sure if this is possible but given the table of data below I was wondering if it is possible to write a query that could easily show the time taken for each car between the carViewed and carBought stage. Ideally I would like to see the carID along with the time. For example the results should be something like this:
CarID
TimeDifference
1
00:17:83
2
00:04:21
3
01:57:83
Data
CarID
Stage
Timestamp
1
carArrived
2022-01-20 13:00:00
1
carViewed
2022-01-20 14:00:00
1
carBought
2022-01-20 14:17:83
1
carLeft
2022-01-20 15:17:83
2
carArrived
2022-01-21 15:00:00
2
carViewed
2022-01-21 16:00:00
2
carBought
2022-01-21 16:04:21
2
carLeft
2022-01-21 16:27:83
3
carArrived
2022-01-22 13:00:00
3
carViewed
2022-01-22 14:00:00
3
carBought
2022-01-22 15:57:83
3
carLeft
2022-01-22 16:17:83
Any help with this would be greatly appreciated. Thank you.
Use conditional aggregation:
SELECT carid,
MAX(CASE stage WHEN 'carBought' THEN timestamp END)
- MIN(CASE stage WHEN 'carViewed' THEN timestamp END) AS timeDifference
FROM table_name
GROUP BY carid
Which, for the sample data:
CREATE TABLE table_name (CarID, Stage, Timestamp) AS
SELECT 1, 'carArrived', TIMESTAMP '2022-01-20 13:00:00' FROM DUAL UNION ALL
SELECT 1, 'carViewed', TIMESTAMP '2022-01-20 14:00:00' FROM DUAL UNION ALL
SELECT 1, 'carBought', TIMESTAMP '2022-01-20 14:17:53' FROM DUAL UNION ALL
SELECT 1, 'carLeft', TIMESTAMP '2022-01-20 15:17:53' FROM DUAL UNION ALL
SELECT 2, 'carArrived', TIMESTAMP '2022-01-21 15:00:00' FROM DUAL UNION ALL
SELECT 2, 'carViewed', TIMESTAMP '2022-01-21 16:00:00' FROM DUAL UNION ALL
SELECT 2, 'carBought', TIMESTAMP '2022-01-21 16:04:21' FROM DUAL UNION ALL
SELECT 2, 'carLeft', TIMESTAMP '2022-01-21 16:27:53' FROM DUAL UNION ALL
SELECT 3, 'carArrived', TIMESTAMP '2022-01-22 13:00:00' FROM DUAL UNION ALL
SELECT 3, 'carViewed', TIMESTAMP '2022-01-22 14:00:00' FROM DUAL UNION ALL
SELECT 3, 'carBought', TIMESTAMP '2022-01-22 15:57:53' FROM DUAL UNION ALL
SELECT 3, 'carLeft', TIMESTAMP '2022-01-22 16:17:53' FROM DUAL;
Outputs:
CARID
TIMEDIFFERENCE
1
+000000000 00:17:53.000000000
2
+000000000 00:04:21.000000000
3
+000000000 01:57:53.000000000
db<>fiddle here

Generate date range data group based on data by dates

I have a list of date's data in daily basis below:
| Daytime | Item | Category| Value |
| -------- |------|------- |-------|
| 01.01.2022|A |1 |500 |
| 02.01.2022|A |1 |500 |
| 03.01.2022|A |1 |80000 |
| 04.01.2022|A |1 |500 |
| 05.01.2022|A |1 |500 |
| 01.01.2022|A |2 |600 |
| 02.01.2022|A |2 |600 |
| 03.01.2022|A |2 |600 |
| 04.01.2022|A |2 |600 |
| 05.01.2022|A |2 |600 |
| 01.01.2022|C |1 |600 |
| 02.01.2022|C |1 |600 |
| 03.01.2022|C |1 |600 |
| 04.01.2022|C |1 |600 |
| 05.01.2022|C |1 |600 |
How can i transform the data into this form?
| FromDate | ToDate | Item |Category| Value |
| --------- |--------- |------|------ |-------|
| 01.01.2022| 02.01.2022|A |1 |500 |
| 03.01.2022| 03.01.2022|A |1 |80000 |
| 04.01.2022| 05.01.2022|A |1 |500 |
| 01.01.2022| 05.01.2022|A |2 |600 |
| 01.01.2022| 05.01.2022|C |1 |600 |
I want to group the value (by item and category too) only if they are same for consecutive dates, please help, thank you!
Date format in DD.MM.YYYY and daytime's datatype is Date.
Following script for questions:
(SELECT to_date('01/01/2022', 'dd/mm/yyyy') daytime, 'A' Item, 1 Category, 500 Value FROM dual UNION ALL
SELECT to_date('02/01/2022', 'dd/mm/yyyy') daytime, 'A' Item, 1 Category, 500 Value FROM dual UNION ALL
SELECT to_date('03/01/2022', 'dd/mm/yyyy') daytime, 'A' Item, 1 Category, 80000 Value FROM dual UNION ALL
SELECT to_date('04/01/2022', 'dd/mm/yyyy') daytime, 'A' Item, 1 Category, 500 Value FROM dual UNION ALL
SELECT to_date('05/01/2022', 'dd/mm/yyyy') daytime, 'A' Item, 1 Category, 500 Value FROM dual UNION ALL
SELECT to_date('01/01/2022', 'dd/mm/yyyy') daytime, 'A' Item, 2 Category, 600 Value FROM dual UNION ALL
SELECT to_date('02/01/2022', 'dd/mm/yyyy') daytime, 'A' Item, 2 Category, 600 Value FROM dual UNION ALL
SELECT to_date('03/01/2022', 'dd/mm/yyyy') daytime, 'A' Item, 2 Category, 600 Value FROM dual UNION ALL
SELECT to_date('04/01/2022', 'dd/mm/yyyy') daytime, 'A' Item, 2 Category, 600 Value FROM dual UNION ALL
SELECT to_date('05/01/2022', 'dd/mm/yyyy') daytime, 'A' Item, 2 Category, 600 Value FROM dual UNION ALL
SELECT to_date('01/01/2022', 'dd/mm/yyyy') daytime, 'C' Item, 1 Category, 600 Value FROM dual UNION ALL
SELECT to_date('02/01/2022', 'dd/mm/yyyy') daytime, 'C' Item, 1 Category, 600 Value FROM dual UNION ALL
SELECT to_date('03/01/2022', 'dd/mm/yyyy') daytime, 'C' Item, 1 Category, 600 Value FROM dual UNION ALL
SELECT to_date('04/01/2022', 'dd/mm/yyyy') daytime, 'C' Item, 1 Category, 600 Value FROM dual UNION ALL
SELECT to_date('05/01/2022', 'dd/mm/yyyy') daytime, 'C' Item, 1 Category, 600 Value FROM dual)
You can use common table expression (cte) technique for that purpose.
with YourSample ( Daytime, Item, Category, Value) as (
select to_date('01.01.2022', 'DD.MM.YYYY'), 'A', 1, 500 from dual union all
select to_date('02.01.2022', 'DD.MM.YYYY'), 'A', 1, 500 from dual union all
select to_date('03.01.2022', 'DD.MM.YYYY'), 'A', 1, 80000 from dual union all
select to_date('04.01.2022', 'DD.MM.YYYY'), 'A', 1, 500 from dual union all
select to_date('05.01.2022', 'DD.MM.YYYY'), 'A', 1, 500 from dual union all
select to_date('01.01.2022', 'DD.MM.YYYY'), 'A', 2, 600 from dual union all
select to_date('02.01.2022', 'DD.MM.YYYY'), 'A', 2, 600 from dual union all
select to_date('03.01.2022', 'DD.MM.YYYY'), 'A', 2, 600 from dual union all
select to_date('04.01.2022', 'DD.MM.YYYY'), 'A', 2, 600 from dual union all
select to_date('05.01.2022', 'DD.MM.YYYY'), 'A', 2, 600 from dual union all
select to_date('01.01.2022', 'DD.MM.YYYY'), 'C', 1, 600 from dual union all
select to_date('02.01.2022', 'DD.MM.YYYY'), 'C', 1, 600 from dual union all
select to_date('03.01.2022', 'DD.MM.YYYY'), 'C', 1, 600 from dual union all
select to_date('04.01.2022', 'DD.MM.YYYY'), 'C', 1, 600 from dual union all
select to_date('05.01.2022', 'DD.MM.YYYY'), 'C', 1, 600 from dual
)
, YourSampleRanked (Daytime, Item, Category, Value, rnb) as (
select Daytime, Item, Category, Value
, row_number()over(PARTITION BY ITEM, CATEGORY ORDER BY DAYTIME) rnb
from YourSample
)
, cte (Daytime, Item, Category, Value, rnb, grp) as (
select Daytime, Item, Category, Value, rnb, 1 grp
from YourSampleRanked
where rnb = 1
union all
select t.Daytime, t.Item, t.Category, t.Value, t.rnb
, decode( t.Value, c.Value, c.grp, c.grp + 1 ) grp
from YourSampleRanked t
join cte c
on ( c.Category = t.Category and c.Item = t.Item and t.rnb = c.rnb + 1 )
)
select min(DAYTIME) FromDate, max(DAYTIME) ToDate, ITEM, CATEGORY, min(Value) Value
from cte
GROUP BY GRP, ITEM, CATEGORY
order by ITEM, CATEGORY, FromDate
;
demo on fiddle<>db
You can also use the MATCH_RECOGNIZE clause for the same purpose if you are running Oracle 12c and later.
select FromDate, toDate, ITEM, CATEGORY, VALUE
from YourSample
MATCH_RECOGNIZE (
PARTITION BY ITEM, CATEGORY
ORDER BY DAYTIME
MEASURES first(STRT.VALUE) as VALUE,
first(STRT.DAYTIME) as FromDate,
nvl(last(SAME.DAYTIME), first(STRT.DAYTIME)) as toDate
ONE ROW PER MATCH
PATTERN (STRT Same*)
DEFINE
Same AS VALUE = PREV(VALUE)
) MR
ORDER BY ITEM, CATEGORY, FromDate, toDate
;
demo2 on fiddle
From Oracle 12, you can use MATCH_RECOGNIZE to perform row-by-row processing:
SELECT *
FROM table_name
MATCH_RECOGNIZE (
PARTITION BY item, category
ORDER BY daytime
MEASURES
FIRST(daytime) AS from_date,
LAST(daytime) AS to_date,
FIRST(value) AS value
ONE ROW PER MATCH
PATTERN (same_value+)
DEFINE
same_value AS FIRST(value) = value
)
Which, for the sample data:
CREATE TABLE table_name (daytime, item, category, value) AS
SELECT DATE '2022-01-01', 'A', 1, 500 FROM DUAL UNION ALL
SELECT DATE '2022-01-02', 'A', 1, 500 FROM DUAL UNION ALL
SELECT DATE '2022-01-03', 'A', 1, 80000 FROM DUAL UNION ALL
SELECT DATE '2022-01-04', 'A', 1, 500 FROM DUAL UNION ALL
SELECT DATE '2022-01-05', 'A', 1, 500 FROM DUAL UNION ALL
SELECT DATE '2022-01-01', 'A', 2, 600 FROM DUAL UNION ALL
SELECT DATE '2022-01-02', 'A', 2, 600 FROM DUAL UNION ALL
SELECT DATE '2022-01-03', 'A', 2, 600 FROM DUAL UNION ALL
SELECT DATE '2022-01-04', 'A', 2, 600 FROM DUAL UNION ALL
SELECT DATE '2022-01-05', 'A', 2, 600 FROM DUAL UNION ALL
SELECT DATE '2022-01-01', 'C', 1, 600 FROM DUAL UNION ALL
SELECT DATE '2022-01-02', 'C', 1, 600 FROM DUAL UNION ALL
SELECT DATE '2022-01-03', 'C', 1, 600 FROM DUAL UNION ALL
SELECT DATE '2022-01-04', 'C', 1, 600 FROM DUAL UNION ALL
SELECT DATE '2022-01-05', 'C', 1, 600 FROM DUAL
Outputs:
ITEM
CATEGORY
FROM_DATE
TO_DATE
VALUE
A
1
2022-01-01 00:00:00
2022-01-02 00:00:00
500
A
1
2022-01-03 00:00:00
2022-01-03 00:00:00
80000
A
1
2022-01-04 00:00:00
2022-01-05 00:00:00
500
A
2
2022-01-01 00:00:00
2022-01-05 00:00:00
600
C
1
2022-01-01 00:00:00
2022-01-05 00:00:00
600
db<>fiddle here
This is a job for a GROUP BY using TRUNC(daytime, 'MM'). TRUNC(), when used with dates, truncates them to the beginning of a calendar / clock period.
SELECT TRUNC(Daytime, 'MM') FromDate,
ADD_MONTHS(TRUNC(Daytime, 'MM'), 1) ToDate,
Item, Category,
SUM(Value) Value
FROM my_table
GROUP BY TRUNC(Daytime, 'MM'), Item, Category
Or alternatively you can avoid those arcane Oracle date format specifiers like 'MM' and go with LAST_DAY().
SELECT ADD_MONTHS(LAST_DAY(Daytime) + 1, -1) FromDate,
LAST_DAY(Daytime) + 1 ToDate,
Item, Category,
SUM(Value) Value
FROM my_table
GROUP BY LAST_DAY(Daytime), Item, Category

SQL count consecutive rows

I have the following data in a table:
|event_id |starttime |person_id|attended|
|------------|-----------------|---------|--------|
| 11512997-1 | 01-SEP-16 08:00 | 10001 | N |
| 11512997-2 | 01-SEP-16 10:00 | 10001 | N |
| 11512997-3 | 01-SEP-16 12:00 | 10001 | N |
| 11512997-4 | 01-SEP-16 14:00 | 10001 | N |
| 11512997-5 | 01-SEP-16 16:00 | 10001 | N |
| 11512997-6 | 01-SEP-16 18:00 | 10001 | Y |
| 11512997-7 | 02-SEP-16 08:00 | 10001 | N |
| 11512997-1 | 01-SEP-16 08:00 | 10002 | N |
| 11512997-2 | 01-SEP-16 10:00 | 10002 | N |
| 11512997-3 | 01-SEP-16 12:00 | 10002 | N |
| 11512997-4 | 01-SEP-16 14:00 | 10002 | Y |
| 11512997-5 | 01-SEP-16 16:00 | 10002 | N |
| 11512997-6 | 01-SEP-16 18:00 | 10002 | Y |
| 11512997-7 | 02-SEP-16 08:00 | 10002 | Y |
I want to produce the following results, where the maximum number of consecutive occurences where atended = 'N' is returned:
|person_id|consec_missed_max|
| 1001 | 5 |
| 1002 | 3 |
How could this be done in Oracle (or ANSI) SQL? Thanks!
Edit:
So far I have tried:
WITH t1 AS
(SELECT t.person_id,
row_number() over(PARTITION BY t.person_id ORDER BY t.starttime) AS idx
FROM the_table t
WHERE t.attended = 'N'),
t2 AS
(SELECT person_id, MAX(idx) max_idx FROM t1 GROUP BY person_id)
SELECT t1.person_id, COUNT(1) ct
FROM t1
JOIN t2
ON t1.person_id = t2.person_id
GROUP BY t1.person_id;
The main work is in the factored subquery "prep". You seem to be somewhat familiar with analytic function, but that is not enough. This solution uses the so-called "tabibitosan" method to create groups of consecutive rows with the same characteristic in one or more dimensions; in this case, you want to group consecutive N rows with a different group for each sequence. This is done with a difference of two ROW_NUMBER() calls - one partitioned by person only, and the other by person and attended. Google "tabibitosan" to read more about the idea if needed.
with
inputs ( event_id, starttime, person_id, attended ) as (
select '11512997-1', to_date('01-SEP-16 08:00', 'dd-MON-yy hh24:mi'), 10001, 'N' from dual union all
select '11512997-2', to_date('01-SEP-16 10:00', 'dd-MON-yy hh24:mi'), 10001, 'N' from dual union all
select '11512997-3', to_date('01-SEP-16 12:00', 'dd-MON-yy hh24:mi'), 10001, 'N' from dual union all
select '11512997-4', to_date('01-SEP-16 14:00', 'dd-MON-yy hh24:mi'), 10001, 'N' from dual union all
select '11512997-5', to_date('01-SEP-16 16:00', 'dd-MON-yy hh24:mi'), 10001, 'N' from dual union all
select '11512997-6', to_date('01-SEP-16 18:00', 'dd-MON-yy hh24:mi'), 10001, 'Y' from dual union all
select '11512997-7', to_date('02-SEP-16 08:00', 'dd-MON-yy hh24:mi'), 10001, 'N' from dual union all
select '11512997-1', to_date('01-SEP-16 08:00', 'dd-MON-yy hh24:mi'), 10002, 'N' from dual union all
select '11512997-2', to_date('01-SEP-16 10:00', 'dd-MON-yy hh24:mi'), 10002, 'N' from dual union all
select '11512997-3', to_date('01-SEP-16 12:00', 'dd-MON-yy hh24:mi'), 10002, 'N' from dual union all
select '11512997-4', to_date('01-SEP-16 14:00', 'dd-MON-yy hh24:mi'), 10002, 'Y' from dual union all
select '11512997-5', to_date('01-SEP-16 16:00', 'dd-MON-yy hh24:mi'), 10002, 'N' from dual union all
select '11512997-6', to_date('01-SEP-16 18:00', 'dd-MON-yy hh24:mi'), 10002, 'Y' from dual union all
select '11512997-7', to_date('02-SEP-16 08:00', 'dd-MON-yy hh24:mi'), 10002, 'Y' from dual
),
prep ( starttime, person_id, attended, gp ) as (
select starttime, person_id, attended,
row_number() over (partition by person_id order by starttime) -
row_number() over (partition by person_id, attended
order by starttime)
from inputs
),
counts ( person_id, consecutive_absences ) as (
select person_id, count(*)
from prep
where attended = 'N'
group by person_id, gp
)
select person_id, max(consecutive_absences) as max_consecutive_absences
from counts
group by person_id
order by person_id;
OUTPUT:
PERSON_ID MAX_CONSECUTIVE_ABSENCES
---------- ---------------------------------------
10001 5
10002 3
If you are using Oracle 12c you could use MATCH_RECOGNIZE:
Data:
CREATE TABLE data AS
SELECT *
FROM (
with inputs ( event_id, starttime, person_id, attended ) as (
select '11512997-1', to_date('01-SEP-16 08:00', 'dd-MON-yy hh24:mi'), 10001, 'N' from dual union all
select '11512997-2', to_date('01-SEP-16 10:00', 'dd-MON-yy hh24:mi'), 10001, 'N' from dual union all
select '11512997-3', to_date('01-SEP-16 12:00', 'dd-MON-yy hh24:mi'), 10001, 'N' from dual union all
select '11512997-4', to_date('01-SEP-16 14:00', 'dd-MON-yy hh24:mi'), 10001, 'N' from dual union all
select '11512997-5', to_date('01-SEP-16 16:00', 'dd-MON-yy hh24:mi'), 10001, 'N' from dual union all
select '11512997-6', to_date('01-SEP-16 18:00', 'dd-MON-yy hh24:mi'), 10001, 'Y' from dual union all
select '11512997-7', to_date('02-SEP-16 08:00', 'dd-MON-yy hh24:mi'), 10001, 'N' from dual union all
select '11512997-1', to_date('01-SEP-16 08:00', 'dd-MON-yy hh24:mi'), 10002, 'N' from dual union all
select '11512997-2', to_date('01-SEP-16 10:00', 'dd-MON-yy hh24:mi'), 10002, 'N' from dual union all
select '11512997-3', to_date('01-SEP-16 12:00', 'dd-MON-yy hh24:mi'), 10002, 'N' from dual union all
select '11512997-4', to_date('01-SEP-16 14:00', 'dd-MON-yy hh24:mi'), 10002, 'Y' from dual union all
select '11512997-5', to_date('01-SEP-16 16:00', 'dd-MON-yy hh24:mi'), 10002, 'N' from dual union all
select '11512997-6', to_date('01-SEP-16 18:00', 'dd-MON-yy hh24:mi'), 10002, 'Y' from dual union all
select '11512997-7', to_date('02-SEP-16 08:00', 'dd-MON-yy hh24:mi'), 10002, 'Y' from dual
)
SELECT * FROM inputs
);
And query:
SELECT PERSON_ID, MAX(LEN) AS MAX_ABSENCES_IN_ROW
FROM data
MATCH_RECOGNIZE (
PARTITION BY PERSON_ID
ORDER BY STARTTIME
MEASURES FINAL COUNT(*) AS len
ALL ROWS PER MATCH
PATTERN(a b*)
DEFINE b AS attended = a.attended
)
WHERE attended = 'N'
GROUP BY PERSON_ID;
Output:
"PERSON_ID","MAX_ABSENCES_IN_ROW"
10001,5
10002,3
EDIT:
As #mathguy pointed it could be rewritten as:
SELECT PERSON_ID, MAX(LEN) AS MAX_ABSENCES_IN_ROW
FROM data
MATCH_RECOGNIZE (
PARTITION BY PERSON_ID
ORDER BY STARTTIME
MEASURES COUNT(*) AS len
PATTERN(a+)
DEFINE a AS attended = 'N'
)
GROUP BY PERSON_ID;
db<>fiddle demo

Event grouping in time series

I'm trying to build groups of precipitation events in my measurement data. I got a time, a measurement value and a flag noting if it's was raining:
00:00, 32.4, 0
00:10, 32.4, 0
00:20, 32.6, 1
00:30, 32.7, 1
00:40, 32.9, 1
00:50, 33.2, 1
01:00, 33.2, 0
01:10, 33.2, 0
01:20, 33.2, 0
01:30, 33.5, 1
01:40, 33.6, 1
01:50, 33.6, 0
02:00, 33.6, 0
...
Now I'd like to generate an event id for the precipitation events:
00:00, 32.4, 0, NULL
00:10, 32.4, 0, NULL
00:20, 32.6, 1, 1
00:30, 32.7, 1, 1
00:40, 32.9, 1, 1
00:50, 33.2, 1, 1
01:00, 33.2, 0, NULL
01:10, 33.2, 0, NULL
01:20, 33.2, 0, NULL
01:30, 33.5, 1, 2
01:40, 33.6, 1, 2
01:50, 33.6, 0, NULL
02:00, 33.6, 0, NULL
...
Then I'll be able to use grouping to summarize the events. Any hint how to do this in Oracle is much appreciated.
So far I was able to calculate the mentioned flag and the diff to the last row:
SELECT
measured_at,
station_id
ps, -- precipitation sum
ps - lag(ps, 1, NULL) OVER (ORDER BY measured_at ASC) as p, -- precipitation delta
CASE
WHEN ps - lag(ps, 1, NULL) OVER (ORDER BY measured_at ASC) > 0 THEN 1
ELSE 0
END as rainflag
FROM measurements;
I think it must be possible to generate the required event id somehow, but can't figure it out. Thanks for your time!
Final solution using mt0 answer:
DROP TABLE events;
CREATE TABLE events (measured_at, station_id, ps) AS
SELECT TO_DATE('2016-05-01 12:00', 'YYYY-MM-DD HH24:MI'), 'XYZ', 32.4 FROM DUAL UNION ALL
SELECT TO_DATE('2016-05-01 12:10', 'YYYY-MM-DD HH24:MI'), 'XYZ', 32.6 FROM DUAL UNION ALL
SELECT TO_DATE('2016-05-01 12:20', 'YYYY-MM-DD HH24:MI'), 'XYZ', 32.7 FROM DUAL UNION ALL
SELECT TO_DATE('2016-05-01 12:30', 'YYYY-MM-DD HH24:MI'), 'XYZ', 32.9 FROM DUAL UNION ALL
SELECT TO_DATE('2016-05-01 12:40', 'YYYY-MM-DD HH24:MI'), 'XYZ', 33.2 FROM DUAL UNION ALL
SELECT TO_DATE('2016-05-01 12:50', 'YYYY-MM-DD HH24:MI'), 'XYZ', 33.2 FROM DUAL UNION ALL
SELECT TO_DATE('2016-05-01 13:00', 'YYYY-MM-DD HH24:MI'), 'XYZ', 33.2 FROM DUAL UNION ALL
SELECT TO_DATE('2016-05-01 13:10', 'YYYY-MM-DD HH24:MI'), 'XYZ', 33.2 FROM DUAL UNION ALL
SELECT TO_DATE('2016-05-01 13:20', 'YYYY-MM-DD HH24:MI'), 'XYZ', 33.5 FROM DUAL UNION ALL
SELECT TO_DATE('2016-05-01 13:30', 'YYYY-MM-DD HH24:MI'), 'XYZ', 33.6 FROM DUAL UNION ALL
SELECT TO_DATE('2016-05-01 13:40', 'YYYY-MM-DD HH24:MI'), 'XYZ', 33.6 FROM DUAL UNION ALL
SELECT TO_DATE('2016-05-01 13:50', 'YYYY-MM-DD HH24:MI'), 'XYZ', 33.5 FROM DUAL UNION ALL
SELECT TO_DATE('2016-05-01 17:00', 'YYYY-MM-DD HH24:MI'), 'XYZ', 39.1 FROM DUAL UNION ALL
SELECT TO_DATE('2016-05-01 17:10', 'YYYY-MM-DD HH24:MI'), 'XYZ', 39.2 FROM DUAL UNION ALL
SELECT TO_DATE('2016-05-01 17:20', 'YYYY-MM-DD HH24:MI'), 'XYZ', 39.2 FROM DUAL;
WITH
flagged AS (
SELECT
measured_at,
station_id,
ps,
CASE
WHEN measured_at - lag(measured_at, 1, NULL) OVER (ORDER BY measured_at) = (1/144) THEN ps - lag(ps, 1, NULL) OVER (ORDER BY measured_at)
ELSE NULL
END as delta_p,
CASE
WHEN ps - lag(ps, 1, NULL) OVER (ORDER BY measured_at) > 0 THEN 1
ELSE 0
END AS rain
FROM events
),
eventmarked AS (
SELECT
f.*,
CASE
WHEN f.delta_p >= 0 THEN f.delta_p
ELSE NULL
END AS p,
CASE rain
WHEN 1 THEN COUNT(1) OVER (ORDER BY measured_at) - SUM(rain) OVER (ORDER BY measured_at)
END as event
FROM flagged f
),
summarized AS (
SELECT
em.*,
sum(CASE p WHEN 0 THEN NULL ELSE p END) OVER (PARTITION BY event ORDER BY measured_at) as e_ps
FROM eventmarked em
)
SELECT measured_at, station_id, ps, p, e_ps FROM summarized
ORDER BY measured_at;
Oracle Setup:
CREATE TABLE events ( measured_at, station_id, ps ) AS
SELECT '00:00', 32.4, 0 FROM DUAL UNION ALL
SELECT '00:10', 32.4, 0 FROM DUAL UNION ALL
SELECT '00:20', 32.6, 1 FROM DUAL UNION ALL
SELECT '00:30', 32.7, 1 FROM DUAL UNION ALL
SELECT '00:40', 32.9, 1 FROM DUAL UNION ALL
SELECT '00:50', 33.2, 1 FROM DUAL UNION ALL
SELECT '01:00', 33.2, 0 FROM DUAL UNION ALL
SELECT '01:10', 33.2, 0 FROM DUAL UNION ALL
SELECT '01:20', 33.2, 0 FROM DUAL UNION ALL
SELECT '01:30', 33.5, 1 FROM DUAL UNION ALL
SELECT '01:40', 33.6, 1 FROM DUAL UNION ALL
SELECT '01:50', 33.6, 0 FROM DUAL UNION ALL
SELECT '02:00', 33.6, 0 FROM DUAL;
Query:
SELECT measured_at,
station_id,
ps,
CASE WHEN rainflag IS NOT NULL THEN DENSE_RANK() OVER ( ORDER BY rainflag ) END AS rainflag
FROM (
SELECT e.*,
CASE ps
WHEN 1
THEN COUNT( 1 ) OVER ( ORDER BY measured_at )
- SUM( ps ) OVER ( ORDER BY measured_at )
END AS rainflag
FROM events e
)
ORDER BY measured_at;
Query 2
SELECT measured_at,
station_id,
ps,
CASE ps WHEN 1
THEN SUM( rainflag ) OVER ( ORDER BY measured_at )
END AS rainflag
FROM (
SELECT e.*,
CASE WHEN ps > LAG( ps, 1, 0 ) OVER ( ORDER BY measured_at )
THEN 1
END AS rainflag
FROM events e
);
Output:
MEASURED_AT STATION_ID PS RAINFLAG
----------- ---------- ---------- ----------
00:00 32.4 0
00:10 32.4 0
00:20 32.6 1 1
00:30 32.7 1 1
00:40 32.9 1 1
00:50 33.2 1 1
01:00 33.2 0
01:10 33.2 0
01:20 33.2 0
01:30 33.5 1 2
01:40 33.6 1 2
01:50 33.6 0
02:00 33.6 0
Alternative solution using only LAG function.
In the subquery the column PS2 marks the rain started events. The main query simple sums this flag while ignoring the time that is not raining.
with ev as (
select measured_at, station_id, ps,
case when ps = 1 and lag(ps,1,0) over (order by measured_at) = 0
then 1 else 0 end ps2
from events)
select measured_at, station_id, ps, ps2,
case when ps = 1 then
sum(ps2) over (order by measured_at) end rf
from ev
;
MEASURED_AT STATION_ID PS PS2 RF
----------- ---------- ---------- ---------- ----------
00:00 32,4 0 0
00:10 32,4 0 0
00:20 32,6 1 1 1
00:30 32,7 1 0 1
00:40 32,9 1 0 1
00:50 33,2 1 0 1
01:00 33,2 0 0
01:10 33,2 0 0
01:20 33,2 0 0
01:30 33,5 1 1 2
01:40 33,6 1 0 2
01:50 33,6 0 0
02:00 33,6 0 0