Select records all within 10 minutes from each other - sql

I have some data coming from a source in my Oracle database.
If a particular Office_ID has been deactivated and it has all three clients (A,B,C) for a particular day, then we have to check whether all clients have gone. If yes, then we need to check whether timeframe for all clients is within 10 Minutes.
If this repeats three times in a day for a particular office we declare the office as closed.
Here is some sample data:
+-----------+-----------+--------------+--------+
| OFFICE_ID | FAIL_TIME | ACTIVITY_DAY | CLIENT |
| 1002 | 5:39:00 | 23/01/2015 | A |
| 1002 | 17:49:00 | 23/12/2014 | A |
| 1002 | 18:41:57 | 1/5/2014 | B |
| 1002 | 10:32:00 | 1/7/2014 | A |
| 1002 | 10:34:23 | 1/7/2014 | B |
| 1002 | 10:35:03 | 1/7/2014 | C |
| 1002 | 12:08:52 | 1/7/2014 | B |
| 1002 | 12:09:00 | 1/7/2014 | A |
| 1002 | 12:26:10 | 1/7/2014 | B |
| 1002 | 13:31:32 | 1/7/2014 | B |
| 1002 | 15:24:06 | 1/7/2014 | B |
| 1002 | 15:55:06 | 1/7/2014 | C |
+-----------+-----------+--------------+--------+
The result should be like this:
1002 10:32:00 A
1002 10:34:23 B
1002 10:35:03 C
Any help would be appreciated. I am looking for a SQL query or a PL/SQL procedure.

A solution using the COUNT analytic function with a RANGE BETWEEN INTERVAL '10' MINUTE PRECEDING AND INTERVAL '10' MINUTE FOLLOWING that avoids self-joins:
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE Test ( OFFICE_ID, FAIL_TIME, ACTIVITY_DAY, CLIENT ) AS
SELECT 1002, '5:39:00', '23/01/2015', 'A' FROM DUAL
UNION ALL SELECT 1002, '17:49:00', '23/12/2014', 'A' FROM DUAL
UNION ALL SELECT 1002, '18:41:57', '1/5/2014', 'B' FROM DUAL
UNION ALL SELECT 1002, '10:32:00', '1/7/2014', 'A' FROM DUAL
UNION ALL SELECT 1002, '10:34:23', '1/7/2014', 'B' FROM DUAL
UNION ALL SELECT 1002, '10:35:03', '1/7/2014', 'C' FROM DUAL
UNION ALL SELECT 1002, '12:08:52', '1/7/2014', 'B' FROM DUAL
UNION ALL SELECT 1002, '12:09:00', '1/7/2014', 'A' FROM DUAL
UNION ALL SELECT 1002, '12:26:10', '1/7/2014', 'B' FROM DUAL
UNION ALL SELECT 1002, '13:31:32', '1/7/2014', 'B' FROM DUAL
UNION ALL SELECT 1002, '15:24:06', '1/7/2014', 'B' FROM DUAL
UNION ALL SELECT 1002, '15:55:06', '1/7/2014', 'C' FROM DUAL
Query 1:
WITH Times AS (
SELECT OFFICE_ID,
TO_DATE( ACTIVITY_DAY || ' ' || FAIL_TIME, 'DD/MM/YYYY HH24/MI/SS' ) AS FAIL_DATETIME,
CLIENT
FROM Test
),
Next_Times As (
SELECT OFFICE_ID,
FAIL_DATETIME,
COUNT( CASE CLIENT WHEN 'A' THEN 1 END ) OVER ( PARTITION BY OFFICE_ID ORDER BY FAIL_DATETIME RANGE BETWEEN INTERVAL '10' MINUTE PRECEDING AND INTERVAL '10' MINUTE FOLLOWING ) AS COUNT_A,
COUNT( CASE CLIENT WHEN 'B' THEN 1 END ) OVER ( PARTITION BY OFFICE_ID ORDER BY FAIL_DATETIME RANGE BETWEEN INTERVAL '10' MINUTE PRECEDING AND INTERVAL '10' MINUTE FOLLOWING ) AS COUNT_B,
COUNT( CASE CLIENT WHEN 'C' THEN 1 END ) OVER ( PARTITION BY OFFICE_ID ORDER BY FAIL_DATETIME RANGE BETWEEN INTERVAL '10' MINUTE PRECEDING AND INTERVAL '10' MINUTE FOLLOWING ) AS COUNT_C
FROM Times
)
SELECT OFFICE_ID,
TO_CHAR( FAIL_DATETIME, 'HH24:MI:SS' ) AS FAIL_TIME,
TO_CHAR( FAIL_DATETIME, 'DD/MM/YYYY' ) AS ACTIVITY_DAY
FROM Next_Times
WHERE COUNT_A > 0
AND COUNT_B > 0
AND COUNT_C > 0
ORDER BY FAIL_DATETIME
Results:
| OFFICE_ID | FAIL_TIME | ACTIVITY_DAY |
|-----------|-----------|--------------|
| 1002 | 10:32:00 | 01/07/2014 |
| 1002 | 10:34:23 | 01/07/2014 |
| 1002 | 10:35:03 | 01/07/2014 |

To identify records you can join table to it self three times like this:
SELECT
a.*, b.*, c.*
FROM FailLog a INNER JOIN
FailLog b ON b.OFFICE_ID = A.OFFICE_ID AND
a.CLIENT = 'A' AND
b.CLIENT = 'B' AND
b.ACTIVITY_DAY = a.ACTIVITY_DAY INNER JOIN
FailLog c ON c.OFFICE_ID = A.OFFICE_ID AND
c.CLIENT = 'C' AND
c.ACTIVITY_DAY = a.ACTIVITY_DAY AND
-- need to calculate difference in min here
GREATEST (a.FAIL_TIME, b. FAIL_TIME, c. FAIL_TIME) -
LEAST (a.FAIL_TIME, b. FAIL_TIME, c. FAIL_TIME) <= 10
The output will give you one row instead of three as requested in the question, but that will be the right level for the fault data, as all three clients should have it.

The first thing we need is a way of comparing FAIL_TIME. As you haven't posted a table structure let's assume we're dealing with strings.
Oracle has some neat built-ins for casting dates and strings. If we concatenate ACTIVITY_DATE and FAIL_TIME we can convert them to a DATE data type:
to_date(ACTIVITY_DAY||' '||FAIL_TIME, 'dd/mm/yyyy hh24:mi:ss')
We can cast that to a string representing the number of seconds past midnight:
to_char(to_date(ACTIVITY_DAY||' '||FAIL_TIME, 'dd/mm/yyyy hh24:mi:ss'), 'sssss')
Then we can cast that to a number, which we can use in some arithmetic to compare with other rows; ten minutes = 600 seconds.
Next we can use the subquery factoring (the WITH clause). One of the neat features of this syntax is that we can pass the output of one subquery into another one, so we only only need to write that gnarly nested cast expression once.
with t as
( select OFFICE_ID
, ACTIVITY_DAY
, FAIL_TIME
, to_number(to_char(to_date(ACTIVITY_DAY||' '||FAIL_TIME, 'dd/mm/yyyy hh24:mi:ss'), 'sssss')) FAIL_TIME_SSSSS
, CLIENT
from faillog
)
We can use this sub-query to build other subqueries which separate the table's rows into sets for each CLIENT for use in our main query.
Finally we can use an analytic COUNT() function to track how many bunches of FAIL_TIME we have for each OFFICE and ACTIVITY_DATE combo.
count(*) over (partition by a.OFFICE_ID, a.ACTIVITY_DAY)
Putting it all together in an in-line view allows us to test for whether we can "declare the office as closed".
select * from (
with t as ( select OFFICE_ID
, ACTIVITY_DAY
, FAIL_TIME
, to_number(to_char(to_date(ACTIVITY_DAY||' '||FAIL_TIME, 'dd/mm/yyyy hh24:mi:ss'), 'sssss')) FAIL_TIME_SSSSS
, CLIENT
from faillog
)
, a as (select *
from t
where CLIENT = 'A' )
, b as (select *
from t
where CLIENT = 'B' )
, c as (select *
from t
where CLIENT = 'C' )
select a.OFFICE_ID
, a.ACTIVITY_DAY
, a.FAIL_TIME as a_fail_time
, b.FAIL_TIME as b_fail_time
, c.FAIL_TIME as a_fail_time
, count(*) over (partition by a.OFFICE_ID, a.ACTIVITY_DAY) as fail_count
from a
join b on a.OFFICE_ID = b.OFFICE_ID and a.ACTIVITY_DAY = b.ACTIVITY_DAY
join c on a.OFFICE_ID = c.OFFICE_ID and a.ACTIVITY_DAY = c.ACTIVITY_DAY
where a.FAIL_TIME_SSSSS between b.FAIL_TIME_SSSSS - 600 and b.FAIL_TIME_SSSSS + 600
and a.FAIL_TIME_SSSSS between c.FAIL_TIME_SSSSS - 600 and c.FAIL_TIME_SSSSS + 600
and b.FAIL_TIME_SSSSS between a.FAIL_TIME_SSSSS - 600 and a.FAIL_TIME_SSSSS + 600
and b.FAIL_TIME_SSSSS between c.FAIL_TIME_SSSSS - 600 and c.FAIL_TIME_SSSSS + 600
and c.FAIL_TIME_SSSSS between a.FAIL_TIME_SSSSS - 600 and a.FAIL_TIME_SSSSS + 600
and c.FAIL_TIME_SSSSS between b.FAIL_TIME_SSSSS - 600 and b.FAIL_TIME_SSSSS + 600
)
where fail_count >= 3
/
Notes
Obviously I have hard-coded the CLIENT identifier in the subqueries.
It would be possible to avoid the hard-coding, but the sample query is already complicated enough.
This query doesn't search for
triplets. Providing there is one failure for each of A, B and C
within a ten minute window it doesn't matter how many instances of
each CLIENT occur within the window. There's nothing in your
business rules to say this is wrong.
Similarly, the same instance of
one CLIENT can be matched with instances of other CLIENTs in
overlapping windows. Now this may be undesirable: double or triple
counting may inflate the FAIL_COUNT. But again, handling this will
make the final query more complicated.
The query as presented has one row for each distinct combo of A, B and C FAIL_TIME values. The result set can be pivoted if you really need a row for each CLIENT/FAIL_TIME.

Related

SQL statement to return the Min and Max amount of stock per article for a given Month

I have a table from which I am trying to return the quantity per day that the article was in the system.
Example is in table Bestand the are multiple palletes of a different articles that each have a Booking In and Out date; I am try to find out the Min and Max amount of stock that was in the system per article and month.
My thinking is that if I can return the stock quantity for each day and then read out the Min and Max values.
The Timespan would be set at the time of running the SQL and the articles would be fixed.
To find out the quantity for each day I have used the following SQL:
SELECT DISTINCT
a.artbez1 AS Artikelbezeichnung,
b.artikelnr AS Artikelnummer,
SUM(CASE WHEN TO_DATE('2019-11-01 00:00:00', 'YYYY-MM-DD HH24:MI:SS') BETWEEN b.neu_datum AND b.aender_datum THEN 1 * b.menge_ist ELSE 0 END) AS "01 Nov 2019"
FROM
artikel a, bestand b
WHERE
b.artikelnr IN ('273632002', .... (huge long list of numbers) ....)
AND b.artikelnr = a.artikelnr
GROUP BY
a.artbez1, b.artikelnr;
This returns for example:
ARTIKELBEZEICHNUNG
ARTIKELNUMMER
01 Nov 2019
SC-4400.CW
220450002
39
S-320.FK120
220502004
0
H-595.FK120
220800004
35
AC-548.FK209
220948032
0
AS-6800.CW
221355002
20
I would like return this for each day of the Month and then from that return the Min and Max Value for each Article
I have the following SQL to return the days of a given Month and was wondering if anyone had any ideas on how they could be combined (If at all possible):
SELECT to_date('01.11.2019','dd.mm.yyyy')+LEVEL-1
FROM dual
CONNECT BY LEVEL <= TO_CHAR(LAST_DAY(to_date('01.11.2019','dd.mm.yyyy')),'DD')
DATES
2019-11-01 00:00:00
2019-11-02 00:00:00
2019-11-03 00:00:00
2019-11-04 00:00:00
2019-11-05 00:00:00
2019-11-06 00:00:00
2019-11-07 00:00:00
The result i am try to get would be something like:
ARTIKELBEZEICHNUNG
ARTIKELNUMMER
Nov 19 Min
Nov 19 Max
SC-4400.CW
220450002
5
39
S-320.FK120
220502004
0
15
H-595.FK120
220800004
2
35
AC-548.FK209
220948032
0
0
AS-6800.CW
221355002
10
20
Is this at all possible in SQL?
Thanks for taking the time to read my post.
JeRi
You can use a partitioned outer join:
WITH calendar ( day ) AS (
SELECT DATE '2019-11-01'
FROM DUAL
UNION ALL
SELECT day + INTERVAL '1' DAY
FROM calendar
WHERE day < LAST_DAY( DATE '2019-11-01' )
),
daily_totals ( artbez1, Artikelnr, Day, total_menge_ist ) AS (
SELECT MAX( ab.artbez1 ),
ab.artikelnr,
c.day,
COALESCE( SUM( ab.menge_ist ), 0 )
FROM calendar c
LEFT OUTER JOIN
( SELECT a.artikelnr,
a.artbez1,
b.neu_datum,
b.aender_datum,
b.menge_ist
FROM artikel a
LEFT JOIN bestand b
ON ( a.artikelnr = b.artikelnr )
-- WHERE b.artikelnr IN ('273632002', .... (huge long list of numbers) ....)
) ab
PARTITION BY ( ab.artikelnr, ab.artbez1 )
ON ( c.day BETWEEN ab.neu_datum AND ab.aender_datum )
GROUP BY ab.artikelnr, c.day
)
SELECT MAX( artbez1 ) AS Artikelbezeichnung,
artikelnr AS Artikelnummer,
TRUNC( day, 'MM' ) AS month,
MIN( total_menge_ist ) AS min_total_menge_ist,
MAX( total_menge_ist ) AS max_total_menge_ist
FROM daily_totals
GROUP BY artikelnr, TRUNC( day, 'MM' );
Which, for the sample data:
CREATE TABLE artikel ( artikelnr, artbez1 ) AS
SELECT 220450002, 'SC-4400.CW' FROM DUAL UNION ALL
SELECT 220502004, 'S-320.FK120' FROM DUAL UNION ALL
SELECT 220800004, 'H-595.FK120' FROM DUAL UNION ALL
SELECT 220948032, 'AC-548.FK209' FROM DUAL UNION ALL
SELECT 221355002, 'AS-6800.CW' FROM DUAL;
CREATE TABLE bestand ( artikelnr, neu_datum, aender_datum, menge_ist ) AS
SELECT 220450002, DATE '2019-10-30', DATE '2019-11-01', 20 FROM DUAL UNION ALL
SELECT 220450002, DATE '2019-11-01', DATE '2019-11-05', 19 FROM DUAL UNION ALL
SELECT 220502004, DATE '2019-11-05', DATE '2019-11-03', 5 FROM DUAL UNION ALL
SELECT 220800004, DATE '2019-11-01', DATE '2019-11-15', 35 FROM DUAL UNION ALL
SELECT 221355002, DATE '2019-10-20', DATE '2019-11-05', 5 FROM DUAL UNION ALL
SELECT 221355002, DATE '2019-10-25', DATE '2019-11-10', 5 FROM DUAL UNION ALL
SELECT 221355002, DATE '2019-10-28', DATE '2019-11-13', 5 FROM DUAL UNION ALL
SELECT 221355002, DATE '2019-10-30', DATE '2019-11-15', 5 FROM DUAL UNION ALL
SELECT 221355002, DATE '2019-11-05', DATE '2019-11-20', 5 FROM DUAL;
Outputs:
ARTIKELBEZEICHNUNG | ARTIKELNUMMER | MONTH | MIN_TOTAL_MENGE_IST | MAX_TOTAL_MENGE_IST
:----------------- | ------------: | :------------------ | ------------------: | ------------------:
SC-4400.CW | 220450002 | 2019-11-01 00:00:00 | 0 | 39
S-320.FK120 | 220502004 | 2019-11-01 00:00:00 | 0 | 0
AC-548.FK209 | 220948032 | 2019-11-01 00:00:00 | 0 | 0
H-595.FK120 | 220800004 | 2019-11-01 00:00:00 | 0 | 35
AS-6800.CW | 221355002 | 2019-11-01 00:00:00 | 0 | 25
db<>fiddle here

With Oracle SQL how can I find 3 days where total sum >= 150

I have a report that needs to list activity where total is >= 150 over 3 consecutive days.
Let's say I've created a temp table foo, to summarize daily totals.
| ID | Day | Total |
| -- | ---------- | ----- |
| 01 | 2020-01-01 | 10 |
| 01 | 2020-01-02 | 50 |
| 01 | 2020-01-03 | 50 |
| 01 | 2020-01-04 | 50 |
| 01 | 2020-01-05 | 20 |
| 02 | 2020-01-01 | 10 |
| 02 | 2020-01-02 | 10 |
| 02 | 2020-01-03 | 10 |
| 02 | 2020-01-04 | 10 |
| 02 | 2020-01-05 | 10 |
How Would I write SQL to return ID 01, but not 02?
Example Result:
| ID |
| -- |
| 01 |
I suspect that you want window functions here:
select distinct id
from (
select
t.*,
sum(total) over(partition by id order by day rows between 2 preceding and current row) sum_total,
count(*) over(partition by id order by day rows between 2 preceding and current row) cnt
from mytable t
) t
where cnt = 3 and sum_total >= 150
This gives you the ids that have a total greater than the given threshold over 3 consecutive days - which is how I understood your question.
If you just want to output the rows that have 3 consecutive days with a sum >= 150, you can use an analytic function to determine the moving total across each 3 day period per id, and then find the aggregate max value of the moving total per id, returning the id where it's >= 150.
E.g.:
WITH your_table AS (SELECT 1 ID, to_date('01/01/2020', 'dd/mm/yyyy') dy, 10 total FROM dual UNION ALL
SELECT 1 ID, to_date('02/01/2020', 'dd/mm/yyyy') dy, 50 total FROM dual UNION ALL
SELECT 1 ID, to_date('03/01/2020', 'dd/mm/yyyy') dy, 50 total FROM dual UNION ALL
SELECT 1 ID, to_date('04/01/2020', 'dd/mm/yyyy') dy, 50 total FROM dual UNION ALL
SELECT 1 ID, to_date('05/01/2020', 'dd/mm/yyyy') dy, 20 total FROM dual UNION ALL
SELECT 2 ID, to_date('01/01/2020', 'dd/mm/yyyy') dy, 10 total FROM dual UNION ALL
SELECT 2 ID, to_date('02/01/2020', 'dd/mm/yyyy') dy, 10 total FROM dual UNION ALL
SELECT 2 ID, to_date('03/01/2020', 'dd/mm/yyyy') dy, 10 total FROM dual UNION ALL
SELECT 2 ID, to_date('04/01/2020', 'dd/mm/yyyy') dy, 10 total FROM dual UNION ALL
SELECT 2 ID, to_date('05/01/2020', 'dd/mm/yyyy') dy, 10 total FROM dual),
moving_sums AS (SELECT ID,
dy,
total,
SUM(total) OVER (PARTITION BY ID ORDER BY dy RANGE BETWEEN 2 PRECEDING AND CURRENT ROW) moving_sum
FROM your_table)
SELECT ID
FROM moving_sums
GROUP BY ID
HAVING MAX(moving_sum) >= 150;
ID
----------
1
You can use a HAVING Clause GROUPED BY ID to list the desired ID values
SELECT ID
FROM foo
GROUP BY ID
HAVING COUNT( distinct day )>=3 AND SUM( NVL(Total,0) ) >= 150
Demo
Use this if you are to specify dates
WITH foo( ID, Day, Total ) AS
(
SELECT '01', date'2020-01-01' , 10 FROM dual
UNION ALL SELECT '01', date'2020-01-02' , 50 FROM dual
UNION ALL SELECT '01', date'2020-01-03' , 50 FROM dual
UNION ALL SELECT '01', date'2020-01-04' , 50 FROM dual
UNION ALL SELECT '01', date'2020-01-05' , 20 FROM dual
UNION ALL SELECT '02', date'2020-01-01' , 10 FROM dual
UNION ALL SELECT '02', date'2020-01-02' , 10 FROM dual
UNION ALL SELECT '02', date'2020-01-03' , 10 FROM dual
UNION ALL SELECT '02', date'2020-01-04' , 10 FROM dual
UNION ALL SELECT '02', date'2020-01-05' , 10 FROM dual
)SELECT
ID
FROM foo
WHERE day BETWEEN TO_DATE('2020-01-01', 'YYYY-MM-DD' ) AND TO_DATE('2020-01-04', 'YYYY-MM-DD' )
GROUP BY ID HAVING SUM(Total) >= 150;
RESULT:
ID|
--|
01|
Maybe you can try something like this :
SELECT
*
FROM foo
WHERE day BETWEEN 2020-01-01 AND 2020-01-04
AND total > 150

SQL: Getting all dates between a set of date pairs

I have a table with some data and a time period i.e. start date and end date
------------------------------
| id | start_date | end_date |
|------------------------------|
| 0 | 1-1-2019 | 3-1-2019 |
|------------------------------|
| 1 | 6-1-2019 | 8-1-2019 |
|------------------------------|
I want to run a query that will return the id and all the dates that are within those time periods. for instance, the result of the query for the above table will be:
------------------
| id | date |
|------------------|
| 0 | 1-1-2019 |
|------------------|
| 0 | 2-1-2019 |
|------------------|
| 0 | 3-1-2019 |
|------------------|
| 1 | 6-1-2019 |
|------------------|
| 1 | 7-1-2019 |
|------------------|
| 1 | 8-1-2019 |
------------------
I am using Redshift therefor I need it supported in Postgres and take this into consideration
Your help will be greatly appriciated
The common way this is done is to create a calendar table with a list of dates. In fact, a calendar table can be extended to include columns like:
Day number (in year)
Week number
First day of month
Last day of month
Weekday / Weekend
Public holiday
Simply create the table in Excel, save as CSV and then COPY it into Redshift.
You could then just JOIN to the table, like:
SELECT
table.id,
calendar.date
FROM table
JOIN calendar
WHERE
calendar.date BETWEEN table.start_date AND table.end_date
This question was originally tagged Postgres.
Use generate_series():
select t.id, gs.dte
from t cross join lateral
generate_series(t.start_date, t.end_date, interval '1 day') as gs(dte);
ok, It took me a while to get there but this is what I did (though not really proud of it):
I created a query that generates a calendar for the last 6 years, cross joined it with my table and then selected the relevant dates from my calendar table.
WITH
days AS (select 0 as num UNION select 1 as num UNION select 2 UNION select 3 UNION select 4 UNION select 5 UNION select 6 UNION select 7 UNION select 8 UNION select 9 UNION select 10 UNION select 11 UNION select 12 UNION select 13 UNION select 14 UNION select 15 UNION select 16 UNION select 17 UNION select 18 UNION select 19 UNION select 20 UNION select 21 UNION select 22 UNION select 23 UNION select 24 UNION select 25 UNION select 26 UNION select 27 UNION select 28 UNION select 29 UNION select 30 UNION select 31),
month AS (select num from days where num <= 12),
years AS (select num from days where num <= 6),
rightnow AS (select CAST( TO_CHAR(GETDATE(), 'yyyy-mm-dd hh24') || ':' || trim(TO_CHAR((ROUND((DATEPART (MINUTE, GETDATE()) / 5), 1) * 5 ),'09')) AS TIMESTAMP) as start),
calendar as
(
select
DATEADD(years, -y.num, DATEADD( month, -m.num, DATEADD( days, -d.num, n.start ) ) ) AS period_date
from days d, month m, years y, rightnow n
)
select u.id, calendar.period_date
from periods u
cross join calendar
where date_part(DAY, u.finishedat) >= date_part(DAY, u.startedat) + 1 and date_part(DAY, calendar.period_date) < date_part(DAY, u.finishedat) and date_part(DAY, calendar.period_date) > date_part(DAY, u.startedat) and calendar.period_date < u.finishedat and calendar.period_date > u.startedat
This was based on the answer here: Using sql function generate_series() in redshift

Oracle - Assigning the correct Date from a Set

I have a table A like below
REGID | PKG_DESC | EVENT_DATE | IS_CON | IS_REN
-----------------------------------------------------
1234 | cc | 27-MAR-14 | 0 | 0
1234 | cc | 27-JUN-14 | 1 | 0
1234 | GUI | 27-MAR-14 | 0 | 0
1234 | GUI | 27-JUN-14 | 1 | 0
1234 | GUI | 27-SEPT-14 | 0 | 1
1234 | GUI | 27-SEPT-15 | 0 | 1
1234 | REMOTE | 27-MAR-14 | 0 | 0
1234 | REMOTE | 27-JUN-14 | 1 | 0
1234 | REMOTE | 27-SEPT-14 | 0 | 1
2431 | cc | 27-MAR-14 | 0 | 0
2431 | cc | 27-JUN-14 | 1 | 0
I have a query like below
select a.reg_id, b.sess_start_dt,
case when TRUNC(A.EVENT_DATE) - B.SESS_START_DT BETWEEN 0-30 THEN 'DAYS 0_30'
WHEN TRUNC(A.EVENT_DATE) - B.SESS_START_DT BETWEEN 31-60 THEN 'DAYS 31-60'
from tab a inner join tab b on a.reg_id = b.reg_id and a.is_ren = 1
union
select a.reg_id, b.sess_start_dt,
case when TRUNC(A.EVENT_DATE) - B.SESS_START_DT BETWEEN 0-30 THEN 'DAYS 0_30'
WHEN TRUNC(A.EVENT_DATE) - B.SESS_START_DT BETWEEN 31-60 THEN 'DAYS 31-60'
from tab a inner join tab b on a.reg_id = b.reg_id and a.is_con = 1
Tab B contains all the usage for each reg_id there will be 100's of records.. Sample of few are
REGID | SESS_START_DT
1234 | 27-Jan-14
1234 | 20-MAR-12
1234 | 27-MAR-12
1234 | 01-sept-14
1234 | 07-sept-14
1234 | 29-JUL-14
1234 | 03-AUG-14
1234 | 27-MAR-13
1234 | 27-MAR-12
1234 | 27-MAR-12
1234 | 27-MAR-12
1234 | 27-MAR-12
1234 | 27-MAR-12
1234 | 27-MAR-12
2431 | 20-JUN-14
The Above query needs to be corrected in a way like,
1) If the REG_ID is having at least one is_ren = 1 then that subscription should be considered as renewal subscription and needs to get the 30 days and 60 days usage from table B from his is_ren = 1 event_date. (for REGID 1234 only is_ren query should execute)
2) If multiple IS_REN = 1 are existing for each REGID then the usage needs to be taken 30 days and 60 days from table B with the MIN(event_date). in this case the usage should be taken from 27-SEPT-14 instead of 27-SEPT-15
3) If there is no IS_REN = 1 and there is IS_CON = 1 then it's considered as conversion and usage should be taken before 60 days from the converted date (for REGID 2431, usage needs to get 60 days back from 27-JUN-14{this is my event_date in the query})
The O/P should be like
REGID | EVENT_DATE | DAYS 0_30 | DAYS 31-60 | CODE
1234 | 27-SEPT-14 | 2 | 2 | REN
2431 | 27-JUL-14 | 1 | 0 | CON
If my assumptions in my Comment are correct, this may be what you need. Notice the order by clause in row_number() - first the rows with is_ren = 1, then the rows with is_ren = 0 and is_con = 1, then all the other rows, and within each group order by event_date ascending. This way, the top row (rn = 1), which is the only one I use in the outer query, will have is_ren = 1 with the earliest possible date, or if no is_ren = 1 then the row with is_con = 1 and the earliest date, or else just the earliest date. (In the last case, the CODE will be null: this means there were no is_ren = 1 and no is_con = 1 for that regid.
Not sure why you have 27-JUL-14 in the output for regid = 2431, that should be 27-JUN-14. Also, there are no four-letter months in Oracle ("SEPT"). The output shows dates using my session parameters; if you need to format the dates, use to_date(event_date, .....) with the desired date format model. Also, since the data you provided is just dates (with no time-of-day component), I didn't truncate anything; you may need to, if your real data has time-of-day components.
with
table_a ( regid, pkg_desc, event_date, is_con, is_ren ) as (
select 1234, 'cc' , to_date ('27-MAR-14', 'dd-MON-rr'), 0, 0 from dual union all
select 1234, 'cc' , to_date ('27-JUN-14', 'dd-MON-rr'), 1, 0 from dual union all
select 1234, 'GUI' , to_date ('27-MAR-14', 'dd-MON-rr'), 0, 0 from dual union all
select 1234, 'GUI' , to_date ('27-JUN-14', 'dd-MON-rr'), 1, 0 from dual union all
select 1234, 'GUI' , to_date ('27-SEP-14', 'dd-MON-rr'), 0, 1 from dual union all
select 1234, 'GUI' , to_date ('27-SEP-15', 'dd-MON-rr'), 0, 1 from dual union all
select 1234, 'REMOTE', to_date ('27-MAR-14', 'dd-MON-rr'), 0, 0 from dual union all
select 1234, 'REMOTE', to_date ('27-JUN-14', 'dd-MON-rr'), 1, 0 from dual union all
select 1234, 'REMOTE', to_date ('27-SEP-14', 'dd-MON-rr'), 0, 1 from dual union all
select 2431, 'cc' , to_date ('27-MAR-14', 'dd-MON-rr'), 0, 0 from dual union all
select 2431, 'cc' , to_date ('27-JUN-14', 'dd-MON-rr'), 1, 0 from dual
),
table_b ( regid, sess_start_dt ) as (
select 1234, to_date ('27-JAN-14', 'dd-MON-rr') from dual union all
select 1234, to_date ('20-MAR-12', 'dd-MON-rr') from dual union all
select 1234, to_date ('27-MAR-12', 'dd-MON-rr') from dual union all
select 1234, to_date ('01-SEP-14', 'dd-MON-rr') from dual union all
select 1234, to_date ('07-SEP-14', 'dd-MON-rr') from dual union all
select 1234, to_date ('29-JUL-14', 'dd-MON-rr') from dual union all
select 1234, to_date ('03-AUG-14', 'dd-MON-rr') from dual union all
select 1234, to_date ('27-MAR-13', 'dd-MON-rr') from dual union all
select 1234, to_date ('27-MAR-12', 'dd-MON-rr') from dual union all
select 1234, to_date ('27-MAR-12', 'dd-MON-rr') from dual union all
select 1234, to_date ('27-MAR-12', 'dd-MON-rr') from dual union all
select 1234, to_date ('27-MAR-12', 'dd-MON-rr') from dual union all
select 1234, to_date ('27-MAR-12', 'dd-MON-rr') from dual union all
select 1234, to_date ('27-MAR-12', 'dd-MON-rr') from dual union all
select 2431, to_date ('20-JUN-14', 'dd-MON-rr') from dual
),
prep ( regid, event_date, code, rn ) as (
select regid, event_date,
case when is_ren = 1 then 'REN' when is_con = 1 then 'CON' else null end,
row_number() over (partition by regid
order by case when is_ren = 1 then 0
when is_con = 1 then 1 else 2 end,
event_date)
from table_a
)
select p.regid, p.event_date,
count(case when b.sess_start_dt between p.event_date - 30 and p.event_date
then 1 end) as days_0_30,
count(case when b.sess_start_dt between p.event_date - 60 and p.event_date - 31
then 1 end) as days_31_60,
p.code
from prep p inner join table_b b on p.regid = b.regid
where rn = 1
group by p.regid, p.event_date, p.code
;
Output:
REGID EVENT_DATE DAYS_0_30 DAYS_31_60 COD
---------- ------------------- ---------- ---------- ---
1234 2014-09-27 00:00:00 2 2 REN
2431 2014-06-27 00:00:00 1 0 CON

Return a value when a different value changes

I have a query, which returns the following, EXCEPT for the last column, which is what I need to figure out how to create. For each given ObservationID I need to return the date on which the status changes; something like a LEAD() function that would take conditions and not just offsets. Can it be done?
I need to calculate the column Change Date; it should be the last date the status was not the current status.
+---------------+--------+-----------+--------+-------------+
| ObservationID | Region | Date | Status | Change Date | <-This field
+---------------+--------+-----------+--------+-------------+
| 1 | 10 | 1/3/2012 | Ice | 1/4/2012 |
| 2 | 10 | 1/4/2012 | Water | 1/6/2012 |
| 3 | 10 | 1/5/2012 | Water | 1/6/2012 |
| 4 | 10 | 1/6/2012 | Gas | 1/7/2012 |
| 5 | 10 | 1/7/2012 | Ice | |
| 6 | 20 | 2/6/2012 | Water | 2/10/2012 |
| 7 | 20 | 2/7/2012 | Water | 2/10/2012 |
| 8 | 20 | 2/8/2012 | Water | 2/10/2012 |
| 9 | 20 | 2/9/2012 | Water | 2/10/2012 |
| 10 | 20 | 2/10/2012 | Ice | |
+---------------+--------+-----------+--------+-------------+
a model clause (10g+) can do this in a compact way:
SQL> create table observation(ObservationID , Region ,obs_date, Status)
2 as
3 select 1, 10, date '2012-03-01', 'Ice' from dual union all
4 select 2, 10, date '2012-04-01', 'Water' from dual union all
5 select 3, 10, date '2012-05-01', 'Water' from dual union all
6 select 4, 10, date '2012-06-01', 'Gas' from dual union all
7 select 5, 10, date '2012-07-01', 'Ice' from dual union all
8 select 6, 20, date '2012-06-02', 'Water' from dual union all
9 select 7, 20, date '2012-07-02', 'Water' from dual union all
10 select 8, 20, date '2012-08-02', 'Water' from dual union all
11 select 9, 20, date '2012-09-02', 'Water' from dual union all
12 select 10, 20, date '2012-10-02', 'Ice' from dual ;
Table created.
SQL> select ObservationID, obs_date, Status, status_change
2 from observation
3 model
4 dimension by (Region, obs_date, Status)
5 measures ( ObservationID, obs_date obs_date2, cast(null as date) status_change)
6 rules (
7 status_change[any,any,any] = min(obs_date2)[cv(Region), obs_date > cv(obs_date), status != cv(status)]
8 )
9 order by 1;
OBSERVATIONID OBS_DATE STATU STATUS_CH
------------- --------- ----- ---------
1 01-MAR-12 Ice 01-APR-12
2 01-APR-12 Water 01-JUN-12
3 01-MAY-12 Water 01-JUN-12
4 01-JUN-12 Gas 01-JUL-12
5 01-JUL-12 Ice
6 02-JUN-12 Water 02-OCT-12
7 02-JUL-12 Water 02-OCT-12
8 02-AUG-12 Water 02-OCT-12
9 02-SEP-12 Water 02-OCT-12
10 02-OCT-12 Ice
fiddle: http://sqlfiddle.com/#!4/f6687/1
i.e. we will dimension on region, date and status as we want to look at cells with the same region, but get the first date that the status differs on.
we also have to measure date too so i created an alias obs_date2 to do that, and we want a new column status_change to hold the date the status changed.
this line is the line that does all the working out for us:
status_change[any,any,any] = min(obs_date2)[cv(Region), obs_date > cv(obs_date), status != cv(status)]
it says, for our three dimensions, only look at the rows with the same region (cv(Region),) and look at rows where the date follows the date of the current row (obs_date > cv(obs_date)) and also the status is different from the current row (status != cv(status)) finally get the minimum date that satisfies this set of conditions (min(obs_date2)) and assign it to status_change. The any,any,any part on the left means this calculation applies to all rows.
I've tried many times to understand the MODEL clause and never really quite managed it, so thought I would add another solution
This solution takes some of what Ronnis has done but instead uses the IGNORE NULLS clause of the LEAD function. I think that this is only new with Oracle 11 but you could probably replace it with the FIRST_VALUE function for Oracle 10 if necessary.
select
observation_id,
region,
observation_date,
status,
lead(case when is_change = 'Y' then observation_date end) ignore nulls
over (partition by region order by observation_date) as change_observation_date
from (
select
a.observation_id,
a.region,
a.observation_date,
a.status,
case
when status = lag(status) over (partition by region order by observation_date)
then null
else 'Y' end as is_change
from observations a
)
order by 1
I frequently do this when cleaning up overlapping from/to-dates and duplicate rows.
Your case is much simpler though, since you only have the "from-date" :)
Setting up the test data
create table observations(
observation_id number not null
,region number not null
,observation_date date not null
,status varchar2(10) not null
);
insert
into observations(observation_id, region, observation_date, status)
select 1, 10, date '2012-03-01', 'Ice' from dual union all
select 2, 10, date '2012-04-01', 'Water' from dual union all
select 3, 10, date '2012-05-01', 'Water' from dual union all
select 4, 10, date '2012-06-01', 'Gas' from dual union all
select 5, 10, date '2012-07-01', 'Ice' from dual union all
select 6, 20, date '2012-06-02', 'Water' from dual union all
select 7, 20, date '2012-07-02', 'Water' from dual union all
select 8, 20, date '2012-08-02', 'Water' from dual union all
select 9, 20, date '2012-09-02', 'Water' from dual union all
select 10, 20, date '2012-10-02', 'Ice' from dual;
commit;
The below query has three points of interest:
Identifying repeated information (the recording show the same as previous recording)
Ignoring the repeated recordings
Determining the date from the "next" change
.
with lagged as(
select a.*
,case when status = lag(status, 1) over(partition by region
order by observation_date)
then null
else rownum
end as change_flag -- 1
from observations a
)
select observation_id
,region
,observation_date
,status
,lead(observation_date, 1) over(
partition by region
order by observation_date
) as change_date --3
,lead(observation_date, 1, sysdate) over(
partition by region
order by observation_date
) - observation_date as duration
from lagged
where change_flag is not null -- 2
;