SQL - Selecting rows based on date difference - sql

Suppose we have below table:
Code
Dt
c1
2020-10-01
c1
2020-10-05
c1
2020-10-09
c1
2020-10-10
c1
2020-10-20
c2
2020-10-07
c2
2020-10-09
c2
2020-10-15
c2
2020-10-16
c2
2020-10-20
c2
2020-10-24
Combination of Code and Dt is unique. Rows are sorted based on Code and Dt. Database is Oracle 12.
For every code, I want to get list of its Dts that each Dt is grater than 7 days compared to previously selected Dt. Therefore, result should be:
Code
Dt
c1
2020-10-01
c1
2020-10-09
c1
2020-10-20
c2
2020-10-07
c2
2020-10-15
c2
2020-10-24
I've tried self join based on row_number() to join every row with its previous row if date difference is grater than 7. But there is a challenge that each row should be compared with previously selected row and not its previous row in table. Any solutions? Thanks

You can solve this relatively easily using match_recognize
with data(code, dt) as (
select 'c1', to_date('2020-10-01', 'YYYY-MM-DD') from dual union all
select 'c1', to_date('2020-10-05', 'YYYY-MM-DD') from dual union all
select 'c1', to_date('2020-10-09', 'YYYY-MM-DD') from dual union all
select 'c1', to_date('2020-10-10', 'YYYY-MM-DD') from dual union all
select 'c1', to_date('2020-10-20', 'YYYY-MM-DD') from dual union all
select 'c2', to_date('2020-10-07', 'YYYY-MM-DD') from dual union all
select 'c2', to_date('2020-10-09', 'YYYY-MM-DD') from dual union all
select 'c2', to_date('2020-10-15', 'YYYY-MM-DD') from dual union all
select 'c2', to_date('2020-10-16', 'YYYY-MM-DD') from dual union all
select 'c2', to_date('2020-10-20', 'YYYY-MM-DD') from dual union all
select 'c2', to_date('2020-10-24', 'YYYY-MM-DD') from dual
)
select *
from data match_recognize (
partition by code
order by dt
measures
init.dt dt
one row per match
pattern (init less_than_7_days*)
define
less_than_7_days as less_than_7_days.dt - init.dt < 7
)
You just partition by code, order by dates and then get any row init and 0-many following rows (less_than_7_days*) that have date difference less than 7 (compared with init). You return 1 row for this whole match (init + following rows) that will contain date from init

Looks like a case for a hierahical query.
Compute pairs and traverse the chain
with pairs(Code, Dt, dtnext) as (
select t1.Code, t1.dt, Min(t2.dt)
from tbl t1
join tbl t2 on t1.code=t2.code and t2.dt >= t1.dt + INTERVAL '7' DAY
group by t1.Code, t1.dt
),
h(Code, Dtn) as (
select Code, Min(dt)
from tbl
group by Code
union all
select h.Code, p.dtnext
from h
join pairs p on p.code=h.code and p.Dt= h.dtn
)
select *
from h
order by code, dtn
The fiddle
Returns
CODE DTN
c1 01-OCT-20
c1 09-OCT-20
c1 20-OCT-20
c2 07-OCT-20
c2 15-OCT-20
c2 24-OCT-20

Related

How to get the most repeated value of x column grouped by z column - ORACLE SQL

I have a select that returns a table such as:
weekOfTheYear
mostRepeatedID
01
a
01
b
01
a
02
b
02
b
02
a
and what I need is:
weekOfTheYear
mostRepeatedID
01
a
02
b
so that each week of the year only appears once and the mostRepeatedID for each week, is the value that appears the most.
You can use the DENSE_RANK analytic function to find the rows with the maximum count and then filter to only return the rows with the first rank:
SELECT weekOfTheYear,
mostRepeatedId
FROM table_name
GROUP BY
weekOfTheYear,
mostRepeatedId
ORDER BY
DENSE_RANK() OVER (
PARTITION BY weekOfTheYear
ORDER BY COUNT(*) DESC
)
FETCH FIRST ROW WITH TIES;
Which, for the sample data:
CREATE TABLE table_name (weekOfTheYear, mostRepeatedID) AS
SELECT '01', 'a' FROM DUAL UNION ALL
SELECT '01', 'b' FROM DUAL UNION ALL
SELECT '01', 'a' FROM DUAL UNION ALL
SELECT '02', 'b' FROM DUAL UNION ALL
SELECT '02', 'b' FROM DUAL UNION ALL
SELECT '02', 'a' FROM DUAL UNION ALL
SELECT '03', 'a' FROM DUAL UNION ALL
SELECT '03', 'b' FROM DUAL UNION ALL
SELECT '03', 'c' FROM DUAL;
Outputs:
WEEKOFTHEYEAR
MOSTREPEATEDID
01
a
02
b
03
a
03
b
03
c
Note: If you only want a single row-per-group then use ROW_NUMBER rather than DENSE_RANK and, if you want the minimum count then ORDER BY COUNT(*) rather than ORDER BY COUNT(*) DESC.
fiddle
What you want to have is descendingly sorted counts of each mostRepeatedID grouped by weekOfTheYear values such as
WITH t1 AS
(
SELECT weekOfTheYear, mostRepeatedID, COUNT(*) AS cnt
FROM t -- your table
GROUP BY weekOfTheYear, mostRepeatedID
)
SELECT DISTINCT
weekOfTheYear,
MAX(mostRepeatedID) KEEP (DENSE_RANK FIRST ORDER BY cnt DESC)
OVER (PARTITION BY weekOfTheYear) AS mostRepeatedID
FROM t1
But this case, there's a drawback(if it's not problem for you) that the ties might occur for those counts, then the last alphabetically ordered value is returned(here b) as MAX implies.
Demo

Oracle SQL row concatenation by periods: maximum period

I have the below table:
LAUFD
ID
NEXDT
ORDER_ROW
20140305
C1
20140310
14
20140226
C1
20140305
13
20131125
C1
20131126
12
20131021
C1
20131022
11
20130821
C1
20130828
10
20130814
C1
20130821
9
20130807
C1
20130814
8
20130731
C1
20130807
7
20130724
C1
20130731
6
20130710
C1
20130724
5
20130708
C1
20130709
4
20130624
C1
20130707
3
20130603
C1
20130608
2
20130527
C1
20130603
1
I would like to have the below output:
ID
START
END
C1
20140226
20140310
The logic is: if, ordering ID by order_row, the field NEXDT is equal or equal+1 or equal+2 to the field LAUFD of the next order_row, then continue with the next entry. If not, generate an entry in the output table with the start (earliest LAUFD) and end (latest NEXDT).
Basically, it's the same question as in Oracle SQL row concatenation by periods but I'd like just the latest period as an output.
Looks like this is what you need:
with t (LAUFD, ID, NEXDT, ORDER_ROW) as (
select 20140305,'C1', 20140310, 14 from dual union all
select 20140226,'C1', 20140305, 13 from dual union all
select 20131125,'C1', 20131126, 12 from dual union all
select 20131021,'C1', 20131022, 11 from dual union all
select 20130821,'C1', 20130828, 10 from dual union all
select 20130814,'C1', 20130821, 9 from dual union all
select 20130807,'C1', 20130814, 8 from dual union all
select 20130731,'C1', 20130807, 7 from dual union all
select 20130724,'C1', 20130731, 6 from dual union all
select 20130710,'C1', 20130724, 5 from dual union all
select 20130708,'C1', 20130709, 4 from dual union all
select 20130624,'C1', 20130707, 3 from dual union all
select 20130603,'C1', 20130608, 2 from dual union all
select 20130527,'C1', 20130603, 1 from dual
)
,t1 as (select id, order_row, to_date(laufd,'yyyymmdd') as laufd_dt, to_date(nexdt,'yyyymmdd') as nexdt_dt from t)
select *
from t1
match_recognize (
partition by id
order by order_row desc
measures
min(x.laufd_dt) as dt_start,
max(a.nexdt_dt) as dt_end,
x.laufd_dt-next(x.nexdt_dt) as dates_diff
one row per match
pattern(a x+ y* z*)
define
x as x.order_row=prev(order_row)-1 and prev(laufd_dt)-nexdt_dt<=3
,y as x.order_row=prev(order_row)-1
);
For just the latest period, you could use the previous solution. But instead, look for the first "break". Then only use the rows since that break;
select id, min(laufd), max(nextdt),
row_number() over (partition by id order by min(laufd)) as period
from (select t.*,
sum(case when prev_nextdt >= laufd - interval '2' day then 0 else 1 end) over
(partition by id order by order_row range desc) as grp,
sum(case when prev_nextdt >= laufd - interval '2' day then 0 else 1 end) over (partition by id) as num_grps
from (select t.id, t.order_row, -- any other columns you need
to_date(laufd, 'YYYYMMDD') as laufd,
to_date(nextdt, 'YYYYMMDD') as next_dt,
lag(to_date(nextdt, 'YYYYMMDD')) over (partition by id order by order_row) as prev_nextdt
from t
) t
) t
where num_grps = grp
group by id;
This is basically the same logic. It just keeps the first group.

Last changed Data with T as status in Oracle sql

My Data is given below
In the below sample latest record has T and last occurrence of T was updated on 3-Apr-17 so that row needs to be displayed
EMP EFFDT STATUS
11367 15-Apr-15 A
11367 14-Jun-15 A
11367 10-Aug-15 T
11367 2-Apr-17 A
11367 3-Apr-17 T *
11367 10-Apr-17 T
In the below sample latest record has T and last occurrence of T was updated on 23-Feb-18 so that row needs to be displayed
EMP EFFDT STATUS
20612 4-Sep-16 A
20612 23-Feb-18 T *
20612 20-Jul-18 T
In the below sample latest record has T and that is the only occurrence so display it
EMP EFFDT STATUS
20644 12-Jul-15 A
20644 8-Aug-16 A
20644 6-Oct-16 T*
In the below sample latest record does not has T so no need to display
EMP EFFDT STATUS
21155 18-May-17 T
21155 21-Jun-17 A
21155 13-Mar-18 T
21155 15-Aug-18 A
My Desired Output should be (* marked records)
EMP EFFDT STATUS
11367 3-Apr-17 T
20612 23-Feb-18 T
20644 6-Oct-16 T
This is an island and gap problem.
In the cte you try to found out what island have T as last update (t=0)
SQL DEMO
WITH cte as (
SELECT "EMP",
"EFFDT",
SUM(CASE WHEN "STATUS" <> 'T'
THEN 1
ELSE 0
END) OVER (partition by "EMP" ORDER BY "EFFDT" DESC) as t
FROM Table1
)
SELECT "EMP", MIN("EFFDT") as "EFFDT", MAX('T') as "STATUS"
FROM cte
WHERE t = 0
GROUP BY "EMP"
OUTPUT
| EMP | EFFDT | STATUS |
|-------|-----------------------|--------|
| 11367 | 2017-04-03 00:00:00.0 | T |
| 20612 | 2018-02-23 00:00:00.0 | T |
| 20644 | 2016-10-06 00:00:00.0 | T |
For debug you can try
SELECT *
FROM cte
to see how t values are created
WITH cte1
AS (
SELECT A.*
,lag(STATUS, 1, 0) OVER (
PARTITION BY EMP ORDER BY EFFDT
) AS PRIOR_STATUS
FROM Table1 A
)
SELECT EMP
,STATUS
,MAX(EFFDT) AS EFFDT
FROM cte1 A
WHERE A.STATUS = 'T'
AND A.PRIOR_STATUS <> 'T'
GROUP BY EMP
,STATUS
SQL Fiddle here: http://sqlfiddle.com/#!4/458733/18
alter session set nls_date_format = 'dd-Mon-rr';
Solution (including simulated data in with clause):
with
simulated_data (EMP, EFFDT, STATUS) as (
select 11367, to_date('15-Apr-15'), 'A' from dual union all
select 11367, to_date('14-Jun-15'), 'A' from dual union all
select 11367, to_date('10-Aug-15'), 'T' from dual union all
select 11367, to_date( '2-Apr-17'), 'A' from dual union all
select 11367, to_date( '3-Apr-17'), 'T' from dual union all
select 11367, to_date('10-Apr-17'), 'T' from dual union all
select 20612, to_date( '4-Sep-16'), 'A' from dual union all
select 20612, to_date('23-Feb-18'), 'T' from dual union all
select 20612, to_date('20-Jul-18'), 'T' from dual union all
select 20644, to_date('12-Jul-15'), 'A' from dual union all
select 20644, to_date( '8-Aug-16'), 'A' from dual union all
select 20644, to_date( '6-Oct-16'), 'T' from dual union all
select 21155, to_date('18-May-17'), 'T' from dual union all
select 21155, to_date('21-Jun-17'), 'A' from dual union all
select 21155, to_date('13-Mar-18'), 'T' from dual union all
select 21155, to_date('15-Aug-18'), 'A' from dual
)
-- End of simulated data (for testing only).
-- SQL query (solution) begins BELOW THIS LINE.
select emp, min(effdt) as eff_dt, 'T' as status
from (
select emp, effdt, status,
row_number() over (partition by emp, status
order by effdt desc) as rn,
min(status) keep (dense_rank last order by effdt)
over (partition by emp) as last_status
from simulated_data
)
where last_status = 'T' and status = 'T' and rn <= 2
group by emp
;
Output:
EMP EFF_DT STATUS
---------- --------- ------
11367 03-Apr-17 T
20612 23-Feb-18 T
20644 06-Oct-16 T
Explanation:
In the subquery, we add two columns to the input data. Column RN gives a rank within each partition by EMPNO and STATUS, in descending order by EFFDT. LAST_STATUS used the analytic version of the LAST() function to assign either T or A as the last status for each EMP (and it attaches this value to EVERY row for the EMP, regardless of each row's own STATUS).
In the outer query, we are only interested to retain the EMP where the last status was T. For those rows, we only want to retain the rows where the actual status of the row is in fact T (we know this will always include the last row for that EMP, by the way, and it will have RN = 1). Moreover, we are only interested in those rows where RN is 1 or possibly 2 (if there are at least two rows with status T for that EMP). Of these either one or two rows with status T for a given EMP, we want to get the EARLIEST date. That will be the ONLY date if there is no row with RN = 2 for that partition; otherwise, it will be the date from the earlier row, with RN = 2.
In the outer SELECT we select the EMP, the earliest date, and the status we already know, it is T (so we don't need any work for this - actually it is not clear why the third column is even needed, since it is known beforehand it will be T in all rows).
Assuming that A and T are the only statuses, this should work.
WITH cte1
AS (
SELECT A.EMP, A.EFFDT, A.STATUS
,min(STATUS) OVER (
PARTITION BY EMP ORDER BY EFFDT RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
) AS MIN_STATUS
FROM Table1 A
)
SELECT
cte1.EMP
,MIN(cte1.EFFDT) AS EFFDT
,MIN(cte1.STATUS) as STATUS
FROM cte1
WHERE cte1.MIN_STATUS = 'T'
GROUP BY EMP
EDIT: well, if you have another statues, let's make it more robust. Actually, it's almost the same as juan-carlos-oropeza proposed, but he missed "RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING" part.
Ooops, it IS the same solution: juan-carlos-oropeza used order by DESC istead of unbounded following.
with emp_status_log (EMP, EFFDT, STATUS) as
(
select 11367, to_date('15-Apr-15', 'dd-Mon-yy'), 'A' from dual union all
select 11367, to_date('14-Jun-15', 'dd-Mon-yy'), 'A' from dual union all
select 11367, to_date('10-Aug-15', 'dd-Mon-yy'), 'T' from dual union all
select 11367, to_date( '2-Apr-17', 'dd-Mon-yy'), 'A' from dual union all
select 11367, to_date( '3-Apr-17', 'dd-Mon-yy'), 'T' from dual union all
select 11367, to_date('10-Apr-17', 'dd-Mon-yy'), 'T' from dual union all
select 20612, to_date( '4-Sep-16', 'dd-Mon-yy'), 'A' from dual union all
select 20612, to_date('23-Feb-18', 'dd-Mon-yy'), 'T' from dual union all
select 20612, to_date('20-Jul-18', 'dd-Mon-yy'), 'T' from dual union all
select 20644, to_date('12-Jul-15', 'dd-Mon-yy'), 'A' from dual union all
select 20644, to_date( '8-Aug-16', 'dd-Mon-yy'), 'A' from dual union all
select 20644, to_date( '6-Oct-16', 'dd-Mon-yy'), 'T' from dual union all
select 21155, to_date('18-May-17', 'dd-Mon-yy'), 'T' from dual union all
select 21155, to_date('21-Jun-17', 'dd-Mon-yy'), 'A' from dual union all
select 21155, to_date('13-Mar-18', 'dd-Mon-yy'), 'T' from dual union all
select 21155, to_date('15-Aug-18', 'dd-Mon-yy'), 'A' from dual
)
,
-- End of simulated data (for testing only).
/* SQL query (solution) begins BELOW THIS LINE.
with--*/
cte1 as
(
select sl.*
,sum(decode(sl.STATUS, 'T', 0, 1)) OVER (
PARTITION BY sl.EMP ORDER BY sl.EFFDT RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
) AS non_t_count
from emp_status_log sl
)
select
cte1.emp
, min(cte1.effdt) as effdt
, min(cte1.status) as status
from cte1
where cte1.non_t_count = 0
group by cte1.emp

Oracle deduping

I have the following data:
ID ID2 DATE
A AA 2017-01-01
A BB 2017-01-01
A CC 2017-01-01
B DD 2018-01-01
B DD 2018-01-01
C EE 2018-02-01
I would like to dedupe by ID keeping only one ID2 and one date per row. I am trying this sql command, but it doesn't dedupe:
SELECT DISTINCT A.ID, A.ID2, A.DATE
FROM TABLE A
GROUP BY A.ID;
Any help will be appreciated.
As it seems that you don't care which ID2 and DATUM (as you can't name a column "DATE"; it is reserved for the datatype) you'd want to keep, a simple option is
SQL> with test (id, id2, datum) as
2 (select 'a', 'aa', date '2017-01-01' from dual union all
3 select 'a', 'bb', date '2017-01-01' from dual union all
4 select 'a', 'cc', date '2017-01-01' from dual union all
5 select 'b', 'dd', date '2018-01-01' from dual union all
6 select 'b', 'dd', date '2018-01-01' from dual union all
7 select 'c', 'ee', date '2018-02-01' from dual
8 )
9 select id, min(id2) id2, min(datum) datum
10 from test
11 group by id;
ID ID2 DATUM
--- --- ----------
a aa 2017-01-01
b dd 2018-01-01
c ee 2018-02-01
SQL>
You can use row_number():
select id, id2, date
from (select t.*, row_number() over (partition by id order by id) as seqnum
from t
) t
where seqnum = 1;
You can change the order by if there is a particular date that you want, such as the minimum or maximum date.

SQL - Finding differences in row order of two tables

I have two tables of ID's and dates and I want to order both tables by date and see those ids that are not in the same order
e.g.
table_1
id | date
------------
A 01/01/09
B 02/01/09
C 03/01/09
table_2
id | date
------------
A 01/01/09
B 03/01/09
C 02/01/09
and get the results
B
C
Now admittedly I could just dump the results of an order by query and diff them, but I was wondering if there is an SQL-y way of getting the same results.
Edit to clarify, the dates are not necessarily the same between tables, it's just there to determine an order
Thanks
if the dates are different in TABLE_1 and TABLE_2, you will have to join both tables on their rank. For exemple:
SQL> WITH table_1 AS (
2 SELECT 'A' ID, DATE '2009-01-01' dt FROM dual UNION ALL
3 SELECT 'B', DATE '2009-01-02' FROM dual UNION ALL
4 SELECT 'C', DATE '2009-01-03' FROM dual
5 ), table_2 AS (
6 SELECT 'A' ID, DATE '2009-01-01' dt FROM dual UNION ALL
7 SELECT 'C', DATE '2009-01-02' FROM dual UNION ALL
8 SELECT 'B', DATE '2009-01-03' FROM dual
9 )
10 SELECT t1.ID
11 FROM (SELECT ID, row_number() over(ORDER BY dt) rn FROM table_1) t1
12 WHERE (ID, rn) NOT IN (SELECT ID,
13 row_number() over(ORDER BY dt) rn
14 FROM table_2);
ID
--
B
C
Is it not just the case of joining on the date and comparing the IDs are the same. This assumes that table_1 is the master sequence.
SELECT table_1.id
FROM
table_1
INNER JOIN table_2
on table_1.[date] = table_2.[date]
WHERE table_1.id <> table_2.id
ORDER BY table_1.id
ehm select id from table_1, table_2 where table_1.id = table_2.id and table_1.date <> table_2.date ?