Oracle deduping

Oracle deduping - sql

I have the following data:
ID ID2 DATE
A AA 2017-01-01
A BB 2017-01-01
A CC 2017-01-01
B DD 2018-01-01
B DD 2018-01-01
C EE 2018-02-01
I would like to dedupe by ID keeping only one ID2 and one date per row. I am trying this sql command, but it doesn't dedupe:
SELECT DISTINCT A.ID, A.ID2, A.DATE
FROM TABLE A
GROUP BY A.ID;
Any help will be appreciated.

As it seems that you don't care which ID2 and DATUM (as you can't name a column "DATE"; it is reserved for the datatype) you'd want to keep, a simple option is
SQL> with test (id, id2, datum) as
2 (select 'a', 'aa', date '2017-01-01' from dual union all
3 select 'a', 'bb', date '2017-01-01' from dual union all
4 select 'a', 'cc', date '2017-01-01' from dual union all
5 select 'b', 'dd', date '2018-01-01' from dual union all
6 select 'b', 'dd', date '2018-01-01' from dual union all
7 select 'c', 'ee', date '2018-02-01' from dual
8 )
9 select id, min(id2) id2, min(datum) datum
10 from test
11 group by id;
ID ID2 DATUM
--- --- ----------
a aa 2017-01-01
b dd 2018-01-01
c ee 2018-02-01
SQL>

You can use row_number():
select id, id2, date
from (select t.*, row_number() over (partition by id order by id) as seqnum
from t
) t
where seqnum = 1;
You can change the order by if there is a particular date that you want, such as the minimum or maximum date.

Related

Date Entry of SQL ORACLE

Here I am using Oracle SQL and I have a table with 2 columns, Keyword and Created_Date.
Is there any way to get the 3rd column with information of next entry of 2nd column in accordance with first column?
Thanks guys

Looks like the LEAD analytic function. Sample data in lines #1 - 10; query begins at line #11.
SQL> with test (keyword, datum) as
2 (select 'A', date '2021-01-18' from dual union all
3 select 'A', date '2021-04-26' from dual union all
4 select 'B', date '2021-03-01' from dual union all
5 select 'B', date '2021-04-26' from dual union all
6 select 'B', date '2021-03-01' from dual union all
7 select 'C', date '2021-02-24' from dual union all
8 select 'C', date '2021-02-24' from dual union all
9 select 'C', date '2021-08-04' from dual
10 )
11 select keyword,
12 datum,
13 lead(datum) over (order by keyword, datum) next_entry_date
14 from test
15 order by keyword, datum;
KEYWORD DATUM NEXT_ENTRY
-------- ---------- ----------
A 18.01.2021 26.04.2021
A 26.04.2021 01.03.2021
B 01.03.2021 01.03.2021
B 01.03.2021 26.04.2021
B 26.04.2021 24.02.2021
C 24.02.2021 24.02.2021
C 24.02.2021 04.08.2021
C 04.08.2021
8 rows selected.
SQL>

SQL - Selecting rows based on date difference

Suppose we have below table:
Code
Dt
c1
2020-10-01
c1
2020-10-05
c1
2020-10-09
c1
2020-10-10
c1
2020-10-20
c2
2020-10-07
c2
2020-10-09
c2
2020-10-15
c2
2020-10-16
c2
2020-10-20
c2
2020-10-24
Combination of Code and Dt is unique. Rows are sorted based on Code and Dt. Database is Oracle 12.
For every code, I want to get list of its Dts that each Dt is grater than 7 days compared to previously selected Dt. Therefore, result should be:
Code
Dt
c1
2020-10-01
c1
2020-10-09
c1
2020-10-20
c2
2020-10-07
c2
2020-10-15
c2
2020-10-24
I've tried self join based on row_number() to join every row with its previous row if date difference is grater than 7. But there is a challenge that each row should be compared with previously selected row and not its previous row in table. Any solutions? Thanks

You can solve this relatively easily using match_recognize
with data(code, dt) as (
select 'c1', to_date('2020-10-01', 'YYYY-MM-DD') from dual union all
select 'c1', to_date('2020-10-05', 'YYYY-MM-DD') from dual union all
select 'c1', to_date('2020-10-09', 'YYYY-MM-DD') from dual union all
select 'c1', to_date('2020-10-10', 'YYYY-MM-DD') from dual union all
select 'c1', to_date('2020-10-20', 'YYYY-MM-DD') from dual union all
select 'c2', to_date('2020-10-07', 'YYYY-MM-DD') from dual union all
select 'c2', to_date('2020-10-09', 'YYYY-MM-DD') from dual union all
select 'c2', to_date('2020-10-15', 'YYYY-MM-DD') from dual union all
select 'c2', to_date('2020-10-16', 'YYYY-MM-DD') from dual union all
select 'c2', to_date('2020-10-20', 'YYYY-MM-DD') from dual union all
select 'c2', to_date('2020-10-24', 'YYYY-MM-DD') from dual
)
select *
from data match_recognize (
partition by code
order by dt
measures
init.dt dt
one row per match
pattern (init less_than_7_days*)
define
less_than_7_days as less_than_7_days.dt - init.dt < 7
)
You just partition by code, order by dates and then get any row init and 0-many following rows (less_than_7_days*) that have date difference less than 7 (compared with init). You return 1 row for this whole match (init + following rows) that will contain date from init

Looks like a case for a hierahical query.
Compute pairs and traverse the chain
with pairs(Code, Dt, dtnext) as (
select t1.Code, t1.dt, Min(t2.dt)
from tbl t1
join tbl t2 on t1.code=t2.code and t2.dt >= t1.dt + INTERVAL '7' DAY
group by t1.Code, t1.dt
),
h(Code, Dtn) as (
select Code, Min(dt)
from tbl
group by Code
union all
select h.Code, p.dtnext
from h
join pairs p on p.code=h.code and p.Dt= h.dtn
)
select *
from h
order by code, dtn
The fiddle
Returns
CODE DTN
c1 01-OCT-20
c1 09-OCT-20
c1 20-OCT-20
c2 07-OCT-20
c2 15-OCT-20
c2 24-OCT-20

How do I need to change my sql to get what I want in this case?

I have a table like following:
id value date
1 5 2015-01-10
2 5 2015-06-13
3 5 2015-09-05
4 11 2015-02-11
5 11 2015-01-10
6 11 2015-01-25
As can be seen, every value appears 3 times with different date. I want to write a query that returns the unique values that has the maximum date, which would be the following for the above table:
id value date
3 5 2015-09-05
4 11 2015-02-11
How could I do it?
This is the updated question:
The real question I am encountering is a little bit more complicated than the simplified version above. I thought I can move a step further once I know the answer to the simplified version, but I guest I was wrong. So, I am updating the question herein.
I have 2 tables like following:
Table 1
id id2 date
1 2 2015-01-10
2 5 2015-06-13
3 9 2015-09-05
4 10 2015-02-11
5 26 2015-01-10
6 65 2015-01-25
Table 2
id id2 data
1 2 A
2 5 A
3 9 A
4 10 B
5 26 B
6 65 B
Here, Table 1 and Table 2 are joined by id2
What I want to get is two records as follows:
id2 date data
9 2015-01-10 A
10 2015-02-11 B

You can use row_number to select the rows with the greatest date per value
select * from (
select t2.id2, t1.date, t2.data,
row_number() over (partition by t2.data order by t1.date desc) rn
from table1 t1
join table2 t2 on t1.id = t2.id2
) t where rn = 1

select a.id, a.value, a.date
from mytable a,
( select id, max(date) maxdate
from mytable b
group by id) b
where a.id = b.id
and a.date = b.maxdate;

Oracle Setup:
CREATE TABLE Table1 ( id, id2, "date" ) AS
SELECT 1, 2, DATE '2015-01-10' FROM DUAL UNION ALL
SELECT 2, 5, DATE '2015-06-13' FROM DUAL UNION ALL
SELECT 3, 9, DATE '2015-09-05' FROM DUAL UNION ALL
SELECT 4, 10, DATE '2015-02-11' FROM DUAL UNION ALL
SELECT 5, 26, DATE '2015-01-10' FROM DUAL UNION ALL
SELECT 6, 65, DATE '2015-01-25' FROM DUAL;
CREATE TABLE Table2 ( id, id2, data ) AS
SELECT 1, 2, 'A' FROM DUAL UNION ALL
SELECT 2, 5, 'A' FROM DUAL UNION ALL
SELECT 3, 9, 'A' FROM DUAL UNION ALL
SELECT 4, 10, 'B' FROM DUAL UNION ALL
SELECT 5, 26, 'B' FROM DUAL UNION ALL
SELECT 6, 65, 'B' FROM DUAL;
Query:
SELECT MAX( t1.id ) KEEP ( DENSE_RANK LAST ORDER BY t1."date" ) AS id,
MAX( t1.id2 ) KEEP ( DENSE_RANK LAST ORDER BY t1."date" ) AS id2,
MAX( t1."date" ) AS "date",
t2.data
FROM Table1 t1
INNER JOIN
Table2 t2
ON ( t1.id = t2.id AND t1.id2 = t2.id2 )
GROUP BY t2.data
Output:
ID ID2 date DATA
---------- ---------- ------------------- ----
3 9 2015-09-05 00:00:00 A
4 10 2015-02-11 00:00:00 B
Query 2:
SELECT id,
id2,
"date",
data
FROM (
SELECT t1.*,
t2.data,
ROW_NUMBER() OVER ( PARTITION BY t2.data ORDER BY t1."date" DESC ) AS rn
FROM Table1 t1
INNER JOIN
Table2 t2
ON ( t1.id = t2.id AND t1.id2 = t2.id2 )
)
WHERE rn = 1;
Output:
ID ID2 date DATA
---------- ---------- ------------------- ----
3 9 2015-09-05 00:00:00 A
4 10 2015-02-11 00:00:00 B

SQL: Earliest Date After Latest Null If Exists

Using T-Sql I am looking to return the min date after the latest null if one exists and simply the min date on any products where there are no nulls.
Table:
DateSold Product
12/31/2012 A
1/31/2013
2/28/2013 A
3/31/2013 A
4/30/2013 A
5/31/2013
6/30/2013 A
7/31/2013 A
8/31/2013 A
9/30/2013 A
12/31/2012 B
1/31/2013 B
2/28/2013 B
3/31/2013 B
4/30/2013 B
5/31/2013 B
6/30/2013 B
7/31/2013 B
8/31/2013 B
9/30/2013 B
For product “A” 6/30/2013 is the desired return while for product “B” 12/31/2012 is desired.
Result:
MinDateSold Product
6/30/2013 A
12/31/2012 B
Any solutions will greatly be appreciated. Thank you.

This does it for me, if there's a GROUP involved, otherwise how do you know whether the NULLs are in the run of A or B products? I realise this may not be exactly what you're after, but I hope it helps anyway.
WITH DATA_IN AS (
SELECT 1 as grp,
convert(DateTime,'12/31/2012') as d_Date,
'A' AS d_ch
UNION ALL
SELECT 1, '1/31/2013', NULL UNION ALL
SELECT 1, '2/28/2013', 'A' UNION ALL
SELECT 1, '3/31/2013', 'A' UNION ALL
SELECT 1, '4/30/2013', 'A' UNION ALL
SELECT 1, '5/31/2013', NULL UNION ALL
SELECT 1, '6/30/2013', 'A' UNION ALL
SELECT 1, '7/31/2013', 'A' UNION ALL
SELECT 1, '8/31/2013', 'A' UNION ALL
SELECT 1, '9/30/2013', 'A' UNION ALL
SELECT 2, '12/31/2012', 'B' UNION ALL
SELECT 2, '1/31/2013', 'B' UNION ALL
SELECT 2, '2/28/2013', 'B' UNION ALL
SELECT 2, '3/31/2013', 'B' UNION ALL
SELECT 2, '4/30/2013', 'B' UNION ALL
SELECT 2, '5/31/2013', 'B' UNION ALL
SELECT 2, '6/30/2013', 'B' UNION ALL
SELECT 2, '7/31/2013', 'B' UNION ALL
SELECT 2, '8/31/2013', 'B' UNION ALL
SELECT 2, '9/30/2013', 'B'
)
SELECT
grp as YourGroup,
(SELECT Min(d_date) -- first date after...
FROM DATA_IN
WHERE d_date>
Coalesce( -- either the latest NULL
(SELECT max(d_Date)
FROM DATA_IN d2
WHERE d2.grp=d1.grp AND d2.d_ch IS NULL
)
, '1/1/1901' -- or a base date if no NULLs
)
) as MinDateSold
FROM DATA_IN d1
GROUP BY grp
Results :
1 2013-06-30 00:00:00.000
2 2012-12-31 00:00:00.000

One approach to this is to count the number of NULL values that appear before a given row for a given value. This divides the ranges into groups. For each group, take the minimum date. And, find the largest minimum date for each product:
select product, minDate
from (select product, NumNulls, min(DateSold) as minDate,
row_number() over (partition by product order by min(DateSold) desc
) as seqnum
from (select t.*,
(select count(*)
from table t2
where t2.product is null and t2.DateSold <= t.DateSold
) as NumNulls
from table t
) t
group by Product, NumNUlls
) t
where seqnum = 1;
In your data, there is no mixing of different products in a range, so this query sort of assumes that is true as well.

SQL - Finding differences in row order of two tables

I have two tables of ID's and dates and I want to order both tables by date and see those ids that are not in the same order
e.g.
table_1
id | date
------------
A 01/01/09
B 02/01/09
C 03/01/09
table_2
id | date
------------
A 01/01/09
B 03/01/09
C 02/01/09
and get the results
B
C
Now admittedly I could just dump the results of an order by query and diff them, but I was wondering if there is an SQL-y way of getting the same results.
Edit to clarify, the dates are not necessarily the same between tables, it's just there to determine an order
Thanks

if the dates are different in TABLE_1 and TABLE_2, you will have to join both tables on their rank. For exemple:
SQL> WITH table_1 AS (
2 SELECT 'A' ID, DATE '2009-01-01' dt FROM dual UNION ALL
3 SELECT 'B', DATE '2009-01-02' FROM dual UNION ALL
4 SELECT 'C', DATE '2009-01-03' FROM dual
5 ), table_2 AS (
6 SELECT 'A' ID, DATE '2009-01-01' dt FROM dual UNION ALL
7 SELECT 'C', DATE '2009-01-02' FROM dual UNION ALL
8 SELECT 'B', DATE '2009-01-03' FROM dual
9 )
10 SELECT t1.ID
11 FROM (SELECT ID, row_number() over(ORDER BY dt) rn FROM table_1) t1
12 WHERE (ID, rn) NOT IN (SELECT ID,
13 row_number() over(ORDER BY dt) rn
14 FROM table_2);
ID
--
B
C

Is it not just the case of joining on the date and comparing the IDs are the same. This assumes that table_1 is the master sequence.
SELECT table_1.id
FROM
table_1
INNER JOIN table_2
on table_1.[date] = table_2.[date]
WHERE table_1.id <> table_2.id
ORDER BY table_1.id

ehm select id from table_1, table_2 where table_1.id = table_2.id and table_1.date <> table_2.date ?

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Oracle deduping - sql

You can use row_number(): select id, id2, date from (select t.*, row_number() over (partition by id order by id) as seqnum from t ) t where seqnum = 1; You can change the order by if there is a particular date that you want, such as the minimum or maximum date.

Related

Date Entry of SQL ORACLE

SQL - Selecting rows based on date difference

How do I need to change my sql to get what I want in this case?

SQL: Earliest Date After Latest Null If Exists

SQL - Finding differences in row order of two tables

Categories

Resources