SQL to deal with Duplicate Record Management

SQL to deal with Duplicate Record Management - sql

I have an Oracle database application which creates lists of duplicate records for people to review, with users being able to deduplcate, leave or mark the record as being different to each other (legitimate duplicates). I need to report on the number of records in this queue.
The challenge is the legitimate duplicates as these are held on a second table of data and are held as two rows for each pair marked up (a≠b). However as duplicate groups can contain a number of duplicates, and records could be marked up a being a legitimate duplicate more than once I need to somehow remove the records that are marked up as legitimate duplicates from all the other records in a candidate group. Hopefully that is making sense.
So a simplified view of the two tables would be:
Duplicate Candidates
Group Key
Dup-1 123
Dup-1 234
Dup-2 123
Dup-2 345
Dup-2 567
Dup-3 234
Dup-3 567
Dup-4 123
Dup-4 567
Dup-4 235
Legitimate Duplicates:
Group Key
A 123
A 234
B 345
B 456
C 123
C 567
D 123
D 235
The results I would like to return from this example would be:
Duplicate Candidates
Group Key
Dup-2 123
Dup-2 345
Dup-2 567
Dup-3 234
Dup-3 567
Dup-4 567
Dup-4 235
Dup-1 would not be returned as Legitimate Group A has both Keys, Dup-2 would be returned as while both Key 123 and 345 are marked up as legitimate duplicates they are not currently marked as different to each other. Dup-3 again should be returned as the two records are not marked as legitimate duplicates. Finally the row Dup-4 123 should not be returned as it is marked up as legitimate duplicate to both of the other records in the group, but they should be returned as they are not marked a legitimate duplicates of each other.
I really need to carry this out in SQL as I will feed this data into a reporting solution (Business Objects or Tablaux) directly. Is anyone able to give me a nudge in the right direction on this. Unfortunately our software is completely black box so I cannot reverse engineer this from the code that deals with this for users.

Using Exists. Return the Candidate if at least one other member of the group is not in any group of Legitimates where the tested Candidate is a member too.
select *
from Candidates c
where exists (
select 1
from Candidates c2
where c2.Grp = c.Grp and c2.Key <> c.Key
and not exists (
select 1
from Legitimate l
where l.Key = c.Key
and exists (
select 1
from Legitimate l2
where l2.Grp = l.Grp and l2.Key = c2.Key
)
)
)
order by Grp, Key

Generate pairs of duplicate candidates using:
SELECT "GROUP",
PRIOR key AS key1,
key AS key2
FROM duplicate_candidates
WHERE LEVEL = 2
CONNECT BY PRIOR "GROUP" = "GROUP"
AND PRIOR key < key
and pairs of legitimate candidates using:
SELECT MIN(key), MAX(key)
FROM legitimate_duplicates
GROUP BY "GROUP"
Then you can find the duplicate pairs that are not in the legitimate candidate pairs and UNPIVOT the pairs and find the DISTINCT keys. In a single query:
SELECT DISTINCT
"GROUP",
key
FROM (
SELECT "GROUP",
PRIOR key AS key1,
key AS key2
FROM duplicate_candidates
WHERE LEVEL = 2
AND (PRIOR key, key) NOT IN (SELECT MIN(key), MAX(key)
FROM legitimate_duplicates
GROUP BY "GROUP")
CONNECT BY PRIOR "GROUP" = "GROUP"
AND PRIOR key < key
)
UNPIVOT (key FOR id IN (key1, key2))
Which, for the sample data:
CREATE TABLE duplicate_candidates ("GROUP", Key) AS
SELECT 'Dup-1', 123 FROM DUAL UNION ALL
SELECT 'Dup-1', 234 FROM DUAL UNION ALL
SELECT 'Dup-2', 123 FROM DUAL UNION ALL
SELECT 'Dup-2', 345 FROM DUAL UNION ALL
SELECT 'Dup-2', 567 FROM DUAL UNION ALL
SELECT 'Dup-3', 234 FROM DUAL UNION ALL
SELECT 'Dup-3', 567 FROM DUAL UNION ALL
SELECT 'Dup-4', 123 FROM DUAL UNION ALL
SELECT 'Dup-4', 567 FROM DUAL UNION ALL
SELECT 'Dup-4', 235 FROM DUAL;
CREATE TABLE Legitimate_Duplicates ("GROUP", Key) AS
SELECT 'A', 123 FROM DUAL UNION ALL
SELECT 'A', 234 FROM DUAL UNION ALL
SELECT 'B', 345 FROM DUAL UNION ALL
SELECT 'B', 456 FROM DUAL UNION ALL
SELECT 'C', 123 FROM DUAL UNION ALL
SELECT 'C', 567 FROM DUAL UNION ALL
SELECT 'D', 123 FROM DUAL UNION ALL
SELECT 'D', 235 FROM DUAL;
Note: GROUP is a reserved word and cannot be an unquoted identifier. It would be better to use a different name for the columns but you can use a quoted identifier (but its not best practice).
Outputs:
GROUP
KEY
Dup-2
123
Dup-2
345
Dup-2
567
Dup-3
234
Dup-3
567
Dup-4
235
Dup-4
567
db<>fiddle here

Related

SQL needed to find if a property exists in multiple counties

I need some SQL to determine if a property exists in multiple counties.
I have a list of distinct property ids and county ids, but I'm not sure how to find if the property exists in more than one county.
TABLE: PROPERTIES
PROPERTYID
COUNTYID
12345
1111
12345
1112
23456
1111
34567
2222
In this example, I need some sql that will only show me property 12345 since it exists in both county 1111 and 1112.
I'm sure there is some easy SQL, but I can't figure it out.

Sample data:
SQL> with properties (propertyid, countryid) as
2 (select 12345, 1111 from dual union all
3 select 12345, 1112 from dual union all
4 select 23456, 1111 from dual union all
5 select 34567, 2222 from dual
6 )
Query:
7 select propertyid
8 from properties
9 group by propertyid
10 having count(distinct countryid) > 1;
PROPERTYID
----------
12345
SQL>

Find the discrepancies in sql

I am using Oracle database.
Suppose that I have a table called "MYTABLE" and it contains the tuples of every dog in the world and the owners:
NAME
OWNER_ID
Aaron
81
Aaron
281
Aaron
404
Michael
81
Michael
281
Michael
404
Brendan
281
Brendan
81
Micon
404
Micon
81
Tyson
404
For DEFAULT every DOG must be associated with 3 different owners, in this case the owners are identified by an id: 81, 281 and 404.
How can I know the dogs that are not associated with 3 rows in the table?
I would like this output:
Brendan
Micon
Tyson
These 3 dogs do not have 3 rows in the given table. They are not associated exactly with owners 81, 281 and 404.

To find out which name is not associated with all three of the 81, 281 and 404 owner_id, you can use conditional aggregation in a HAVING clause:
SELECT name
FROM table_name
GROUP BY name
HAVING COUNT(DISTINCT CASE WHEN OWNER_ID IN (81, 281, 404) THEN OWNER_ID END) < 3
Which, for the sample data:
CREATE TABLE table_name (NAME, OWNER_ID) AS
SELECT 'Aaron', 81 FROM DUAL UNION ALL
SELECT 'Aaron', 281 FROM DUAL UNION ALL
SELECT 'Aaron', 404 FROM DUAL UNION ALL
SELECT 'Michael', 81 FROM DUAL UNION ALL
SELECT 'Michael', 281 FROM DUAL UNION ALL
SELECT 'Michael', 404 FROM DUAL UNION ALL
SELECT 'Brendan', 281 FROM DUAL UNION ALL
SELECT 'Brendan', 81 FROM DUAL UNION ALL
SELECT 'Brendan', 123 FROM DUAL UNION ALL -- Different owner_id
SELECT 'Micon', 404 FROM DUAL UNION ALL
SELECT 'Micon', 81 FROM DUAL UNION ALL
SELECT 'Micon', 81 FROM DUAL UNION ALL -- Duplicate
SELECT 'Tyson', 404 FROM DUAL
Outputs:
NAME
Brendan
Micon
Tyson
If you just want the name where they do not have three different owner_id (any owner_id) then:
SELECT name
FROM table_name
GROUP BY name
HAVING COUNT(DISTINCT OWNER_ID) < 3
Which, for the same sample data, would output:
NAME
Micon
Tyson
If you just want the name where there are not three owner_id (either unique or not) then:
SELECT name
FROM table_name
GROUP BY name
HAVING COUNT(OWNER_ID) < 3
Which, for the same sample data, outputs:
NAME
Tyson
db<>fiddle here

You can GROUP BY the NAME and the use COUNT(*) to exclude al that have less than 3 owners
SELECT "NAME" FROM tab1 GROUP BY "NAME" HAVING COUNT(*) < 3
| NAME |
| :------ |
| Brendan |
| Micon |
| Tyson |
db<>fiddle here

update table column from view in another database in oracle sql developer

i have 2 database in oracle.
DATABASE TABLE/VIEW NAME
digidb1 CUSTOMER_REFERENCE
digidb2 CUST_REF_VIEW
this query will display all data in the table CUSTOMER_REFERENCE from db digidb1.
select * from CUSTOMER_REFERENCE
cust_id brch_code cust_name description
001 001 COMPANY TEST 1 TEST COMPANY 1
002 002 COMPANY TEST 2 TEST COMPANY 2
003 003 COMPANY TEST 3 TEST COMPANY 3
this query will display all data in the view CUST_REF_VIEW from db digidb2.
select * FROM CUST_REF_VIEW
WINBBN CUSTFULLNAME ISINDIVIDUAL MRGDATE
1234 COMPANY TEST 1 N 12-03-20
4567 COMPANY TEST 4 N 12-03-20
8901 COMPANY TEST 2 N 11-03-20
2345 COMPANY TEST 5 Y 10-03-20
6789 COMPANY TEST 3 N 12-03-20
is it possible to update the table(CUSTOMER_REFERENCE) from database(digidb1) with this data?
i want to update cust_id column in CUSTOMER_REFERENCE from digidb1. the data will come from view CUST_REF_VIEW of digidb2.
the condition for updates are:
CUSTFULLNAME is equal to cust_name
MRGDATE is equal to system date/today (12-03-20)
ISINDIVIDUAL is equal to N.
my expected result is:
cust_id brch_code cust_name description
1234 001 COMPANY TEST 1 TEST COMPANY 1
002 002 COMPANY TEST 2 TEST COMPANY 2
6789 003 COMPANY TEST 3 TEST COMPANY 3

If I understood you correctly, you don't actually want to update anything, but select data from those two views by joining them, using certain conditions. If that's so, then:
SQL> with
2 -- sample data
3 customer_reference (cust_id, brch_code, cust_name, description) as
4 (select '001', '001', 'CT1', 'TC1' from dual union all
5 select '002', '002', 'CT2', 'TC2' from dual union all
6 select '003', '003', 'CT3', 'TC3' from dual
7 ),
8 cust_ref_view (winbbn, custfullname, isindividual, mgrdate) as
9 (select '1234', 'CT1', 'N', date '2020-03-12' from dual union all
10 select '4567', 'CT4', 'N', date '2020-03-12' from dual union all
11 select '8901', 'CT2', 'N', date '2020-03-11' from dual union all
12 select '2345', 'CT5', 'Y', date '2020-03-10' from dual union all
13 select '6789', 'CT3', 'N', date '2020-03-12' from dual
14 )
15 -- query you need
16 select case when v.mgrdate = trunc(sysdate)
17 and v.isindividual = 'N'
18 then v.winbbn
19 else r.cust_id
20 end cust_id,
21 --
22 r.brch_code, r.cust_name, r.description
23 from customer_reference r join cust_ref_view v on v.custfullname = r.cust_name;
CUST BRC CUS DES
---- --- --- ---
1234 001 CT1 TC1
002 002 CT2 TC2
6789 003 CT3 TC3
SQL>
Now, depending on what you really call "database", a database link might to be be involved if those really are different databases, e.g.
from customer_reference r join cust_ref_view#db_link_digidb2 v
----------------
this
If it is just about different users (schemas) within the same database, then you'll need to grant (at least) SELECT privilege from one user to another. It also means that you'd need to precede remote view name with its owner name, e.g.
from customer_reference r join digidb2.cust_ref_view v
--------
this
or - a simpler option - to create a synonym in one schema which will point to view in another schema. In that case, line #23 in query I posted would look exactly the same.

Oracle SQL Join query results

I have the following two tables
Table 1: SOURCE_SYSTEM
ID CODE Source ID Source Name
123 111 Monster Dice.com
456 111 Dice ABC COMPANY
456 888 Ticv A2 systems
4566 999 MOnster hgtt solutions
789 222 Monster ABC COMPANY
985 222 Dice Dice.com
Table 2: TARGET_SYSTEM
RECORDID AI CL ID Source Name Op Code
123 111 Dice.com Secondary
456 111 ABC COMPANY Primary
789 222 ABC COMPANY Secondary
985 222 Dice.com Primary
We have a process which runs gets the data from source table and loads into target table. But in here the process has a rule saying the primary row in Target should has the Source Name from Source table where Source ID = ‘Monster’.
Here the following entry in Target is correct
RECORDID AI CL ID Source Name Op Code
123 111 Dice.com Secondary
456 111 ABC COMPANY Primary
But the following is wrong, has Primary Source name is Dice.com which should be ABC COMPANY.
RECORDID AI CL ID Source Name Op Code
789 222 ABC COMPANY Secondary
985 222 Dice.com Primary
So I need a query which can identify all the rows in Target which the same issue.

Why are the two rows for AI_CL_ID = 111 correct? They are wrong according to your specification, because recorded = 123 corresponds to 'Monster' but it has 'Secondary' in your target_system table.
To find all the rows in the target_system table with the wrong op_code you can use the following query. Assumptions: The pair (id, code) is unique target_system; there are no NULLs in any column; the source_name in target_system is always correct (it matches the source_name in source_system when matched by id and code); the marker 'Primary' is special, but there may be other markers besides 'Secondary'.
The solution does not include the rows from "with" to the closing " ) " after the definition of target_system; the WITH clause is used to generate test data within the query itself, but in real life you should simply start with select t.id, ... and hit your base tables or views.
with
source_system ( id, code, source_id, source_name) as (
select 123, 111, 'Monster', 'Dice.com' from dual union all
select 456, 111, 'Dice' , 'ABC COMPANY' from dual union all
select 456, 888, 'Ticv' , 'A2 systems' from dual union all
select 4566, 999, 'MOnster', 'hgtt solutions' from dual union all
select 789, 222, 'Monster', 'ABC COMPANY' from dual union all
select 985, 222, 'Dice' , 'Dice.com' from dual
),
target_system ( recordid, ai_cl_id, source_name, op_code ) AS (
select 123, 111, 'Dice.com' , 'Secondary' from dual union all
select 456, 111, 'ABC COMPANY', 'Primary' from dual union all
select 789, 222, 'ABC COMPANY', 'Secondary' from dual union all
select 985, 222, 'Dice.com' , 'Primary' from dual
)
select t.recordid, t.ai_cl_id, t.source_name, t.op_code
from target_system t inner join source_system s
on t.recordid = s.id and t.ai_cl_id = s.code
where ( s.source_id = 'Monster' and t.op_code != 'Primary' )
or
( s.source_id != 'Monster' and t.op_code = 'Primary' )
order by ai_cl_id, recordid
;
Output (with your inputs; the output is different from that in your post since what you have in your post is wrong, as I explained).
RECORDID AI_CL_ID SOURCE_NAME OP_CODE
---------- ---------- ----------- ---------
123 111 Dice.com Secondary
456 111 ABC COMPANY Primary
789 222 ABC COMPANY Secondary
985 222 Dice.com Primary

How to get min value of a column which is present in three different table for a particular record

I have three tables. These tables need not to have common members.
First is Opt_Out table:
**MemberId** **Opt_out_Date**
123 12-Jun-2014
234 7-Dec-2014
789 10-March-2014`
Second is Cov_End table:
**MemberId** **Cov_End_Date**
123 30-Jun-2014
234 31-Dec-2014
345 30-Sept-2014
891 30-Oct-2014
Third Table is Decsd_Date table
**MemberId** **Deceased_Date**
123 23-Jun-2014
345 17-Sept-2014
456 23-Jun-2014
678 25-Aug-2014
The result should be like this:
**MemberId** **Min_Date**
123 12-Jun-2014
234 7-Dec-2014
345 17-Sept-2014
456 23-Jun-2014
678 25-Aug-2014
789 10-March-2014
891 30-Oct-2014
I want to achieve this result in best possible way and in single query.

You can use theunion alloperator to merge the tables and use the result as a derived table:
SELECT MemberId, MIN(Date) AS "Min_Date"
FROM (
SELECT MemberId, Cov_End_Date AS "Date" FROM Cov_End
UNION ALL
SELECT MemberId, Opt_out_date AS "Date" FROM Opt_Out
UNION ALL
SELECT MemberId, Deceased_Date AS "Date" FROM Decsd_Date
) src
GROUP BY MemberId
Sample SQL Fiddle (using MS SQL 2012)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas