Find duplicates out of multiple columns

Find duplicates out of multiple columns - sql

I have a tricky sql problem. Let me qive you an example
ID1 Name Name2 Name3 Name4
100 Albert Kevin Jon Alex
101 Albert Jon Kevin Alex
102 Albert Georg Alex Babera
103 Albert Stefany
Lets say ID1 gives me a project ID and Name is the main person (Albert). Name2-4 are subgroups of people who worked with Albert. Now I want to count matches between this subgroups. First I want to know exact matches. For example between 100 and 101.
Second is it possible to count how many names matches? Like one match between 101 and 100.
Thanks in advance

I know it is long and not bulletproof but it kind of does the job.
WITH source_t AS
(
SELECT 100 id, 'Albert' name, 'Kevin' name2, 'Jon' name3, 'Alex' name4 FROM DUAL UNION ALL
SELECT 101, 'Albert', 'Jon', 'Kevin', 'Alex' FROM DUAL UNION ALL
SELECT 102, 'Albert', 'Georg', 'Alex', 'Babera' FROM DUAL UNION ALL
SELECT 103, 'Albert', 'Stefany', NULL, NULL FROM DUAL
)
, tab_1 AS
(
SELECT id, name, name2 FROM source_t UNION ALL
SELECT id, name, name3 FROM source_t UNION ALL
SELECT id, name, name4 FROM source_t
)
, tab_2 AS
(
SELECT id
, name
, name2
, ROW_NUMBER() OVER (PARTITION BY id, name ORDER BY name2) AS r_number
FROM tab_1
)
, tab_3 AS
(
SELECT id
, name
, MAX(CASE WHEN r_number = 1 THEN name2 END) AS name2
, MAX(CASE WHEN r_number = 2 THEN name2 END) AS name3
, MAX(CASE WHEN r_number = 3 THEN name2 END) AS name4
FROM tab_2
GROUP BY
id
, name
)
SELECT tab_3.id
, tab_3.name
, tab_3.name2
, tab_3.name3
, tab_3.name4
, tab_4.n_count
FROM tab_3
LEFT JOIN
(
SELECT name
, name2
, name3
, name4
, COUNT(1) AS n_count
FROM tab_3
GROUP BY
name
, name2
, name3
, name4
) tab_4
ON tab_3.name = tab_4.name
and NVL(tab_3.name2, 'NULL') = NVL(tab_4.name2, 'NULL')
and NVL(tab_3.name3, 'NULL') = NVL(tab_4.name3, 'NULL')
and NVL(tab_3.name4, 'NULL') = NVL(tab_4.name4, 'NULL')
;
/*
102 Albert Alex Babera Georg 1
103 Albert Stefany NULL NULL 1
101 Albert Alex Jon Kevin 2
100 Albert Alex Jon Kevin 2
*/

Related

Find all possible combinations column value in ORACLE SQL

Could you please help me to solve this Below Query:
I have below table of data.
EmpNo
Name
City
1
John
US
2
Miranda
US
3
Pete
US
4
Jack
US
5
Kathy
UK
6
Tanni
UK
7
Sally
UAE
I want output as like below:
City
Name1
Name2
US
John
Miranda
US
John
Pete
US
John
Jack
US
Miranda
Pete
US
Miranda
Jack
US
Pete
Jack
UK
Kathy
Tanni
PLSQL we can write block to get this output. But is it possible to get output using SQL code alone?

Looks like a self join.
SQL> with temp (empno, name, city) as
2 (select 1, 'John' , 'US' from dual union all
3 select 2, 'Miranda', 'US' from dual union all
4 select 3, 'Pete' , 'US' from dual union all
5 select 4, 'Jack' , 'US' from dual union all
6 select 5, 'Kathy' , 'UK' from dual union all
7 select 6, 'Tanni' , 'UK' from dual union all
8 select 7, 'Sally' , 'UAE' from dual
9 )
10 select a.city, a.name, b.name
11 from temp a join temp b on a.city = b.city and a.name < b.name
12 order by a.city, a.name;
CIT NAME NAME
--- ------- -------
UK Kathy Tanni
US Jack Miranda
US Jack John
US Jack Pete
US John Pete
US John Miranda
US Miranda Pete
7 rows selected.
SQL>

with
input_table (empno, name, city) as (
select 1, 'John' , 'US' from dual union all
select 2, 'Miranda', 'US' from dual union all
select 3, 'Pete' , 'US' from dual union all
select 4, 'Jack' , 'US' from dual union all
select 5, 'Kathy' , 'UK' from dual union all
select 6, 'Tanni' , 'UK' from dual union all
select 7, 'Sally' , 'UAE' from dual
)
-- end of sample data (for testing only, not part of the query)
-- remove WITH clause and use your actual table name below
select t1.city, t1.name as name1, t2.name as name2
from input_table t1 inner join input_table t2
on t1.city = t2.city and t1.empno < t2.empno
order by t1.empno, t2.empno -- if needed
;
CITY NAME1 NAME2
----- -------- --------
US John Miranda
US John Pete
US John Jack
US Miranda Pete
US Miranda Jack
US Pete Jack
UK Kathy Tanni

Multiple rows to multiple columns for co-applicant name and address for a unique ID

This is the query
SELECT b.ID, e.customername AS "Applicant name",
f.address AS "Applicant address",
x.customername AS "Co-Applicant name",
x.address AS "Co-Applicant address"
FROM table_1 b,
table_2 e,
table_3 f,
(SELECT b.customername, g.agreementid, a.address
FROM table_2 g, table_4 x, table_2 b, table_3 a
WHERE g.ID = x.ID
AND b.customerid = x.custid
AND b.customerid = a.custid
AND x.flag <> 'G') x
WHERE b.custid = e.customerid
AND f.custid = b.lesseeid
AND f.bptype = 'LS'
AND f.mailingaddress = 'Y'
AND b.ID = x.ID
AND b.ID='101'
The data is coming in below format.
+-----+-------+----------+--------------+----------+
| ID | name | address | co-applicant | address |
+-----+-------+----------+--------------+----------+
| 101 | aamir | address1 | rahul | London |
| 101 | aamir | address1 | vijay | Paris |
| 101 | aamir | address1 | sanjay | New York |
+-----+-------+----------+--------------+----------+
I need the data in below format
![ID name address name_1 address name_2 address
101 aamir address1 rahul London vijay Paris
102 Anil address2 Suyash Mumbai Rajesh Delhi Prakash Kolkata]1

You can use PIVOT as following:
SQL> WITH DATAA AS
2 (
3 SELECT 101 ID, 'aamir' name, 'address1' address, 'rahul' co_applicant, 'london' co_address FROM DUAL UNION ALL
4 SELECT 101, 'aamir', 'address1', 'vijay', 'Paris' FROM DUAL UNION ALL
5 SELECT 101, 'aamir', 'address1', 'sanjay', 'New York' FROM DUAL
6 )
7 -- YOUR QUERY STARTS FROM HERE
8 SELECT * FROM
9 (
10 SELECT
11 T.*,
12 ROW_NUMBER() OVER(ORDER BY NULL) AS RN
13 FROM DATAA T
14 ) PIVOT (
15 MAX ( CO_APPLICANT ) AS NAME, MAX ( CO_ADDRESS ) AS ADDRESS
16 FOR RN IN ( 1, 2,3 )
17 );
ID NAME ADDRESS 1_NAME 1_ADDRES 2_NAME 2_ADDRES 3_NAME 3_ADDRES
---------- ----- -------- ------ -------- ------ -------- ------ --------
101 aamir address1 rahul london vijay Paris sanjay New York
SQL>
Note: It will generate only 3 combinations of name and address as oracle do not allow dynamic columns in the query. If there is more than 3 co-applicant exist then it will take only data of 3 co-applicants.
-- UPDATE --
Use PARTITION BY clause in ROW_NUMBER if you multiple IDs as following:
SQL> WITH DATAA AS
2 (
3 SELECT 101 ID, 'aamir' name, 'address1' address, 'rahul' co_applicant, 'london' co_address FROM DUAL UNION ALL
4 SELECT 101, 'aamir', 'address1', 'vijay', 'Paris' FROM DUAL UNION ALL
5 SELECT 101, 'aamir', 'address1', 'sanjay', 'New York' FROM DUAL UNION ALL
6 SELECT 102 ID, 'Tejash' name, 'address2' address, 'chetan' co_applicant, 'london' co_address FROM DUAL UNION ALL
7 SELECT 102, 'Tejash', 'address2', 'nirav', 'Paris' FROM DUAL UNION ALL
8 SELECT 102, 'Tejash', 'address2', 'pulkit', 'New York' FROM DUAL
9 )
10 -- YOUR QUERY STARTS FROM HERE
11 SELECT * FROM
12 (
13 SELECT
14 T.*,
15 ROW_NUMBER() OVER(PARTITION BY ID ORDER BY NULL) AS RN
16 FROM DATAA T
17 ) PIVOT (
18 MAX ( CO_APPLICANT ) AS NAME, MAX ( CO_ADDRESS ) AS ADDRESS
19 FOR RN IN ( 1, 2,3 )
20 );
ID NAME ADDRESS 1_NAME 1_ADDRES 2_NAME 2_ADDRES 3_NAME 3_ADDRES
---------- ------ -------- ------ -------- ------ -------- ------ --------
101 aamir address1 rahul london vijay Paris sanjay New York
102 Tejash address2 chetan london nirav Paris pulkit New York
SQL>
Cheers!!

I guess you can first do the conditional aggregation and then can join back with table -
WITH DATAA AS (SELECT 101 ID, 'aamir' name, 'address1' address, 'rahul' co_applicant, 'london' co_address FROM DUAL
UNION ALL
SELECT 101, 'aamir', 'address1', 'vijay', 'Paris' FROM DUAL
UNION ALL
SELECT 101, 'aamir', 'address1', 'sanjay', 'New York' FROM DUAL
),
TEMP AS (select D.*, ROW_NUMBER() OVER(PARTITION BY name ORDER BY co_applicant) RN from DATAA D)
SELECT ID, NAME, ADDRESS
,MAX(CASE WHEN RN = 1 THEN CO_APPLICANT ELSE NULL END) AS name_1
,MAX(CASE WHEN RN = 1 THEN CO_ADDRESS ELSE NULL END) AS ADDRESS_1
,MAX(CASE WHEN RN = 2 THEN CO_APPLICANT ELSE NULL END) AS name_2
,MAX(CASE WHEN RN = 2 THEN CO_ADDRESS ELSE NULL END) AS ADDRESS_2
,MAX(CASE WHEN RN = 3 THEN CO_APPLICANT ELSE NULL END) AS name_3
,MAX(CASE WHEN RN = 3 THEN CO_ADDRESS ELSE NULL END) AS ADDRESS_3
FROM TEMP
GROUP BY ID, NAME, ADDRESS;
With the clarification as SQL tables represent unordered sets until you specify a deterministic order clause. So I use an order by clause as co_applicant name. Thus from your sample data Sanjay is coming on 2nd col and Vihjay is coming in 3rd col.
Here is the fiddle.

Perform counting for number of occurrence in SQL

Is it common and convenient for SQL to perform such data manipulation, capturing only results in columns satisfying the conditions, and perform counting for number of occurrence? How to write SQL code to generate the desired output (if feasible).
Name is presented only when the conditions (Cond1 to Cond5) are yes.
Desired Input
ID Cond1 Cond2 Cond3 Cond4 Cond5 Name1 Name2 Name3 Name4 Name5
1 No Yes No No Yes (null) Result1 n/a (null) Result2
2 Yes No Yes No Yes Result3 n/a Result4 (null) Result5
Desired Output
ID Counting Name
1 1 Result1
1 2 Result2
2 1 Result3
2 2 Result4
2 3 Result5

This can be done with union all and row_number():
select id, row_number() over(partition by id order by seq) couting, name
from (
select id, name1 name, 1 seq from mytable where cond1 = 'Yes'
union all select id, name2, 2 from mytable where cond2 = 'Yes'
union all select id, name3, 3 from mytable where cond3 = 'Yes'
union all select id, name4, 4 from mytable where cond4 = 'Yes'
union all select id, name5, 5 from mytable where cond5 = 'Yes'
) x
order by id, rn

You can use UNPIVOT with pairs of columns and then filter on the Yes rows and use the ROW_NUMBER analytic function to get the incremental index of the result:
Query:
SELECT id,
ROW_NUMBER() OVER ( PARTITION BY id ORDER BY value ) AS "COUNT",
name
FROM table_name
UNPIVOT ( ( cond, name ) FOR value IN (
( Cond1, Name1 ) AS 'V1',
( Cond2, Name2 ) AS 'V2',
( Cond3, Name3 ) AS 'V3',
( Cond4, Name4 ) AS 'V4',
( Cond5, Name5 ) AS 'V5'
) )
WHERE cond = 'Yes'
Test Data:
CREATE TABLE table_name (
ID NUMBER(10,0) PRIMARY KEY,
Cond1 VARCHAR2(3) CHECK ( Cond1 IN ( 'Yes', 'No' ) ),
Cond2 VARCHAR2(3) CHECK ( Cond2 IN ( 'Yes', 'No' ) ),
Cond3 VARCHAR2(3) CHECK ( Cond3 IN ( 'Yes', 'No' ) ),
Cond4 VARCHAR2(3) CHECK ( Cond4 IN ( 'Yes', 'No' ) ),
Cond5 VARCHAR2(3) CHECK ( Cond5 IN ( 'Yes', 'No' ) ),
Name1 VARCHAR2(10),
Name2 VARCHAR2(10),
Name3 VARCHAR2(10),
Name4 VARCHAR2(10),
Name5 VARCHAR2(10),
CHECK ( ( Cond1 = 'Yes' AND Name1 IS NOT NULL ) OR ( Cond1 = 'No' AND ( Name1 IS NULL OR Name1 = 'n/a' ) ) ),
CHECK ( ( Cond2 = 'Yes' AND Name2 IS NOT NULL ) OR ( Cond2 = 'No' AND ( Name2 IS NULL OR Name2 = 'n/a' ) ) ),
CHECK ( ( Cond3 = 'Yes' AND Name3 IS NOT NULL ) OR ( Cond3 = 'No' AND ( Name3 IS NULL OR Name3 = 'n/a' ) ) ),
CHECK ( ( Cond4 = 'Yes' AND Name4 IS NOT NULL ) OR ( Cond4 = 'No' AND ( Name4 IS NULL OR Name4 = 'n/a' ) ) ),
CHECK ( ( Cond5 = 'Yes' AND Name5 IS NOT NULL ) OR ( Cond5 = 'No' AND ( Name5 IS NULL OR Name5 = 'n/a' ) ) )
);
INSERT INTO table_name ( ID, Cond1, Cond2, Cond3, Cond4, Cond5, Name1, Name2, Name3, Name4, Name5 )
SELECT 1, 'No', 'Yes', 'No', 'No', 'Yes', null, 'Result1', 'n/a', null, 'Result2' FROM DUAL UNION ALL
SELECT 2, 'Yes', 'No', 'Yes', 'No', 'Yes', 'Result3', 'n/a', 'Result4', null, 'Result5' FROM DUAL;
Output:
ID | COUNT | NAME
-: | ----: | :------
1 | 1 | Result1
1 | 2 | Result2
2 | 1 | Result3
2 | 2 | Result4
2 | 3 | Result5
db<>fiddle here

Another option:
SQL> with
2 test (id, cond1, cond2, cond3, cond4, cond5, name1, name2, name3, name4, name5) as
3 -- your sample data
4 (select 1, 'no' , 'yes', 'no' , 'no', 'yes', null , 'result1', 'n/a' , null, 'result2' from dual union all
5 select 2, 'yes', 'no' , 'yes', 'no', 'yes', 'result3', 'n/a' , 'result4', null, 'result5' from dual
6 ),
7 temp as
8 -- values whose COND column is 'yes'
9 (select id,
10 decode(cond1, 'yes', name1) n1,
11 decode(cond2, 'yes', name2) n2,
12 decode(cond3, 'yes', name3) n3,
13 decode(cond4, 'yes', name4) n4,
14 decode(cond5, 'yes', name5) n5
15 from test
16 ),
17 up as
18 -- unpivot data
19 (select *
20 from temp
21 unpivot (c_name for pc in (n1, n2, n3, n4, n5))
22 )
23 -- final result
24 select id,
25 row_number() over (partition by id order by c_name) counting,
26 c_name as name
27 from up
28 order by id;
ID COUNTING NAME
---------- ---------- -------
1 1 result1
1 2 result2
2 1 result3
2 2 result4
2 3 result5
SQL>

You can use CONNECT BY LEVEL for achieving the desired result as following:
SELECT
ID,
ROW_NUMBER() OVER(PARTITION BY ID ORDER BY LVL) AS "Counting",
NAME_ AS "Name"
FROM
(SELECT
T.ID,
DECODE(LVL, 1, COND1, 2, COND2, 3, COND3, 4, COND4, 5, COND5) AS COND,
DECODE(LVL, 1, NAME1, 2, NAME2, 3, NAME3, 4, NAME4, 5, NAME5) AS NAME_,
LVL AS LVL
FROM
YOUR_TABLE T join
(Select level as lvl from dual CONNECT BY LEVEL <= 5) on (1=1)
)
WHERE COND = 'Yes';
Cheers!!

Here is another option using UNPIVOT.
create table mytab(id number,
cond1 varchar2(3),
cond2 varchar2(3),
cond3 varchar2(3),
cond4 varchar2(3),
cond5 varchar2(3),
Name1 varchar2(7),
Name2 varchar2(7),
Name3 varchar2(7),
Name4 varchar2(7),
Name5 varchar2(7));
insert into mytab values(1,'No','Yes','No','No','Yes',null,'Result1','n/a',null,'Result2');
insert into mytab values(2,'Yes','No','Yes','No','Yes','Result3','n/a','Result4',null,'Result5');
commit;
select * from mytab;
Output:
ID COND1 COND2 COND3 COND4 COND5 NAME1 NAME2 NAME3 NAME4 NAME5
1 No Yes No No Yes (null) Result1 n/a (null) Result2
2 Yes No Yes No Yes Result3 n/a Result4 (null) Result5
UNPIVOT based solution.
with ns as (
select id,
n,
names
from mytab
unpivot(names for n in (name1 as 'n1',
name2 as 'n2',
name3 as 'n3',
name4 as 'n4',
name5 as 'n5'))),
cs as (
select id,
n,
condns
from mytab
unpivot(condns for n in (cond1 as 'n1',
cond2 as 'n2',
cond3 as 'n3',
cond4 as 'n4',
cond5 as 'n5')))
select ns.id,
row_number() over(partition by ns.id order by ns.n) counting,
ns.names
from ns inner join cs
on ns.id = cs.id
and ns.n = cs.n
and cs.condns = 'Yes'
order by 1,2;
Output:
ID COUNTING NAMES
1 1 Result1
1 2 Result2
2 1 Result3
2 2 Result4
2 3 Result5

Counting how many fields are the same in a row

I have data in a table organised as follows:
ID, name1, name2, name3, name4, name5, name6
Sample data
select ID, name1, name2, name3, name4, name5, name6
from datatable
123, bob, mark, jane, bob, jane, fred
124, mark, mark, mark, bob, bob, bob
and I need to end up with something like
123, bob, 2
123, mark, 1
123, jane, 2
123, fred, 1
124, mark, 3
124, bob, 3
Where the count is the number of times a name appears in the record. Is this possible?

The real reason you're having a problem here is because you have de-normalised data. Instead of 6 Name columns, you should have 1 column (called [Name]) and then another column to denote the "number" (ID?).
You can do this on the fly, however, using VALUES to Unpivot your data, and then perform a COUNT, but I strongly recommend you fix your table design in the long run:
SELECT DT.ID,
V.[Name],
COUNT(V.[Name]) AS Names
FROM dbo.DataTable DT
CROSS APPLY (VALUES(DT.name1),DT.(name2),(DT.name3),(DT.name4),(DT.name5),(DT.name6))V([Name])
GROUP BY DT.ID,
V.[Name];

Try this:
declare #tbl table (ID int, name1 varchar(10), name2 varchar(10), name3 varchar(10), name4 varchar(10), name5 varchar(10), name6 varchar(10))
insert into #tbl values
(123, 'bob', 'mark', 'jane', 'bob', 'jane', 'fred'),
(124, 'mark', 'mark', 'mark', 'bob', 'bob', 'bob')
select id, [name], count(*) cnt from (
select ID, name1 [name] from #tbl
union all
select ID, name2 from #tbl
union all
select ID, name3 from #tbl
union all
select ID, name4 from #tbl
union all
select ID, name5 from #tbl
union all
select ID, name6 from #tbl
) a group by id, [name]

Using a union approach:
WITH cte AS (
SELECT ID, name1 AS name FROM yourTable UNION ALL
SELECT ID, name2 FROM yourTable UNION ALL
SELECT ID, name3 FROM yourTable UNION ALL
SELECT ID, name4 FROM yourTable UNION ALL
SELECT ID, name5 FROM yourTable UNION ALL
SELECT ID, name6 FROM yourTable
)
SELECT
ID,
name,
COUNT(*) AS Count
FROM cte
GROUP BY
ID,
name
ORDER BY
ID,
name;
Demo

How to copy column names if a column equal to something

I want to
If a column is 1, then copy the column name (to a new column). For example, for ID 1, the Name1 is 1, then we copy 'Name1' (to the 'Name' column). Else, do nothing.
If two columns (Name1, Name2) are both 1, then we will have two rows for each name. For example, ID 3.
Input
ID Name1 Name2
1 1 0
2 0 1
3 1 1
Output
ID Name
1 Name1
2 Name2
3 Name1
3 Name2
Do I need some advanced keywords to do that?

You should be able to use the UNPIVOT function to get the result. This converts your columns into rows, then you can filter the final result based on whether the value of the original column is 0 or 1:
select Id, Name
from <yourtable>
unpivot
(
value for
name in (Name1, Name2)
) u
where value <> 0
Here is a demo

One way is using union all
select id,
'Name1' as name
from your_table
where name1 = 1
union all
select id,
'Name2' as name
from your_table
where name2 = 1
You could also use cross apply if there are more columns:
select t.id, x.name
from your_table t
cross apply (
values (case when t.name1 = 1 then 'Name1' end),
(case when t.name2 = 1 then 'Name2' end),
(case when t.name3 = 1 then 'Name3' end)
) x (name)
where x.name is not null;
Demo

You can use cross apply and get this as below:
select Id, nam as [Name] from #yournames
cross apply ( values (name1, 'name1'),(name2, 'name2')) v(n, nam)
where n = 1
Output:
+----+-------+
| Id | Name |
+----+-------+
| 1 | name1 |
| 2 | name2 |
| 3 | name1 |
| 3 | name2 |
+----+-------+

If there are only 3 columns, use union
select id, 'Name1' as Name from Input where Name1=1
union all
select id, 'Name2' as Name from Input where Name2=1

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Find duplicates out of multiple columns - sql

Related

Find all possible combinations column value in ORACLE SQL

Multiple rows to multiple columns for co-applicant name and address for a unique ID

Perform counting for number of occurrence in SQL

Counting how many fields are the same in a row

How to copy column names if a column equal to something

Categories

Resources