Remove Duplicates concatenating columns - sql

I have the data in Table A like this as i/p
Col A | Col B | Col C
PG_1100000357_1100000356 | 1100000357 | 1100000356
PG_1100000356_1100000357 | 1100000356 | 1100000357
PG_10909099_12990909 | 10909099 | 12990909
PG_8989898_79797987 | 8989898 | 79797987
PG_8989898_79797987 | 8989898 | 79797987
I need to write a query to recieve the o/p as -
1) Remove the exact duplicates from the i/p when it matches with another record. (examples 4th & 5th record)
2) We need to consider concatenation of COl B, COl C to concatenation of Col c, Col B and remove that duplicate also. (1st and 2nd record)
note :- COl A is arrived by CONTACT(PG_,Col B,'_',Col c) and dont worry about that
Col A | Col B | Col C
PG_1100000357_1100000356 | 1100000357 | 1100000356
PG_10909099_12990909 | 10909099 | 12990909
PG_8989898_79797987 | 8989898 | 79797987
Could you please help me? Thanks very much in advance.

--Oracle SQL: row_number().
--Least and Greatest functions will work regardless Col_B and Col_C have number or varchar2 data type
with s (Col_A, Col_B, Col_C) as (
select 'PG_1100000357_1100000356', 1100000357, 1100000356 from dual union all
select 'PG_1100000356_1100000357', 1100000356, 1100000357 from dual union all
select 'PG_10909099_12990909' , 10909099 , 12990909 from dual union all
select 'PG_8989898_79797987' , 8989898 , 79797987 from dual union all
select 'PG_8989898_79797987' , 8989898 , 79797987 from dual)
select Col_A, Col_B, Col_C
from
(select s.*,
row_number () over (partition by least(Col_B, Col_C), greatest(Col_B, Col_C) order by Col_B desc) rn
from s
)
where rn = 1;
COL_A COL_B COL_C
------------------------ ---------- ----------
PG_8989898_79797987 8989898 79797987
PG_10909099_12990909 10909099 12990909
PG_1100000357_1100000356 1100000357 1100000356

It's not right to hold the same data in the multiple columns. The values of the Col_B and Col_C are already exist in Col_A, you just need to split them, and then apply group by with least and greatest functions as #akk0rd87 suggested and considering the previous tag oracle :
with Table_A(Col_A) as
(
select 'PG_1100000357_1100000356' from dual union all
select 'PG_1100000356_1100000357' from dual union all
select 'PG_10909099_12990909' from dual union all
select 'PG_8989898_79797987' from dual union all
select 'PG_8989898_79797987' from dual
), t as
(
select regexp_substr(Col_A, '[^_]+', 1, 1) col_one,
regexp_substr(Col_A, '[^_]+', 1, 2) col_two,
regexp_substr(Col_A, '[^_]+', 1, 3) col_three
from Table_A
)
select max(concat(concat(col_one||'-',least(col_two,col_three)||'-'),
greatest(col_two,col_three)))
as Col_A,
least(col_two,col_three) as Col_B, greatest(col_two,col_three) as Col_C
from t
group by least(col_two,col_three), greatest(col_two,col_three);
Demo

The below SQL query returns result as you expected
;WITH CTE AS( SELECT ColA, ColB, ColC,
ROW_NUMBER() OVER (PARTITION BY CASE WHEN ColB > ColC THEN ColB ELSE ColC END ORDER BY ColB) RN
FROM TableA )
SELECT ColA, ColB, ColC FROM CTE WHERE RN =1

Related

Find value with same value in string ORACLE

We have a table like this:
Column A
Column B
Column C
Cell 1
201453
1000
Cell 2
201232
1000
Cell 3
213231
2000
Cell 2
201233
3000
Cell 1
200032
1000
Column A - may be repeated
Column B - unique
Column C - may be repeated
How do I find value (column A), which have same value (column C)?
I don't get it...
If I'm getting the request right, you need to know each value os columnA that has the same value in ColumnC:
select LISTAGG("Column A",',') WITHIN GROUP (ORDER BY "Column A") from table1
group by "Column C"
test
http://sqlfiddle.com/#!4/303780/5
Apparently, you need to count the duplicates for values of col_a vs. col_c pair . So, using a HAVING clause along with grouping by those two columns will do the trick such as
SELECT col_a
FROM t
GROUP BY col_a,col_c
HAVING COUNT(*) > 1
Demo
If you are looking for rows having the same (repeated) combination of COL_A and COL_C then try this:
WITH
tbl AS
(
Select 1 "COL_A", 201453 "COL_B", 1000 "COL_C" From Dual UNION ALL
Select 2 "COL_A", 201232 "COL_B", 1000 "COL_C" From Dual UNION ALL
Select 3 "COL_A", 213231 "COL_B", 2000 "COL_C" From Dual UNION ALL
Select 2 "COL_A", 201233 "COL_B", 3000 "COL_C" From Dual UNION ALL
Select 1 "COL_A", 200032 "COL_B", 1000 "COL_C" From Dual
)
SELECT
COL_A, COL_B, COL_C
FROM
( Select COL_A, COL_B, COL_C,
Count(COL_B) OVER(Partition By COL_A, COL_C) "COUNT_A_C"
From tbl )
WHERE COUNT_A_C > 1
R e s u l t :
COL_A COL_B COL_C
---------- ---------- ----------
1 201453 1000
1 200032 1000
Used analytic function Count() Over() - more about analytic functions here.
... Or with aggregate function Count(), Group By and Having - the result is the same as above
SELECT t.COL_A, t.COL_B, t.COL_C
FROM tbl t
INNER JOIN
( Select COL_A, COL_C, Count(COL_B) "COUNT_A_C"
From tbl
Group By COL_A, COL_C
Having Count(COL_B) > 1) c ON (c.COL_A = t.COL_A And c.COL_C = t.COL_C)

Oracle SQL - How to Filter Out Duplicate Row Based on 1 Column

So I've Been trying to only get a unique value for 1 column. For Example I have a Table called "TBL"
COL_A COL_B COL_C
1 "HAT" "RED"
2 "HAT" "BLUE"
3 "SHIRT" "BLUE"
3 "SHOES" "GREEN"
I want to get the table to filter out all the duplicates for COL_A so the end result table would look like this - getting rid of 2 rows that have the duplicate 3 IDs. But still keeping all the columns just only filtering out the duplicates for one column.
COL_A COL_B COL_C
1 "HAT" "RED"
2 "HAT" "BLUE
I've tried multiple ways, first with the DISTINCT selector but after digging there still is a row left with the ID and I have also tried to use the GROUP BY selector and it also comes back with 1 row. If I can create a table that only has the unique values for COL_A then I can join that with my other table to only grab all the other columns but the table will only have the unique values for COL_A.
The statements I've tried were
SELECT DISTINCT COL_A FROM TBL
SELECT COL_A FROM TBL GROUP BY COL_A
I would think that one of these statements would be able to give me my result but instead it returns 3 even though the 3 id has 2 rows. If I could get it to return only 1 and 2 I can join that with another table to get all the other data for the unique IDs
COL_A
1
2
3
Any suggestions?
You can go with A IN clause and a Sub-SELECT
SELECT "COL_A", "COL_B", "COL_C" FROM tabl1
WHERE "COL_B" IN (
SELECT
"COL_B"
FROM tabl1
GROUP BY "COL_B"
HAVING COUNT("COL_B") > 1
AND COUNT(DISTINCT "COL_C") > 1)
ORDER BY "COL_A"
COL_A | COL_B | COL_C
----: | :---- | :----
1 | HAT | RED
2 | HAT | "BLUE
db<>fiddle here
You can use NOT IN to find the ones that are not repeated. For example:
select *
from t
where col_a not in (
select col_a from t group by col_a having count(*) > 1
)
Result:
COL_A COL_B COL_C
------ ------ -----
1 HAT RED
2 HAT BLUE
See example at db<>fiddle.
The WITH clause just generates the sample data and, as such, is not part of the answer.
You can get what you want using analytic function (Count() Over()) as in SQL below...
WITH
tbl AS
(
Select 1 "COL_A", 'HAT' "COL_B", 'RED' "COL_C" From Dual Union All
Select 2 "COL_A", 'HAT' "COL_B", 'BLUE' "COL_C" From Dual Union All
Select 3 "COL_A", 'SHIRT' "COL_B", 'BLUE' "COL_C" From Dual Union All
Select 3 "COL_A", 'SHOES' "COL_B", 'GREEN' "COL_C" From Dual
)
-- ------------------------------------------------------------------------------
SELECT COL_A, COL_B, COL_C
FROM ( Select COL_A, COL_B, COL_C, Count(*) OVER(Partition By COL_A Order By COL_A) "ITERATIONS" From tbl )
WHERE ITERATIONS = 1
More about analytic functions and how to use them at >> https://docs.oracle.com/cd/E11882_01/server.112/e41084/functions004.htm#SQLRF06174 - this is the old one but good. Regards...

Duplicate handling using case statement

I have a table tab1. Case 1:if no dups then display col1 data. Case 2: If I find duplicate in col1,then max of sr_no should be considered. While considering this, I need to consider only data='xyz' others should be ignored.
Tab1 structure(not exactly) Col1 Sr Data
Could you please help me with the query. Tried with case condition but not getting desired output.
For example
Col1. Sr. Data.
1234. 1. ABC
1234. 2. MNO
1234. 3. XYZ
1234. 4. ABC
2345. 1. ABC
OUTPUT
Col1. Sr. Data
1234. 3. XYZ (as it is duplicated, select max of sr and data=XYZ)
2345. 1. ABC (As it is unique no checks for max and data=XYZ)
I think you want row_number() with a priority for XYZ:
select t.*
from (select t.*,
row_number() over (partition by col1 order by (case when data = 'XYZ' then 1 else 2 end), sr desc) as seqnum
from t
) t
where seqnum = 1;
Your logic appears to be:
SELECT Col1, Sr, Data
FROM (
SELECT t.*,
CASE max_cnt
WHEN 1
THEN 1
ELSE ROW_NUMBER() OVER ( PARTITION BY Col1 ORDER BY Sr DESC )
END AS rn
FROM (
SELECT t.*,
MAX( cnt ) OVER ( PARTITION BY Col1 ) AS max_cnt
FROM (
SELECT t.*,
COUNT(*) OVER ( PARTITION BY Col1, Data ) AS cnt
FROM table_name t
) t
) t
WHERE max_cnt = 1
OR data = 'XYZ'
)
WHERE rn = 1;
Which, for the sample data:
CREATE TABLE table_name ( Col1, Sr, Data ) AS
SELECT 1234, 1, 'ABC' FROM DUAL UNION ALL
SELECT 1234, 2, 'MNO' FROM DUAL UNION ALL
SELECT 1234, 3, 'XYZ' FROM DUAL UNION ALL
SELECT 1234, 4, 'ABC' FROM DUAL UNION ALL
SELECT 2345, 1, 'ABC' FROM DUAL;
Outputs:
COL1
SR
DATA
1234
3
XYZ
2345
1
ABC
db<>fiddle here

How to group duplicate key field and create new column for the values

Currently I'm doing report using iReport 3.0.0 with java.
I need to query from A table which has ID and Name field join with B table that has Value field.
One Id can has more than one value.. So..
I have table as below;
ID Name Value
==== ==== ====
A ABC 1
A ABC 1
B BCD 1
C CDE 1
I have to read the data as:
ID Name Value
==== ==== ====
A ABC 1,2
B BCD 1
C CDE 1
or
ID Name Value1 Value2
==== ==== ==== ====
A ABC 1 2
B BCD 1
C CDE 1
is it possible to write this with sql without program?
Thanks in advance for your help !
You can use listagg function - it concatenates the values.
with
A as (
select 'A' id, 'ABC' name from dual union all
select 'B' id, 'BCD' name from dual union all
select 'C' id, 'CDE' name from dual
),
B as (
select 'A' id, '1' value from dual union all
select 'A' id, '2' value from dual union all
select 'B' id, '1' value from dual union all
select 'C' id, '1' value from dual
)
select A.id, A.name,
listagg(B.value,',') within group (order by B.value) list_of_values
from A join B on A.id = B.id
group by A.id, A.name
Consider that there is limitation for the lenght listagg result, 4K characters.
Try the following query-:
SELECT Distinct id,Name,Value = STUFF(
(
select ',' + cast(row_number() over(partition by A.id,A.name order by value)as varchar(100))
from YOURTABLENAME A
where A.id = B.id FOR XML PATH('')), 1, 1, ''
)
From
(
select row_number() over(partition by id,name order by value) as r,id,name
from YOURTABLENAME
)B

Merge Columns in Oracle with distinct values

Need help to merge columns in Oracle with distinct values.
I've one table called TEST with below data.
ID ID1 ID2 ID3
1 A B C
1 B P A
2 X Y Z
2 Y Z K
Need output as follows
ID MergedValues
1 A;B;C;P
2 X;Y;Z;K
This solution is close:
SELECT id, listagg(v, ';') WITHIN GROUP (ORDER BY v) AS MergedValues
FROM (
SELECT id, id1 AS v
FROM test
UNION
SELECT id, id2 AS v
FROM test
UNION
SELECT id, id3 AS v
FROM test
) t
GROUP BY id
SQLFiddle
It does not retain the order of encounter of MergedValues as you seem to have requested implicitly, but produces this:
| ID | MERGEDVALUES |
|----|--------------|
| 1 | A;B;C;P |
| 2 | K;X;Y;Z |
You can unpivot the columns into rows, and find the distinct values to remove duplicates:
select distinct id, val
from test
unpivot (val for pos in (id1 as 1, id2 as 2, id3 as 3));
And then apply listagg() to that:
select id,
listagg(val, ';') within group (order by val) as mergedvalues
from (
select distinct id, val
from test
unpivot (val for pos in (id1 as 1, id2 as 2, id3 as 3))
)
group by id
order by id;
With your sample data as a CTE:
with test (ID, ID1, ID2, ID3) as (
select 1, 'A', 'B', 'C' from dual
union all select 1, 'B', 'P', 'A' from dual
union all select 2, 'X', 'Y', 'Z' from dual
union all select 2, 'Y', 'Z', 'K' from dual
)
select id,
listagg(val, ';') within group (order by val) as mergedvalues
from (
select distinct id, val
from test
unpivot (val for pos in (id1 as 1, id2 as 2, id3 as 3))
)
group by id
order by id;
ID MERGEDVALUES
---------- ------------------------------
1 A;B;C;P
2 K;X;Y;Z
If the order within the list needs to match what you showed then it seems almost to be based on the first column the value was seen in, so you can do:
select id,
listagg(val, ';') within group (order by min_pos) as mergedvalues
from (
select id, val, min(pos) as min_pos
from test
unpivot (val for pos in (id1 as 1, id2 as 2, id3 as 3))
group by id, val
)
group by id
order by id;
ID MERGEDVALUES
---------- ------------------------------
1 A;B;P;C
2 X;Y;Z;K
which is closer but has C and P reversed; it isn't clear what should control that. Perhaps there is another column you haven't shown which implies a row order.
Here's my approach:
(Note: after posting, I see this resembles Alex Poole's approach, except that I order the input rows first.)
Order the input rows within each ID: you don't say how, I order by ID1,ID2,ID3
Unpivot the data, assigning numbers from 1 to 3 to the columns
Assign priorities to each value based on row order then column order
When a value appears more than once, keep only the minimum "priority"
Use LISTAGG, ordering by priority.
with data_with_rn as (
select t.*,
row_number() over(partition by id order by ID1,ID2,ID3) rn
from t
)
, unpivoted as (
select id, val,
row_number() over(partition by id order by rn, col) priority
from data_with_rn
unpivot(val for col in(ID1 as 1, ID2 as 2, ID3 as 3))
)
, grouped as (
select id, val, min(priority) priority
from unpivoted
group by id, val
)
select id, listagg(val, ';') within group(order by priority) vals
from grouped
group by id
order by id;
ID VALS
-- --------
1 A;B;C;P
2 X;Y;Z;K