Remove duplicate (combination of 2 columns) values in a row - sql

I have a requirement to remove duplicate values present in a row.
like :
C1 | C2 | C3 | C4 | C5 | C6
----------------------------
1 | 2 | 1 | 2 | 1 | 3
1 | 2 | 1 | 3 | 1 | 4
1 |NULL| 1 |NULL| 1 |NULL
OUTPUT of the query should be:
C1 | C2 | C3 | C4 | C5 | C6
----------------------------
1 | 2 | 1 | 3 |NULL|NULL
1 | 2 | 1 | 3 | 1 | 4
1 |NULL|NULL|NULL|NULL|NULL
As you can see combination of 2 columns should be unique in a row.
in Row 1:
combination of 1/2 is duplicate so its removed and 1/3 is in c5/c6 is moved to c3/c4
in Row 2:
there is no duplicate in the combination of 1/2 , 1/3, 1/4 so no change in the result
in Row 3:
All the 3 combinations are same like 1/NULL is present in all the combinations so c3 to c6 is set to null.
Thanks in advance

Maybe there is a more clever way... but you could convert them to pairs, distinct (union in this case does that), then pivot back.
with pairs as (
select id, c1 as x, c2 as y from mytable
union
select id, c3, c4 from mytable
union
select id, c5, c6 from mytable
)
select id,
max(decode(rn,1,x)) c1,
max(decode(rn,1,y)) c2,
max(decode(rn,2,x)) c3,
max(decode(rn,2,y)) c4,
max(decode(rn,3,x)) c5,
max(decode(rn,3,y)) c6
from (
select id, x, y, row_number() over (partition by id) rn
from pairs
) as foo
group by id

This one works - data included for testing, but might take some time to understand
A tip: un-comment the code snippets under the -- debug lines, copy the script until just these code snippets and paste this part into an SQL prompt to test the intermediate results.
The principle is get a row identifier to "remember" the rows; then to vertically pivot - not 3 columns to one, but 6 columns to 3 pairs of columns; then, use DISTINCT to de-dupe; then get an index within the row identifier of the de-duped intermediate rows; then use that index to pivot horizontally again.
Like so:
WITH
input(c1,c2,c3,c4,c5,c6) AS (
SELECT 1, 2,1, 2,1, 3
UNION ALL SELECT 1, 2,1, 3,1, 4
UNION ALL SELECT 1,NULL::INT,1,NULL::INT,1,NULL::INT
)
,
-- need rowid
input_with_rowid AS (
SELECT ROW_NUMBER() OVER() AS rowid, * FROM input
)
,
-- three groupy of 2 columns, so pivot using 3 indexes
idx3(idx) AS (SELECT 1 UNION SELECT 2 UNION SELECT 3)
,
-- pivot vertically, two columns at a time and de-dupe
pivot_pair AS (
SELECT DISTINCT
rowid
, CASE idx
WHEN 1 THEN c1
WHEN 2 THEN c3
WHEN 3 THEN c5
END AS c1
,
CASE idx
WHEN 1 THEN c2
WHEN 2 THEN c4
WHEN 3 THEN c6
END AS c2
FROM input_with_rowid CROSS JOIN idx3
)
-- debug
-- SELECT * FROM pivot_pair ORDER BY rowid;
,
-- add sequence per rowid
pivot_pair_with_seq AS (
SELECT
rowid
, ROW_NUMBER() OVER(PARTITION BY rowid) AS seq
, c1
, c2
FROM pivot_pair
)
-- debug
-- SELECT * FROM pivot_pair_with_seq;
SELECT
rowid
, MAX(CASE seq WHEN 1 THEN c1 END) AS c1
, MAX(CASE seq WHEN 1 THEN c2 END) AS c2
, MAX(CASE seq WHEN 2 THEN c1 END) AS c3
, MAX(CASE seq WHEN 2 THEN c2 END) AS c4
, MAX(CASE seq WHEN 3 THEN c1 END) AS c5
, MAX(CASE seq WHEN 3 THEN c2 END) AS c6
FROM pivot_pair_with_seq
GROUP BY rowid
ORDER BY rowid
;
rowid|c1|c2|c3|c4|c5|c6
1| 1| 2| 1| 3|- |-
2| 1| 2| 1| 3| 1| 4
3| 1|- |- |- |- |-

Using marcothesane's idea with pivot/unpivot operators. Easier to maintain if more input columns should be deduplicated. This maintains the order of source data (column pairs) - whereas marcothesane's solution might reorder column pairs depening on input data. Also it is a little slower than marcothesane's. It works only in 11R1 and up.
WITH
input(c1,c2,c3,c4,c5,c6) AS (
SELECT 1, 2,1, 2,1, 3 from dual
UNION ALL SELECT 1, 2,1, 3,1, 4 from dual
UNION ALL SELECT 1,NULL ,1,NULL ,1,NULL from dual
)
,
-- need rowid
input_with_rowid AS (
SELECT ROW_NUMBER() OVER (order by 1) AS row_id, input.* FROM input
),
unpivoted_pairs as
(
select row_id, tuple_idx, val1, val2, row_number() over (partition by row_id, val1, val2 order by tuple_idx) as keep_first
from input_with_rowid
UnPivot include nulls(
(val1, val2) --measure
for tuple_idx in ((c1,c2) as 1,
(c3,c4) as 2,
(c5,c6) as 3)
)
)
select row_id,
t1_val1 as c1,
t1_val2 as c2,
t2_val1 as c3,
t2_val2 as c4,
t3_val1 as c5,
t3_val2 as c6
from (
select row_id,
val1, val2, row_number() over (partition by row_id order by tuple_idx) as tuple_order
from unpivoted_pairs
where keep_first = 1
)
pivot (sum(val1) as val1, sum(val2) as val2
for tuple_order in ('1' as t1, '2' as t2, '3' as t3)
)

Related

splitting two columns containing comma separated values in oracle [duplicate]

This question already has answers here:
Splitting string into multiple rows in Oracle
(14 answers)
Closed 2 years ago.
I have two columns in a table with comma separated values, how do it split it into rows?
Would this help?
SQL> with test (col1, col2) as
2 (select 'Little,Foot,is,stupid', 'poor,bastard' from dual union all
3 select 'Green,mile,is,a' , 'good,film,is,it,not?' from dual
4 )
5 select regexp_substr(col1 ||','|| col2, '[^,]+', 1, column_value) str
6 from test cross join
7 table(cast(multiset(select level from dual
8 connect by level <= regexp_count(col1 ||','|| col2, ',') + 1
9 ) as sys.odcinumberlist));
STR
--------------------------------------------------------------------------------
Little
Foot
is
stupid
poor
bastard
Green
mile
is
a
good
film
is
it
not?
15 rows selected.
SQL>
Use a recursive sub-query factoring clause and simple string functions:
WITH splits ( id, c1, c2, idx, start_c1, end_c1, start_c2, end_c2 ) AS (
SELECT id,
c1,
c2,
1,
1,
INSTR( c1, ',', 1 ),
1,
INSTR( c2, ',', 1 )
FROM test_data
UNION ALL
SELECT id,
c1,
c2,
idx + 1,
CASE end_c1 WHEN 0 THEN NULL ELSE end_c1 + 1 END,
CASE end_c1 WHEN 0 THEN NULL ELSE INSTR( c1, ',', end_c1 + 1 ) END,
CASE end_c2 WHEN 0 THEN NULL ELSE end_c2 + 1 END,
CASE end_c2 WHEN 0 THEN NULL ELSE INSTR( c2, ',', end_c2 + 1 ) END
FROM splits
WHERE end_c1 > 0
OR end_c2 > 0
)
SELECT id,
idx,
CASE end_c1
WHEN 0
THEN SUBSTR( c1, start_c1 )
ELSE SUBSTR( c1, start_c1, end_c1 - start_c1 )
END AS c1,
CASE end_c2
WHEN 0
THEN SUBSTR( c2, start_c2 )
ELSE SUBSTR( c2, start_c2, end_c2 - start_c2 )
END AS c2
FROM splits s
ORDER BY id, idx;
So for the test data:
CREATE TABLE test_data ( id, c1, c2 ) AS
SELECT 1, 'a,b,c,d', 'e,f,g' FROM DUAL UNION ALL
SELECT 2, 'h', 'i' FROM DUAL UNION ALL
SELECT 3, NULL, 'j,k,l,m,n' FROM DUAL;
This outputs:
ID | IDX | C1 | C2
-: | --: | :--- | :---
1 | 1 | a | e
1 | 2 | b | f
1 | 3 | c | g
1 | 4 | d | null
2 | 1 | h | i
3 | 1 | null | j
3 | 2 | null | k
3 | 3 | null | l
3 | 4 | null | m
3 | 5 | null | n
db<>fiddle here

BigQuery SQL - a way to pass values from more than one row and more than one column to User Defined Function

I want to create a User Defined Function, (CREATE TEMPORARY FUNCTION) in BigQuery Standard SQL which will accept values aggregated from a bunch of rows.
My schema and table is similar to this:
| c1 | c2 | c3 | c4 |
|=======|=======|=======|=======|
| 1 | 1-1 | 3A | 4A |
| 1 | 1-1 | 3B | 4B |
| 1 | 1-1 | 3C | 4C |
| 1 | 1-2 | 3D | 4D |
| 2 | 2-1 | 3E | 4E |
| 2 | 2-1 | 3F | 4F |
| 2 | 2-2 | 3G | 4G |
| 2 | 2-2 | 3H | 4H |
I can't change the original schema to be made of nested or ARRAY fields.
I want to group by c1 and by c2 and pass values of c3 and c4 to a function, while being able to match between values from c3 and c4 for each row.
One way of doing so is using ARRAY_AGG and pass values as an Array, but ARRAY_AGG is non-deterministic so values from c3 and c4 might come with different orders than the source table.
Example:
CREATE TEMPORARY FUNCTION
tempConcatStrFunction(c3 ARRAY<STRING>, c4 ARRAY<STRING>)
RETURNS STRING
LANGUAGE js AS """
return
c3
.map((item, index) => [ item, c4[index] ].join(','))
.join(',');
""";
WITH T as (
SELECT c1, c2, ARRAY_AGG(c3) as c3, ARRAY_AGG(c4) as c4
GROUP BY c1, c2
)
SELECT c1, c2, tempConcatStrFunction(c3, c4) as str from T
The result should be:
| c1 | c2 | str |
|=======|=======|======================|
| 1 | 1-1 | 3A,4A,3B,4B,3C,4C |
| 1 | 1-2 | 3D,4D |
| 2 | 2-1 | 3E,4E,3F,4F |
| 2 | 2-2 | 3G,4G,3H,4H |
Any ideas how to achieve such results?
Any ideas how to achieve such results?
I understand your question is about how to keep c3 and c4 match each other in final string. How about just keep it super simple as below
SELECT c1, c2, STRING_AGG(CONCAT(c3, ',', c4)) AS str
FROM yourTable
GROUP BY c1, c2
A couple of examples that may help with setting up a query:
WITH T AS (
SELECT 1 AS c1, '1-1' AS c2, '3A' AS c3, '4A' AS c4 UNION ALL
SELECT 1, '1-1', '3B', '4B' UNION ALL
SELECT 1, '1-1', '3C', '4C' UNION ALL
SELECT 1, '1-2', '3D', '4D' UNION ALL
SELECT 2, '2-1', '3E', '4E' UNION ALL
SELECT 2, '2-1', '3F', '4F' UNION ALL
SELECT 2, '2-2', '3G', '4G' UNION ALL
SELECT 2, '2-2', '3H', '4H'
)
SELECT
c1,
c2,
STRING_AGG(CONCAT(c3, ',', c4)) AS str
FROM T
GROUP BY 1, 2;
This takes the unaggregated inputs (as in Mikhail's answer) and does string concatenation.
If the inputs are already aggregated into arrays, ideally they would repeat together, e.g.:
WITH T AS (
SELECT 1 AS c1, '1-1' AS c2, '3A' AS c3, '4A' AS c4 UNION ALL
SELECT 1, '1-1', '3B', '4B' UNION ALL
SELECT 1, '1-1', '3C', '4C' UNION ALL
SELECT 1, '1-2', '3D', '4D' UNION ALL
SELECT 2, '2-1', '3E', '4E' UNION ALL
SELECT 2, '2-1', '3F', '4F' UNION ALL
SELECT 2, '2-2', '3G', '4G' UNION ALL
SELECT 2, '2-2', '3H', '4H'
),
U AS (
SELECT
c1,
c2,
ARRAY_AGG(STRUCT(c3, c4)) AS arr
FROM T
)
SELECT
c1,
c2,
(SELECT STRING_AGG(CONCAT(c3, ',', c4)) FROM UNNEST(arr)) AS str
FROM U
GROUP BY 1, 2;
If the arrays are separate, but have a consistent order (and length), you can recombine them after the fact:
WITH T AS (
SELECT 1 AS c1, '1-1' AS c2, '3A' AS c3, '4A' AS c4 UNION ALL
SELECT 1, '1-1', '3B', '4B' UNION ALL
SELECT 1, '1-1', '3C', '4C' UNION ALL
SELECT 1, '1-2', '3D', '4D' UNION ALL
SELECT 2, '2-1', '3E', '4E' UNION ALL
SELECT 2, '2-1', '3F', '4F' UNION ALL
SELECT 2, '2-2', '3G', '4G' UNION ALL
SELECT 2, '2-2', '3H', '4H'
),
U AS (
SELECT
c1,
c2,
ARRAY_AGG(c3 ORDER BY c3, c4) AS arr3,
ARRAY_AGG(c4 ORDER BY c3, c4) AS arr4
FROM T
GROUP BY 1, 2
)
SELECT
c1,
c2,
(SELECT STRING_AGG(CONCAT(arr4[OFFSET(off)], ',', c3))
FROM UNNEST(arr3) AS c3 WITH OFFSET off) AS str
FROM U;

tSQL UNPIVOT of comma concatenated column into multiple rows

I have a table that has a value column. The value could be one value or it could be multiple values separated with a comma:
id | assess_id | question_key | item_value
---+-----------+--------------+-----------
1 | 859 | Cust_A_1 | 1,5
2 | 859 | Cust_B_1 | 2
I need to unpivot the data based on the item_value to look like this:
id | assess_id | question_key | item_value
---+-----------+--------------+-----------
1 | 859 | Cust_A_1 | 1
1 | 859 | Cust_A_1 | 5
2 | 859 | Cust_B_1 | 2
How does one do that in tSQL on SQL Server 2012?
We have a user defined function that we use for stuff like this that we called "split_delimiter":
CREATE FUNCTION [dbo].[split_delimiter](#delimited_string VARCHAR(8000), #delimiter_type CHAR(1))
RETURNS TABLE AS
RETURN
WITH cte10(num) AS
(
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
)
,cte100(num) AS
(
SELECT 1
FROM cte10 t1, cte10 t2
)
,cte10000(num) AS
(
SELECT 1
FROM cte100 t1, cte100 t2
)
,cte1(num) AS
(
SELECT TOP (ISNULL(DATALENGTH(#delimited_string),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM cte10000
)
,cte2(num) AS
(
SELECT 1
UNION ALL
SELECT t.num+1
FROM cte1 t
WHERE SUBSTRING(#delimited_string,t.num,1) = #delimiter_type
)
,cte3(num,[len]) AS
(
SELECT t.num
,ISNULL(NULLIF(CHARINDEX(#delimiter_type,#delimited_string,t.num),0)-t.num,8000)
FROM cte2 t
)
SELECT delimited_item_num = ROW_NUMBER() OVER(ORDER BY t.num)
,delimited_value = SUBSTRING(#delimited_string, t.num, t.[len])
FROM cte3 t;
GO
It will take a varchar value up to 8000 characters and will return a table with the delimited elements broken into rows. In your example, you'll want to use an outer apply to turn those delimited values into separate rows:
SELECT my_table.id, my_table.assess_id, question_key, my_table.delimited_items.item_value
FROM my_table
OUTER APPLY(
SELECT delimited_value AS item_value
FROM my_database.dbo.split_delimiter(my_table.item_value, ',')
) AS delimited_items

Select first row in each GROUP BY group

I have a requirement in my project that I have this data with me:
C1 | C2 | C3 | C4
A | B | 2 | X
A | B | 3 | Y
C | D | 4 | Q
C | D | 1 | P
Where C1, C2, C3 and C4 are columns name in Database
And I have need to show data like this
C1 | C2 | C3 | C4
A | B | 5 | X
C | D | 5 | Q
The answer to this is fairly simple. Just follow my solution below:
--CREATE THE SAMPLE TABLE
CREATE TABLE TABLE1 (C1 char(1) NULL, C2 char(1) NULL, C3 int NULL, C4 char(1) NULL);
GO
--INSERT THE SAMPLE VALUES
INSERT INTO TABLE1 VALUES ('A', 'B', 2, 'X'), ('A', 'B', 3, 'Y'), ('C', 'D', 4, 'Q'), ('C','D', 1, 'P');
GO
--SELECT SUM(C3) AND GROUP BY ONLY C1 AND C2, THEN SELECT TOP 1 ONLY FROM C4
SELECT
C1,
C2,
SUM(C3) AS C3,
(SELECT TOP(1) C4 FROM TABLE1 AS B WHERE A.C1 = B.C1) AS C4
FROM
TABLE1 AS A
GROUP BY
C1,
C2;
GO
--CLEAN UP THE DATABASE, DROP THE SAMPLE TABLE
IF EXISTS(SELECT name FROM sys.tables WHERE object_id = OBJECT_ID(N'TABLE1')) DROP TABLE TABLE1;
GO
Let me know if this helps.
Assuming you mean the first record ordered by c4 (grouped by c1 and c2), then this will work establishing a row_number and using max with case:
with cte as (
select *,
row_number() over (partition by c1, c2 order by c4) rn
from yourtable
)
select c1, c2, sum(c3), max(case when rn = 1 then c4 end) c4
from cte
group by c1, c2
SQL Fiddle Demo
However, if you don't want to order by c4, then you need some other column to ensure the correct order of the results. Without an order by clause, there's no guarantee on how they are returned.
I hope you choose 'X' and 'Q' as those rows where inserted first, while grouping C1 and C2.
I would suggest you to add an identity column in your table and work based on it as given below.
Table:
DECLARE #DB TABLE (ID INT IDENTITY(1,1),C1 VARCHAR(10),C2 VARCHAR(10),C3 INT,C4 VARCHAR(10))
INSERT INTO #DB VALUES
('A','B',2,'X'),
('A','B',3,'Y'),
('C','D',4,'Q'),
('C','D',1,'P')
Code:
SELECT A.*,B.C4
FROM (
SELECT C1,C2,SUM(C3) C3 FROM #DB
GROUP BY C1,C2) A
JOIN
(
SELECT C1,C2,C4 FROM (
SELECT *,ROW_NUMBER() OVER (PARTITION BY C1,C2 ORDER BY ID) [ROW]
FROM #DB) LU WHERE LU.ROW = 1) B
ON A.C1 = B.C1 AND A.C2 = B.C2
Result:

Oracle grouping/changing rows to columns

I have the following table named foo:
ID | KEY | VAL
----------------
1 | 47 | 97
2 | 47 | 98
3 | 47 | 99
4 | 48 | 100
5 | 48 | 101
6 | 49 | 102
I want to run a select query and have the results show like this
UNIQUE_ID | KEY | ID1 | VAL1 | ID2 | VAL2 | ID3 | VAL3
--------------------------------------------------------------
47_1:97_2:98_3:99| 47 | 1 | 97 | 2 | 98 | 3 | 99
48_4:100_5:101 | 48 | 4 | 100 | 5 | 101 | |
49_6:102 | 49 | 6 | 102 | | | |
So, basically all rows with the same KEY get collapsed into 1 row. There can be anywhere from 1-3 rows per KEY value
Is there a way to do this in a sql query (without writing a stored procedure or scripts)?
If not, I could also work with the less desirable choice of
UNIQUE_ID | KEY | IDS | VALS
--------------------------------------------------------------
47_1:97_2:98_3:99| 47 | 1,2,3 | 97,98,99
48_4:100_5:101 | 48 | 4,5 | 100, 101
49_6:102 | 49 | 6 | 102
Thanks!
UPDATE:
Unfortunately my real-world problem seems to be much more difficult than this example, and I'm having trouble getting either example to work :( My query is over 120 lines so it's not very easy to post. It kind of looks like:
with v_table as (select ...),
v_table2 as (select foo from v_table where...),
v_table3 as (select foo from v_table where ...),
...
v_table23 as (select foo from v_table where ...)
select distinct (...) as "UniqueID", myKey, myVal, otherCol1, ..., otherCol18
from tbl1 inner join tbl2 on...
...
inner join tbl15 on ...
If I try any of the methods below it seems that I cannot do group-bys correctly because of all the other data being returned.
Ex:
with v_table as (select ...),
v_table2 as (select foo from v_table where...),
v_table3 as (select foo from v_table where ...),
...
v_table23 as (select foo from v_table where ...)
select "Unique ID",
myKey, max(decode(id_col,1,id_col)) as id_1, max(decode(id_col,1,myVal)) as val_1,
max(decode(id_col,2,id_col)) as id_2,max(decode(id_col,2,myVal)) as val_2,
max(decode(id_col,3,id_col)) as id_3,max(decode(id_col,3,myVal)) as val_3
from (
select distinct (...) as "UniqueID", myKey, row_number() over (partition by myKey order by id) as id_col, id, myVal, otherCol1, ..., otherCol18
from tbl1 inner join tbl2 on...
...
inner join tbl15 on ...
) group by myKey;
Gives me the error: ORA-00979: not a GROUP BY expression
This is because I am selecting the UniqueID from the inner select. I will need to do this as well as select other columns from the inner table.
Any help would be appreciated!
Take a look ath this article about Listagg function, this will help you getting the comma separated results, it works only in the 11g version.
You may try this
select key,
max(decode(id_col,1,id_col)) as id_1,max(decode(id_col,1,val)) as val_1,
max(decode(id_col,2,id_col)) as id_2,max(decode(id_col,2,val)) as val_2,
max(decode(id_col,3,id_col)) as id_3,max(decode(id_col,3,val)) as val_3
from (
select key, row_number() over (partition by key order by id) as id_col,id,val
from your_table
)
group by key
As #O.D. suggests, you can generate the less desirable version with LISTAGG, for example (using a CTE to generate your sample data):
with foo as (
select 1 as id, 47 as key, 97 as val from dual
union select 2,47,98 from dual
union select 3,47,99 from dual
union select 4,48,100 from dual
union select 5,48,101 from dual
union select 6,49,102 from dual
)
select key ||'_'|| listagg(id ||':' ||val, '_')
within group (order by id) as unique_id,
key,
listagg(id, ',') within group (order by id) as ids,
listagg(val, ',') within group (order by id) as vals
from foo
group by key
order by key;
UNIQUE_ID KEY IDS VALS
----------------- ---- -------------------- --------------------
47_1:97_2:98_3:99 47 1,2,3 97,98,99
48_4:100_5:101 48 4,5 100,101
49_6:102 49 6 102
With a bit more manipulation you can get your preferred results:
with foo as (
select 1 as id, 47 as key, 97 as val from dual
union select 2,47,98 from dual
union select 3,47,99 from dual
union select 4,48,100 from dual
union select 5,48,101 from dual
union select 6,49,102 from dual
)
select unique_id, key,
max(id1) as id1, max(val1) as val1,
max(id2) as id2, max(val2) as val2,
max(id3) as id3, max(val3) as val3
from (
select unique_id,key,
case when r = 1 then id end as id1, case when r = 1 then val end as val1,
case when r = 2 then id end as id2, case when r = 2 then val end as val2,
case when r = 3 then id end as id3, case when r = 3 then val end as val3
from (
select key ||'_'|| listagg(id ||':' ||val, '_')
within group (order by id) over (partition by key) as unique_id,
key, id, val,
row_number() over (partition by key order by id) as r
from foo
)
)
group by unique_id, key
order by key;
UNIQUE_ID KEY ID1 VAL1 ID2 VAL2 ID3 VAL3
----------------- ---- ---- ---- ---- ---- ---- ----
47_1:97_2:98_3:99 47 1 97 2 98 3 99
48_4:100_5:101 48 4 100 5 101
49_6:102 49 6 102
Can't help feeling there ought to be a simpler way though...