Find duplicate symmetric rows in a table - sql

I have a table which contains data as
col1 col2
a b
b a
c d
d c
a d
a c
For me row 1 and row 2 are duplicate because a, b & b, a are the same. The same stands for row 3 and row 4.
I need an SQL (not PL/SQL) query which gives output as
col1 col2
a b
c d
a d
a c

select distinct least(col1, col2), greatest(col1, col2)
from your_table
Edit: for those using a DBMS that does support the standard SQL functions least and greatest this can be simulated using a CASE expression:
select distinct
case
when col1 < col2 then col1
else col2
end as least_col,
case
when col1 > col2 then col1
else col2
end as greatest_col
from your_table

Try this:
CREATE TABLE t_1(col1 varchar(10),col2 varchar(10))
INSERT INTO t_1
VALUES ('a','b'),
('b','a'),
('c','d'),
('d','c'),
('a','d'),
('a','c')
;with CTE as (select ROW_NUMBER() over (order by (select 0)) as id,col1,col2,col1+col2 as col3 from t_1)
,CTE1 as (
select id,col1,col2,col3 from CTE where id=1
union all
select c.id,c.col1,c.col2,CASE when c.col3=REVERSE(c1.col3) then null else c.col3 end from CTE c inner join CTE1 c1
on c.id-1=c1.id
)
select col1,col2 from CTE1 where col3 is not null

Related

SQL with having statement now want complete rows

Here is a mock table
MYTABLE ROWS
PKEY 1,2,3,4,5,6
COL1 a,b,b,c,d,d
COL2 55,44,33,88,22,33
I want to know which rows have duplicated COL1 values:
select col1, count(*)
from MYTABLE
group by col1
having count(*) > 1
This returns :
b,2
d,2
I now want all the rows that contain b and d. Normally, I would use where in stmt, but with the count column, not certain what type of statement I should use?
maybe you need
select * from MYTABLE
where col1 in
(
select col1
from MYTABLE
group by col1
having count(*) > 1
)
Use a CTE and a windowed aggregate:
WITH CTE AS(
SELECT Pkey,
Col1,
Col2,
COUNT(1) OVER (PARTITION BY Col1) AS C
FROM dbo.YourTable)
SELECT PKey,
Col1,
Col2
FROM CTE
WHERE C > 1;
Lots of ways to solve this here's another
select * from MYTABLE
join
(
select col1 ,count(*)
from MYTABLE
group by col1
having count(*) > 1
) s on s.col1 = mytable.col1;

Create multiple columns from existing Hive table columns

How to create multiple columns from an existing hive table. The example data would be like below.
My requirement is to create 2 new columns from existing table only when the condition met.
col1 when code=1. col2 when code=2.
expected output:
Please help in how to achieve it in Hive queries?
If you aggregate values required into arrays, then you can explode and filter only those with matching positions.
Demo:
with
my_table as (--use your table instead of this CTE
select stack(8,
'a',1,
'b',2,
'c',3,
'b1',2,
'd',4,
'c1',3,
'a1',1,
'd1',4
) as (col, code)
)
select c1.val as col1, c2.val as col2 from
(
select collect_set(case when code=1 then col else null end) as col1,
collect_set(case when code=2 then col else null end) as col2
from my_table where code in (1,2)
)s lateral view outer posexplode(col1) c1 as pos, val
lateral view outer posexplode(col2) c2 as pos, val
where c1.pos=c2.pos
Result:
col1 col2
a b
a1 b1
This approach will not work if arrays are of different size.
Another approach - calculate row_number and full join on row_number, this will work if col1 and col2 have different number of values (some values will be null):
with
my_table as (--use your table instead of this CTE
select stack(8,
'a',1,
'b',2,
'c',3,
'b1',2,
'd',4,
'c1',3,
'a1',1,
'd1',4
) as (col, code)
),
ordered as
(
select code, col, row_number() over(partition by code order by col) rn
from my_table where code in (1,2)
)
select c1.col as col1, c2.col as col2
from (select * from ordered where code=1) c1
full join
(select * from ordered where code=2) c2 on c1.rn = c2.rn
Result:
col1 col2
a b
a1 b1

SQL/Oracle return only field with identical value in 2nd column

Need to return column 1 only if identical values are found in 2nd column of a repeating log. If any other value is seen exclude from result.
A 2
A 2
A 2
A 2
A 2
Exlude
B 2
B 1
B 2
B 3
B 2
select b. column1
from
( select *
from table
where column2 != 1
) b
where b.column2 = 2
Results:
A
You could use aggregation and HAVING:
SELECT col1
FROM tab
GROUP BY col1
HAVING COUNT(DISTINCT col2) = 1;
or if you need original rows:
SELECT s.*
FROM (SELECT t.*, COUNT(DISTINCT col2) OVER(PARTITION BY col1) AS cnt
FROM tab t) s
WHERE s.cnt = 1;
If you need the original rows, I would recommend not exists:
select t.*
from t
where not exists (select 1 from t t2 where t2.col1 = t.col1 and t2.col2 <> t.col2);
If you just want the col1 values (which makes sense to me), then I would phrase the aggregation as:
select col1
from t
group by col1
having min(col2) = max(col2);
If you want to include "all-null" as a valid option, then:
having min(col2) = max(col2) or min(col2) is null
Try this query
select column1 from (select column1,column2 from Test group by column1,column2) a group by column1 having count(column1)=1;

sql to group sort and concat two columns into one text field

I am stuck with some sql and need a bit of help...
I have the following source table:
Col1 Col2 Col3
1 A B
1 B C
1 C D
2 D C
2 A D
3 E A
3 F D
My expected outcome is this:
Col1 Txt
1 A;B;C;D
2 A;C;D
3 A;D;E;F
So, group by Col1, and then find all the distinct values for Col2 and Col3, sort these and concat them into one field.
Any ideas/suggestions?
if you are using oracle 11.2 or newer you can use listagg
select
y,
listagg(x,';') WITHIN GROUP (ORDER BY x)
from
(select col1 y, col2 x from test_table
union
select col1 y, col3 x from test_table)
group by
y
if you are using an earlier version you could resort to this one (taken from an old post of the oracle-l mailing list)
select y, max(sys_connect_by_path(x, ' | ')) trans
from (
select y, x, row_number() over (partition by y order by x) cur, row_number() over (partition by y order by x) - 1 prev
from (select col1 y, col2 x from test_table
union
select col1 y, col3 x from test_table)
)
connect by prior cur = prev and prior y = y
start with cur = 1
group by y
This is a sample script to test both of them
create table test_table (col1 numeric, col2 varchar2(2), col3 varchar2(2));
insert into test_table values (1,'A','B');
insert into test_table values (1,'B','C');
insert into test_table values (1,'C','D');
insert into test_table values (2,'D','C');
insert into test_table values (2,'A','D');
insert into test_table values (3,'E','A');
insert into test_table values (3,'F','D');
You can UNION the columns col2 and col3 and then use LISTAGG
SELECT Col1,LISTAGG(Col2,';') WITHIN GROUP (ORDER BY Col2)
(
SELECT DISTINCT Col1,Col2
FROM YourTable
UNION
SELECT DISTINCT Col1,Col3
FROM YourTable
)
GROUP BY Col1
Something like
select
col1,
listagg(col23)
from
(select distinct col1, col23 from
(select col1, col2 col23 from table
union all
select col1, col3 from table))
group by
col1

sort items based on their appears count

I have data like this
d b c
a d
c b
a b
c a
c a d
c
if you analyse, you will find the appearance of each element as follows
a: 4
b: 3
c: 5
d: 2
According to appearance my sorted elements would be
c,a,b,d
and final output should be
c b d
a d
c b
a b
c a
c a d
c
Any clue, how we can achieve this using sql query ?
Unless there is another column which dictates the order of the input rows, it will not be possible to guarantee that the output rows are returned in the same order. I've made an assumption here to order them by the three column values so that the result is deterministic.
It's likely to be possible to compact this code into fewer steps, but shows the steps reasonably clearly.
Note that for a large dataset, it may be more efficient to partition some of these steps into SELECT INTO operations creating temporary tables or work tables.
DECLARE #t TABLE
(col1 CHAR(1)
,col2 CHAR(1)
,col3 CHAR(1)
)
INSERT #t
SELECT 'd','b','c'
UNION SELECT 'a','d',NULL
UNION SELECT 'c','b',NULL
UNION SELECT 'a','b',NULL
UNION SELECT 'c','a',NULL
UNION SELECT 'c','a','d'
UNION SELECT 'c',NULL,NULL
;WITH freqCTE
AS
(
SELECT col1 FROM #t WHERE col1 IS NOT NULL
UNION ALL
SELECT col2 FROM #t WHERE col2 IS NOT NULL
UNION ALL
SELECT col3 FROM #t WHERE col3 IS NOT NULL
)
,grpCTE
AS
(
SELECT col1 AS val
,COUNT(1) AS cnt
FROM freqCTE
GROUP BY col1
)
,rowNCTE
AS
(
SELECT *
,ROW_NUMBER() OVER (ORDER BY col1
,col2
,col3
) AS rowN
FROM #t
)
,buildCTE
AS
(
SELECT rowN
,val
,cnt
,ROW_NUMBER() OVER (PARTITION BY rowN
ORDER BY ISNULL(cnt,-1) DESC
,ISNULL(val,'z')
) AS colOrd
FROM (
SELECT *
FROM rowNCTE AS t
JOIN grpCTE AS g1
ON g1.val = t.col1
UNION ALL
SELECT *
FROM rowNCTE AS t
LEFT JOIN grpCTE AS g2
ON g2.val = t.col2
UNION ALL
SELECT *
FROM rowNCTE AS t
LEFT JOIN grpCTE AS g3
ON g3.val = t.col3
) AS x
)
SELECT b1.val AS col1
,b2.val AS col2
,b3.val AS col3
FROM buildCTE AS b1
JOIN buildCTE AS b2
ON b2.rowN = b1.rowN
AND b2.colOrd = 2
JOIN buildCTE AS b3
ON b3.rowN = b1.rowN
AND b3.colOrd = 3
WHERE b1.colOrd = 1
ORDER BY b1.rowN