Combine row pairs with swapped columns - sql

I have a table that contains an even number of rows.
The rows form pairs with the same information but the content
of the first two columns swapped. Here is an example with three columns:
1 2 3
=======
A B W
B A W
C D X
D C X
E F Y
H G Z
F E Y
G H Z
My actual table has many more columns, but the content is always the same
within a pair.
I'm looking for an SQL-Statement that gets rid of one row of each pair.
The result should look like this:
1 2 3
=======
A B W
C D X
E F Y
G H Z
My table is generated by a script (which I can't change), so I assume
my input is correct (every row has a partner, rows >=3 are the same for each pair).
A statement that could check these preconditions would be extra cool.

For Me the below code is working
DECLARE #TEST TABLE
(A CHAR(1),B CHAR(1),C CHAR(1))
INSERT INTO #TEST VALUES
('A','B','W'),
('B','A','W'),
('C','D','X'),
('D','C','X'),
('E','F','Y'),
('H','G','Z'),
('F','E','Y'),
('G','H','Z')
SELECT MIN(A) [1],
MAX(A) [2],
C [3]
FROM #TEST
GROUP BY C
Result:

Something similar would help using ROWNUM and CTE
with test_Data as
(
SELECT COL1, COL2, COL3, ROWNUM ROWCOUNT FROM
(
SELECT 'A' COL1, 'B' COL2, 'W' COL3 FROM DUAL UNION
SELECT 'B' COL1, 'A' COL2, 'W' COL3 FROM DUAL UNION
SELECT 'C' COL1, 'D' COL2, 'X' COL3 FROM DUAL UNION
SELECT 'D' COL1, 'C' COL2, 'X' COL3 FROM DUAL
) ORDER BY COL3, COL1
)
SELECT TAB1.COL1, TAB1.COL2, TAB1.COL3 FROM TEST_DATA TAB1, TEST_DATA TAB2
WHERE
TAB1.COL1 = TAB2.COL2
AND TAB1.COL2 = TAB2.COL1
AND TAB1.COL3 = TAB2.COL3
AND TAB1.ROWCOUNT = TAB2.ROWCOUNT+1;
Your query without testdata would be,
with CTE as
(SELECT COL1, COL2, COL3, ROWNUM ROWCOUNT FROM MY_TABLE ORDER BY COL3,COL1)
SELECT TAB1.COL1, TAB1.COL2, TAB1.COL3 FROM CTE TAB1, CTE TAB2
WHERE
TAB1.COL1 = TAB2.COL2
AND TAB1.COL2 = TAB2.COL1
AND TAB1.COL3 = TAB2.COL3
AND TAB1.ROWCOUNT = TAB2.ROWCOUNT+1;

You didn't state your DBMS so this is ANSI SQL:
select least(c1,c2),
greatest(c1,c2),
min(c3) -- choose min or max to your liking
from the_table
group by least(c1,c2), greatest(c1,c2)

If every row has a counterpart where c1 and c2 are swapped then just select rows where c1 and c2 are in a certain order (i.e. c1 < c2).
The EXISTS part makes sure that only rows that have a counterpart are shown. If you want to show all unique rows regardless of whether or not they have a counterpart, then change the last condition from AND EXISTS to OR NOT EXISTS.
SELECT * FROM myTable t1
WHERE c1 < c2
AND EXISTS (
SELECT * FROM myTable t2
WHERE t2.c1 = t1.c2
AND t2.c2 = t1.c1
AND t2.c3 = t1.c3
) ORDER BY c1

Related

Create multiple columns from existing Hive table columns

How to create multiple columns from an existing hive table. The example data would be like below.
My requirement is to create 2 new columns from existing table only when the condition met.
col1 when code=1. col2 when code=2.
expected output:
Please help in how to achieve it in Hive queries?
If you aggregate values required into arrays, then you can explode and filter only those with matching positions.
Demo:
with
my_table as (--use your table instead of this CTE
select stack(8,
'a',1,
'b',2,
'c',3,
'b1',2,
'd',4,
'c1',3,
'a1',1,
'd1',4
) as (col, code)
)
select c1.val as col1, c2.val as col2 from
(
select collect_set(case when code=1 then col else null end) as col1,
collect_set(case when code=2 then col else null end) as col2
from my_table where code in (1,2)
)s lateral view outer posexplode(col1) c1 as pos, val
lateral view outer posexplode(col2) c2 as pos, val
where c1.pos=c2.pos
Result:
col1 col2
a b
a1 b1
This approach will not work if arrays are of different size.
Another approach - calculate row_number and full join on row_number, this will work if col1 and col2 have different number of values (some values will be null):
with
my_table as (--use your table instead of this CTE
select stack(8,
'a',1,
'b',2,
'c',3,
'b1',2,
'd',4,
'c1',3,
'a1',1,
'd1',4
) as (col, code)
),
ordered as
(
select code, col, row_number() over(partition by code order by col) rn
from my_table where code in (1,2)
)
select c1.col as col1, c2.col as col2
from (select * from ordered where code=1) c1
full join
(select * from ordered where code=2) c2 on c1.rn = c2.rn
Result:
col1 col2
a b
a1 b1

SQL - avoid select if there is pair

Is it possible to write MS SQL query for this case? If there is pair with 1 and -1 , I don't want select those entries at all.
COL1
COL2
NOTE
A
1
I don't want select this entry becase is in pair with A -1
A
-1
I don't want select this entry becase is in pair with A 1
A
1
OK to select - no pair (no -1 for this A )
B
1
OK to select - no pair
C
1
OK to select - no pair
D
1
I don't want select this entry because is in pair with D -1
D
-1
I don't want select this entry because is in pair with D 1
I understand there is 1s and -1s and these are the only possible values for col2. If this is the case and there is at most one row difference, then you can just add the values up:
select col1, sum(col2)
from mytable
group by col1
having sum(col2) <> 0;
If there can be more rows different or there exist other values beside 1 and -1, then we must generate row numbers.
select col1, max(col2)
from
(
select
col1,
col2,
row_number() over (partition by col1, col2 order by col2) as rn
from mytable
) numbered
group by col1, rn
having count(*) = 1;
One method is aggregation. Assuming there are only -1 and 1 and no duplicates with the same sign:
select col1, max(col2), col3
from t
group by col1, col3
having count(*) = 1;
Alternatively, you could use `not exists:
select t.*
from t
where not exists (select 1
from t t2
where t2.col3 = c.col3 and t2.col1 = t.col1 and
t2.col2 = - t.col1
);
If for any value of Col1 sum of 1 and -1 is not 0, it means that it has unpaired value.
try this:
select *
from t
where col1 in
(select col1 from t group by col1 having sum(col2) <> 0);

SQL/Oracle return only field with identical value in 2nd column

Need to return column 1 only if identical values are found in 2nd column of a repeating log. If any other value is seen exclude from result.
A 2
A 2
A 2
A 2
A 2
Exlude
B 2
B 1
B 2
B 3
B 2
select b. column1
from
( select *
from table
where column2 != 1
) b
where b.column2 = 2
Results:
A
You could use aggregation and HAVING:
SELECT col1
FROM tab
GROUP BY col1
HAVING COUNT(DISTINCT col2) = 1;
or if you need original rows:
SELECT s.*
FROM (SELECT t.*, COUNT(DISTINCT col2) OVER(PARTITION BY col1) AS cnt
FROM tab t) s
WHERE s.cnt = 1;
If you need the original rows, I would recommend not exists:
select t.*
from t
where not exists (select 1 from t t2 where t2.col1 = t.col1 and t2.col2 <> t.col2);
If you just want the col1 values (which makes sense to me), then I would phrase the aggregation as:
select col1
from t
group by col1
having min(col2) = max(col2);
If you want to include "all-null" as a valid option, then:
having min(col2) = max(col2) or min(col2) is null
Try this query
select column1 from (select column1,column2 from Test group by column1,column2) a group by column1 having count(column1)=1;

Find duplicate symmetric rows in a table

I have a table which contains data as
col1 col2
a b
b a
c d
d c
a d
a c
For me row 1 and row 2 are duplicate because a, b & b, a are the same. The same stands for row 3 and row 4.
I need an SQL (not PL/SQL) query which gives output as
col1 col2
a b
c d
a d
a c
select distinct least(col1, col2), greatest(col1, col2)
from your_table
Edit: for those using a DBMS that does support the standard SQL functions least and greatest this can be simulated using a CASE expression:
select distinct
case
when col1 < col2 then col1
else col2
end as least_col,
case
when col1 > col2 then col1
else col2
end as greatest_col
from your_table
Try this:
CREATE TABLE t_1(col1 varchar(10),col2 varchar(10))
INSERT INTO t_1
VALUES ('a','b'),
('b','a'),
('c','d'),
('d','c'),
('a','d'),
('a','c')
;with CTE as (select ROW_NUMBER() over (order by (select 0)) as id,col1,col2,col1+col2 as col3 from t_1)
,CTE1 as (
select id,col1,col2,col3 from CTE where id=1
union all
select c.id,c.col1,c.col2,CASE when c.col3=REVERSE(c1.col3) then null else c.col3 end from CTE c inner join CTE1 c1
on c.id-1=c1.id
)
select col1,col2 from CTE1 where col3 is not null

sort items based on their appears count

I have data like this
d b c
a d
c b
a b
c a
c a d
c
if you analyse, you will find the appearance of each element as follows
a: 4
b: 3
c: 5
d: 2
According to appearance my sorted elements would be
c,a,b,d
and final output should be
c b d
a d
c b
a b
c a
c a d
c
Any clue, how we can achieve this using sql query ?
Unless there is another column which dictates the order of the input rows, it will not be possible to guarantee that the output rows are returned in the same order. I've made an assumption here to order them by the three column values so that the result is deterministic.
It's likely to be possible to compact this code into fewer steps, but shows the steps reasonably clearly.
Note that for a large dataset, it may be more efficient to partition some of these steps into SELECT INTO operations creating temporary tables or work tables.
DECLARE #t TABLE
(col1 CHAR(1)
,col2 CHAR(1)
,col3 CHAR(1)
)
INSERT #t
SELECT 'd','b','c'
UNION SELECT 'a','d',NULL
UNION SELECT 'c','b',NULL
UNION SELECT 'a','b',NULL
UNION SELECT 'c','a',NULL
UNION SELECT 'c','a','d'
UNION SELECT 'c',NULL,NULL
;WITH freqCTE
AS
(
SELECT col1 FROM #t WHERE col1 IS NOT NULL
UNION ALL
SELECT col2 FROM #t WHERE col2 IS NOT NULL
UNION ALL
SELECT col3 FROM #t WHERE col3 IS NOT NULL
)
,grpCTE
AS
(
SELECT col1 AS val
,COUNT(1) AS cnt
FROM freqCTE
GROUP BY col1
)
,rowNCTE
AS
(
SELECT *
,ROW_NUMBER() OVER (ORDER BY col1
,col2
,col3
) AS rowN
FROM #t
)
,buildCTE
AS
(
SELECT rowN
,val
,cnt
,ROW_NUMBER() OVER (PARTITION BY rowN
ORDER BY ISNULL(cnt,-1) DESC
,ISNULL(val,'z')
) AS colOrd
FROM (
SELECT *
FROM rowNCTE AS t
JOIN grpCTE AS g1
ON g1.val = t.col1
UNION ALL
SELECT *
FROM rowNCTE AS t
LEFT JOIN grpCTE AS g2
ON g2.val = t.col2
UNION ALL
SELECT *
FROM rowNCTE AS t
LEFT JOIN grpCTE AS g3
ON g3.val = t.col3
) AS x
)
SELECT b1.val AS col1
,b2.val AS col2
,b3.val AS col3
FROM buildCTE AS b1
JOIN buildCTE AS b2
ON b2.rowN = b1.rowN
AND b2.colOrd = 2
JOIN buildCTE AS b3
ON b3.rowN = b1.rowN
AND b3.colOrd = 3
WHERE b1.colOrd = 1
ORDER BY b1.rowN