Sql query for comparing rows - sql

Let's suppose we have a table:
id1 id2
1 2
2 1
3 4
4 3
The expected output is
id1 id2
1 2
3 4
Rows 1,2 and 2,1 are same, and only one needs to be outputted.
What's the SQL query for this.

Assuming your RDBMS supports LEAST and GREATEST (Oracle does):
SELECT DISTINCT LEAST(id1, id2), GREATEST(id1, id2)
FROM mytable
Cross-platform version:
SELECT DISTINCT
CASE WHEN id1 < id2 THEN id1 ELSE id2 END,
CASE WHEN id1 > id2 THEN id1 ELSE id2 END
FROM mytable

Select ...
From MyTable As T
Where Exists (
Select 1
From MyTable As T2
Where T2.id1 = T.id2
And T2.id2 = T.id1
)
And T.id1 < T.id2
Another solution using Union
Select T.id1, T.id2
From MyTable As T
Where T.id1 <= T.id2
Union
Select T.id2, T.id1
From MyTable As T
Where T.id1 > T.id2

My interpretation of what you're trying to do is: return rows were id1 matches id2 and id2 matches id1, but only return rows from that set when id1 is also less than or equal to id2.
select x.id1, x.id2 from
myTable x, myTable y
where x.id1 = y.id2 and y.id1 = x.id2 and x.id1 <= y.id1

Exact same question I also had to solve recently. See Eliminating duplicates.
select id1, id2
from t
where not exists (
select 1
from t
where id1 = t.id2
and id2 = t.id1
and rowid > t.rowid
);

Related

"duplicate" rows - how to select distinct

I have a table with this structure
id1 id2
--------------
10 2
2 10
12 15
I need to select "distinct" using SQL in the sense that rows 1 and 2 are considered the same
So I need a query that results in
10 2
12 15
or
2 10
12 15
Both are fine.
Any good ideas. This problem is driving me crazy :-)
One simple method is:
select t.*
from t
where a < b or
not exists (select 1 from t t2 where t2.b = t.a and t2.a = t.b)
In a DBMS that supports LEAST and GREATEST you can use these to get ordered pairs:
select distinct
least(id1, id2) as lesser_id,
greatest(id1, id2) as greater_id
from mytable;
In a DBMS that doesn't support these functions , you can use CASE expressions to achieve the same:
select distinct
case when id1 <= id2 then id1 else id2 as lesser_id,
case when id1 >= id2 then id1 else id2 as greater_id
from mytable;
I would do:
SELECT DISTINCT id1, id2
FROM (
SELECT id1, id2 FROM mytable
UNION
SELECT id2, id1 FROM mytable
) AS combinations
Another solution, using relations instead of a DISTINCT clause:
SELECT A.id1, A.id2
FROM mytable A LEFT JOIN mytable B ON A.id1 > B.id1 AND A.id1 = B.id2 AND A.id2 = B.id1
WHERE B.id1 IS NULL

Recursive SQL retrieve all levels

I am unable to retrieve the desired result my query when using Oracle's recursive approach:
Foo
ID1 ID2
1 2
1 3
4 2
4 3
4 5
Query:
select sys_connect_by_path(id2,' -> ')
FROM Foo
START WITH id1 = 1
CONNECT BY PRIOR id1 = id2
ORDER BY 1;
Outputs only level 1 hierarchy (2,3). I want it to detect the tree ( 1 -> (2,3) -> 4 -> 5 ), such that selecting distinct ID2 yields (2,3,5). Thank you.
If you are using Oracle 11.2 or above, a CTE (Common Table Expression) is preferred over using Oracle's CONNECT BY statement.
WITH
aset -- Create pseudo table with ID2 as ID1 and vice versa
AS
(SELECT id1, id2
FROM (SELECT id1, id2
FROM foo
UNION
SELECT id2, id1
FROM foo)
WHERE id1 < id2),
bset (id1, id2) -- Extract hierarchy from pseudo table
AS
(SELECT id1, id2
FROM aset
WHERE id1 = 1
UNION ALL
SELECT aset.id1, aset.id2
FROM bset INNER JOIN aset ON bset.id2 = aset.id1
WHERE bset.id1 <> aset.id2)
SELECT DISTINCT bset.id2 -- Only keep values that were originally ID2
FROM bset INNER JOIN foo ON bset.id2 = foo.id2
ORDER BY id2;
Here is the same thing using CONNECT BY
WITH
aset
-- Create pseudo table with ID2 as ID1 and vice versa
AS
(SELECT id1, id2
FROM (SELECT id1, id2
FROM foo
UNION
SELECT id2, id1
FROM foo)
WHERE id1 < id2),
bset
-- Extract hierarchy from pseudo table
AS
( SELECT id2
FROM aset
START WITH id1 = 1
CONNECT BY PRIOR id2 = id1)
SELECT DISTINCT bset.id2
-- Only keep values that were originally ID2
FROM bset INNER JOIN foo ON bset.id2 = foo.id2
ORDER BY id2

Oracle 11.2 SQL - help to condense data in ordered set

I have a data-set with a timestamp column and multiple identifier columns. I want to condense it to a single row for each "block" of adjacent rows with equal identifiers, when ordered by the timestamp. The min and max timestamp for each block is required.
Source Data:
TSTAMP ID1 ID2
t1 A B <= start of new block
t2 A B
t3 C D <= start of new block
t4 E F <= start of new block
t5 E F
t6 E F
t7 A B <= start of new block
t8 G H <= start of new block
Desired Result:
MIN_TSTAMP MAX_TSTAMP ID1 ID2
t1 t2 A B
t3 t3 C D
t4 t6 E F
t7 t7 A B
t8 t8 G H
I thought this was ripe for a window-ing analytic function but I cannot partition without grouping ALL equal combinations of IDn - rather than only those in adjacent rows, when ordered by timestamp.
A workaround is to create a key column first in an in-line view that I can later group by i.e. with same value for each row in the block and different value for each block. I can do this using LAG analytic function to compare row values and then calling a PL/SQL function to return nextval/currval values of a sequence (calling nextval/currval directly in the SQL is restricted in this context).
select min(ilv.tstamp), max(ilv.tstamp), id1, id2
from (
select case when (id1 != lag(id1,1,'*') over (partition by (1) order by tstamp)
or id2 != lag(id2,1,'*') over (partition by (1) order by tstamp))
then
pk_seq_utils.gav_get_nextval
else
pk_seq_utils.gav_get_currval
end ident, t.*
from tab1 t
order by tstamp) ilv
group by ident, id1, id2
order by 1;
where the gav_get_xxx functions simply return currval/nextval from a sequence.
But I would like to use SQL only and avoid PL/SQL (as I could also write this easily in PL/SQL and pipe out the result-rows from a pipeline function).
Any ideas?
Thanks.
Tabibitosan to the rescue!
with sample_data as (select 't1' tstamp, 'A' id1, 'B' id2 from dual union all
select 't2' tstamp, 'A' id1, 'B' id2 from dual union all
select 't3' tstamp, 'C' id1, 'D' id2 from dual union all
select 't4' tstamp, 'E' id1, 'F' id2 from dual union all
select 't5' tstamp, 'E' id1, 'F' id2 from dual union all
select 't6' tstamp, 'E' id1, 'F' id2 from dual union all
select 't7' tstamp, 'A' id1, 'B' id2 from dual union all
select 't8' tstamp, 'G' id1, 'H' id2 from dual)
select min(tstamp) min_tstamp, max(tstamp) max_tstamp, id1, id2
from (select tstamp,
id1,
id2,
row_number() over (order by tstamp) - row_number() over (partition by id1, id2 order by tstamp) grp
from sample_data)
group by id1,
id2,
grp
order by min(tstamp);
MIN_TSTAMP MAX_TSTAMP ID1 ID2
---------- ---------- --- ---
t1 t2 A B
t3 t3 C D
t4 t6 E F
t7 t7 A B
t8 t8 G H
You can use an analytic 'trick' to identify the gaps and islands, comparing the position of each row just against the tstamp across all rows with its position against tstamp just for that id2, id2 combination:
select tstamp, id1, id2,
row_number() over (partition by id1, id2 order by tstamp)
- row_number() over (order by tstamp) as block_id
from tab1;
TS I I BLOCK_ID
-- - - ----------
t1 A B 0
t2 A B 0
t3 C D -2
t4 E F -3
t5 E F -3
t6 E F -3
t7 A B -4
t8 G H -7
The actual value of block_id doesn't matter, just that it's unique for each block for the combination. You can then group using that:
select min(tstamp) as min_tstamp, max(tstamp) as max_tstamp, id1, id2
from (
select tstamp, id1, id2,
row_number() over (partition by id1, id2 order by tstamp)
- row_number() over (order by tstamp) as block_id
from tab1
)
group by id1, id2, block_id
order by min(tstamp);
MI MA I I
-- -- - -
t1 t2 A B
t3 t3 C D
t4 t6 E F
t7 t7 A B
t8 t8 G H
You should be able to use the row_number window function to do this, like below:
select
min(tstamp) mints, max(tstamp) maxts, id1, id2
from (
select
*,
row_number() over (order by tstamp)
- row_number() over (partition by id1, id2 order by tstamp) as rn
from t
) as subq
group by id1, id2, rn
order by rn
I haven't been able to test it with any Oracle db, but it works with MSSQL and should work in Oracle too as the window function works the same way.
You need to do this step by step:
Detect ID changes with LAG marking each change with a flag = 1.
Generate keys for the groups (i.e. adjacent records with the same ID) with SUM over the ID change flags (running total).
Group by generated group key and get min/max timestamp.
Query:
select
min(tstamp) as min_tstamp,
max(tstamp) as max_tstamp,
min(id1) as id1,
min(id2) as id2
from
(
select
grouped.*,
sum(newgroup) over (order by tstamp) as groupkey
from
(
select
mytable.*,
case when id1 <> lag(id1) over (order by tstamp)
or id2 <> lag(id2) over (order by tstamp)
then 1 else 0 end as newgroup
from mytable
order by tstamp
) grouped
)
group by groupkey
order by groupkey;

SQL Query to retrieve values that belong exclusively to a group

Suppose I have one table with the following values and columns:
ID1 | ID2
1 | 1
2 | 1
3 | 1
4 | 1
4 | 2
3 | 3
4 | 3
4 | 4
4 | 4
I'd like to retrieve the ID2 values that belong exclusively to records where ID1 = 4. So for the above example, I'd like to see the following response:
ID1 | ID2
4 | 2
4 | 4
Try working it out contrapositively like this.
Finding all elements where ID1 is only 4 is the same as finding all elements that don't not have ID1 = 4.
CREATE TABLE #temp (ID1 NVARCHAR(10), ID2 NVARCHAR(10))
INSERT INTO #temp(ID1,ID2) VALUES (N'1',N'1')
INSERT INTO #temp(ID1,ID2) VALUES (N'2',N'1')
INSERT INTO #temp(ID1,ID2) VALUES (N'3',N'1')
INSERT INTO #temp(ID1,ID2) VALUES (N'4',N'1')
INSERT INTO #temp(ID1,ID2) VALUES (N'4',N'2')
INSERT INTO #temp(ID1,ID2) VALUES (N'3',N'3')
INSERT INTO #temp(ID1,ID2) VALUES (N'4',N'3')
INSERT INTO #temp(ID1,ID2) VALUES (N'4',N'4')
INSERT INTO #temp(ID1,ID2) VALUES (N'4',N'4')
SELECT * FROM #temp AS t
SELECT DISTINCT * FROM #temp AS t
WHERE id2 NOT IN (SELECT ID2 FROM #temp AS t WHERE ID1 <> 4)
These queries will probably be useful to you for the more general cases (and by general I mean when ID1 is something other than 4):
select distinct t1.id1, t1.id2
from T as t1
where not exists (
select 1
from T as t2
where t2.ID1 <> t1.ID1 and t2.ID2 = t1.ID2
)
select t1.id1, count(distinct t1.id2)
from T as t1
where not exists (
select 1
from T as t2
where t2.ID1 <> t1.ID1 and t2.ID2 = t1.ID2
)
group by t1.id1
You can also do this:
select 4,id2 from
(select distinct ID1 , ID2 from t) t1
group by id2
having count(*)=1
There are a few ways to do this:
SELECT t1.id1, t1.id2
FROM mytable t1
WHERE t1.id1 = 4
AND NOT EXISTS ( SELECT 1 FROM mytable t2
WHERE t2.id2 = t1.id2
AND t2.id1 != 4 );
or:
SELECT id1, id2 FROM (
SELECT id1, id2
FROM mytable
GROUP BY id1, id2
HAVING COUNT(*) = 1
) WHERE id1 = 4;
or:
SELECT id1, id2 FROM (
SELECT id1, id2, COUNT(*) OVER ( PARTITION BY id2 ) AS cnt
FROM mytable
) WHERE id1 = 4
AND cnt = 1;

How do I find groups of rows where all rows in each group have a specific column value

Sample data:
ID1 ID2 Num Type
---------------------
1 1 1 'A'
1 1 2 'A'
1 2 3 'A'
1 2 4 'A'
2 1 1 'A'
2 2 1 'B'
3 1 1 'A'
3 2 1 'A'
Desired result:
ID1 ID2
---------
1 1
1 2
3 1
3 2
Notice that I'm grouping by ID1 and ID2, but not Num, and that I'm looking specifically for groups where Type = 'A'. I know it's doable through a join two queries on the same table: one query to find all groups that have a distinct Type, and another query to filter rows with Type = 'A'. But I was wondering if this can be done in a more efficient way.
I'm using SQL Server 2008, and my current query is:
SELECT ID1, ID2
FROM (
SELECT ID1, ID2
FROM T
GROUP BY ID1, ID2
HAVING COUNT( DISTINCT Type ) = 1
) AS SingleType
INNER JOIN (
SELECT ID1, ID2
FROM T
WHERE Type = 'A'
GROUP BY ID1, ID2
) AS TypeA ON
TypeA.ID1 = SingleType.ID1 AND
TypeA.ID2 = SingleType.ID2
EDIT: Updated sample data and query to indicate that I'm grouping on two columns, not just one.
SELECT ID1, ID2
FROM MyTable
GROUP BY ID1, ID2
HAVING COUNT(Type) = SUM(CASE WHEN Type = 'A' THEN 1 ELSE 0 END)
There are two alternatives that don't require the aggregation (but do require distinct)
ANTI-JOIN
SELECT DISTINCT t1.ID1, t1.ID2
FROM
table t1
LEFT JOIN table t2
ON t1.ID1 = t2.ID1
and t1.Type <> t2.Type
WHERE
t1.Type = 'A'
AND
t2.ID1 IS NULL
See it working at this data.se query Sample for 9132209 (Anti-Join)
NOT EXISTS
SELECT DISTINCT t1.ID1, t1.ID2
FROM
table t1
WHERE
t1.Type = 'A'
AND
NOT EXISTS
(SELECT 1
FROM table t2
WHERE t1.ID1 = t2.ID1 AND Type <> 'A')
See it working at this data.se query Sample for 9132209 Not Exists