I want to get the top 1 row for each unique value of b with the minimum value of c for that particular value of b. Even though there can be more than 1 row with the same min value (just chose the first one)
myTable
a integer (unique)
b integer
c integer
I've tried this query
SELECT t1.*
FROM myTable t1,
(SELECT b,
MIN(c) as c
FROM myTable
GROUP BY b) t2
WHERE t1.b = t2.b
AND t1.c = t2.c
However, in this table it's possible for there to be more than 1 instance of the minimum value of c for a given value of b. The above query generates duplicates under these conditions.
I've got a feeling that I need to use rownum somewhere, but I'm not quite sure where.
You can use ROW_NUMBER:
SELECT *
FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY b ORDER BY c) AS rn
FROM myTable
) AS T1
WHERE rn = 1
To tie-break between the equal c's, you will need to subquery one level further to get the min-a for each group of equal c's per b. (A mouthful!)
select t0.*
FROM myTable t0
inner join (
select t1.b, t1.c, MIN(a) as a
from myTable t1
inner join (
select b, min(c) as c
from myTable
group by b
) t2 on t1.b = t2.b and t1.c = t2.c
group by t1.b, t1.c
) t3 on t3.a = t0.a and t3.b = t0.b and t3.c = t0.c
Related
I have written the query:
Select distinct a,b from t1 minus Select distinct a,b from t2.
Here t1 and t2 are two tables. I want distinct values of a and b that occur in t1 but not in t2. So I'm using minus operator. I want values of both a and b but I know that in some cases the value of b in t1 and t2 maybe different. This would result in values of a and b that are present in both t1 and t2 as minus would not happen if values of b do not match in both the tables. How can I do this successfully?
How can I get values of a and b that are present in table t1 but not in table t2 even though in some cases values of b might not match in both the tables?
table1: table2:
column1 column2 column1 column2
1 a 1 c
2 b 3 d
In this case I would want values (2,b) only. I would not want (1,a) as 1 is also present in table2.
Start with not exists:
select distinct. . .
from t1
where not exists (select 1 from t2 where t2.a = t1.a and t2.b = t1.b);
From you describe, you might want the comparison only on a:
select distinct a, b
from t1
where not exists (select 1 from t2 where t2.a = t1.a);
Another option is to use sub query in the WHERE condition as below-
SELECT A.*
FROM table1 A
WHERE A.column1 NOT IN
(SELECT DISTINCT column1 FROM table2)
You can also use LEFT JOIN as below which will provide you the same output as below-
SELECT A.*
FROM table1 A
LEFT JOIN table2 B ON A.column1 = B.column1
WHERE B.column1 IS NULL
For the data not include in t2, you can either go for the NOT EXISTS or LEFT OUTER JOIN.
Here is the solution.
Using NOT EXISTS
SELECT DISTINCT A,B FROM T1 WHERE NOT EXISTS (SELECT 1 FROM T2 WHERE T2.A = T1.A AND T2.B = T1.B);
Using Left Join
SELECT DISTINCT a,b,c FROM T1 LEFT JOIN T2 ON T1.a = T2.a and T1.b = T2.b WHERE T2.a IS NULL AND T2.b IS NULL
Hope it helps.
I am having an issue in extracting data using data of two tables in SQL.
select A, B, C, D
from Table_one T1
where A in (select T2.A from Table_two T2
where T2.E <> 'ZZZ');
This returns A, B, C, D where E in T2 is not ZZZ.
However, when I add another where clause like below,
it returns data where T2 is ZZZ also.
select A, B, C, D
from Table_one T1
where A in (select T2.A from Table_two T2
where T2.E <> 'ZZZ')
and D <> 0 ;
This ignores "T2.E <> 'ZZZ'" part, but "D<>0" is not ignored.
Why is this happening?
Because you have duplicates in Table_two. For some of those duplicates, one has the value of ZZZ and the other does not.
You are using the wrong logic if you want to exclude rows that have a ZZZ in table_two. I would recommend NOT EXISTS:
select A, B, C, D
from Table_one T1
where not exists (select 1
from Table_two T2
where T1.A = T2.A and
T2.E = 'ZZZ'
) and
D <> 0 ;
Let's say I have a table with columns: A, B, C & D
Any two rows are considered a duplicate if:
A, B, C have equal values but not D
or
A, B, D have equal values but not C.
How do I get a set of duplicate rows? Using a CTE is OK.
I think you can do it with union all with the corresponding where conditions.
select * from tablename where a=b and b=c and a<>d
union all
select * from tablename where a=b and b=d and a<>c
Using a self join it's quite easy:
SELECT DISTINCT t1.*
FROM TableName t1
INNER JOIN TableName t2
ON T1.A = T2.A
AND T1.B = T2.B
AND (T1.C = T2.C OR T1.D = T2.D)
Assuming, of course, that if all 4 columns are equal it's a duplicated row as well...
However, if for some strange reason these rows are not considered as duplicates, you can change the conditions in the ON clause to this:
SELECT DISTINCT t1.*
FROM TableName t1
INNER JOIN TableName t2
ON T1.A = T2.A
AND T1.B = T2.B
AND (
(T1.C = T2.C AND T1.D <> T2.D)
OR (T1.C <> T2.C AND T1.D = T2.D)
)
You can use RANK() to detect duplicates without having to select from the table twice :
SELECT s.* FROM (
SELECT t.*,
RANK() OVER(PARTITION BY t.a,t.b,t.c ORDER BY t.d) as d_dif,
RANK() OVER(PARTITION BY t.a,t.b,t.D ORDER BY t.c) as c_dif
FROM YourTable) s
WHERE s.d_dif > 1 or s.c_dif > 1
RANK() as opposed to ROW_NUMBER() deals with duplicates, so if d / c will be the same, both records will get the same rank and won't be selected.
I have 3 tables I need to join together. Once I join the first two, I'm going to have two columns, let's call them A and B.
The relationship between A and B is many-to-many. So we can have:
A B
1 1
1 2
2 1
2 3
Then, I need to join with a third table on the B column, giving me:
A B C
1 1 5
1 2 6
2 1 9
2 3 2
Now for my final result I only want one row for each unique A value, and I want to select that row based upon the MAX C value across that given A.
So in this example the final value would be:
A B C
1 2 6
2 1 9
I have the following query which works as expected, but I am fairly certain it is not the best way of doing it:
SELECT
Temp.A,
Temp.B,
Temp.C1
FROM
(SELECT DISTINCT
T1.A,
T2.B,
MAX(T3.C) OVER(PARTITION BY T1.A) AS C1
FROM T1
INNER JOIN T2 ON T1.X = T2.X
INNER JOIN T3 ON T2.B = T3.B) Temp
INNER JOIN T3 ON T3.B = Temp.B
WHERE Temp.C1 = T3.C
You query can be simplified:
You don't need select distinct in the subquery.
You don't need to join back to T3.
You can select the C value in the subquery.
Here is the revision:
SELECT Temp.A, Temp.B, Temp.C
FROM (SELECT T1.A, T2.B, T3.C, MAX(T3.C) OVER (PARTITION BY T1.A) AS C1
FROM T1 INNER JOIN
T2
ON T1.X = T2.X INNER JOIN
T3
ON T2.B = T3.B
) Temp
WHERE Temp.C1 = Temp.C
Do note that if T3 has duplicate maximum values, then this will return duplicates. To get just one, you can use row_number() instead:
SELECT Temp.A, Temp.B, Temp.C
FROM (SELECT T1.A, T2.B, T3.C,
ROW_NUMBER() OVER (PARTITION BY T1.A ORDER BY T3.C DESC) AS seqnum
FROM T1 INNER JOIN
T2
ON T1.X = T2.X INNER JOIN
T3
ON T2.B = T3.B
) Temp
WHERE seqnum = 1;
I was wondering if there was a way to convert a self subquery to a self join
Here is the self subquery
SELECT a,
b
FROM c AS t1
WHERE ( b IN (SELECT b
FROM c AS t2
WHERE ( t1.b = b )
AND ( t1.e <> e )) )
If you only want to find the duplicates an EXIST would probably be faster:
SELECT a,b FROM c WHERE EXISTS(SELECT NULL FROM c c2 WHERE c2.b=c.b AND c2.e<>c.e)
If you want to join every record with its duplicate but get only one record for each:
select t1.a
, t1.b
, t1.e as t1e
, t2.e as t2e
from c as t1
inner join c as t2
on t1.b = t2.b
and t1.e > t2.e
(note that i've used > instead of <>)
As e is the Primary Key another way of approaching this would be
SELECT a,
b
FROM (SELECT a,
b,
COUNT(*) OVER (PARTITION BY b) AS Cnt
FROM c) T1
WHERE Cnt > 1
SELECT t1.a, t2.b
FROM c as t1
join c as t2 on t1.b=t2.b
WHERE t1.e <> t2.e
select t1.a
, t1.b
from c as t1
join c as t2
on t1.b = t2.b
and t1.e <> t2.e