SQL Finding duplicate values in two of the three columns of each row - sql

Let's say we have three columns: A, B, and C.
I would like to filter the results as follows:
The values of A and B are the same (duplicated) for > 1 (more than 1) row, and the value of C is always different.
In the attached image, the values that appear selected would meet the conditions mentioned above.
What I've tried:
SELECT
a.notation as A, a.gene as B, b.id as C
FROM
`db-dummy`.sgdata c
join `db-dummy`.g_info a on a.rec_id = c.gen_id
join `db-dummy`.spec_data b on b.rec_id = c.spec_id GROUP BY A, B HAVING COUNT(*) > 1;
I thought that using GROUP BY and HAVING COUNT(*) > 1 I could get the desired result, but I get the following error:
SQL Error [1055] [42000]: (conn=1632) Expression #3 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'db-dummy.b.spec_id' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by

If you had a single table, I would suggest just using exists. But because you have a join, use window functions. If you are. looking for different values of id:
SELECT A, B, C
FROM (SELECT a.notation as A, a.gene as B, b.id as C,
MIN(b.id) OVER (PARTITION BY a.notation, a.gene) as min_id,
MAX(b.id) OVER (PARTITION BY a.notation, a.gene) as max_id
FROM `db-dummy`.sgdata c JOIN
`db-dummy`.g_info a
ON a.rec_id = c.gen_id JOIN
`db-dummy`.spec_data b
ON b.rec_id = c.spec_id
) x
WHERE min_id <> max_id;
If you are just looking for multiple rows for a given A and B, then you can use:
SELECT A, B, C
FROM (SELECT a.notation as A, a.gene as B, b.id as C,
COUNT(*) OVER (PARTITION BY a.noation, a.gene) as cnt
FROM `db-dummy`.sgdata c JOIN
`db-dummy`.g_info a
ON a.rec_id = c.gen_id JOIN
`db-dummy`.spec_data b
ON b.rec_id = c.spec_id
) x
WHERE cnt > 1;

SELECT * FROM `db-dummy`.sgdata a
LEFT JOIN
(SELECT COUNT(Id) as count, notation, gene
FROM `db-dummy`.sgdata
GROUP BY notation, gene
HAVING COUNT(id) > 1) b
on a.notation = b.notation AND a.gene = b.gene

Related

Sql code for distinct fields

I was wondering if anyone can help me with this query.
I have two tables that I join together (DDS2ENVR.QBO AND KCA0001.ORTS)
THE QBO Table has a field labeled NIIN AND RIC. THE KCA0001.ORTS table has a field named SERVICE and OWN_RIC.
I Join the tables by QBO.RIC and ORTS.OWN_RIC. My dilemma is that under the NIIN field multiple rows can be identical but have different values for RIC.
Example:
NIIN RIC
123455 A
122222 B
123456 C
122222 A
I want to query a distinct count for NIINS that separates by the different service where it does not overlap. So example NIIN should only find distinct values only associated with A where the same NIIN is not found in B,C,D etc.
SELECT D.SERVICE, COUNT(C.NIIN)
FROM DDS2ENVR.QBO C
JOIN KCA0001.ORTS D ON D.OWN_RIC = C.RIC
WHERE C.SITE_ID = ('HEAA')
GROUP BY D.SERVICE
HAVING COUNT(DISTINCT C.NIIN) > 1
Please ask questions if this does not make any sense.
Using Not Exists
SELECT D.SERVICE, COUNT(C.NIIN)
FROM DDS2ENVR.QBO C
JOIN KCA0001.ORTS D ON D.OWN_RIC = C.RIC
WHERE C.SITE_ID = ('HEAA')
and NOT EXISTS (Select 1 from DDS2ENVR.QBO C1 where C1.NIIN = C.NIIN and C1.RIC <> C.RIC)
GROUP BY D.SERVICE
HAVING COUNT(DISTINCT C.NIIN) > 1
Also if the table DDS2ENVR.QBO doesn't contain duplicates and your dbms supports CTE
With cte as
(Select NIIN from DDS2ENVR.QBO group by NIIN having count(*) = 1)
SELECT D.SERVICE, COUNT(C.NIIN)
FROM DDS2ENVR.QBO C
JOIN KCA0001.ORTS D ON D.OWN_RIC = C.RIC
WHERE C.SITE_ID = ('HEAA')
and C.NIIN in (Select * from cte)
GROUP BY D.SERVICE
HAVING COUNT(DISTINCT C.NIIN) > 1

SQL order by can't find a solution

I'm struggling here to do an order by in a query.
Imagine the following result of current query.
TableA.Id ---- TableB.TagName ---- TableB.TagValue
1----------------A---------------------------Customer A
1----------------B---------------------------Contract B
1----------------C---------------------------Product Z
2----------------A---------------------------Customer B
2----------------B---------------------------Contract C
2----------------C---------------------------Product Y
3----------------A---------------------------Customer C
3----------------B---------------------------Contract D
3----------------C---------------------------Product X
So the sort is dynamic and can be A, B, C asc or desc (TableB.TagName)
If user selects C ASC as sort, the result should be:
TableA.Id ---- TableB.TagName ---- TableB.TagValue
3----------------A---------------------------Customer C
3----------------B---------------------------Contract D
3----------------C---------------------------Product X
2----------------A---------------------------Customer B
2----------------B---------------------------Contract C
2----------------C---------------------------Product Y
1----------------A---------------------------Customer A
1----------------B---------------------------Contract B
1----------------C---------------------------Product Z
Thank you so much for your help
Oh, I see.
You can use a subquery in the order by:
select a.id, b.tagname, b.tagvalue,
max(case when b.tagname = #usertagname then b.tagvalue end) over (partition by a.id) as tagtagvalue
from tablea a join
tableb b
on a.tagid = b.tagid
order by tagtagvalue, a.id, b.tagname;
To get the descending sort you need a case expression in the `order by.

Oracle SQL division

I have
SELECT
COUNT(*) AS a,
SUM(CASE WHEN r.hn IS NOT NULL THEN 1 ELSE 0 END) AS b,
SUM(CASE WHEN r.hn IS NULL THEN 1 ELSE 0 END) AS c,
( ____ / ____ ) AS d
FROM
x
LEFT JOIN (SELECT DISTINCT xn FROM yn) r ON x.xn = y.xn;
I need the blanks on line 4 to be the values set to 'a' and 'c', I'm not sure what the correct syntax is though.
You can't refer to column aliases in the same level of the query (except in the order by clause), so you have to either repeat the original expression as in #juergend's answer, or use an inline view:
SELECT a, b, c, a/c AS d
FROM (
SELECT
COUNT(*) AS a,
SUM(CASE WHEN y.hn IS NOT NULL THEN 1 ELSE 0 END) AS b,
SUM(CASE WHEN y.hn IS NULL THEN 1 ELSE 0 END) AS c
FROM x
LEFT JOIN (SELECT DISTINCT xn FROM yn) y ON y.xn = x.xn
);
For complicated expressions this is a bit simpler and easier to maintain - if the expression changes you only have to modify it in one place, reducing the risk of a mistake.
If you're trying to make d the ratio of nulls to the total then you just need the division reversed, as c/a; and if you wanted the percentage then100*c/a, possibly rounded or truncated to a certain precision.
And as Clockwork-Muse mentioned, since count() ignores nulls, you coudl use that instead of the two sum() calls:
SELECT a, b, c, a/c AS d
FROM (
SELECT
COUNT(*) AS a,
COUNT(y.hn) AS b,
COUNT(*) - COUNT(y.hn) AS c
FROM x
LEFT JOIN (SELECT DISTINCT xn FROM yn) y ON y.xn = x.xn
);
... or you could calculate c in the outer query too, as (b - a), though that makes the d calculation messier.
The correct syntax is to rewrite the statements again. You can't re-use alias names in the select clause.
SELECT COUNT(*) AS t,
count(r.hn) AS c,
SUM(case when r.hn IS NULL then 1 end) AS u,
count(r.hn) / SUM(case when r.hn IS NULL then 1 end) AS p
FROM h
LEFT JOIN (SELECT DISTINCT hn FROM r) r ON h.hn = r.hn;

How to find rows that have one equal value and one different value from the table

I have the following table:
ID Number Revision
x y 0
x y 1
z w 0
a w 0
a w 1
b m 0
b m 0
I need to return rows that for the same Number thare are more then one ID with the same Revision.Number can be "Null" and I don't need those values.
The output should be:
z w 0
a w 0
I have tried the following query:
SELECT a.id,a.number,a.revision,
FROM table a INNER JOIN
(SELECT id, number, revision FROM table where number > '0'
GROUP BY number HAVING COUNT(*) > 1
) b ON a.revision = b.revision AND a.id != b.id
A little addition- I have rows in my table with the same Number, ID and Revision- I don't need those rows in my query to be displayed!
It is not working! Please help me to figure out how to fix it.
Thanks.
Select t.Id,s.number,t.revision
from (Select number,count(*) 'c'
from table t1
where revision=0
group by number
having count(*) > 1
) s join table t on t.number= s.number
where revision = 0
Another simple approach:
SELECT DISTINCT b.id, b.Number, b.Revision
FROM tbl a
INNER JOIN tbl b
ON a.ID != b.ID AND a.Number = b.Number AND a.Revision = b.Revision;
This is tested in MySql 5, syntax might differ slightly.
You are not that far away with your query:
SELECT a.id,a.number,a.revision
FROM table a
JOIN (
-- multiple id for the same number and revision
SELECT number, revision
FROM table
GROUP BY number, revision
HAVING COUNT(*) > 1
) b
ON a.revision = b.revision
AND a.number = b.number
Untested, but you should get the idea. If your sql-server is a resent version you can solve this with OLAP functions as well.
To filter out rows where the whole row is duplicated we can select only unique rows via group by and having:
SELECT a.id,a.number,a.revision
FROM table a
JOIN (
-- multiple id for the same number and revision
SELECT number, revision
FROM table
GROUP BY number, revision
HAVING COUNT(*) > 1
) b
ON a.revision = b.revision
AND a.number = b.number
GROUP BY a.id,a.number,a.revision
HAVING COUNT(1) = 1

Restricting results to only rows where one value appears only once

I have a query that is more complex than the example here, but which needs to only return the rows where a certain field doesn't appear more than once in the data set.
ACTIVITY_SK STUDY_ACTIVITY_SK
100 200
101 201
102 200
100 203
In this example I don't want any records with an ACTIVITY_SK of 100 being returned because ACTIVITY_SK appears twice in the data set.
The data is a mapping table, and is used in many joins, but multiple records like this imply data quality issues and so I need to simply remove them from the results, rather than cause a bad join elsewhere.
SELECT
A.ACTIVITY_SK,
A.STATUS,
B.STUDY_ACTIVITY_SK,
B.NAME,
B.PROJECT
FROM
ACTIVITY A,
PROJECT B
WHERE
A.ACTIVITY_SK = B.STUDY_ACTIVITY_SK
I had tried something like this:
SELECT
A.ACTIVITY_SK,
A.STATUS,
B.STUDY_ACTIVITY_SK,
B.NAME,
B.PROJECT
FROM
ACTIVITY A,
PROJECT B
WHERE
A.ACTIVITY_SK = B.STUDY_ACTIVITY_SK
WHERE A.ACTIVITY_SK NOT IN
(
SELECT
A.ACTIVITY_SK,
COUNT(*)
FROM
ACTIVITY A,
PROJECT B
WHERE
A.ACTIVITY_SK = B.STUDY_ACTIVITY_SK
GROUP BY A.ACTIVITY_SK
HAVING COUNT(*) > 1
)
But there must be a less expensive way of doing this...
Something like this could be a bit "cheaper" to run:
SELECT
A.ACTIVITY_SK,
A.STATUS,
B.STUDY_ACTIVITY_SK,
B.NAME,
B.PROJECT
PROJECT B INNER JOIN
(SELECT
ACTIVITY_SK,
MIN(STATUS) STATUS,
FROM
ACTIVITY
GROUP BY ACTIVITY_SK
HAVING COUNT(ACTIVITY_SK) = 1 ) A
ON A.ACTIVITY_SK = B.STUDY_ACTIVITY_SK
Another alternative:
select * from (
SELECT
A.ACTIVITY_SK,
A.STATUS,
B.STUDY_ACTIVITY_SK,
B.NAME,
B.PROJECT,
count(distinct a.pk) over (partition by a.activity_sk) AS c
FROM
ACTIVITY A,
PROJECT B
WHERE
A.ACTIVITY_SK = B.STUDY_ACTIVITY_SK
) where c = 1;
(where a.pk refers to a unique identifier from the ACTIVITY table)