find records with matching values in a column - sql

I want to return all the records that having matching phoneBoxRecordIDs in phoneBox DB.
SELECT * FROM phoneBox where phoneBoxRecordIDs MATCH
would return:
Id phoneBoxRecordIDs colour
4 492948 Blue
9 492948 Brown
27 492948 Pink

You could group by the field where count > 1,
But this would only return the phoneboxrecordid and the # of records with that id
SELECT Count(*) [Count]
, phoneBoxRecordIDs
FROM phoneBox
Group By phoneBoxRecordIDs
Having Count(*) > 1

If you want the rows where phoneBoxRecordIDs appear more than once, then an ANSI-standard method uses window functions:
select pb.*
from (select pb.*, count(*) over (partition by phoneBoxRecordIDs) as cnt
from phoneBox
) pb
where cnt > 1
order by phoneBoxRecordIDs;
You could also do this by returning records only when a matching record exists:
select pb.*
from phoneBox pb
where exists (select 1
from phoneBox pb2
where pb2.phoneBoxRecordIDs = pb.phoneBoxRecordIDs and
pb2.id <> pb.id
);

Related

Finding unique combination of columns associated with 1 non-unique column

Here's my table:
ItemID
ItemName
ItemBatch
TrackingNumber
a
bag
1
498239
a
bag
1
498239
a
bag
1
958103
b
paper
2
123444
b
paper
2
123444
I'm trying to find occurrences of ItemID + ItemName + ItemBatch that have a non-unique TrackingNumber. So in the example above, there are 3 occurrences of a bag 1 and at least 1 of those rows has a different TrackingNumber from any of the other rows. In this case 958103 is different from 498239 so it should be a hit.
For b paper 2 the TrackingNumber is unique for all the respective rows so we ignore this. Is there a query that can pull this combination of columns with 3 identical fields and 1 non-unique field?
Yet another option:
SELECT *
FROM tab
WHERE ItemBatch IN (SELECT ItemBatch
FROM tab
GROUP BY ItemBatch, TrackingNumber
HAVING COUNT(TrackingNumber) = 1)
This query finds the combination of (ItemBatch, TrackingNumber) that occur only once, then gets all rows corresponding to their ItemBatch values.
Try it here.
You can use GROUP BY and HAVING
SELECT
t.ItemID,
t.ItemName,
t.ItemBatch,
COUNT(*)
FROM YourTable t
GROUP BY
t.ItemID,
t.ItemName,
t.ItemBatch
HAVING COUNT(DISTINCT TrackingNumber) > 1;
Or if you want each individual row you can use a window function. You cannot use COUNT(DISTINCT in a window function, but you can simulate it with DENSE_RANK and MAX
SELECT
t.*
FROM (
SELECT *,
Count = MAX(dr) OVER (PARTITION BY t.ItemID, t.ItemName, t.ItemBatch)
FROM (
SELECT *,
dr = DENSE_RANK() OVER (PARTITION BY t.ItemID, t.ItemName, t.ItemBatch ORDER BY t.TrackingNumber)
FROM YourTable t
) t
) t
WHERE t.Count > 1;
db<>fiddle

Exclude records where count > 5 and select top 1 of it

I want to exclude records where id > 5 then select the top 1 of it order by date. How can I achieve this? Each record has audit_line which is unique field for each record. Recent SQL script is on below:
SELECT *
FROM db.table
HAVING COUNT(id) > 5
If you want id > 5 then you want where:
select top (1) t.*
from db.table t
where id > 5
order by date;
You can use row-numbering for this.
Note that if you have no other column to order by, you can do ORDER BY (SELECT NULL), but then you may get different results on each run.
SELECT *
FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY id ORDER BY some_other_column) rn
FROM table
) t
WHERE rn = 5;

Count the total (N) of duplicates in a column

I'm attempting to count the total number of duplicates in a column (not the individual duplicates).
from outputs
GROUP BY journal_id
HAVING ( COUNT(doi) > 1 )
WHERE journal_id = 1
SQL TABLE
doi journal_id
123 1
123 2
123 1
124 1
The expected answer is 2
The number of entire row duplicates can be calculated by taking the total number of rows and subtracting the number of distinct rows:
select a.cnt_all - d.cnt_individual
from (select count(*) as cnt_all
from outputs
) a cross join
(select count(*) as cnt_individual
from (select distinct *
from outputs
) d
) d;
If you know your columns and your database supports multiple arguments to count(distinct), this can be radically simplified to:
select count(*) - count(distinct doi, journal_id)
from outputs;
Or, if your database doesn't support this:
select sum(cnt - 1)
from (select doi, journal_id, count(*) as cnt
from outputs
group by doi, journal_id
) o;
Just sum up the count of the individual duplicates by journal id.
SELECT
SUM(COUNT(doi)) AS total_duplicates
from
outputs
WHERE
journal_id = 1
GROUP BY
journal_id
HAVING
(COUNT(doi) > 1)

Retrieve rows on the basis of repeating value of a column

I want to retrieve data from my table Card.
table Card(
MembershipNumber,
EmbossLine,
status,
EmbossName
)
Such that only those rows should be returned that have repeating MembershipNumber i.e having count greater than 1.
Like if I Have following records
(11,0321,'active','John')
(11,0322,'active','John')
(23,0350,'active','Mary')
(46,0383,'active','Fudge')
(46,0382,'active','Fudge')
(46,0381,'active','Fudge')
The query should return all records except the third one. Is it possible?
EDITED I got the answer for my question. I have another query. I want to filter the rows by status too but when I run the following query I dont get the desired result:
SELECT EmbossLine,Membershipnumber,status,embossname,*
FROM (SELECT *,
Count(MembershipNumber)OVER(partition BY EmbossName) AS cnt
FROM card) A
WHERE cnt > 1 AND status='E0'
Before Adding status in the where clause, it works perfectly fine. see Picture
After adding filtering by status
Use Count() Over() window function to do this.
SELECT *
FROM (SELECT *,
Count(MembershipNumber)OVER(partition BY EmbossName) AS cnt
FROM youurtable) A
WHERE cnt > 1
Demo
SELECT MembershipNumber,
[status],
EmbossName
FROM (SELECT *,
Count(MembershipNumber)OVER(partition BY EmbossName) AS cnt
FROM (VALUES (11.0321,'active','John'),
(11.0322,'active','John'),
(23.0350,'active','Mary'),
(46.0383,'active','Fudge'),
(46.0382,'active','Fudge'),
(46.0381,'active','Fudge')) tc (MembershipNumber, [status], EmbossName)) A
WHERE cnt > 1
SELECT * FROM CARD a WHERE
(SELECT COUNT(*) FROM CARD b WHERE b.MembershipNumber = a.MembershipNumber) > 1
should do it
Select c.* from card c
join (select MembershipNumber from table group by
MembershipNumber having count(MembershipNumber) > 1) mem
on mem.MembershipNumber = c.MembershipNumber
Or
Select * from table where MembershipNumber in
(select MembershipNumber from table group by
MembershipNumber having count(MembershipNumber) > 1)
Find the duplicates on empbossname and get the result
select t1.* from card as t1 inner join
(select empbossname from card group by empbossname having count(*)>1) as t2
on t1.empbossname =t2.empbossname

Getting rows with duplicate column values

I tried this with solutions avaialble online, but none worked for me.
Table :
Id rank
1 100
1 100
2 75
2 45
3 50
3 50
I want Ids 1 and 3 returned, beacuse they have duplicates.
I tried something like
select * from A where rank in (
select rank from A group by rank having count(rank) > 1
This also returned ids without any duplicates. Please help.
Try this:
select id from table
group by id, rank
having count(*) > 1
select id, rank
from
(
select id, rank, count(*) cnt
from rank_tab
group by id, rank
having count(*) > 1
) t
This general idea should work:
SELECT id
FROM your_table
GROUP BY id
HAVING COUNT(*) > 1 AND COUNT(DISTINCT rank) = 1
In plain English: get every id that exists in multiple rows, but all these rows have the same value in rank.
If you want ids that have some duplicated ranks (but not necessarily all), something like this should work:
SELECT id
FROM your_table
GROUP BY id
HAVING COUNT(*) > COUNT(DISTINCT rank)